Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-10-08

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.

2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c

3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet

4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski

5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.

6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf

7) various AF_XDP fixes from Björn and Magnus
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
David S. Miller 2018-10-08 23:42:44 -07:00
commit 071a234ad7
66 changed files with 4142 additions and 665 deletions

View File

@ -159,8 +159,8 @@ log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
and 3000 refers to the same chunk. and 3000 refers to the same chunk.
UMEM Completetion Ring UMEM Completion Ring
~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
The Completion Ring is used transfer ownership of UMEM frames from The Completion Ring is used transfer ownership of UMEM frames from
kernel-space to user-space. Just like the Fill ring, UMEM indicies are kernel-space to user-space. Just like the Fill ring, UMEM indicies are

View File

@ -203,11 +203,11 @@ opcodes as defined in linux/filter.h stand for:
Instruction Addressing mode Description Instruction Addressing mode Description
ld 1, 2, 3, 4, 10 Load word into A ld 1, 2, 3, 4, 12 Load word into A
ldi 4 Load word into A ldi 4 Load word into A
ldh 1, 2 Load half-word into A ldh 1, 2 Load half-word into A
ldb 1, 2 Load byte into A ldb 1, 2 Load byte into A
ldx 3, 4, 5, 10 Load word into X ldx 3, 4, 5, 12 Load word into X
ldxi 4 Load word into X ldxi 4 Load word into X
ldxb 5 Load byte into X ldxb 5 Load byte into X
@ -216,14 +216,14 @@ opcodes as defined in linux/filter.h stand for:
jmp 6 Jump to label jmp 6 Jump to label
ja 6 Jump to label ja 6 Jump to label
jeq 7, 8 Jump on A == k jeq 7, 8, 9, 10 Jump on A == <x>
jneq 8 Jump on A != k jneq 9, 10 Jump on A != <x>
jne 8 Jump on A != k jne 9, 10 Jump on A != <x>
jlt 8 Jump on A < k jlt 9, 10 Jump on A < <x>
jle 8 Jump on A <= k jle 9, 10 Jump on A <= <x>
jgt 7, 8 Jump on A > k jgt 7, 8, 9, 10 Jump on A > <x>
jge 7, 8 Jump on A >= k jge 7, 8, 9, 10 Jump on A >= <x>
jset 7, 8 Jump on A & k jset 7, 8, 9, 10 Jump on A & <x>
add 0, 4 A + <x> add 0, 4 A + <x>
sub 0, 4 A - <x> sub 0, 4 A - <x>
@ -240,7 +240,7 @@ opcodes as defined in linux/filter.h stand for:
tax Copy A into X tax Copy A into X
txa Copy X into A txa Copy X into A
ret 4, 9 Return ret 4, 11 Return
The next table shows addressing formats from the 2nd column: The next table shows addressing formats from the 2nd column:
@ -254,9 +254,11 @@ The next table shows addressing formats from the 2nd column:
5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet 5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet
6 L Jump label L 6 L Jump label L
7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf 7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf
8 #k,Lt Jump to Lt if predicate is true 8 x/%x,Lt,Lf Jump to Lt if true, otherwise jump to Lf
9 a/%a Accumulator A 9 #k,Lt Jump to Lt if predicate is true
10 extension BPF extension 10 x/%x,Lt Jump to Lt if predicate is true
11 a/%a Accumulator A
12 extension BPF extension
The Linux kernel also has a couple of BPF extensions that are used along The Linux kernel also has a couple of BPF extensions that are used along
with the class of load instructions by "overloading" the k argument with with the class of load instructions by "overloading" the k argument with
@ -1125,6 +1127,14 @@ pointer type. The types of pointers describe their base, as follows:
PTR_TO_STACK Frame pointer. PTR_TO_STACK Frame pointer.
PTR_TO_PACKET skb->data. PTR_TO_PACKET skb->data.
PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden. PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden.
PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted.
PTR_TO_SOCKET_OR_NULL
Either a pointer to a socket, or NULL; socket lookup
returns this type, which becomes a PTR_TO_SOCKET when
checked != NULL. PTR_TO_SOCKET is reference-counted,
so programs must release the reference through the
socket release function before the end of the program.
Arithmetic on these pointers is forbidden.
However, a pointer may be offset from this base (as a result of pointer However, a pointer may be offset from this base (as a result of pointer
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
offset'. The former is used when an exactly-known value (e.g. an immediate offset'. The former is used when an exactly-known value (e.g. an immediate
@ -1171,6 +1181,13 @@ over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2 pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
that pointer are safe. that pointer are safe.
The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
to all copies of the pointer returned from a socket lookup. This has similar
behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
represents a reference to the corresponding 'struct sock'. To ensure that the
reference is not leaked, it is imperative to NULL-check the reference and in
the non-NULL case, and pass the valid reference to the socket release function.
Direct packet access Direct packet access
-------------------- --------------------
@ -1444,6 +1461,55 @@ Error:
8: (7a) *(u64 *)(r0 +0) = 1 8: (7a) *(u64 *)(r0 +0) = 1
R0 invalid mem access 'imm' R0 invalid mem access 'imm'
Program that performs a socket lookup then sets the pointer to NULL without
checking it:
value:
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_MOV64_IMM(BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_4, 0),
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
Error:
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
3: (07) r2 += -8
4: (b7) r3 = 4
5: (b7) r4 = 0
6: (b7) r5 = 0
7: (85) call bpf_sk_lookup_tcp#65
8: (b7) r0 = 0
9: (95) exit
Unreleased reference id=1, alloc_insn=7
Program that performs a socket lookup but does not NULL-check the returned
value:
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_MOV64_IMM(BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_4, 0),
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_EXIT_INSN(),
Error:
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
3: (07) r2 += -8
4: (b7) r3 = 4
5: (b7) r4 = 0
6: (b7) r5 = 0
7: (85) call bpf_sk_lookup_tcp#65
8: (95) exit
Unreleased reference id=1, alloc_insn=7
Testing Testing
------- -------

View File

@ -89,15 +89,32 @@ nfp_bpf_cmsg_alloc(struct nfp_app_bpf *bpf, unsigned int size)
return skb; return skb;
} }
static struct sk_buff * static unsigned int
nfp_bpf_cmsg_map_req_alloc(struct nfp_app_bpf *bpf, unsigned int n) nfp_bpf_cmsg_map_req_size(struct nfp_app_bpf *bpf, unsigned int n)
{ {
unsigned int size; unsigned int size;
size = sizeof(struct cmsg_req_map_op); size = sizeof(struct cmsg_req_map_op);
size += sizeof(struct cmsg_key_value_pair) * n; size += (bpf->cmsg_key_sz + bpf->cmsg_val_sz) * n;
return nfp_bpf_cmsg_alloc(bpf, size); return size;
}
static struct sk_buff *
nfp_bpf_cmsg_map_req_alloc(struct nfp_app_bpf *bpf, unsigned int n)
{
return nfp_bpf_cmsg_alloc(bpf, nfp_bpf_cmsg_map_req_size(bpf, n));
}
static unsigned int
nfp_bpf_cmsg_map_reply_size(struct nfp_app_bpf *bpf, unsigned int n)
{
unsigned int size;
size = sizeof(struct cmsg_reply_map_op);
size += (bpf->cmsg_key_sz + bpf->cmsg_val_sz) * n;
return size;
} }
static u8 nfp_bpf_cmsg_get_type(struct sk_buff *skb) static u8 nfp_bpf_cmsg_get_type(struct sk_buff *skb)
@ -338,6 +355,34 @@ void nfp_bpf_ctrl_free_map(struct nfp_app_bpf *bpf, struct nfp_bpf_map *nfp_map)
dev_consume_skb_any(skb); dev_consume_skb_any(skb);
} }
static void *
nfp_bpf_ctrl_req_key(struct nfp_app_bpf *bpf, struct cmsg_req_map_op *req,
unsigned int n)
{
return &req->data[bpf->cmsg_key_sz * n + bpf->cmsg_val_sz * n];
}
static void *
nfp_bpf_ctrl_req_val(struct nfp_app_bpf *bpf, struct cmsg_req_map_op *req,
unsigned int n)
{
return &req->data[bpf->cmsg_key_sz * (n + 1) + bpf->cmsg_val_sz * n];
}
static void *
nfp_bpf_ctrl_reply_key(struct nfp_app_bpf *bpf, struct cmsg_reply_map_op *reply,
unsigned int n)
{
return &reply->data[bpf->cmsg_key_sz * n + bpf->cmsg_val_sz * n];
}
static void *
nfp_bpf_ctrl_reply_val(struct nfp_app_bpf *bpf, struct cmsg_reply_map_op *reply,
unsigned int n)
{
return &reply->data[bpf->cmsg_key_sz * (n + 1) + bpf->cmsg_val_sz * n];
}
static int static int
nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,
enum nfp_bpf_cmsg_type op, enum nfp_bpf_cmsg_type op,
@ -366,12 +411,13 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,
/* Copy inputs */ /* Copy inputs */
if (key) if (key)
memcpy(&req->elem[0].key, key, map->key_size); memcpy(nfp_bpf_ctrl_req_key(bpf, req, 0), key, map->key_size);
if (value) if (value)
memcpy(&req->elem[0].value, value, map->value_size); memcpy(nfp_bpf_ctrl_req_val(bpf, req, 0), value,
map->value_size);
skb = nfp_bpf_cmsg_communicate(bpf, skb, op, skb = nfp_bpf_cmsg_communicate(bpf, skb, op,
sizeof(*reply) + sizeof(*reply->elem)); nfp_bpf_cmsg_map_reply_size(bpf, 1));
if (IS_ERR(skb)) if (IS_ERR(skb))
return PTR_ERR(skb); return PTR_ERR(skb);
@ -382,9 +428,11 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,
/* Copy outputs */ /* Copy outputs */
if (out_key) if (out_key)
memcpy(out_key, &reply->elem[0].key, map->key_size); memcpy(out_key, nfp_bpf_ctrl_reply_key(bpf, reply, 0),
map->key_size);
if (out_value) if (out_value)
memcpy(out_value, &reply->elem[0].value, map->value_size); memcpy(out_value, nfp_bpf_ctrl_reply_val(bpf, reply, 0),
map->value_size);
dev_consume_skb_any(skb); dev_consume_skb_any(skb);
@ -428,6 +476,13 @@ int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap,
key, NULL, 0, next_key, NULL); key, NULL, 0, next_key, NULL);
} }
unsigned int nfp_bpf_ctrl_cmsg_mtu(struct nfp_app_bpf *bpf)
{
return max3((unsigned int)NFP_NET_DEFAULT_MTU,
nfp_bpf_cmsg_map_req_size(bpf, 1),
nfp_bpf_cmsg_map_reply_size(bpf, 1));
}
void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb) void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb)
{ {
struct nfp_app_bpf *bpf = app->priv; struct nfp_app_bpf *bpf = app->priv;

View File

@ -52,6 +52,7 @@ enum bpf_cap_tlv_type {
NFP_BPF_CAP_TYPE_RANDOM = 4, NFP_BPF_CAP_TYPE_RANDOM = 4,
NFP_BPF_CAP_TYPE_QUEUE_SELECT = 5, NFP_BPF_CAP_TYPE_QUEUE_SELECT = 5,
NFP_BPF_CAP_TYPE_ADJUST_TAIL = 6, NFP_BPF_CAP_TYPE_ADJUST_TAIL = 6,
NFP_BPF_CAP_TYPE_ABI_VERSION = 7,
}; };
struct nfp_bpf_cap_tlv_func { struct nfp_bpf_cap_tlv_func {
@ -98,6 +99,7 @@ enum nfp_bpf_cmsg_type {
#define CMSG_TYPE_MAP_REPLY_BIT 7 #define CMSG_TYPE_MAP_REPLY_BIT 7
#define __CMSG_REPLY(req) (BIT(CMSG_TYPE_MAP_REPLY_BIT) | (req)) #define __CMSG_REPLY(req) (BIT(CMSG_TYPE_MAP_REPLY_BIT) | (req))
/* BPF ABIv2 fixed-length control message fields */
#define CMSG_MAP_KEY_LW 16 #define CMSG_MAP_KEY_LW 16
#define CMSG_MAP_VALUE_LW 16 #define CMSG_MAP_VALUE_LW 16
@ -147,24 +149,19 @@ struct cmsg_reply_map_free_tbl {
__be32 count; __be32 count;
}; };
struct cmsg_key_value_pair {
__be32 key[CMSG_MAP_KEY_LW];
__be32 value[CMSG_MAP_VALUE_LW];
};
struct cmsg_req_map_op { struct cmsg_req_map_op {
struct cmsg_hdr hdr; struct cmsg_hdr hdr;
__be32 tid; __be32 tid;
__be32 count; __be32 count;
__be32 flags; __be32 flags;
struct cmsg_key_value_pair elem[0]; u8 data[0];
}; };
struct cmsg_reply_map_op { struct cmsg_reply_map_op {
struct cmsg_reply_map_simple reply_hdr; struct cmsg_reply_map_simple reply_hdr;
__be32 count; __be32 count;
__be32 resv; __be32 resv;
struct cmsg_key_value_pair elem[0]; u8 data[0];
}; };
struct cmsg_bpf_event { struct cmsg_bpf_event {

View File

@ -266,6 +266,38 @@ emit_br_bset(struct nfp_prog *nfp_prog, swreg src, u8 bit, u16 addr, u8 defer)
emit_br_bit_relo(nfp_prog, src, bit, addr, defer, true, RELO_BR_REL); emit_br_bit_relo(nfp_prog, src, bit, addr, defer, true, RELO_BR_REL);
} }
static void
__emit_br_alu(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 imm_hi,
u8 defer, bool dst_lmextn, bool src_lmextn)
{
u64 insn;
insn = OP_BR_ALU_BASE |
FIELD_PREP(OP_BR_ALU_A_SRC, areg) |
FIELD_PREP(OP_BR_ALU_B_SRC, breg) |
FIELD_PREP(OP_BR_ALU_DEFBR, defer) |
FIELD_PREP(OP_BR_ALU_IMM_HI, imm_hi) |
FIELD_PREP(OP_BR_ALU_SRC_LMEXTN, src_lmextn) |
FIELD_PREP(OP_BR_ALU_DST_LMEXTN, dst_lmextn);
nfp_prog_push(nfp_prog, insn);
}
static void emit_rtn(struct nfp_prog *nfp_prog, swreg base, u8 defer)
{
struct nfp_insn_ur_regs reg;
int err;
err = swreg_to_unrestricted(reg_none(), base, reg_imm(0), &reg);
if (err) {
nfp_prog->error = err;
return;
}
__emit_br_alu(nfp_prog, reg.areg, reg.breg, 0, defer, reg.dst_lmextn,
reg.src_lmextn);
}
static void static void
__emit_immed(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 imm_hi, __emit_immed(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 imm_hi,
enum immed_width width, bool invert, enum immed_width width, bool invert,
@ -1137,7 +1169,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
unsigned int size, unsigned int ptr_off, u8 gpr, u8 ptr_gpr, unsigned int size, unsigned int ptr_off, u8 gpr, u8 ptr_gpr,
bool clr_gpr, lmem_step step) bool clr_gpr, lmem_step step)
{ {
s32 off = nfp_prog->stack_depth + meta->insn.off + ptr_off; s32 off = nfp_prog->stack_frame_depth + meta->insn.off + ptr_off;
bool first = true, last; bool first = true, last;
bool needs_inc = false; bool needs_inc = false;
swreg stack_off_reg; swreg stack_off_reg;
@ -1146,7 +1178,8 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
bool lm3 = true; bool lm3 = true;
int ret; int ret;
if (meta->ptr_not_const) { if (meta->ptr_not_const ||
meta->flags & FLAG_INSN_PTR_CALLER_STACK_FRAME) {
/* Use of the last encountered ptr_off is OK, they all have /* Use of the last encountered ptr_off is OK, they all have
* the same alignment. Depend on low bits of value being * the same alignment. Depend on low bits of value being
* discarded when written to LMaddr register. * discarded when written to LMaddr register.
@ -1695,7 +1728,7 @@ map_call_stack_common(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
s64 lm_off; s64 lm_off;
/* We only have to reload LM0 if the key is not at start of stack */ /* We only have to reload LM0 if the key is not at start of stack */
lm_off = nfp_prog->stack_depth; lm_off = nfp_prog->stack_frame_depth;
lm_off += meta->arg2.reg.var_off.value + meta->arg2.reg.off; lm_off += meta->arg2.reg.var_off.value + meta->arg2.reg.off;
load_lm_ptr = meta->arg2.var_off || lm_off; load_lm_ptr = meta->arg2.var_off || lm_off;
@ -1808,10 +1841,10 @@ static int mov_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
swreg stack_depth_reg; swreg stack_depth_reg;
stack_depth_reg = ur_load_imm_any(nfp_prog, stack_depth_reg = ur_load_imm_any(nfp_prog,
nfp_prog->stack_depth, nfp_prog->stack_frame_depth,
stack_imm(nfp_prog)); stack_imm(nfp_prog));
emit_alu(nfp_prog, reg_both(dst), emit_alu(nfp_prog, reg_both(dst), stack_reg(nfp_prog),
stack_reg(nfp_prog), ALU_OP_ADD, stack_depth_reg); ALU_OP_ADD, stack_depth_reg);
wrp_immed(nfp_prog, reg_both(dst + 1), 0); wrp_immed(nfp_prog, reg_both(dst + 1), 0);
} else { } else {
wrp_reg_mov(nfp_prog, dst, src); wrp_reg_mov(nfp_prog, dst, src);
@ -3081,7 +3114,93 @@ static int jne_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
return wrp_test_reg(nfp_prog, meta, ALU_OP_XOR, BR_BNE); return wrp_test_reg(nfp_prog, meta, ALU_OP_XOR, BR_BNE);
} }
static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) static int
bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
u32 ret_tgt, stack_depth, offset_br;
swreg tmp_reg;
stack_depth = round_up(nfp_prog->stack_frame_depth, STACK_FRAME_ALIGN);
/* Space for saving the return address is accounted for by the callee,
* so stack_depth can be zero for the main function.
*/
if (stack_depth) {
tmp_reg = ur_load_imm_any(nfp_prog, stack_depth,
stack_imm(nfp_prog));
emit_alu(nfp_prog, stack_reg(nfp_prog),
stack_reg(nfp_prog), ALU_OP_ADD, tmp_reg);
emit_csr_wr(nfp_prog, stack_reg(nfp_prog),
NFP_CSR_ACT_LM_ADDR0);
}
/* Two cases for jumping to the callee:
*
* - If callee uses and needs to save R6~R9 then:
* 1. Put the start offset of the callee into imm_b(). This will
* require a fixup step, as we do not necessarily know this
* address yet.
* 2. Put the return address from the callee to the caller into
* register ret_reg().
* 3. (After defer slots are consumed) Jump to the subroutine that
* pushes the registers to the stack.
* The subroutine acts as a trampoline, and returns to the address in
* imm_b(), i.e. jumps to the callee.
*
* - If callee does not need to save R6~R9 then just load return
* address to the caller in ret_reg(), and jump to the callee
* directly.
*
* Using ret_reg() to pass the return address to the callee is set here
* as a convention. The callee can then push this address onto its
* stack frame in its prologue. The advantages of passing the return
* address through ret_reg(), instead of pushing it to the stack right
* here, are the following:
* - It looks cleaner.
* - If the called function is called multiple time, we get a lower
* program size.
* - We save two no-op instructions that should be added just before
* the emit_br() when stack depth is not null otherwise.
* - If we ever find a register to hold the return address during whole
* execution of the callee, we will not have to push the return
* address to the stack for leaf functions.
*/
if (!meta->jmp_dst) {
pr_err("BUG: BPF-to-BPF call has no destination recorded\n");
return -ELOOP;
}
if (nfp_prog->subprog[meta->jmp_dst->subprog_idx].needs_reg_push) {
ret_tgt = nfp_prog_current_offset(nfp_prog) + 3;
emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 2,
RELO_BR_GO_CALL_PUSH_REGS);
offset_br = nfp_prog_current_offset(nfp_prog);
wrp_immed_relo(nfp_prog, imm_b(nfp_prog), 0, RELO_IMMED_REL);
} else {
ret_tgt = nfp_prog_current_offset(nfp_prog) + 2;
emit_br(nfp_prog, BR_UNC, meta->n + 1 + meta->insn.imm, 1);
offset_br = nfp_prog_current_offset(nfp_prog);
}
wrp_immed_relo(nfp_prog, ret_reg(nfp_prog), ret_tgt, RELO_IMMED_REL);
if (!nfp_prog_confirm_current_offset(nfp_prog, ret_tgt))
return -EINVAL;
if (stack_depth) {
tmp_reg = ur_load_imm_any(nfp_prog, stack_depth,
stack_imm(nfp_prog));
emit_alu(nfp_prog, stack_reg(nfp_prog),
stack_reg(nfp_prog), ALU_OP_SUB, tmp_reg);
emit_csr_wr(nfp_prog, stack_reg(nfp_prog),
NFP_CSR_ACT_LM_ADDR0);
wrp_nops(nfp_prog, 3);
}
meta->num_insns_after_br = nfp_prog_current_offset(nfp_prog);
meta->num_insns_after_br -= offset_br;
return 0;
}
static int helper_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{ {
switch (meta->insn.imm) { switch (meta->insn.imm) {
case BPF_FUNC_xdp_adjust_head: case BPF_FUNC_xdp_adjust_head:
@ -3102,6 +3221,19 @@ static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
} }
} }
static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
if (is_mbpf_pseudo_call(meta))
return bpf_to_bpf_call(nfp_prog, meta);
else
return helper_call(nfp_prog, meta);
}
static bool nfp_is_main_function(struct nfp_insn_meta *meta)
{
return meta->subprog_idx == 0;
}
static int goto_out(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) static int goto_out(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{ {
emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 0, RELO_BR_GO_OUT); emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 0, RELO_BR_GO_OUT);
@ -3109,6 +3241,39 @@ static int goto_out(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
return 0; return 0;
} }
static int
nfp_subprog_epilogue(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
if (nfp_prog->subprog[meta->subprog_idx].needs_reg_push) {
/* Pop R6~R9 to the stack via related subroutine.
* We loaded the return address to the caller into ret_reg().
* This means that the subroutine does not come back here, we
* make it jump back to the subprogram caller directly!
*/
emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 1,
RELO_BR_GO_CALL_POP_REGS);
/* Pop return address from the stack. */
wrp_mov(nfp_prog, ret_reg(nfp_prog), reg_lm(0, 0));
} else {
/* Pop return address from the stack. */
wrp_mov(nfp_prog, ret_reg(nfp_prog), reg_lm(0, 0));
/* Jump back to caller if no callee-saved registers were used
* by the subprogram.
*/
emit_rtn(nfp_prog, ret_reg(nfp_prog), 0);
}
return 0;
}
static int jmp_exit(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
if (nfp_is_main_function(meta))
return goto_out(nfp_prog, meta);
else
return nfp_subprog_epilogue(nfp_prog, meta);
}
static const instr_cb_t instr_cb[256] = { static const instr_cb_t instr_cb[256] = {
[BPF_ALU64 | BPF_MOV | BPF_X] = mov_reg64, [BPF_ALU64 | BPF_MOV | BPF_X] = mov_reg64,
[BPF_ALU64 | BPF_MOV | BPF_K] = mov_imm64, [BPF_ALU64 | BPF_MOV | BPF_K] = mov_imm64,
@ -3197,36 +3362,66 @@ static const instr_cb_t instr_cb[256] = {
[BPF_JMP | BPF_JSET | BPF_X] = jset_reg, [BPF_JMP | BPF_JSET | BPF_X] = jset_reg,
[BPF_JMP | BPF_JNE | BPF_X] = jne_reg, [BPF_JMP | BPF_JNE | BPF_X] = jne_reg,
[BPF_JMP | BPF_CALL] = call, [BPF_JMP | BPF_CALL] = call,
[BPF_JMP | BPF_EXIT] = goto_out, [BPF_JMP | BPF_EXIT] = jmp_exit,
}; };
/* --- Assembler logic --- */ /* --- Assembler logic --- */
static int
nfp_fixup_immed_relo(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
struct nfp_insn_meta *jmp_dst, u32 br_idx)
{
if (immed_get_value(nfp_prog->prog[br_idx + 1])) {
pr_err("BUG: failed to fix up callee register saving\n");
return -EINVAL;
}
immed_set_value(&nfp_prog->prog[br_idx + 1], jmp_dst->off);
return 0;
}
static int nfp_fixup_branches(struct nfp_prog *nfp_prog) static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
{ {
struct nfp_insn_meta *meta, *jmp_dst; struct nfp_insn_meta *meta, *jmp_dst;
u32 idx, br_idx; u32 idx, br_idx;
int err;
list_for_each_entry(meta, &nfp_prog->insns, l) { list_for_each_entry(meta, &nfp_prog->insns, l) {
if (meta->skip) if (meta->skip)
continue; continue;
if (meta->insn.code == (BPF_JMP | BPF_CALL))
continue;
if (BPF_CLASS(meta->insn.code) != BPF_JMP) if (BPF_CLASS(meta->insn.code) != BPF_JMP)
continue; continue;
if (meta->insn.code == (BPF_JMP | BPF_EXIT) &&
!nfp_is_main_function(meta))
continue;
if (is_mbpf_helper_call(meta))
continue;
if (list_is_last(&meta->l, &nfp_prog->insns)) if (list_is_last(&meta->l, &nfp_prog->insns))
br_idx = nfp_prog->last_bpf_off; br_idx = nfp_prog->last_bpf_off;
else else
br_idx = list_next_entry(meta, l)->off - 1; br_idx = list_next_entry(meta, l)->off - 1;
/* For BPF-to-BPF function call, a stack adjustment sequence is
* generated after the return instruction. Therefore, we must
* withdraw the length of this sequence to have br_idx pointing
* to where the "branch" NFP instruction is expected to be.
*/
if (is_mbpf_pseudo_call(meta))
br_idx -= meta->num_insns_after_br;
if (!nfp_is_br(nfp_prog->prog[br_idx])) { if (!nfp_is_br(nfp_prog->prog[br_idx])) {
pr_err("Fixup found block not ending in branch %d %02x %016llx!!\n", pr_err("Fixup found block not ending in branch %d %02x %016llx!!\n",
br_idx, meta->insn.code, nfp_prog->prog[br_idx]); br_idx, meta->insn.code, nfp_prog->prog[br_idx]);
return -ELOOP; return -ELOOP;
} }
if (meta->insn.code == (BPF_JMP | BPF_EXIT))
continue;
/* Leave special branches for later */ /* Leave special branches for later */
if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) != if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) !=
RELO_BR_REL) RELO_BR_REL && !is_mbpf_pseudo_call(meta))
continue; continue;
if (!meta->jmp_dst) { if (!meta->jmp_dst) {
@ -3241,6 +3436,18 @@ static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
return -ELOOP; return -ELOOP;
} }
if (is_mbpf_pseudo_call(meta) &&
nfp_prog->subprog[jmp_dst->subprog_idx].needs_reg_push) {
err = nfp_fixup_immed_relo(nfp_prog, meta,
jmp_dst, br_idx);
if (err)
return err;
}
if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) !=
RELO_BR_REL)
continue;
for (idx = meta->off; idx <= br_idx; idx++) { for (idx = meta->off; idx <= br_idx; idx++) {
if (!nfp_is_br(nfp_prog->prog[idx])) if (!nfp_is_br(nfp_prog->prog[idx]))
continue; continue;
@ -3258,6 +3465,27 @@ static void nfp_intro(struct nfp_prog *nfp_prog)
plen_reg(nfp_prog), ALU_OP_AND, pv_len(nfp_prog)); plen_reg(nfp_prog), ALU_OP_AND, pv_len(nfp_prog));
} }
static void
nfp_subprog_prologue(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
/* Save return address into the stack. */
wrp_mov(nfp_prog, reg_lm(0, 0), ret_reg(nfp_prog));
}
static void
nfp_start_subprog(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
unsigned int depth = nfp_prog->subprog[meta->subprog_idx].stack_depth;
nfp_prog->stack_frame_depth = round_up(depth, 4);
nfp_subprog_prologue(nfp_prog, meta);
}
bool nfp_is_subprog_start(struct nfp_insn_meta *meta)
{
return meta->flags & FLAG_INSN_IS_SUBPROG_START;
}
static void nfp_outro_tc_da(struct nfp_prog *nfp_prog) static void nfp_outro_tc_da(struct nfp_prog *nfp_prog)
{ {
/* TC direct-action mode: /* TC direct-action mode:
@ -3348,6 +3576,67 @@ static void nfp_outro_xdp(struct nfp_prog *nfp_prog)
emit_ld_field(nfp_prog, reg_a(0), 0xc, reg_b(2), SHF_SC_L_SHF, 16); emit_ld_field(nfp_prog, reg_a(0), 0xc, reg_b(2), SHF_SC_L_SHF, 16);
} }
static bool nfp_prog_needs_callee_reg_save(struct nfp_prog *nfp_prog)
{
unsigned int idx;
for (idx = 1; idx < nfp_prog->subprog_cnt; idx++)
if (nfp_prog->subprog[idx].needs_reg_push)
return true;
return false;
}
static void nfp_push_callee_registers(struct nfp_prog *nfp_prog)
{
u8 reg;
/* Subroutine: Save all callee saved registers (R6 ~ R9).
* imm_b() holds the return address.
*/
nfp_prog->tgt_call_push_regs = nfp_prog_current_offset(nfp_prog);
for (reg = BPF_REG_6; reg <= BPF_REG_9; reg++) {
u8 adj = (reg - BPF_REG_0) * 2;
u8 idx = (reg - BPF_REG_6) * 2;
/* The first slot in the stack frame is used to push the return
* address in bpf_to_bpf_call(), start just after.
*/
wrp_mov(nfp_prog, reg_lm(0, 1 + idx), reg_b(adj));
if (reg == BPF_REG_8)
/* Prepare to jump back, last 3 insns use defer slots */
emit_rtn(nfp_prog, imm_b(nfp_prog), 3);
wrp_mov(nfp_prog, reg_lm(0, 1 + idx + 1), reg_b(adj + 1));
}
}
static void nfp_pop_callee_registers(struct nfp_prog *nfp_prog)
{
u8 reg;
/* Subroutine: Restore all callee saved registers (R6 ~ R9).
* ret_reg() holds the return address.
*/
nfp_prog->tgt_call_pop_regs = nfp_prog_current_offset(nfp_prog);
for (reg = BPF_REG_6; reg <= BPF_REG_9; reg++) {
u8 adj = (reg - BPF_REG_0) * 2;
u8 idx = (reg - BPF_REG_6) * 2;
/* The first slot in the stack frame holds the return address,
* start popping just after that.
*/
wrp_mov(nfp_prog, reg_both(adj), reg_lm(0, 1 + idx));
if (reg == BPF_REG_8)
/* Prepare to jump back, last 3 insns use defer slots */
emit_rtn(nfp_prog, ret_reg(nfp_prog), 3);
wrp_mov(nfp_prog, reg_both(adj + 1), reg_lm(0, 1 + idx + 1));
}
}
static void nfp_outro(struct nfp_prog *nfp_prog) static void nfp_outro(struct nfp_prog *nfp_prog)
{ {
switch (nfp_prog->type) { switch (nfp_prog->type) {
@ -3360,13 +3649,23 @@ static void nfp_outro(struct nfp_prog *nfp_prog)
default: default:
WARN_ON(1); WARN_ON(1);
} }
if (!nfp_prog_needs_callee_reg_save(nfp_prog))
return;
nfp_push_callee_registers(nfp_prog);
nfp_pop_callee_registers(nfp_prog);
} }
static int nfp_translate(struct nfp_prog *nfp_prog) static int nfp_translate(struct nfp_prog *nfp_prog)
{ {
struct nfp_insn_meta *meta; struct nfp_insn_meta *meta;
unsigned int depth;
int err; int err;
depth = nfp_prog->subprog[0].stack_depth;
nfp_prog->stack_frame_depth = round_up(depth, 4);
nfp_intro(nfp_prog); nfp_intro(nfp_prog);
if (nfp_prog->error) if (nfp_prog->error)
return nfp_prog->error; return nfp_prog->error;
@ -3376,6 +3675,12 @@ static int nfp_translate(struct nfp_prog *nfp_prog)
meta->off = nfp_prog_current_offset(nfp_prog); meta->off = nfp_prog_current_offset(nfp_prog);
if (nfp_is_subprog_start(meta)) {
nfp_start_subprog(nfp_prog, meta);
if (nfp_prog->error)
return nfp_prog->error;
}
if (meta->skip) { if (meta->skip) {
nfp_prog->n_translated++; nfp_prog->n_translated++;
continue; continue;
@ -4018,20 +4323,35 @@ void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt)
/* Another pass to record jump information. */ /* Another pass to record jump information. */
list_for_each_entry(meta, &nfp_prog->insns, l) { list_for_each_entry(meta, &nfp_prog->insns, l) {
struct nfp_insn_meta *dst_meta;
u64 code = meta->insn.code; u64 code = meta->insn.code;
unsigned int dst_idx;
bool pseudo_call;
if (BPF_CLASS(code) == BPF_JMP && BPF_OP(code) != BPF_EXIT && if (BPF_CLASS(code) != BPF_JMP)
BPF_OP(code) != BPF_CALL) { continue;
struct nfp_insn_meta *dst_meta; if (BPF_OP(code) == BPF_EXIT)
unsigned short dst_indx; continue;
if (is_mbpf_helper_call(meta))
continue;
dst_indx = meta->n + 1 + meta->insn.off; /* If opcode is BPF_CALL at this point, this can only be a
dst_meta = nfp_bpf_goto_meta(nfp_prog, meta, dst_indx, * BPF-to-BPF call (a.k.a pseudo call).
cnt); */
pseudo_call = BPF_OP(code) == BPF_CALL;
meta->jmp_dst = dst_meta; if (pseudo_call)
dst_meta->flags |= FLAG_INSN_IS_JUMP_DST; dst_idx = meta->n + 1 + meta->insn.imm;
} else
dst_idx = meta->n + 1 + meta->insn.off;
dst_meta = nfp_bpf_goto_meta(nfp_prog, meta, dst_idx, cnt);
if (pseudo_call)
dst_meta->flags |= FLAG_INSN_IS_SUBPROG_START;
dst_meta->flags |= FLAG_INSN_IS_JUMP_DST;
meta->jmp_dst = dst_meta;
} }
} }
@ -4054,6 +4374,7 @@ void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv)
for (i = 0; i < nfp_prog->prog_len; i++) { for (i = 0; i < nfp_prog->prog_len; i++) {
enum nfp_relo_type special; enum nfp_relo_type special;
u32 val; u32 val;
u16 off;
special = FIELD_GET(OP_RELO_TYPE, prog[i]); special = FIELD_GET(OP_RELO_TYPE, prog[i]);
switch (special) { switch (special) {
@ -4070,6 +4391,24 @@ void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv)
br_set_offset(&prog[i], br_set_offset(&prog[i],
nfp_prog->tgt_abort + bv->start_off); nfp_prog->tgt_abort + bv->start_off);
break; break;
case RELO_BR_GO_CALL_PUSH_REGS:
if (!nfp_prog->tgt_call_push_regs) {
pr_err("BUG: failed to detect subprogram registers needs\n");
err = -EINVAL;
goto err_free_prog;
}
off = nfp_prog->tgt_call_push_regs + bv->start_off;
br_set_offset(&prog[i], off);
break;
case RELO_BR_GO_CALL_POP_REGS:
if (!nfp_prog->tgt_call_pop_regs) {
pr_err("BUG: failed to detect subprogram registers needs\n");
err = -EINVAL;
goto err_free_prog;
}
off = nfp_prog->tgt_call_pop_regs + bv->start_off;
br_set_offset(&prog[i], off);
break;
case RELO_BR_NEXT_PKT: case RELO_BR_NEXT_PKT:
br_set_offset(&prog[i], bv->tgt_done); br_set_offset(&prog[i], bv->tgt_done);
break; break;

View File

@ -54,11 +54,14 @@ const struct rhashtable_params nfp_bpf_maps_neutral_params = {
static bool nfp_net_ebpf_capable(struct nfp_net *nn) static bool nfp_net_ebpf_capable(struct nfp_net *nn)
{ {
#ifdef __LITTLE_ENDIAN #ifdef __LITTLE_ENDIAN
if (nn->cap & NFP_NET_CFG_CTRL_BPF && struct nfp_app_bpf *bpf = nn->app->priv;
nn_readb(nn, NFP_NET_CFG_BPF_ABI) == NFP_NET_BPF_ABI)
return true; return nn->cap & NFP_NET_CFG_CTRL_BPF &&
#endif bpf->abi_version &&
nn_readb(nn, NFP_NET_CFG_BPF_ABI) == bpf->abi_version;
#else
return false; return false;
#endif
} }
static int static int
@ -342,6 +345,26 @@ nfp_bpf_parse_cap_adjust_tail(struct nfp_app_bpf *bpf, void __iomem *value,
return 0; return 0;
} }
static int
nfp_bpf_parse_cap_abi_version(struct nfp_app_bpf *bpf, void __iomem *value,
u32 length)
{
if (length < 4) {
nfp_err(bpf->app->cpp, "truncated ABI version TLV: %d\n",
length);
return -EINVAL;
}
bpf->abi_version = readl(value);
if (bpf->abi_version < 2 || bpf->abi_version > 3) {
nfp_warn(bpf->app->cpp, "unsupported BPF ABI version: %d\n",
bpf->abi_version);
bpf->abi_version = 0;
}
return 0;
}
static int nfp_bpf_parse_capabilities(struct nfp_app *app) static int nfp_bpf_parse_capabilities(struct nfp_app *app)
{ {
struct nfp_cpp *cpp = app->pf->cpp; struct nfp_cpp *cpp = app->pf->cpp;
@ -393,6 +416,11 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app)
length)) length))
goto err_release_free; goto err_release_free;
break; break;
case NFP_BPF_CAP_TYPE_ABI_VERSION:
if (nfp_bpf_parse_cap_abi_version(app->priv, value,
length))
goto err_release_free;
break;
default: default:
nfp_dbg(cpp, "unknown BPF capability: %d\n", type); nfp_dbg(cpp, "unknown BPF capability: %d\n", type);
break; break;
@ -414,6 +442,11 @@ err_release_free:
return -EINVAL; return -EINVAL;
} }
static void nfp_bpf_init_capabilities(struct nfp_app_bpf *bpf)
{
bpf->abi_version = 2; /* Original BPF ABI version */
}
static int nfp_bpf_ndo_init(struct nfp_app *app, struct net_device *netdev) static int nfp_bpf_ndo_init(struct nfp_app *app, struct net_device *netdev)
{ {
struct nfp_app_bpf *bpf = app->priv; struct nfp_app_bpf *bpf = app->priv;
@ -447,10 +480,21 @@ static int nfp_bpf_init(struct nfp_app *app)
if (err) if (err)
goto err_free_bpf; goto err_free_bpf;
nfp_bpf_init_capabilities(bpf);
err = nfp_bpf_parse_capabilities(app); err = nfp_bpf_parse_capabilities(app);
if (err) if (err)
goto err_free_neutral_maps; goto err_free_neutral_maps;
if (bpf->abi_version < 3) {
bpf->cmsg_key_sz = CMSG_MAP_KEY_LW * 4;
bpf->cmsg_val_sz = CMSG_MAP_VALUE_LW * 4;
} else {
bpf->cmsg_key_sz = bpf->maps.max_key_sz;
bpf->cmsg_val_sz = bpf->maps.max_val_sz;
app->ctrl_mtu = nfp_bpf_ctrl_cmsg_mtu(bpf);
}
bpf->bpf_dev = bpf_offload_dev_create(); bpf->bpf_dev = bpf_offload_dev_create();
err = PTR_ERR_OR_ZERO(bpf->bpf_dev); err = PTR_ERR_OR_ZERO(bpf->bpf_dev);
if (err) if (err)

View File

@ -61,6 +61,8 @@ enum nfp_relo_type {
/* internal jumps to parts of the outro */ /* internal jumps to parts of the outro */
RELO_BR_GO_OUT, RELO_BR_GO_OUT,
RELO_BR_GO_ABORT, RELO_BR_GO_ABORT,
RELO_BR_GO_CALL_PUSH_REGS,
RELO_BR_GO_CALL_POP_REGS,
/* external jumps to fixed addresses */ /* external jumps to fixed addresses */
RELO_BR_NEXT_PKT, RELO_BR_NEXT_PKT,
RELO_BR_HELPER, RELO_BR_HELPER,
@ -104,6 +106,7 @@ enum pkt_vec {
#define imma_a(np) reg_a(STATIC_REG_IMMA) #define imma_a(np) reg_a(STATIC_REG_IMMA)
#define imma_b(np) reg_b(STATIC_REG_IMMA) #define imma_b(np) reg_b(STATIC_REG_IMMA)
#define imm_both(np) reg_both(STATIC_REG_IMM) #define imm_both(np) reg_both(STATIC_REG_IMM)
#define ret_reg(np) imm_a(np)
#define NFP_BPF_ABI_FLAGS reg_imm(0) #define NFP_BPF_ABI_FLAGS reg_imm(0)
#define NFP_BPF_ABI_FLAG_MARK 1 #define NFP_BPF_ABI_FLAG_MARK 1
@ -121,12 +124,17 @@ enum pkt_vec {
* @cmsg_replies: received cmsg replies waiting to be consumed * @cmsg_replies: received cmsg replies waiting to be consumed
* @cmsg_wq: work queue for waiting for cmsg replies * @cmsg_wq: work queue for waiting for cmsg replies
* *
* @cmsg_key_sz: size of key in cmsg element array
* @cmsg_val_sz: size of value in cmsg element array
*
* @map_list: list of offloaded maps * @map_list: list of offloaded maps
* @maps_in_use: number of currently offloaded maps * @maps_in_use: number of currently offloaded maps
* @map_elems_in_use: number of elements allocated to offloaded maps * @map_elems_in_use: number of elements allocated to offloaded maps
* *
* @maps_neutral: hash table of offload-neutral maps (on pointer) * @maps_neutral: hash table of offload-neutral maps (on pointer)
* *
* @abi_version: global BPF ABI version
*
* @adjust_head: adjust head capability * @adjust_head: adjust head capability
* @adjust_head.flags: extra flags for adjust head * @adjust_head.flags: extra flags for adjust head
* @adjust_head.off_min: minimal packet offset within buffer required * @adjust_head.off_min: minimal packet offset within buffer required
@ -164,12 +172,17 @@ struct nfp_app_bpf {
struct sk_buff_head cmsg_replies; struct sk_buff_head cmsg_replies;
struct wait_queue_head cmsg_wq; struct wait_queue_head cmsg_wq;
unsigned int cmsg_key_sz;
unsigned int cmsg_val_sz;
struct list_head map_list; struct list_head map_list;
unsigned int maps_in_use; unsigned int maps_in_use;
unsigned int map_elems_in_use; unsigned int map_elems_in_use;
struct rhashtable maps_neutral; struct rhashtable maps_neutral;
u32 abi_version;
struct nfp_bpf_cap_adjust_head { struct nfp_bpf_cap_adjust_head {
u32 flags; u32 flags;
int off_min; int off_min;
@ -252,7 +265,9 @@ struct nfp_bpf_reg_state {
bool var_off; bool var_off;
}; };
#define FLAG_INSN_IS_JUMP_DST BIT(0) #define FLAG_INSN_IS_JUMP_DST BIT(0)
#define FLAG_INSN_IS_SUBPROG_START BIT(1)
#define FLAG_INSN_PTR_CALLER_STACK_FRAME BIT(2)
/** /**
* struct nfp_insn_meta - BPF instruction wrapper * struct nfp_insn_meta - BPF instruction wrapper
@ -269,6 +284,7 @@ struct nfp_bpf_reg_state {
* @xadd_maybe_16bit: 16bit immediate is possible * @xadd_maybe_16bit: 16bit immediate is possible
* @jmp_dst: destination info for jump instructions * @jmp_dst: destination info for jump instructions
* @jump_neg_op: jump instruction has inverted immediate, use ADD instead of SUB * @jump_neg_op: jump instruction has inverted immediate, use ADD instead of SUB
* @num_insns_after_br: number of insns following a branch jump, used for fixup
* @func_id: function id for call instructions * @func_id: function id for call instructions
* @arg1: arg1 for call instructions * @arg1: arg1 for call instructions
* @arg2: arg2 for call instructions * @arg2: arg2 for call instructions
@ -279,6 +295,7 @@ struct nfp_bpf_reg_state {
* @off: index of first generated machine instruction (in nfp_prog.prog) * @off: index of first generated machine instruction (in nfp_prog.prog)
* @n: eBPF instruction number * @n: eBPF instruction number
* @flags: eBPF instruction extra optimization flags * @flags: eBPF instruction extra optimization flags
* @subprog_idx: index of subprogram to which the instruction belongs
* @skip: skip this instruction (optimized out) * @skip: skip this instruction (optimized out)
* @double_cb: callback for second part of the instruction * @double_cb: callback for second part of the instruction
* @l: link on nfp_prog->insns list * @l: link on nfp_prog->insns list
@ -304,6 +321,7 @@ struct nfp_insn_meta {
struct { struct {
struct nfp_insn_meta *jmp_dst; struct nfp_insn_meta *jmp_dst;
bool jump_neg_op; bool jump_neg_op;
u32 num_insns_after_br; /* only for BPF-to-BPF calls */
}; };
/* function calls */ /* function calls */
struct { struct {
@ -325,6 +343,7 @@ struct nfp_insn_meta {
unsigned int off; unsigned int off;
unsigned short n; unsigned short n;
unsigned short flags; unsigned short flags;
unsigned short subprog_idx;
bool skip; bool skip;
instr_cb_t double_cb; instr_cb_t double_cb;
@ -413,6 +432,34 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta *meta)
return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_DIV; return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_DIV;
} }
static inline bool is_mbpf_helper_call(const struct nfp_insn_meta *meta)
{
struct bpf_insn insn = meta->insn;
return insn.code == (BPF_JMP | BPF_CALL) &&
insn.src_reg != BPF_PSEUDO_CALL;
}
static inline bool is_mbpf_pseudo_call(const struct nfp_insn_meta *meta)
{
struct bpf_insn insn = meta->insn;
return insn.code == (BPF_JMP | BPF_CALL) &&
insn.src_reg == BPF_PSEUDO_CALL;
}
#define STACK_FRAME_ALIGN 64
/**
* struct nfp_bpf_subprog_info - nfp BPF sub-program (a.k.a. function) info
* @stack_depth: maximum stack depth used by this sub-program
* @needs_reg_push: whether sub-program uses callee-saved registers
*/
struct nfp_bpf_subprog_info {
u16 stack_depth;
u8 needs_reg_push : 1;
};
/** /**
* struct nfp_prog - nfp BPF program * struct nfp_prog - nfp BPF program
* @bpf: backpointer to the bpf app priv structure * @bpf: backpointer to the bpf app priv structure
@ -424,12 +471,16 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta *meta)
* @last_bpf_off: address of the last instruction translated from BPF * @last_bpf_off: address of the last instruction translated from BPF
* @tgt_out: jump target for normal exit * @tgt_out: jump target for normal exit
* @tgt_abort: jump target for abort (e.g. access outside of packet buffer) * @tgt_abort: jump target for abort (e.g. access outside of packet buffer)
* @tgt_call_push_regs: jump target for subroutine for saving R6~R9 to stack
* @tgt_call_pop_regs: jump target for subroutine used for restoring R6~R9
* @n_translated: number of successfully translated instructions (for errors) * @n_translated: number of successfully translated instructions (for errors)
* @error: error code if something went wrong * @error: error code if something went wrong
* @stack_depth: max stack depth from the verifier * @stack_frame_depth: max stack depth for current frame
* @adjust_head_location: if program has single adjust head call - the insn no. * @adjust_head_location: if program has single adjust head call - the insn no.
* @map_records_cnt: the number of map pointers recorded for this prog * @map_records_cnt: the number of map pointers recorded for this prog
* @subprog_cnt: number of sub-programs, including main function
* @map_records: the map record pointers from bpf->maps_neutral * @map_records: the map record pointers from bpf->maps_neutral
* @subprog: pointer to an array of objects holding info about sub-programs
* @insns: list of BPF instruction wrappers (struct nfp_insn_meta) * @insns: list of BPF instruction wrappers (struct nfp_insn_meta)
*/ */
struct nfp_prog { struct nfp_prog {
@ -446,15 +497,19 @@ struct nfp_prog {
unsigned int last_bpf_off; unsigned int last_bpf_off;
unsigned int tgt_out; unsigned int tgt_out;
unsigned int tgt_abort; unsigned int tgt_abort;
unsigned int tgt_call_push_regs;
unsigned int tgt_call_pop_regs;
unsigned int n_translated; unsigned int n_translated;
int error; int error;
unsigned int stack_depth; unsigned int stack_frame_depth;
unsigned int adjust_head_location; unsigned int adjust_head_location;
unsigned int map_records_cnt; unsigned int map_records_cnt;
unsigned int subprog_cnt;
struct nfp_bpf_neutral_map **map_records; struct nfp_bpf_neutral_map **map_records;
struct nfp_bpf_subprog_info *subprog;
struct list_head insns; struct list_head insns;
}; };
@ -471,6 +526,7 @@ struct nfp_bpf_vnic {
unsigned int tgt_done; unsigned int tgt_done;
}; };
bool nfp_is_subprog_start(struct nfp_insn_meta *meta);
void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt); void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt);
int nfp_bpf_jit(struct nfp_prog *prog); int nfp_bpf_jit(struct nfp_prog *prog);
bool nfp_bpf_supported_opcode(u8 code); bool nfp_bpf_supported_opcode(u8 code);
@ -492,6 +548,7 @@ nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv); void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv);
unsigned int nfp_bpf_ctrl_cmsg_mtu(struct nfp_app_bpf *bpf);
long long int long long int
nfp_bpf_ctrl_alloc_map(struct nfp_app_bpf *bpf, struct bpf_map *map); nfp_bpf_ctrl_alloc_map(struct nfp_app_bpf *bpf, struct bpf_map *map);
void void

View File

@ -208,6 +208,8 @@ static void nfp_prog_free(struct nfp_prog *nfp_prog)
{ {
struct nfp_insn_meta *meta, *tmp; struct nfp_insn_meta *meta, *tmp;
kfree(nfp_prog->subprog);
list_for_each_entry_safe(meta, tmp, &nfp_prog->insns, l) { list_for_each_entry_safe(meta, tmp, &nfp_prog->insns, l) {
list_del(&meta->l); list_del(&meta->l);
kfree(meta); kfree(meta);
@ -250,18 +252,9 @@ err_free:
static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog) static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog)
{ {
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv; struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
unsigned int stack_size;
unsigned int max_instr; unsigned int max_instr;
int err; int err;
stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64;
if (prog->aux->stack_depth > stack_size) {
nn_info(nn, "stack too large: program %dB > FW stack %dB\n",
prog->aux->stack_depth, stack_size);
return -EOPNOTSUPP;
}
nfp_prog->stack_depth = round_up(prog->aux->stack_depth, 4);
max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN); max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN);
nfp_prog->__prog_alloc_len = max_instr * sizeof(u64); nfp_prog->__prog_alloc_len = max_instr * sizeof(u64);

View File

@ -34,10 +34,12 @@
#include <linux/bpf.h> #include <linux/bpf.h>
#include <linux/bpf_verifier.h> #include <linux/bpf_verifier.h>
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/netdevice.h>
#include <linux/pkt_cls.h> #include <linux/pkt_cls.h>
#include "../nfp_app.h" #include "../nfp_app.h"
#include "../nfp_main.h" #include "../nfp_main.h"
#include "../nfp_net.h"
#include "fw.h" #include "fw.h"
#include "main.h" #include "main.h"
@ -155,8 +157,9 @@ nfp_bpf_map_call_ok(const char *fname, struct bpf_verifier_env *env,
} }
static int static int
nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env, nfp_bpf_check_helper_call(struct nfp_prog *nfp_prog,
struct nfp_insn_meta *meta) struct bpf_verifier_env *env,
struct nfp_insn_meta *meta)
{ {
const struct bpf_reg_state *reg1 = cur_regs(env) + BPF_REG_1; const struct bpf_reg_state *reg1 = cur_regs(env) + BPF_REG_1;
const struct bpf_reg_state *reg2 = cur_regs(env) + BPF_REG_2; const struct bpf_reg_state *reg2 = cur_regs(env) + BPF_REG_2;
@ -333,6 +336,9 @@ nfp_bpf_check_stack_access(struct nfp_prog *nfp_prog,
{ {
s32 old_off, new_off; s32 old_off, new_off;
if (reg->frameno != env->cur_state->curframe)
meta->flags |= FLAG_INSN_PTR_CALLER_STACK_FRAME;
if (!tnum_is_const(reg->var_off)) { if (!tnum_is_const(reg->var_off)) {
pr_vlog(env, "variable ptr stack access\n"); pr_vlog(env, "variable ptr stack access\n");
return -EINVAL; return -EINVAL;
@ -620,8 +626,8 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
return -EINVAL; return -EINVAL;
} }
if (meta->insn.code == (BPF_JMP | BPF_CALL)) if (is_mbpf_helper_call(meta))
return nfp_bpf_check_call(nfp_prog, env, meta); return nfp_bpf_check_helper_call(nfp_prog, env, meta);
if (meta->insn.code == (BPF_JMP | BPF_EXIT)) if (meta->insn.code == (BPF_JMP | BPF_EXIT))
return nfp_bpf_check_exit(nfp_prog, env); return nfp_bpf_check_exit(nfp_prog, env);
@ -640,6 +646,131 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
return 0; return 0;
} }
static int
nfp_assign_subprog_idx_and_regs(struct bpf_verifier_env *env,
struct nfp_prog *nfp_prog)
{
struct nfp_insn_meta *meta;
int index = 0;
list_for_each_entry(meta, &nfp_prog->insns, l) {
if (nfp_is_subprog_start(meta))
index++;
meta->subprog_idx = index;
if (meta->insn.dst_reg >= BPF_REG_6 &&
meta->insn.dst_reg <= BPF_REG_9)
nfp_prog->subprog[index].needs_reg_push = 1;
}
if (index + 1 != nfp_prog->subprog_cnt) {
pr_vlog(env, "BUG: number of processed BPF functions is not consistent (processed %d, expected %d)\n",
index + 1, nfp_prog->subprog_cnt);
return -EFAULT;
}
return 0;
}
static unsigned int
nfp_bpf_get_stack_usage(struct nfp_prog *nfp_prog, unsigned int cnt)
{
struct nfp_insn_meta *meta = nfp_prog_first_meta(nfp_prog);
unsigned int max_depth = 0, depth = 0, frame = 0;
struct nfp_insn_meta *ret_insn[MAX_CALL_FRAMES];
unsigned short frame_depths[MAX_CALL_FRAMES];
unsigned short ret_prog[MAX_CALL_FRAMES];
unsigned short idx = meta->subprog_idx;
/* Inspired from check_max_stack_depth() from kernel verifier.
* Starting from main subprogram, walk all instructions and recursively
* walk all callees that given subprogram can call. Since recursion is
* prevented by the kernel verifier, this algorithm only needs a local
* stack of MAX_CALL_FRAMES to remember callsites.
*/
process_subprog:
frame_depths[frame] = nfp_prog->subprog[idx].stack_depth;
frame_depths[frame] = round_up(frame_depths[frame], STACK_FRAME_ALIGN);
depth += frame_depths[frame];
max_depth = max(max_depth, depth);
continue_subprog:
for (; meta != nfp_prog_last_meta(nfp_prog) && meta->subprog_idx == idx;
meta = nfp_meta_next(meta)) {
if (!is_mbpf_pseudo_call(meta))
continue;
/* We found a call to a subprogram. Remember instruction to
* return to and subprog id.
*/
ret_insn[frame] = nfp_meta_next(meta);
ret_prog[frame] = idx;
/* Find the callee and start processing it. */
meta = nfp_bpf_goto_meta(nfp_prog, meta,
meta->n + 1 + meta->insn.imm, cnt);
idx = meta->subprog_idx;
frame++;
goto process_subprog;
}
/* End of for() loop means the last instruction of the subprog was
* reached. If we popped all stack frames, return; otherwise, go on
* processing remaining instructions from the caller.
*/
if (frame == 0)
return max_depth;
depth -= frame_depths[frame];
frame--;
meta = ret_insn[frame];
idx = ret_prog[frame];
goto continue_subprog;
}
static int nfp_bpf_finalize(struct bpf_verifier_env *env)
{
unsigned int stack_size, stack_needed;
struct bpf_subprog_info *info;
struct nfp_prog *nfp_prog;
struct nfp_net *nn;
int i;
nfp_prog = env->prog->aux->offload->dev_priv;
nfp_prog->subprog_cnt = env->subprog_cnt;
nfp_prog->subprog = kcalloc(nfp_prog->subprog_cnt,
sizeof(nfp_prog->subprog[0]), GFP_KERNEL);
if (!nfp_prog->subprog)
return -ENOMEM;
nfp_assign_subprog_idx_and_regs(env, nfp_prog);
info = env->subprog_info;
for (i = 0; i < nfp_prog->subprog_cnt; i++) {
nfp_prog->subprog[i].stack_depth = info[i].stack_depth;
if (i == 0)
continue;
/* Account for size of return address. */
nfp_prog->subprog[i].stack_depth += REG_WIDTH;
/* Account for size of saved registers, if necessary. */
if (nfp_prog->subprog[i].needs_reg_push)
nfp_prog->subprog[i].stack_depth += BPF_REG_SIZE * 4;
}
nn = netdev_priv(env->prog->aux->offload->netdev);
stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64;
stack_needed = nfp_bpf_get_stack_usage(nfp_prog, env->prog->len);
if (stack_needed > stack_size) {
pr_vlog(env, "stack too large: program %dB > FW stack %dB\n",
stack_needed, stack_size);
return -EOPNOTSUPP;
}
return 0;
}
const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = { const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = {
.insn_hook = nfp_verify_insn, .insn_hook = nfp_verify_insn,
.finalize = nfp_bpf_finalize,
}; };

View File

@ -40,6 +40,8 @@
#include "nfp_net_repr.h" #include "nfp_net_repr.h"
#define NFP_APP_CTRL_MTU_MAX U32_MAX
struct bpf_prog; struct bpf_prog;
struct net_device; struct net_device;
struct netdev_bpf; struct netdev_bpf;
@ -178,6 +180,7 @@ struct nfp_app_type {
* @ctrl: pointer to ctrl vNIC struct * @ctrl: pointer to ctrl vNIC struct
* @reprs: array of pointers to representors * @reprs: array of pointers to representors
* @type: pointer to const application ops and info * @type: pointer to const application ops and info
* @ctrl_mtu: MTU to set on the control vNIC (set in .init())
* @priv: app-specific priv data * @priv: app-specific priv data
*/ */
struct nfp_app { struct nfp_app {
@ -189,6 +192,7 @@ struct nfp_app {
struct nfp_reprs __rcu *reprs[NFP_REPR_TYPE_MAX + 1]; struct nfp_reprs __rcu *reprs[NFP_REPR_TYPE_MAX + 1];
const struct nfp_app_type *type; const struct nfp_app_type *type;
unsigned int ctrl_mtu;
void *priv; void *priv;
}; };

View File

@ -82,6 +82,15 @@
#define OP_BR_BIT_ADDR_LO OP_BR_ADDR_LO #define OP_BR_BIT_ADDR_LO OP_BR_ADDR_LO
#define OP_BR_BIT_ADDR_HI OP_BR_ADDR_HI #define OP_BR_BIT_ADDR_HI OP_BR_ADDR_HI
#define OP_BR_ALU_BASE 0x0e800000000ULL
#define OP_BR_ALU_BASE_MASK 0x0ff80000000ULL
#define OP_BR_ALU_A_SRC 0x000000003ffULL
#define OP_BR_ALU_B_SRC 0x000000ffc00ULL
#define OP_BR_ALU_DEFBR 0x00000300000ULL
#define OP_BR_ALU_IMM_HI 0x0007fc00000ULL
#define OP_BR_ALU_SRC_LMEXTN 0x40000000000ULL
#define OP_BR_ALU_DST_LMEXTN 0x80000000000ULL
static inline bool nfp_is_br(u64 insn) static inline bool nfp_is_br(u64 insn)
{ {
return (insn & OP_BR_BASE_MASK) == OP_BR_BASE || return (insn & OP_BR_BASE_MASK) == OP_BR_BASE ||

View File

@ -3884,10 +3884,20 @@ int nfp_net_init(struct nfp_net *nn)
return err; return err;
/* Set default MTU and Freelist buffer size */ /* Set default MTU and Freelist buffer size */
if (nn->max_mtu < NFP_NET_DEFAULT_MTU) if (!nfp_net_is_data_vnic(nn) && nn->app->ctrl_mtu) {
if (nn->app->ctrl_mtu <= nn->max_mtu) {
nn->dp.mtu = nn->app->ctrl_mtu;
} else {
if (nn->app->ctrl_mtu != NFP_APP_CTRL_MTU_MAX)
nn_warn(nn, "app requested MTU above max supported %u > %u\n",
nn->app->ctrl_mtu, nn->max_mtu);
nn->dp.mtu = nn->max_mtu;
}
} else if (nn->max_mtu < NFP_NET_DEFAULT_MTU) {
nn->dp.mtu = nn->max_mtu; nn->dp.mtu = nn->max_mtu;
else } else {
nn->dp.mtu = NFP_NET_DEFAULT_MTU; nn->dp.mtu = NFP_NET_DEFAULT_MTU;
}
nn->dp.fl_bufsz = nfp_net_calc_fl_bufsz(&nn->dp); nn->dp.fl_bufsz = nfp_net_calc_fl_bufsz(&nn->dp);
if (nfp_app_ctrl_uses_data_vnics(nn->app)) if (nfp_app_ctrl_uses_data_vnics(nn->app))

View File

@ -264,7 +264,6 @@
* %NFP_NET_CFG_BPF_ADDR: DMA address of the buffer with JITed BPF code * %NFP_NET_CFG_BPF_ADDR: DMA address of the buffer with JITed BPF code
*/ */
#define NFP_NET_CFG_BPF_ABI 0x0080 #define NFP_NET_CFG_BPF_ABI 0x0080
#define NFP_NET_BPF_ABI 2
#define NFP_NET_CFG_BPF_CAP 0x0081 #define NFP_NET_CFG_BPF_CAP 0x0081
#define NFP_NET_BPF_CAP_RELO (1 << 0) /* seamless reload */ #define NFP_NET_BPF_CAP_RELO (1 << 0) /* seamless reload */
#define NFP_NET_CFG_BPF_MAX_LEN 0x0082 #define NFP_NET_CFG_BPF_MAX_LEN 0x0082

View File

@ -86,8 +86,14 @@ nsim_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn)
return 0; return 0;
} }
static int nsim_bpf_finalize(struct bpf_verifier_env *env)
{
return 0;
}
static const struct bpf_prog_offload_ops nsim_bpf_analyzer_ops = { static const struct bpf_prog_offload_ops nsim_bpf_analyzer_ops = {
.insn_hook = nsim_bpf_verify_insn, .insn_hook = nsim_bpf_verify_insn,
.finalize = nsim_bpf_finalize,
}; };
static bool nsim_xdp_offload_active(struct netdevsim *ns) static bool nsim_xdp_offload_active(struct netdevsim *ns)

View File

@ -2,6 +2,7 @@
#ifndef _BPF_CGROUP_H #ifndef _BPF_CGROUP_H
#define _BPF_CGROUP_H #define _BPF_CGROUP_H
#include <linux/bpf.h>
#include <linux/errno.h> #include <linux/errno.h>
#include <linux/jump_label.h> #include <linux/jump_label.h>
#include <linux/percpu.h> #include <linux/percpu.h>
@ -22,7 +23,11 @@ struct bpf_cgroup_storage;
extern struct static_key_false cgroup_bpf_enabled_key; extern struct static_key_false cgroup_bpf_enabled_key;
#define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key) #define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key)
DECLARE_PER_CPU(void*, bpf_cgroup_storage); DECLARE_PER_CPU(struct bpf_cgroup_storage*,
bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
#define for_each_cgroup_storage_type(stype) \
for (stype = 0; stype < MAX_BPF_CGROUP_STORAGE_TYPE; stype++)
struct bpf_cgroup_storage_map; struct bpf_cgroup_storage_map;
@ -32,7 +37,10 @@ struct bpf_storage_buffer {
}; };
struct bpf_cgroup_storage { struct bpf_cgroup_storage {
struct bpf_storage_buffer *buf; union {
struct bpf_storage_buffer *buf;
void __percpu *percpu_buf;
};
struct bpf_cgroup_storage_map *map; struct bpf_cgroup_storage_map *map;
struct bpf_cgroup_storage_key key; struct bpf_cgroup_storage_key key;
struct list_head list; struct list_head list;
@ -43,7 +51,7 @@ struct bpf_cgroup_storage {
struct bpf_prog_list { struct bpf_prog_list {
struct list_head node; struct list_head node;
struct bpf_prog *prog; struct bpf_prog *prog;
struct bpf_cgroup_storage *storage; struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE];
}; };
struct bpf_prog_array; struct bpf_prog_array;
@ -101,18 +109,26 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor, int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
short access, enum bpf_attach_type type); short access, enum bpf_attach_type type);
static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage *storage) static inline enum bpf_cgroup_storage_type cgroup_storage_type(
struct bpf_map *map)
{ {
struct bpf_storage_buffer *buf; if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
return BPF_CGROUP_STORAGE_PERCPU;
if (!storage) return BPF_CGROUP_STORAGE_SHARED;
return;
buf = READ_ONCE(storage->buf);
this_cpu_write(bpf_cgroup_storage, &buf->data[0]);
} }
struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog); static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage
*storage[MAX_BPF_CGROUP_STORAGE_TYPE])
{
enum bpf_cgroup_storage_type stype;
for_each_cgroup_storage_type(stype)
this_cpu_write(bpf_cgroup_storage[stype], storage[stype]);
}
struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog,
enum bpf_cgroup_storage_type stype);
void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage); void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage);
void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage, void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
struct cgroup *cgroup, struct cgroup *cgroup,
@ -121,6 +137,10 @@ void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage);
int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *map); int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *map);
void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *map); void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *map);
int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value);
int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
void *value, u64 flags);
/* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */ /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
#define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb) \ #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb) \
({ \ ({ \
@ -265,15 +285,24 @@ static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
return -EINVAL; return -EINVAL;
} }
static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage *storage) {} static inline void bpf_cgroup_storage_set(
struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]) {}
static inline int bpf_cgroup_storage_assign(struct bpf_prog *prog, static inline int bpf_cgroup_storage_assign(struct bpf_prog *prog,
struct bpf_map *map) { return 0; } struct bpf_map *map) { return 0; }
static inline void bpf_cgroup_storage_release(struct bpf_prog *prog, static inline void bpf_cgroup_storage_release(struct bpf_prog *prog,
struct bpf_map *map) {} struct bpf_map *map) {}
static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc( static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(
struct bpf_prog *prog) { return 0; } struct bpf_prog *prog, enum bpf_cgroup_storage_type stype) { return 0; }
static inline void bpf_cgroup_storage_free( static inline void bpf_cgroup_storage_free(
struct bpf_cgroup_storage *storage) {} struct bpf_cgroup_storage *storage) {}
static inline int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key,
void *value) {
return 0;
}
static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
void *key, void *value, u64 flags) {
return 0;
}
#define cgroup_bpf_enabled (0) #define cgroup_bpf_enabled (0)
#define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0) #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0)
@ -293,6 +322,8 @@ static inline void bpf_cgroup_storage_free(
#define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; }) #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; }) #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
#define for_each_cgroup_storage_type(stype) for (; false; )
#endif /* CONFIG_CGROUP_BPF */ #endif /* CONFIG_CGROUP_BPF */
#endif /* _BPF_CGROUP_H */ #endif /* _BPF_CGROUP_H */

View File

@ -154,6 +154,7 @@ enum bpf_arg_type {
ARG_PTR_TO_CTX, /* pointer to context */ ARG_PTR_TO_CTX, /* pointer to context */
ARG_ANYTHING, /* any (initialized) argument is ok */ ARG_ANYTHING, /* any (initialized) argument is ok */
ARG_PTR_TO_SOCKET, /* pointer to bpf_sock */
}; };
/* type of values returned from helper functions */ /* type of values returned from helper functions */
@ -162,6 +163,7 @@ enum bpf_return_type {
RET_VOID, /* function doesn't return anything */ RET_VOID, /* function doesn't return anything */
RET_PTR_TO_MAP_VALUE, /* returns a pointer to map elem value */ RET_PTR_TO_MAP_VALUE, /* returns a pointer to map elem value */
RET_PTR_TO_MAP_VALUE_OR_NULL, /* returns a pointer to map elem value or NULL */ RET_PTR_TO_MAP_VALUE_OR_NULL, /* returns a pointer to map elem value or NULL */
RET_PTR_TO_SOCKET_OR_NULL, /* returns a pointer to a socket or NULL */
}; };
/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
@ -213,6 +215,8 @@ enum bpf_reg_type {
PTR_TO_PACKET, /* reg points to skb->data */ PTR_TO_PACKET, /* reg points to skb->data */
PTR_TO_PACKET_END, /* skb->data + headlen */ PTR_TO_PACKET_END, /* skb->data + headlen */
PTR_TO_FLOW_KEYS, /* reg points to bpf_flow_keys */ PTR_TO_FLOW_KEYS, /* reg points to bpf_flow_keys */
PTR_TO_SOCKET, /* reg points to struct bpf_sock */
PTR_TO_SOCKET_OR_NULL, /* reg points to struct bpf_sock or NULL */
}; };
/* The information passed from prog-specific *_is_valid_access /* The information passed from prog-specific *_is_valid_access
@ -259,6 +263,7 @@ struct bpf_verifier_ops {
struct bpf_prog_offload_ops { struct bpf_prog_offload_ops {
int (*insn_hook)(struct bpf_verifier_env *env, int (*insn_hook)(struct bpf_verifier_env *env,
int insn_idx, int prev_insn_idx); int insn_idx, int prev_insn_idx);
int (*finalize)(struct bpf_verifier_env *env);
}; };
struct bpf_prog_offload { struct bpf_prog_offload {
@ -272,6 +277,14 @@ struct bpf_prog_offload {
u32 jited_len; u32 jited_len;
}; };
enum bpf_cgroup_storage_type {
BPF_CGROUP_STORAGE_SHARED,
BPF_CGROUP_STORAGE_PERCPU,
__BPF_CGROUP_STORAGE_MAX
};
#define MAX_BPF_CGROUP_STORAGE_TYPE __BPF_CGROUP_STORAGE_MAX
struct bpf_prog_aux { struct bpf_prog_aux {
atomic_t refcnt; atomic_t refcnt;
u32 used_map_cnt; u32 used_map_cnt;
@ -289,7 +302,7 @@ struct bpf_prog_aux {
struct bpf_prog *prog; struct bpf_prog *prog;
struct user_struct *user; struct user_struct *user;
u64 load_time; /* ns since boottime */ u64 load_time; /* ns since boottime */
struct bpf_map *cgroup_storage; struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
char name[BPF_OBJ_NAME_LEN]; char name[BPF_OBJ_NAME_LEN];
#ifdef CONFIG_SECURITY #ifdef CONFIG_SECURITY
void *security; void *security;
@ -335,6 +348,11 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src, typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src,
unsigned long off, unsigned long len); unsigned long off, unsigned long len);
typedef u32 (*bpf_convert_ctx_access_t)(enum bpf_access_type type,
const struct bpf_insn *src,
struct bpf_insn *dst,
struct bpf_prog *prog,
u32 *target_size);
u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy); void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy);
@ -358,7 +376,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
*/ */
struct bpf_prog_array_item { struct bpf_prog_array_item {
struct bpf_prog *prog; struct bpf_prog *prog;
struct bpf_cgroup_storage *cgroup_storage; struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
}; };
struct bpf_prog_array { struct bpf_prog_array {
@ -828,4 +846,29 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
void bpf_user_rnd_init_once(void); void bpf_user_rnd_init_once(void);
u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
#if defined(CONFIG_NET)
bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info);
u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
struct bpf_insn *insn_buf,
struct bpf_prog *prog,
u32 *target_size);
#else
static inline bool bpf_sock_is_valid_access(int off, int size,
enum bpf_access_type type,
struct bpf_insn_access_aux *info)
{
return false;
}
static inline u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
struct bpf_insn *insn_buf,
struct bpf_prog *prog,
u32 *target_size)
{
return 0;
}
#endif
#endif /* _LINUX_BPF_H */ #endif /* _LINUX_BPF_H */

View File

@ -43,6 +43,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_ARRAY, cgroup_array_map_ops)
#endif #endif
#ifdef CONFIG_CGROUP_BPF #ifdef CONFIG_CGROUP_BPF
BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, cgroup_storage_map_ops)
#endif #endif
BPF_MAP_TYPE(BPF_MAP_TYPE_HASH, htab_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_HASH, htab_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_HASH, htab_percpu_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_HASH, htab_percpu_map_ops)

View File

@ -58,6 +58,8 @@ struct bpf_reg_state {
* offset, so they can share range knowledge. * offset, so they can share range knowledge.
* For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we * For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we
* came from, when one is tested for != NULL. * came from, when one is tested for != NULL.
* For PTR_TO_SOCKET this is used to share which pointers retain the
* same reference to the socket, to determine proper reference freeing.
*/ */
u32 id; u32 id;
/* For scalar types (SCALAR_VALUE), this represents our knowledge of /* For scalar types (SCALAR_VALUE), this represents our knowledge of
@ -102,6 +104,17 @@ struct bpf_stack_state {
u8 slot_type[BPF_REG_SIZE]; u8 slot_type[BPF_REG_SIZE];
}; };
struct bpf_reference_state {
/* Track each reference created with a unique id, even if the same
* instruction creates the reference multiple times (eg, via CALL).
*/
int id;
/* Instruction where the allocation of this reference occurred. This
* is used purely to inform the user of a reference leak.
*/
int insn_idx;
};
/* state of the program: /* state of the program:
* type of all registers and stack info * type of all registers and stack info
*/ */
@ -119,7 +132,9 @@ struct bpf_func_state {
*/ */
u32 subprogno; u32 subprogno;
/* should be second to last. See copy_func_state() */ /* The following fields should be last. See copy_func_state() */
int acquired_refs;
struct bpf_reference_state *refs;
int allocated_stack; int allocated_stack;
struct bpf_stack_state *stack; struct bpf_stack_state *stack;
}; };
@ -131,6 +146,17 @@ struct bpf_verifier_state {
u32 curframe; u32 curframe;
}; };
#define bpf_get_spilled_reg(slot, frame) \
(((slot < frame->allocated_stack / BPF_REG_SIZE) && \
(frame->stack[slot].slot_type[0] == STACK_SPILL)) \
? &frame->stack[slot].spilled_ptr : NULL)
/* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */
#define bpf_for_each_spilled_reg(iter, frame, reg) \
for (iter = 0, reg = bpf_get_spilled_reg(iter, frame); \
iter < frame->allocated_stack / BPF_REG_SIZE; \
iter++, reg = bpf_get_spilled_reg(iter, frame))
/* linked list of verifier states used to prune search */ /* linked list of verifier states used to prune search */
struct bpf_verifier_state_list { struct bpf_verifier_state_list {
struct bpf_verifier_state state; struct bpf_verifier_state state;
@ -204,15 +230,21 @@ __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env, __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
const char *fmt, ...); const char *fmt, ...);
static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env) static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
{ {
struct bpf_verifier_state *cur = env->cur_state; struct bpf_verifier_state *cur = env->cur_state;
return cur->frame[cur->curframe]->regs; return cur->frame[cur->curframe];
}
static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
{
return cur_func(env)->regs;
} }
int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env); int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env, int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
int insn_idx, int prev_insn_idx); int insn_idx, int prev_insn_idx);
int bpf_prog_offload_finalize(struct bpf_verifier_env *env);
#endif /* _LINUX_BPF_VERIFIER_H */ #endif /* _LINUX_BPF_VERIFIER_H */

View File

@ -609,6 +609,9 @@ struct netdev_queue {
/* Subordinate device that the queue has been assigned to */ /* Subordinate device that the queue has been assigned to */
struct net_device *sb_dev; struct net_device *sb_dev;
#ifdef CONFIG_XDP_SOCKETS
struct xdp_umem *umem;
#endif
/* /*
* write-mostly part * write-mostly part
*/ */
@ -738,6 +741,9 @@ struct netdev_rx_queue {
struct kobject kobj; struct kobject kobj;
struct net_device *dev; struct net_device *dev;
struct xdp_rxq_info xdp_rxq; struct xdp_rxq_info xdp_rxq;
#ifdef CONFIG_XDP_SOCKETS
struct xdp_umem *umem;
#endif
} ____cacheline_aligned_in_smp; } ____cacheline_aligned_in_smp;
/* /*

View File

@ -86,6 +86,7 @@ struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries);
struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem, struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
struct xdp_umem_fq_reuse *newq); struct xdp_umem_fq_reuse *newq);
void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq); void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq);
struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id);
static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr) static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
{ {
@ -183,6 +184,12 @@ static inline void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq)
{ {
} }
static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
u16 queue_id)
{
return NULL;
}
static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr) static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
{ {
return NULL; return NULL;

View File

@ -127,6 +127,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_SOCKHASH, BPF_MAP_TYPE_SOCKHASH,
BPF_MAP_TYPE_CGROUP_STORAGE, BPF_MAP_TYPE_CGROUP_STORAGE,
BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
}; };
enum bpf_prog_type { enum bpf_prog_type {
@ -2143,6 +2144,77 @@ union bpf_attr {
* request in the skb. * request in the skb.
* Return * Return
* 0 on success, or a negative error in case of failure. * 0 on success, or a negative error in case of failure.
*
* struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags)
* Description
* Look for TCP socket matching *tuple*, optionally in a child
* network namespace *netns*. The return value must be checked,
* and if non-NULL, released via **bpf_sk_release**\ ().
*
* The *ctx* should point to the context of the program, such as
* the skb or socket (depending on the hook in use). This is used
* to determine the base network namespace for the lookup.
*
* *tuple_size* must be one of:
*
* **sizeof**\ (*tuple*\ **->ipv4**)
* Look for an IPv4 socket.
* **sizeof**\ (*tuple*\ **->ipv6**)
* Look for an IPv6 socket.
*
* If the *netns* is zero, then the socket lookup table in the
* netns associated with the *ctx* will be used. For the TC hooks,
* this in the netns of the device in the skb. For socket hooks,
* this in the netns of the socket. If *netns* is non-zero, then
* it specifies the ID of the netns relative to the netns
* associated with the *ctx*.
*
* All values for *flags* are reserved for future usage, and must
* be left at zero.
*
* This helper is available only if the kernel was compiled with
* **CONFIG_NET** configuration option.
* Return
* Pointer to *struct bpf_sock*, or NULL in case of failure.
*
* struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags)
* Description
* Look for UDP socket matching *tuple*, optionally in a child
* network namespace *netns*. The return value must be checked,
* and if non-NULL, released via **bpf_sk_release**\ ().
*
* The *ctx* should point to the context of the program, such as
* the skb or socket (depending on the hook in use). This is used
* to determine the base network namespace for the lookup.
*
* *tuple_size* must be one of:
*
* **sizeof**\ (*tuple*\ **->ipv4**)
* Look for an IPv4 socket.
* **sizeof**\ (*tuple*\ **->ipv6**)
* Look for an IPv6 socket.
*
* If the *netns* is zero, then the socket lookup table in the
* netns associated with the *ctx* will be used. For the TC hooks,
* this in the netns of the device in the skb. For socket hooks,
* this in the netns of the socket. If *netns* is non-zero, then
* it specifies the ID of the netns relative to the netns
* associated with the *ctx*.
*
* All values for *flags* are reserved for future usage, and must
* be left at zero.
*
* This helper is available only if the kernel was compiled with
* **CONFIG_NET** configuration option.
* Return
* Pointer to *struct bpf_sock*, or NULL in case of failure.
*
* int bpf_sk_release(struct bpf_sock *sk)
* Description
* Release the reference held by *sock*. *sock* must be a non-NULL
* pointer that was returned from bpf_sk_lookup_xxx\ ().
* Return
* 0 on success, or a negative error in case of failure.
*/ */
#define __BPF_FUNC_MAPPER(FN) \ #define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \ FN(unspec), \
@ -2228,7 +2300,10 @@ union bpf_attr {
FN(get_current_cgroup_id), \ FN(get_current_cgroup_id), \
FN(get_local_storage), \ FN(get_local_storage), \
FN(sk_select_reuseport), \ FN(sk_select_reuseport), \
FN(skb_ancestor_cgroup_id), FN(skb_ancestor_cgroup_id), \
FN(sk_lookup_tcp), \
FN(sk_lookup_udp), \
FN(sk_release),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper /* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call * function eBPF program intends to call
@ -2398,6 +2473,23 @@ struct bpf_sock {
*/ */
}; };
struct bpf_sock_tuple {
union {
struct {
__be32 saddr;
__be32 daddr;
__be16 sport;
__be16 dport;
} ipv4;
struct {
__be32 saddr[4];
__be32 daddr[4];
__be16 sport;
__be16 dport;
} ipv6;
};
};
#define XDP_PACKET_HEADROOM 256 #define XDP_PACKET_HEADROOM 256
/* User return codes for XDP prog type. /* User return codes for XDP prog type.

View File

@ -25,6 +25,7 @@ EXPORT_SYMBOL(cgroup_bpf_enabled_key);
*/ */
void cgroup_bpf_put(struct cgroup *cgrp) void cgroup_bpf_put(struct cgroup *cgrp)
{ {
enum bpf_cgroup_storage_type stype;
unsigned int type; unsigned int type;
for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) { for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) {
@ -34,8 +35,10 @@ void cgroup_bpf_put(struct cgroup *cgrp)
list_for_each_entry_safe(pl, tmp, progs, node) { list_for_each_entry_safe(pl, tmp, progs, node) {
list_del(&pl->node); list_del(&pl->node);
bpf_prog_put(pl->prog); bpf_prog_put(pl->prog);
bpf_cgroup_storage_unlink(pl->storage); for_each_cgroup_storage_type(stype) {
bpf_cgroup_storage_free(pl->storage); bpf_cgroup_storage_unlink(pl->storage[stype]);
bpf_cgroup_storage_free(pl->storage[stype]);
}
kfree(pl); kfree(pl);
static_branch_dec(&cgroup_bpf_enabled_key); static_branch_dec(&cgroup_bpf_enabled_key);
} }
@ -97,6 +100,7 @@ static int compute_effective_progs(struct cgroup *cgrp,
enum bpf_attach_type type, enum bpf_attach_type type,
struct bpf_prog_array __rcu **array) struct bpf_prog_array __rcu **array)
{ {
enum bpf_cgroup_storage_type stype;
struct bpf_prog_array *progs; struct bpf_prog_array *progs;
struct bpf_prog_list *pl; struct bpf_prog_list *pl;
struct cgroup *p = cgrp; struct cgroup *p = cgrp;
@ -125,7 +129,9 @@ static int compute_effective_progs(struct cgroup *cgrp,
continue; continue;
progs->items[cnt].prog = pl->prog; progs->items[cnt].prog = pl->prog;
progs->items[cnt].cgroup_storage = pl->storage; for_each_cgroup_storage_type(stype)
progs->items[cnt].cgroup_storage[stype] =
pl->storage[stype];
cnt++; cnt++;
} }
} while ((p = cgroup_parent(p))); } while ((p = cgroup_parent(p)));
@ -232,7 +238,9 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
{ {
struct list_head *progs = &cgrp->bpf.progs[type]; struct list_head *progs = &cgrp->bpf.progs[type];
struct bpf_prog *old_prog = NULL; struct bpf_prog *old_prog = NULL;
struct bpf_cgroup_storage *storage, *old_storage = NULL; struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
*old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {NULL};
enum bpf_cgroup_storage_type stype;
struct bpf_prog_list *pl; struct bpf_prog_list *pl;
bool pl_was_allocated; bool pl_was_allocated;
int err; int err;
@ -254,34 +262,44 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS) if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS)
return -E2BIG; return -E2BIG;
storage = bpf_cgroup_storage_alloc(prog); for_each_cgroup_storage_type(stype) {
if (IS_ERR(storage)) storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
return -ENOMEM; if (IS_ERR(storage[stype])) {
storage[stype] = NULL;
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return -ENOMEM;
}
}
if (flags & BPF_F_ALLOW_MULTI) { if (flags & BPF_F_ALLOW_MULTI) {
list_for_each_entry(pl, progs, node) { list_for_each_entry(pl, progs, node) {
if (pl->prog == prog) { if (pl->prog == prog) {
/* disallow attaching the same prog twice */ /* disallow attaching the same prog twice */
bpf_cgroup_storage_free(storage); for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return -EINVAL; return -EINVAL;
} }
} }
pl = kmalloc(sizeof(*pl), GFP_KERNEL); pl = kmalloc(sizeof(*pl), GFP_KERNEL);
if (!pl) { if (!pl) {
bpf_cgroup_storage_free(storage); for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return -ENOMEM; return -ENOMEM;
} }
pl_was_allocated = true; pl_was_allocated = true;
pl->prog = prog; pl->prog = prog;
pl->storage = storage; for_each_cgroup_storage_type(stype)
pl->storage[stype] = storage[stype];
list_add_tail(&pl->node, progs); list_add_tail(&pl->node, progs);
} else { } else {
if (list_empty(progs)) { if (list_empty(progs)) {
pl = kmalloc(sizeof(*pl), GFP_KERNEL); pl = kmalloc(sizeof(*pl), GFP_KERNEL);
if (!pl) { if (!pl) {
bpf_cgroup_storage_free(storage); for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return -ENOMEM; return -ENOMEM;
} }
pl_was_allocated = true; pl_was_allocated = true;
@ -289,12 +307,15 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
} else { } else {
pl = list_first_entry(progs, typeof(*pl), node); pl = list_first_entry(progs, typeof(*pl), node);
old_prog = pl->prog; old_prog = pl->prog;
old_storage = pl->storage; for_each_cgroup_storage_type(stype) {
bpf_cgroup_storage_unlink(old_storage); old_storage[stype] = pl->storage[stype];
bpf_cgroup_storage_unlink(old_storage[stype]);
}
pl_was_allocated = false; pl_was_allocated = false;
} }
pl->prog = prog; pl->prog = prog;
pl->storage = storage; for_each_cgroup_storage_type(stype)
pl->storage[stype] = storage[stype];
} }
cgrp->bpf.flags[type] = flags; cgrp->bpf.flags[type] = flags;
@ -304,21 +325,27 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
goto cleanup; goto cleanup;
static_branch_inc(&cgroup_bpf_enabled_key); static_branch_inc(&cgroup_bpf_enabled_key);
if (old_storage) for_each_cgroup_storage_type(stype) {
bpf_cgroup_storage_free(old_storage); if (!old_storage[stype])
continue;
bpf_cgroup_storage_free(old_storage[stype]);
}
if (old_prog) { if (old_prog) {
bpf_prog_put(old_prog); bpf_prog_put(old_prog);
static_branch_dec(&cgroup_bpf_enabled_key); static_branch_dec(&cgroup_bpf_enabled_key);
} }
bpf_cgroup_storage_link(storage, cgrp, type); for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_link(storage[stype], cgrp, type);
return 0; return 0;
cleanup: cleanup:
/* and cleanup the prog list */ /* and cleanup the prog list */
pl->prog = old_prog; pl->prog = old_prog;
bpf_cgroup_storage_free(pl->storage); for_each_cgroup_storage_type(stype) {
pl->storage = old_storage; bpf_cgroup_storage_free(pl->storage[stype]);
bpf_cgroup_storage_link(old_storage, cgrp, type); pl->storage[stype] = old_storage[stype];
bpf_cgroup_storage_link(old_storage[stype], cgrp, type);
}
if (pl_was_allocated) { if (pl_was_allocated) {
list_del(&pl->node); list_del(&pl->node);
kfree(pl); kfree(pl);
@ -339,6 +366,7 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
enum bpf_attach_type type, u32 unused_flags) enum bpf_attach_type type, u32 unused_flags)
{ {
struct list_head *progs = &cgrp->bpf.progs[type]; struct list_head *progs = &cgrp->bpf.progs[type];
enum bpf_cgroup_storage_type stype;
u32 flags = cgrp->bpf.flags[type]; u32 flags = cgrp->bpf.flags[type];
struct bpf_prog *old_prog = NULL; struct bpf_prog *old_prog = NULL;
struct bpf_prog_list *pl; struct bpf_prog_list *pl;
@ -385,8 +413,10 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
/* now can actually delete it from this cgroup list */ /* now can actually delete it from this cgroup list */
list_del(&pl->node); list_del(&pl->node);
bpf_cgroup_storage_unlink(pl->storage); for_each_cgroup_storage_type(stype) {
bpf_cgroup_storage_free(pl->storage); bpf_cgroup_storage_unlink(pl->storage[stype]);
bpf_cgroup_storage_free(pl->storage[stype]);
}
kfree(pl); kfree(pl);
if (list_empty(progs)) if (list_empty(progs))
/* last program was detached, reset flags to zero */ /* last program was detached, reset flags to zero */
@ -677,6 +707,8 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_get_current_uid_gid_proto; return &bpf_get_current_uid_gid_proto;
case BPF_FUNC_get_local_storage: case BPF_FUNC_get_local_storage:
return &bpf_get_local_storage_proto; return &bpf_get_local_storage_proto;
case BPF_FUNC_get_current_cgroup_id:
return &bpf_get_current_cgroup_id_proto;
case BPF_FUNC_trace_printk: case BPF_FUNC_trace_printk:
if (capable(CAP_SYS_ADMIN)) if (capable(CAP_SYS_ADMIN))
return bpf_get_trace_printk_proto(); return bpf_get_trace_printk_proto();

View File

@ -194,16 +194,28 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
.ret_type = RET_INTEGER, .ret_type = RET_INTEGER,
}; };
DECLARE_PER_CPU(void*, bpf_cgroup_storage); #ifdef CONFIG_CGROUP_BPF
DECLARE_PER_CPU(struct bpf_cgroup_storage*,
bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags) BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags)
{ {
/* map and flags arguments are not used now, /* flags argument is not used now,
* but provide an ability to extend the API * but provides an ability to extend the API.
* for other types of local storages. * verifier checks that its value is correct.
* verifier checks that their values are correct.
*/ */
return (unsigned long) this_cpu_read(bpf_cgroup_storage); enum bpf_cgroup_storage_type stype = cgroup_storage_type(map);
struct bpf_cgroup_storage *storage;
void *ptr;
storage = this_cpu_read(bpf_cgroup_storage[stype]);
if (stype == BPF_CGROUP_STORAGE_SHARED)
ptr = &READ_ONCE(storage->buf)->data[0];
else
ptr = this_cpu_ptr(storage->percpu_buf);
return (unsigned long)ptr;
} }
const struct bpf_func_proto bpf_get_local_storage_proto = { const struct bpf_func_proto bpf_get_local_storage_proto = {
@ -214,3 +226,4 @@ const struct bpf_func_proto bpf_get_local_storage_proto = {
.arg2_type = ARG_ANYTHING, .arg2_type = ARG_ANYTHING,
}; };
#endif #endif
#endif

View File

@ -7,7 +7,8 @@
#include <linux/rbtree.h> #include <linux/rbtree.h>
#include <linux/slab.h> #include <linux/slab.h>
DEFINE_PER_CPU(void*, bpf_cgroup_storage); DEFINE_PER_CPU(struct bpf_cgroup_storage*,
bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
#ifdef CONFIG_CGROUP_BPF #ifdef CONFIG_CGROUP_BPF
@ -151,6 +152,71 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *_key,
return 0; return 0;
} }
int bpf_percpu_cgroup_storage_copy(struct bpf_map *_map, void *_key,
void *value)
{
struct bpf_cgroup_storage_map *map = map_to_storage(_map);
struct bpf_cgroup_storage_key *key = _key;
struct bpf_cgroup_storage *storage;
int cpu, off = 0;
u32 size;
rcu_read_lock();
storage = cgroup_storage_lookup(map, key, false);
if (!storage) {
rcu_read_unlock();
return -ENOENT;
}
/* per_cpu areas are zero-filled and bpf programs can only
* access 'value_size' of them, so copying rounded areas
* will not leak any kernel data
*/
size = round_up(_map->value_size, 8);
for_each_possible_cpu(cpu) {
bpf_long_memcpy(value + off,
per_cpu_ptr(storage->percpu_buf, cpu), size);
off += size;
}
rcu_read_unlock();
return 0;
}
int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *_key,
void *value, u64 map_flags)
{
struct bpf_cgroup_storage_map *map = map_to_storage(_map);
struct bpf_cgroup_storage_key *key = _key;
struct bpf_cgroup_storage *storage;
int cpu, off = 0;
u32 size;
if (map_flags != BPF_ANY && map_flags != BPF_EXIST)
return -EINVAL;
rcu_read_lock();
storage = cgroup_storage_lookup(map, key, false);
if (!storage) {
rcu_read_unlock();
return -ENOENT;
}
/* the user space will provide round_up(value_size, 8) bytes that
* will be copied into per-cpu area. bpf programs can only access
* value_size of it. During lookup the same extra bytes will be
* returned or zeros which were zero-filled by percpu_alloc,
* so no kernel data leaks possible
*/
size = round_up(_map->value_size, 8);
for_each_possible_cpu(cpu) {
bpf_long_memcpy(per_cpu_ptr(storage->percpu_buf, cpu),
value + off, size);
off += size;
}
rcu_read_unlock();
return 0;
}
static int cgroup_storage_get_next_key(struct bpf_map *_map, void *_key, static int cgroup_storage_get_next_key(struct bpf_map *_map, void *_key,
void *_next_key) void *_next_key)
{ {
@ -254,6 +320,7 @@ const struct bpf_map_ops cgroup_storage_map_ops = {
int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *_map) int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *_map)
{ {
enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map);
struct bpf_cgroup_storage_map *map = map_to_storage(_map); struct bpf_cgroup_storage_map *map = map_to_storage(_map);
int ret = -EBUSY; int ret = -EBUSY;
@ -261,11 +328,12 @@ int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *_map)
if (map->prog && map->prog != prog) if (map->prog && map->prog != prog)
goto unlock; goto unlock;
if (prog->aux->cgroup_storage && prog->aux->cgroup_storage != _map) if (prog->aux->cgroup_storage[stype] &&
prog->aux->cgroup_storage[stype] != _map)
goto unlock; goto unlock;
map->prog = prog; map->prog = prog;
prog->aux->cgroup_storage = _map; prog->aux->cgroup_storage[stype] = _map;
ret = 0; ret = 0;
unlock: unlock:
spin_unlock_bh(&map->lock); spin_unlock_bh(&map->lock);
@ -275,70 +343,117 @@ unlock:
void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *_map) void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *_map)
{ {
enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map);
struct bpf_cgroup_storage_map *map = map_to_storage(_map); struct bpf_cgroup_storage_map *map = map_to_storage(_map);
spin_lock_bh(&map->lock); spin_lock_bh(&map->lock);
if (map->prog == prog) { if (map->prog == prog) {
WARN_ON(prog->aux->cgroup_storage != _map); WARN_ON(prog->aux->cgroup_storage[stype] != _map);
map->prog = NULL; map->prog = NULL;
prog->aux->cgroup_storage = NULL; prog->aux->cgroup_storage[stype] = NULL;
} }
spin_unlock_bh(&map->lock); spin_unlock_bh(&map->lock);
} }
struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog) static size_t bpf_cgroup_storage_calculate_size(struct bpf_map *map, u32 *pages)
{
size_t size;
if (cgroup_storage_type(map) == BPF_CGROUP_STORAGE_SHARED) {
size = sizeof(struct bpf_storage_buffer) + map->value_size;
*pages = round_up(sizeof(struct bpf_cgroup_storage) + size,
PAGE_SIZE) >> PAGE_SHIFT;
} else {
size = map->value_size;
*pages = round_up(round_up(size, 8) * num_possible_cpus(),
PAGE_SIZE) >> PAGE_SHIFT;
}
return size;
}
struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog,
enum bpf_cgroup_storage_type stype)
{ {
struct bpf_cgroup_storage *storage; struct bpf_cgroup_storage *storage;
struct bpf_map *map; struct bpf_map *map;
gfp_t flags;
size_t size;
u32 pages; u32 pages;
map = prog->aux->cgroup_storage; map = prog->aux->cgroup_storage[stype];
if (!map) if (!map)
return NULL; return NULL;
pages = round_up(sizeof(struct bpf_cgroup_storage) + size = bpf_cgroup_storage_calculate_size(map, &pages);
sizeof(struct bpf_storage_buffer) +
map->value_size, PAGE_SIZE) >> PAGE_SHIFT;
if (bpf_map_charge_memlock(map, pages)) if (bpf_map_charge_memlock(map, pages))
return ERR_PTR(-EPERM); return ERR_PTR(-EPERM);
storage = kmalloc_node(sizeof(struct bpf_cgroup_storage), storage = kmalloc_node(sizeof(struct bpf_cgroup_storage),
__GFP_ZERO | GFP_USER, map->numa_node); __GFP_ZERO | GFP_USER, map->numa_node);
if (!storage) { if (!storage)
bpf_map_uncharge_memlock(map, pages); goto enomem;
return ERR_PTR(-ENOMEM);
}
storage->buf = kmalloc_node(sizeof(struct bpf_storage_buffer) + flags = __GFP_ZERO | GFP_USER;
map->value_size, __GFP_ZERO | GFP_USER,
map->numa_node); if (stype == BPF_CGROUP_STORAGE_SHARED) {
if (!storage->buf) { storage->buf = kmalloc_node(size, flags, map->numa_node);
bpf_map_uncharge_memlock(map, pages); if (!storage->buf)
kfree(storage); goto enomem;
return ERR_PTR(-ENOMEM); } else {
storage->percpu_buf = __alloc_percpu_gfp(size, 8, flags);
if (!storage->percpu_buf)
goto enomem;
} }
storage->map = (struct bpf_cgroup_storage_map *)map; storage->map = (struct bpf_cgroup_storage_map *)map;
return storage; return storage;
enomem:
bpf_map_uncharge_memlock(map, pages);
kfree(storage);
return ERR_PTR(-ENOMEM);
}
static void free_shared_cgroup_storage_rcu(struct rcu_head *rcu)
{
struct bpf_cgroup_storage *storage =
container_of(rcu, struct bpf_cgroup_storage, rcu);
kfree(storage->buf);
kfree(storage);
}
static void free_percpu_cgroup_storage_rcu(struct rcu_head *rcu)
{
struct bpf_cgroup_storage *storage =
container_of(rcu, struct bpf_cgroup_storage, rcu);
free_percpu(storage->percpu_buf);
kfree(storage);
} }
void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage) void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage)
{ {
u32 pages; enum bpf_cgroup_storage_type stype;
struct bpf_map *map; struct bpf_map *map;
u32 pages;
if (!storage) if (!storage)
return; return;
map = &storage->map->map; map = &storage->map->map;
pages = round_up(sizeof(struct bpf_cgroup_storage) +
sizeof(struct bpf_storage_buffer) + bpf_cgroup_storage_calculate_size(map, &pages);
map->value_size, PAGE_SIZE) >> PAGE_SHIFT;
bpf_map_uncharge_memlock(map, pages); bpf_map_uncharge_memlock(map, pages);
kfree_rcu(storage->buf, rcu); stype = cgroup_storage_type(map);
kfree_rcu(storage, rcu); if (stype == BPF_CGROUP_STORAGE_SHARED)
call_rcu(&storage->rcu, free_shared_cgroup_storage_rcu);
else
call_rcu(&storage->rcu, free_percpu_cgroup_storage_rcu);
} }
void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage, void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,

View File

@ -24,7 +24,8 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
* in the verifier is not enough. * in the verifier is not enough.
*/ */
if (inner_map->map_type == BPF_MAP_TYPE_PROG_ARRAY || if (inner_map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
inner_map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE) { inner_map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE ||
inner_map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
fdput(f); fdput(f);
return ERR_PTR(-ENOTSUPP); return ERR_PTR(-ENOTSUPP);
} }

View File

@ -172,6 +172,24 @@ int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
return ret; return ret;
} }
int bpf_prog_offload_finalize(struct bpf_verifier_env *env)
{
struct bpf_prog_offload *offload;
int ret = -ENODEV;
down_read(&bpf_devs_lock);
offload = env->prog->aux->offload;
if (offload) {
if (offload->dev_ops->finalize)
ret = offload->dev_ops->finalize(env);
else
ret = 0;
}
up_read(&bpf_devs_lock);
return ret;
}
static void __bpf_prog_offload_destroy(struct bpf_prog *prog) static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
{ {
struct bpf_prog_offload *offload = prog->aux->offload; struct bpf_prog_offload *offload = prog->aux->offload;

View File

@ -686,7 +686,8 @@ static int map_lookup_elem(union bpf_attr *attr)
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
value_size = round_up(map->value_size, 8) * num_possible_cpus(); value_size = round_up(map->value_size, 8) * num_possible_cpus();
else if (IS_FD_MAP(map)) else if (IS_FD_MAP(map))
value_size = sizeof(u32); value_size = sizeof(u32);
@ -705,6 +706,8 @@ static int map_lookup_elem(union bpf_attr *attr)
err = bpf_percpu_hash_copy(map, key, value); err = bpf_percpu_hash_copy(map, key, value);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_copy(map, key, value); err = bpf_percpu_array_copy(map, key, value);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
err = bpf_percpu_cgroup_storage_copy(map, key, value);
} else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) { } else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
err = bpf_stackmap_copy(map, key, value); err = bpf_stackmap_copy(map, key, value);
} else if (IS_FD_ARRAY(map)) { } else if (IS_FD_ARRAY(map)) {
@ -774,7 +777,8 @@ static int map_update_elem(union bpf_attr *attr)
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
value_size = round_up(map->value_size, 8) * num_possible_cpus(); value_size = round_up(map->value_size, 8) * num_possible_cpus();
else else
value_size = map->value_size; value_size = map->value_size;
@ -809,6 +813,9 @@ static int map_update_elem(union bpf_attr *attr)
err = bpf_percpu_hash_update(map, key, value, attr->flags); err = bpf_percpu_hash_update(map, key, value, attr->flags);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_update(map, key, value, attr->flags); err = bpf_percpu_array_update(map, key, value, attr->flags);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
err = bpf_percpu_cgroup_storage_update(map, key, value,
attr->flags);
} else if (IS_FD_ARRAY(map)) { } else if (IS_FD_ARRAY(map)) {
rcu_read_lock(); rcu_read_lock();
err = bpf_fd_array_map_update_elem(map, f.file, key, value, err = bpf_fd_array_map_update_elem(map, f.file, key, value,
@ -988,10 +995,15 @@ static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog)
/* drop refcnt on maps used by eBPF program and free auxilary data */ /* drop refcnt on maps used by eBPF program and free auxilary data */
static void free_used_maps(struct bpf_prog_aux *aux) static void free_used_maps(struct bpf_prog_aux *aux)
{ {
enum bpf_cgroup_storage_type stype;
int i; int i;
if (aux->cgroup_storage) for_each_cgroup_storage_type(stype) {
bpf_cgroup_storage_release(aux->prog, aux->cgroup_storage); if (!aux->cgroup_storage[stype])
continue;
bpf_cgroup_storage_release(aux->prog,
aux->cgroup_storage[stype]);
}
for (i = 0; i < aux->used_map_cnt; i++) for (i = 0; i < aux->used_map_cnt; i++)
bpf_map_put(aux->used_maps[i]); bpf_map_put(aux->used_maps[i]);

File diff suppressed because it is too large Load Diff

View File

@ -6494,6 +6494,7 @@ static struct sk_buff *populate_skb(char *buf, int size)
skb->queue_mapping = SKB_QUEUE_MAP; skb->queue_mapping = SKB_QUEUE_MAP;
skb->vlan_tci = SKB_VLAN_TCI; skb->vlan_tci = SKB_VLAN_TCI;
skb->vlan_proto = htons(ETH_P_IP); skb->vlan_proto = htons(ETH_P_IP);
dev_net_set(&dev, &init_net);
skb->dev = &dev; skb->dev = &dev;
skb->dev->ifindex = SKB_DEV_IFINDEX; skb->dev->ifindex = SKB_DEV_IFINDEX;
skb->dev->type = SKB_DEV_TYPE; skb->dev->type = SKB_DEV_TYPE;

View File

@ -12,7 +12,7 @@
#include <linux/sched/signal.h> #include <linux/sched/signal.h>
static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx, static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
struct bpf_cgroup_storage *storage) struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
{ {
u32 ret; u32 ret;
@ -28,13 +28,20 @@ static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *time) static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *time)
{ {
struct bpf_cgroup_storage *storage = NULL; struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 0 };
enum bpf_cgroup_storage_type stype;
u64 time_start, time_spent = 0; u64 time_start, time_spent = 0;
u32 ret = 0, i; u32 ret = 0, i;
storage = bpf_cgroup_storage_alloc(prog); for_each_cgroup_storage_type(stype) {
if (IS_ERR(storage)) storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
return PTR_ERR(storage); if (IS_ERR(storage[stype])) {
storage[stype] = NULL;
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return -ENOMEM;
}
}
if (!repeat) if (!repeat)
repeat = 1; repeat = 1;
@ -53,7 +60,8 @@ static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *time)
do_div(time_spent, repeat); do_div(time_spent, repeat);
*time = time_spent > U32_MAX ? U32_MAX : (u32)time_spent; *time = time_spent > U32_MAX ? U32_MAX : (u32)time_spent;
bpf_cgroup_storage_free(storage); for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return ret; return ret;
} }

View File

@ -27,6 +27,7 @@
#include <linux/rtnetlink.h> #include <linux/rtnetlink.h>
#include <linux/sched/signal.h> #include <linux/sched/signal.h>
#include <linux/net.h> #include <linux/net.h>
#include <net/xdp_sock.h>
/* /*
* Some useful ethtool_ops methods that're device independent. * Some useful ethtool_ops methods that're device independent.
@ -1662,8 +1663,10 @@ static noinline_for_stack int ethtool_get_channels(struct net_device *dev,
static noinline_for_stack int ethtool_set_channels(struct net_device *dev, static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
void __user *useraddr) void __user *useraddr)
{ {
struct ethtool_channels channels, max = { .cmd = ETHTOOL_GCHANNELS }; struct ethtool_channels channels, curr = { .cmd = ETHTOOL_GCHANNELS };
u16 from_channel, to_channel;
u32 max_rx_in_use = 0; u32 max_rx_in_use = 0;
unsigned int i;
if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels) if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
return -EOPNOTSUPP; return -EOPNOTSUPP;
@ -1671,13 +1674,13 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
if (copy_from_user(&channels, useraddr, sizeof(channels))) if (copy_from_user(&channels, useraddr, sizeof(channels)))
return -EFAULT; return -EFAULT;
dev->ethtool_ops->get_channels(dev, &max); dev->ethtool_ops->get_channels(dev, &curr);
/* ensure new counts are within the maximums */ /* ensure new counts are within the maximums */
if ((channels.rx_count > max.max_rx) || if (channels.rx_count > curr.max_rx ||
(channels.tx_count > max.max_tx) || channels.tx_count > curr.max_tx ||
(channels.combined_count > max.max_combined) || channels.combined_count > curr.max_combined ||
(channels.other_count > max.max_other)) channels.other_count > curr.max_other)
return -EINVAL; return -EINVAL;
/* ensure the new Rx count fits within the configured Rx flow /* ensure the new Rx count fits within the configured Rx flow
@ -1687,6 +1690,14 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
(channels.combined_count + channels.rx_count) <= max_rx_in_use) (channels.combined_count + channels.rx_count) <= max_rx_in_use)
return -EINVAL; return -EINVAL;
/* Disabling channels, query zero-copy AF_XDP sockets */
from_channel = channels.combined_count +
min(channels.rx_count, channels.tx_count);
to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count);
for (i = from_channel; i < to_channel; i++)
if (xdp_get_umem_from_qid(dev, i))
return -EINVAL;
return dev->ethtool_ops->set_channels(dev, &channels); return dev->ethtool_ops->set_channels(dev, &channels);
} }

View File

@ -58,13 +58,17 @@
#include <net/busy_poll.h> #include <net/busy_poll.h>
#include <net/tcp.h> #include <net/tcp.h>
#include <net/xfrm.h> #include <net/xfrm.h>
#include <net/udp.h>
#include <linux/bpf_trace.h> #include <linux/bpf_trace.h>
#include <net/xdp_sock.h> #include <net/xdp_sock.h>
#include <linux/inetdevice.h> #include <linux/inetdevice.h>
#include <net/inet_hashtables.h>
#include <net/inet6_hashtables.h>
#include <net/ip_fib.h> #include <net/ip_fib.h>
#include <net/flow.h> #include <net/flow.h>
#include <net/arp.h> #include <net/arp.h>
#include <net/ipv6.h> #include <net/ipv6.h>
#include <net/net_namespace.h>
#include <linux/seg6_local.h> #include <linux/seg6_local.h>
#include <net/seg6.h> #include <net/seg6.h>
#include <net/seg6_local.h> #include <net/seg6_local.h>
@ -4813,6 +4817,143 @@ static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
}; };
#endif /* CONFIG_IPV6_SEG6_BPF */ #endif /* CONFIG_IPV6_SEG6_BPF */
#ifdef CONFIG_INET
static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
struct sk_buff *skb, u8 family, u8 proto)
{
int dif = skb->dev->ifindex;
bool refcounted = false;
struct sock *sk = NULL;
if (family == AF_INET) {
__be32 src4 = tuple->ipv4.saddr;
__be32 dst4 = tuple->ipv4.daddr;
int sdif = inet_sdif(skb);
if (proto == IPPROTO_TCP)
sk = __inet_lookup(net, &tcp_hashinfo, skb, 0,
src4, tuple->ipv4.sport,
dst4, tuple->ipv4.dport,
dif, sdif, &refcounted);
else
sk = __udp4_lib_lookup(net, src4, tuple->ipv4.sport,
dst4, tuple->ipv4.dport,
dif, sdif, &udp_table, skb);
#if IS_REACHABLE(CONFIG_IPV6)
} else {
struct in6_addr *src6 = (struct in6_addr *)&tuple->ipv6.saddr;
struct in6_addr *dst6 = (struct in6_addr *)&tuple->ipv6.daddr;
int sdif = inet6_sdif(skb);
if (proto == IPPROTO_TCP)
sk = __inet6_lookup(net, &tcp_hashinfo, skb, 0,
src6, tuple->ipv6.sport,
dst6, tuple->ipv6.dport,
dif, sdif, &refcounted);
else
sk = __udp6_lib_lookup(net, src6, tuple->ipv6.sport,
dst6, tuple->ipv6.dport,
dif, sdif, &udp_table, skb);
#endif
}
if (unlikely(sk && !refcounted && !sock_flag(sk, SOCK_RCU_FREE))) {
WARN_ONCE(1, "Found non-RCU, unreferenced socket!");
sk = NULL;
}
return sk;
}
/* bpf_sk_lookup performs the core lookup for different types of sockets,
* taking a reference on the socket if it doesn't have the flag SOCK_RCU_FREE.
* Returns the socket as an 'unsigned long' to simplify the casting in the
* callers to satisfy BPF_CALL declarations.
*/
static unsigned long
bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
u8 proto, u64 netns_id, u64 flags)
{
struct net *caller_net;
struct sock *sk = NULL;
u8 family = AF_UNSPEC;
struct net *net;
family = len == sizeof(tuple->ipv4) ? AF_INET : AF_INET6;
if (unlikely(family == AF_UNSPEC || netns_id > U32_MAX || flags))
goto out;
if (skb->dev)
caller_net = dev_net(skb->dev);
else
caller_net = sock_net(skb->sk);
if (netns_id) {
net = get_net_ns_by_id(caller_net, netns_id);
if (unlikely(!net))
goto out;
sk = sk_lookup(net, tuple, skb, family, proto);
put_net(net);
} else {
net = caller_net;
sk = sk_lookup(net, tuple, skb, family, proto);
}
if (sk)
sk = sk_to_full_sk(sk);
out:
return (unsigned long) sk;
}
BPF_CALL_5(bpf_sk_lookup_tcp, struct sk_buff *, skb,
struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
{
return bpf_sk_lookup(skb, tuple, len, IPPROTO_TCP, netns_id, flags);
}
static const struct bpf_func_proto bpf_sk_lookup_tcp_proto = {
.func = bpf_sk_lookup_tcp,
.gpl_only = false,
.pkt_access = true,
.ret_type = RET_PTR_TO_SOCKET_OR_NULL,
.arg1_type = ARG_PTR_TO_CTX,
.arg2_type = ARG_PTR_TO_MEM,
.arg3_type = ARG_CONST_SIZE,
.arg4_type = ARG_ANYTHING,
.arg5_type = ARG_ANYTHING,
};
BPF_CALL_5(bpf_sk_lookup_udp, struct sk_buff *, skb,
struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
{
return bpf_sk_lookup(skb, tuple, len, IPPROTO_UDP, netns_id, flags);
}
static const struct bpf_func_proto bpf_sk_lookup_udp_proto = {
.func = bpf_sk_lookup_udp,
.gpl_only = false,
.pkt_access = true,
.ret_type = RET_PTR_TO_SOCKET_OR_NULL,
.arg1_type = ARG_PTR_TO_CTX,
.arg2_type = ARG_PTR_TO_MEM,
.arg3_type = ARG_CONST_SIZE,
.arg4_type = ARG_ANYTHING,
.arg5_type = ARG_ANYTHING,
};
BPF_CALL_1(bpf_sk_release, struct sock *, sk)
{
if (!sock_flag(sk, SOCK_RCU_FREE))
sock_gen_put(sk);
return 0;
}
static const struct bpf_func_proto bpf_sk_release_proto = {
.func = bpf_sk_release,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_SOCKET,
};
#endif /* CONFIG_INET */
bool bpf_helper_changes_pkt_data(void *func) bool bpf_helper_changes_pkt_data(void *func)
{ {
if (func == bpf_skb_vlan_push || if (func == bpf_skb_vlan_push ||
@ -5018,6 +5159,14 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_skb_cgroup_id_proto; return &bpf_skb_cgroup_id_proto;
case BPF_FUNC_skb_ancestor_cgroup_id: case BPF_FUNC_skb_ancestor_cgroup_id:
return &bpf_skb_ancestor_cgroup_id_proto; return &bpf_skb_ancestor_cgroup_id_proto;
#endif
#ifdef CONFIG_INET
case BPF_FUNC_sk_lookup_tcp:
return &bpf_sk_lookup_tcp_proto;
case BPF_FUNC_sk_lookup_udp:
return &bpf_sk_lookup_udp_proto;
case BPF_FUNC_sk_release:
return &bpf_sk_release_proto;
#endif #endif
default: default:
return bpf_base_func_proto(func_id); return bpf_base_func_proto(func_id);
@ -5119,6 +5268,14 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_sk_redirect_hash_proto; return &bpf_sk_redirect_hash_proto;
case BPF_FUNC_get_local_storage: case BPF_FUNC_get_local_storage:
return &bpf_get_local_storage_proto; return &bpf_get_local_storage_proto;
#ifdef CONFIG_INET
case BPF_FUNC_sk_lookup_tcp:
return &bpf_sk_lookup_tcp_proto;
case BPF_FUNC_sk_lookup_udp:
return &bpf_sk_lookup_udp_proto;
case BPF_FUNC_sk_release:
return &bpf_sk_release_proto;
#endif
default: default:
return bpf_base_func_proto(func_id); return bpf_base_func_proto(func_id);
} }
@ -5394,23 +5551,29 @@ static bool __sock_filter_check_size(int off, int size,
return size == size_default; return size == size_default;
} }
static bool sock_filter_is_valid_access(int off, int size, bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
enum bpf_access_type type, struct bpf_insn_access_aux *info)
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{ {
if (off < 0 || off >= sizeof(struct bpf_sock)) if (off < 0 || off >= sizeof(struct bpf_sock))
return false; return false;
if (off % size != 0) if (off % size != 0)
return false; return false;
if (!__sock_filter_check_attach_type(off, type,
prog->expected_attach_type))
return false;
if (!__sock_filter_check_size(off, size, info)) if (!__sock_filter_check_size(off, size, info))
return false; return false;
return true; return true;
} }
static bool sock_filter_is_valid_access(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
if (!bpf_sock_is_valid_access(off, size, type, info))
return false;
return __sock_filter_check_attach_type(off, type,
prog->expected_attach_type);
}
static int bpf_unclone_prologue(struct bpf_insn *insn_buf, bool direct_write, static int bpf_unclone_prologue(struct bpf_insn *insn_buf, bool direct_write,
const struct bpf_prog *prog, int drop_verdict) const struct bpf_prog *prog, int drop_verdict)
{ {
@ -6122,10 +6285,10 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
return insn - insn_buf; return insn - insn_buf;
} }
static u32 sock_filter_convert_ctx_access(enum bpf_access_type type, u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si, const struct bpf_insn *si,
struct bpf_insn *insn_buf, struct bpf_insn *insn_buf,
struct bpf_prog *prog, u32 *target_size) struct bpf_prog *prog, u32 *target_size)
{ {
struct bpf_insn *insn = insn_buf; struct bpf_insn *insn = insn_buf;
int off; int off;
@ -7037,7 +7200,7 @@ const struct bpf_prog_ops lwt_seg6local_prog_ops = {
const struct bpf_verifier_ops cg_sock_verifier_ops = { const struct bpf_verifier_ops cg_sock_verifier_ops = {
.get_func_proto = sock_filter_func_proto, .get_func_proto = sock_filter_func_proto,
.is_valid_access = sock_filter_is_valid_access, .is_valid_access = sock_filter_is_valid_access,
.convert_ctx_access = sock_filter_convert_ctx_access, .convert_ctx_access = bpf_sock_convert_ctx_access,
}; };
const struct bpf_prog_ops cg_sock_prog_ops = { const struct bpf_prog_ops cg_sock_prog_ops = {

View File

@ -32,37 +32,49 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
{ {
unsigned long flags; unsigned long flags;
if (xs->dev) { spin_lock_irqsave(&umem->xsk_list_lock, flags);
spin_lock_irqsave(&umem->xsk_list_lock, flags); list_del_rcu(&xs->list);
list_del_rcu(&xs->list); spin_unlock_irqrestore(&umem->xsk_list_lock, flags);
spin_unlock_irqrestore(&umem->xsk_list_lock, flags);
if (umem->zc)
synchronize_net();
}
} }
int xdp_umem_query(struct net_device *dev, u16 queue_id) /* The umem is stored both in the _rx struct and the _tx struct as we do
* not know if the device has more tx queues than rx, or the opposite.
* This might also change during run time.
*/
static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
u16 queue_id)
{ {
struct netdev_bpf bpf; if (queue_id < dev->real_num_rx_queues)
dev->_rx[queue_id].umem = umem;
if (queue_id < dev->real_num_tx_queues)
dev->_tx[queue_id].umem = umem;
}
ASSERT_RTNL(); struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
u16 queue_id)
{
if (queue_id < dev->real_num_rx_queues)
return dev->_rx[queue_id].umem;
if (queue_id < dev->real_num_tx_queues)
return dev->_tx[queue_id].umem;
memset(&bpf, 0, sizeof(bpf)); return NULL;
bpf.command = XDP_QUERY_XSK_UMEM; }
bpf.xsk.queue_id = queue_id;
if (!dev->netdev_ops->ndo_bpf) static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
return 0; {
return dev->netdev_ops->ndo_bpf(dev, &bpf) ?: !!bpf.xsk.umem; if (queue_id < dev->real_num_rx_queues)
dev->_rx[queue_id].umem = NULL;
if (queue_id < dev->real_num_tx_queues)
dev->_tx[queue_id].umem = NULL;
} }
int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
u32 queue_id, u16 flags) u16 queue_id, u16 flags)
{ {
bool force_zc, force_copy; bool force_zc, force_copy;
struct netdev_bpf bpf; struct netdev_bpf bpf;
int err; int err = 0;
force_zc = flags & XDP_ZEROCOPY; force_zc = flags & XDP_ZEROCOPY;
force_copy = flags & XDP_COPY; force_copy = flags & XDP_COPY;
@ -70,17 +82,23 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
if (force_zc && force_copy) if (force_zc && force_copy)
return -EINVAL; return -EINVAL;
if (force_copy)
return 0;
if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
rtnl_lock(); rtnl_lock();
err = xdp_umem_query(dev, queue_id); if (xdp_get_umem_from_qid(dev, queue_id)) {
if (err) { err = -EBUSY;
err = err < 0 ? -EOPNOTSUPP : -EBUSY; goto out_rtnl_unlock;
goto err_rtnl_unlock; }
xdp_reg_umem_at_qid(dev, umem, queue_id);
umem->dev = dev;
umem->queue_id = queue_id;
if (force_copy)
/* For copy-mode, we are done. */
goto out_rtnl_unlock;
if (!dev->netdev_ops->ndo_bpf ||
!dev->netdev_ops->ndo_xsk_async_xmit) {
err = -EOPNOTSUPP;
goto err_unreg_umem;
} }
bpf.command = XDP_SETUP_XSK_UMEM; bpf.command = XDP_SETUP_XSK_UMEM;
@ -89,18 +107,20 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
err = dev->netdev_ops->ndo_bpf(dev, &bpf); err = dev->netdev_ops->ndo_bpf(dev, &bpf);
if (err) if (err)
goto err_rtnl_unlock; goto err_unreg_umem;
rtnl_unlock(); rtnl_unlock();
dev_hold(dev); dev_hold(dev);
umem->dev = dev;
umem->queue_id = queue_id;
umem->zc = true; umem->zc = true;
return 0; return 0;
err_rtnl_unlock: err_unreg_umem:
xdp_clear_umem_at_qid(dev, queue_id);
if (!force_zc)
err = 0; /* fallback to copy mode */
out_rtnl_unlock:
rtnl_unlock(); rtnl_unlock();
return force_zc ? err : 0; /* fail or fallback */ return err;
} }
static void xdp_umem_clear_dev(struct xdp_umem *umem) static void xdp_umem_clear_dev(struct xdp_umem *umem)
@ -108,7 +128,7 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
struct netdev_bpf bpf; struct netdev_bpf bpf;
int err; int err;
if (umem->dev) { if (umem->zc) {
bpf.command = XDP_SETUP_XSK_UMEM; bpf.command = XDP_SETUP_XSK_UMEM;
bpf.xsk.umem = NULL; bpf.xsk.umem = NULL;
bpf.xsk.queue_id = umem->queue_id; bpf.xsk.queue_id = umem->queue_id;
@ -119,9 +139,17 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
if (err) if (err)
WARN(1, "failed to disable umem!\n"); WARN(1, "failed to disable umem!\n");
}
if (umem->dev) {
rtnl_lock();
xdp_clear_umem_at_qid(umem->dev, umem->queue_id);
rtnl_unlock();
}
if (umem->zc) {
dev_put(umem->dev); dev_put(umem->dev);
umem->dev = NULL; umem->zc = false;
} }
} }

View File

@ -9,7 +9,7 @@
#include <net/xdp_sock.h> #include <net/xdp_sock.h>
int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
u32 queue_id, u16 flags); u16 queue_id, u16 flags);
bool xdp_umem_validate_queues(struct xdp_umem *umem); bool xdp_umem_validate_queues(struct xdp_umem *umem);
void xdp_get_umem(struct xdp_umem *umem); void xdp_get_umem(struct xdp_umem *umem);
void xdp_put_umem(struct xdp_umem *umem); void xdp_put_umem(struct xdp_umem *umem);

View File

@ -355,12 +355,18 @@ static int xsk_release(struct socket *sock)
local_bh_enable(); local_bh_enable();
if (xs->dev) { if (xs->dev) {
struct net_device *dev = xs->dev;
/* Wait for driver to stop using the xdp socket. */ /* Wait for driver to stop using the xdp socket. */
synchronize_net(); xdp_del_sk_umem(xs->umem, xs);
dev_put(xs->dev);
xs->dev = NULL; xs->dev = NULL;
synchronize_net();
dev_put(dev);
} }
xskq_destroy(xs->rx);
xskq_destroy(xs->tx);
sock_orphan(sk); sock_orphan(sk);
sock->sk = NULL; sock->sk = NULL;
@ -419,13 +425,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
} }
qid = sxdp->sxdp_queue_id; qid = sxdp->sxdp_queue_id;
if ((xs->rx && qid >= dev->real_num_rx_queues) ||
(xs->tx && qid >= dev->real_num_tx_queues)) {
err = -EINVAL;
goto out_unlock;
}
flags = sxdp->sxdp_flags; flags = sxdp->sxdp_flags;
if (flags & XDP_SHARED_UMEM) { if (flags & XDP_SHARED_UMEM) {
@ -721,9 +720,6 @@ static void xsk_destruct(struct sock *sk)
if (!sock_flag(sk, SOCK_DEAD)) if (!sock_flag(sk, SOCK_DEAD))
return; return;
xskq_destroy(xs->rx);
xskq_destroy(xs->tx);
xdp_del_sk_umem(xs->umem, xs);
xdp_put_umem(xs->umem); xdp_put_umem(xs->umem);
sk_refcnt_debug_dec(sk); sk_refcnt_debug_dec(sk);

View File

@ -209,7 +209,7 @@ static int map_fd = -1;
static int prog_load_cnt(int verdict, int val) static int prog_load_cnt(int verdict, int val)
{ {
int cgroup_storage_fd; int cgroup_storage_fd, percpu_cgroup_storage_fd;
if (map_fd < 0) if (map_fd < 0)
map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 8, 1, 0); map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 8, 1, 0);
@ -225,6 +225,14 @@ static int prog_load_cnt(int verdict, int val)
return -1; return -1;
} }
percpu_cgroup_storage_fd = bpf_create_map(
BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
sizeof(struct bpf_cgroup_storage_key), 8, 0, 0);
if (percpu_cgroup_storage_fd < 0) {
printf("failed to create map '%s'\n", strerror(errno));
return -1;
}
struct bpf_insn prog[] = { struct bpf_insn prog[] = {
BPF_MOV32_IMM(BPF_REG_0, 0), BPF_MOV32_IMM(BPF_REG_0, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
@ -235,11 +243,20 @@ static int prog_load_cnt(int verdict, int val)
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, val), /* r1 = 1 */ BPF_MOV64_IMM(BPF_REG_1, val), /* r1 = 1 */
BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */ BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
BPF_LD_MAP_FD(BPF_REG_1, cgroup_storage_fd), BPF_LD_MAP_FD(BPF_REG_1, cgroup_storage_fd),
BPF_MOV64_IMM(BPF_REG_2, 0), BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_local_storage), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_local_storage),
BPF_MOV64_IMM(BPF_REG_1, val), BPF_MOV64_IMM(BPF_REG_1, val),
BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_W, BPF_REG_0, BPF_REG_1, 0, 0), BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_W, BPF_REG_0, BPF_REG_1, 0, 0),
BPF_LD_MAP_FD(BPF_REG_1, percpu_cgroup_storage_fd),
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_local_storage),
BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, 0x1),
BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_3, 0),
BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */ BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
}; };

View File

@ -17,8 +17,6 @@
#include "bpf_load.h" #include "bpf_load.h"
#include "bpf_util.h" #include "bpf_util.h"
#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
#define SLOTS 100 #define SLOTS 100
static void clear_stats(int fd) static void clear_stats(int fd)

View File

@ -72,13 +72,15 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_SOCKHASH] = "sockhash", [BPF_MAP_TYPE_SOCKHASH] = "sockhash",
[BPF_MAP_TYPE_CGROUP_STORAGE] = "cgroup_storage", [BPF_MAP_TYPE_CGROUP_STORAGE] = "cgroup_storage",
[BPF_MAP_TYPE_REUSEPORT_SOCKARRAY] = "reuseport_sockarray", [BPF_MAP_TYPE_REUSEPORT_SOCKARRAY] = "reuseport_sockarray",
[BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE] = "percpu_cgroup_storage",
}; };
static bool map_is_per_cpu(__u32 type) static bool map_is_per_cpu(__u32 type)
{ {
return type == BPF_MAP_TYPE_PERCPU_HASH || return type == BPF_MAP_TYPE_PERCPU_HASH ||
type == BPF_MAP_TYPE_PERCPU_ARRAY || type == BPF_MAP_TYPE_PERCPU_ARRAY ||
type == BPF_MAP_TYPE_LRU_PERCPU_HASH; type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE;
} }
static bool map_is_map_of_maps(__u32 type) static bool map_is_map_of_maps(__u32 type)

View File

@ -69,7 +69,9 @@ static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
snprintf(netinfo->devices[netinfo->used_len].devname, snprintf(netinfo->devices[netinfo->used_len].devname,
sizeof(netinfo->devices[netinfo->used_len].devname), sizeof(netinfo->devices[netinfo->used_len].devname),
"%s", "%s",
tb[IFLA_IFNAME] ? nla_getattr_str(tb[IFLA_IFNAME]) : ""); tb[IFLA_IFNAME]
? libbpf_nla_getattr_str(tb[IFLA_IFNAME])
: "");
netinfo->used_len++; netinfo->used_len++;
return do_xdp_dump(ifinfo, tb); return do_xdp_dump(ifinfo, tb);
@ -83,7 +85,7 @@ static int dump_class_qdisc_nlmsg(void *cookie, void *msg, struct nlattr **tb)
if (tcinfo->is_qdisc) { if (tcinfo->is_qdisc) {
/* skip clsact qdisc */ /* skip clsact qdisc */
if (tb[TCA_KIND] && if (tb[TCA_KIND] &&
strcmp(nla_data(tb[TCA_KIND]), "clsact") == 0) strcmp(libbpf_nla_data(tb[TCA_KIND]), "clsact") == 0)
return 0; return 0;
if (info->tcm_handle == 0) if (info->tcm_handle == 0)
return 0; return 0;
@ -101,7 +103,9 @@ static int dump_class_qdisc_nlmsg(void *cookie, void *msg, struct nlattr **tb)
snprintf(tcinfo->handle_array[tcinfo->used_len].kind, snprintf(tcinfo->handle_array[tcinfo->used_len].kind,
sizeof(tcinfo->handle_array[tcinfo->used_len].kind), sizeof(tcinfo->handle_array[tcinfo->used_len].kind),
"%s", "%s",
tb[TCA_KIND] ? nla_getattr_str(tb[TCA_KIND]) : "unknown"); tb[TCA_KIND]
? libbpf_nla_getattr_str(tb[TCA_KIND])
: "unknown");
tcinfo->used_len++; tcinfo->used_len++;
return 0; return 0;
@ -127,14 +131,14 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
tcinfo.array_len = 0; tcinfo.array_len = 0;
tcinfo.is_qdisc = false; tcinfo.is_qdisc = false;
ret = nl_get_class(sock, nl_pid, dev->ifindex, dump_class_qdisc_nlmsg, ret = libbpf_nl_get_class(sock, nl_pid, dev->ifindex,
&tcinfo); dump_class_qdisc_nlmsg, &tcinfo);
if (ret) if (ret)
goto out; goto out;
tcinfo.is_qdisc = true; tcinfo.is_qdisc = true;
ret = nl_get_qdisc(sock, nl_pid, dev->ifindex, dump_class_qdisc_nlmsg, ret = libbpf_nl_get_qdisc(sock, nl_pid, dev->ifindex,
&tcinfo); dump_class_qdisc_nlmsg, &tcinfo);
if (ret) if (ret)
goto out; goto out;
@ -142,10 +146,9 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
filter_info.ifindex = dev->ifindex; filter_info.ifindex = dev->ifindex;
for (i = 0; i < tcinfo.used_len; i++) { for (i = 0; i < tcinfo.used_len; i++) {
filter_info.kind = tcinfo.handle_array[i].kind; filter_info.kind = tcinfo.handle_array[i].kind;
ret = nl_get_filter(sock, nl_pid, dev->ifindex, ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex,
tcinfo.handle_array[i].handle, tcinfo.handle_array[i].handle,
dump_filter_nlmsg, dump_filter_nlmsg, &filter_info);
&filter_info);
if (ret) if (ret)
goto out; goto out;
} }
@ -153,22 +156,22 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
/* root, ingress and egress handle */ /* root, ingress and egress handle */
handle = TC_H_ROOT; handle = TC_H_ROOT;
filter_info.kind = "root"; filter_info.kind = "root";
ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle, ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle,
dump_filter_nlmsg, &filter_info); dump_filter_nlmsg, &filter_info);
if (ret) if (ret)
goto out; goto out;
handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS); handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
filter_info.kind = "clsact/ingress"; filter_info.kind = "clsact/ingress";
ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle, ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle,
dump_filter_nlmsg, &filter_info); dump_filter_nlmsg, &filter_info);
if (ret) if (ret)
goto out; goto out;
handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS); handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS);
filter_info.kind = "clsact/egress"; filter_info.kind = "clsact/egress";
ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle, ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle,
dump_filter_nlmsg, &filter_info); dump_filter_nlmsg, &filter_info);
if (ret) if (ret)
goto out; goto out;
@ -196,7 +199,7 @@ static int do_show(int argc, char **argv)
usage(); usage();
} }
sock = bpf_netlink_open(&nl_pid); sock = libbpf_netlink_open(&nl_pid);
if (sock < 0) { if (sock < 0) {
fprintf(stderr, "failed to open netlink sock\n"); fprintf(stderr, "failed to open netlink sock\n");
return -1; return -1;
@ -211,7 +214,7 @@ static int do_show(int argc, char **argv)
jsonw_start_array(json_wtr); jsonw_start_array(json_wtr);
NET_START_OBJECT; NET_START_OBJECT;
NET_START_ARRAY("xdp", "%s:\n"); NET_START_ARRAY("xdp", "%s:\n");
ret = nl_get_link(sock, nl_pid, dump_link_nlmsg, &dev_array); ret = libbpf_nl_get_link(sock, nl_pid, dump_link_nlmsg, &dev_array);
NET_END_ARRAY("\n"); NET_END_ARRAY("\n");
if (!ret) { if (!ret) {

View File

@ -21,7 +21,7 @@ static void xdp_dump_prog_id(struct nlattr **tb, int attr,
if (new_json_object) if (new_json_object)
NET_START_OBJECT NET_START_OBJECT
NET_DUMP_STR("mode", " %s", mode); NET_DUMP_STR("mode", " %s", mode);
NET_DUMP_UINT("id", " id %u", nla_getattr_u32(tb[attr])) NET_DUMP_UINT("id", " id %u", libbpf_nla_getattr_u32(tb[attr]))
if (new_json_object) if (new_json_object)
NET_END_OBJECT NET_END_OBJECT
} }
@ -32,13 +32,13 @@ static int do_xdp_dump_one(struct nlattr *attr, unsigned int ifindex,
struct nlattr *tb[IFLA_XDP_MAX + 1]; struct nlattr *tb[IFLA_XDP_MAX + 1];
unsigned char mode; unsigned char mode;
if (nla_parse_nested(tb, IFLA_XDP_MAX, attr, NULL) < 0) if (libbpf_nla_parse_nested(tb, IFLA_XDP_MAX, attr, NULL) < 0)
return -1; return -1;
if (!tb[IFLA_XDP_ATTACHED]) if (!tb[IFLA_XDP_ATTACHED])
return 0; return 0;
mode = nla_getattr_u8(tb[IFLA_XDP_ATTACHED]); mode = libbpf_nla_getattr_u8(tb[IFLA_XDP_ATTACHED]);
if (mode == XDP_ATTACHED_NONE) if (mode == XDP_ATTACHED_NONE)
return 0; return 0;
@ -75,14 +75,14 @@ int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr **tb)
return 0; return 0;
return do_xdp_dump_one(tb[IFLA_XDP], ifinfo->ifi_index, return do_xdp_dump_one(tb[IFLA_XDP], ifinfo->ifi_index,
nla_getattr_str(tb[IFLA_IFNAME])); libbpf_nla_getattr_str(tb[IFLA_IFNAME]));
} }
static int do_bpf_dump_one_act(struct nlattr *attr) static int do_bpf_dump_one_act(struct nlattr *attr)
{ {
struct nlattr *tb[TCA_ACT_BPF_MAX + 1]; struct nlattr *tb[TCA_ACT_BPF_MAX + 1];
if (nla_parse_nested(tb, TCA_ACT_BPF_MAX, attr, NULL) < 0) if (libbpf_nla_parse_nested(tb, TCA_ACT_BPF_MAX, attr, NULL) < 0)
return -LIBBPF_ERRNO__NLPARSE; return -LIBBPF_ERRNO__NLPARSE;
if (!tb[TCA_ACT_BPF_PARMS]) if (!tb[TCA_ACT_BPF_PARMS])
@ -91,10 +91,10 @@ static int do_bpf_dump_one_act(struct nlattr *attr)
NET_START_OBJECT_NESTED2; NET_START_OBJECT_NESTED2;
if (tb[TCA_ACT_BPF_NAME]) if (tb[TCA_ACT_BPF_NAME])
NET_DUMP_STR("name", "%s", NET_DUMP_STR("name", "%s",
nla_getattr_str(tb[TCA_ACT_BPF_NAME])); libbpf_nla_getattr_str(tb[TCA_ACT_BPF_NAME]));
if (tb[TCA_ACT_BPF_ID]) if (tb[TCA_ACT_BPF_ID])
NET_DUMP_UINT("id", " id %u", NET_DUMP_UINT("id", " id %u",
nla_getattr_u32(tb[TCA_ACT_BPF_ID])); libbpf_nla_getattr_u32(tb[TCA_ACT_BPF_ID]));
NET_END_OBJECT_NESTED; NET_END_OBJECT_NESTED;
return 0; return 0;
} }
@ -106,10 +106,11 @@ static int do_dump_one_act(struct nlattr *attr)
if (!attr) if (!attr)
return 0; return 0;
if (nla_parse_nested(tb, TCA_ACT_MAX, attr, NULL) < 0) if (libbpf_nla_parse_nested(tb, TCA_ACT_MAX, attr, NULL) < 0)
return -LIBBPF_ERRNO__NLPARSE; return -LIBBPF_ERRNO__NLPARSE;
if (tb[TCA_ACT_KIND] && strcmp(nla_data(tb[TCA_ACT_KIND]), "bpf") == 0) if (tb[TCA_ACT_KIND] &&
strcmp(libbpf_nla_data(tb[TCA_ACT_KIND]), "bpf") == 0)
return do_bpf_dump_one_act(tb[TCA_ACT_OPTIONS]); return do_bpf_dump_one_act(tb[TCA_ACT_OPTIONS]);
return 0; return 0;
@ -120,7 +121,7 @@ static int do_bpf_act_dump(struct nlattr *attr)
struct nlattr *tb[TCA_ACT_MAX_PRIO + 1]; struct nlattr *tb[TCA_ACT_MAX_PRIO + 1];
int act, ret; int act, ret;
if (nla_parse_nested(tb, TCA_ACT_MAX_PRIO, attr, NULL) < 0) if (libbpf_nla_parse_nested(tb, TCA_ACT_MAX_PRIO, attr, NULL) < 0)
return -LIBBPF_ERRNO__NLPARSE; return -LIBBPF_ERRNO__NLPARSE;
NET_START_ARRAY("act", " %s ["); NET_START_ARRAY("act", " %s [");
@ -139,13 +140,15 @@ static int do_bpf_filter_dump(struct nlattr *attr)
struct nlattr *tb[TCA_BPF_MAX + 1]; struct nlattr *tb[TCA_BPF_MAX + 1];
int ret; int ret;
if (nla_parse_nested(tb, TCA_BPF_MAX, attr, NULL) < 0) if (libbpf_nla_parse_nested(tb, TCA_BPF_MAX, attr, NULL) < 0)
return -LIBBPF_ERRNO__NLPARSE; return -LIBBPF_ERRNO__NLPARSE;
if (tb[TCA_BPF_NAME]) if (tb[TCA_BPF_NAME])
NET_DUMP_STR("name", " %s", nla_getattr_str(tb[TCA_BPF_NAME])); NET_DUMP_STR("name", " %s",
libbpf_nla_getattr_str(tb[TCA_BPF_NAME]));
if (tb[TCA_BPF_ID]) if (tb[TCA_BPF_ID])
NET_DUMP_UINT("id", " id %u", nla_getattr_u32(tb[TCA_BPF_ID])); NET_DUMP_UINT("id", " id %u",
libbpf_nla_getattr_u32(tb[TCA_BPF_ID]));
if (tb[TCA_BPF_ACT]) { if (tb[TCA_BPF_ACT]) {
ret = do_bpf_act_dump(tb[TCA_BPF_ACT]); ret = do_bpf_act_dump(tb[TCA_BPF_ACT]);
if (ret) if (ret)
@ -160,7 +163,8 @@ int do_filter_dump(struct tcmsg *info, struct nlattr **tb, const char *kind,
{ {
int ret = 0; int ret = 0;
if (tb[TCA_OPTIONS] && strcmp(nla_data(tb[TCA_KIND]), "bpf") == 0) { if (tb[TCA_OPTIONS] &&
strcmp(libbpf_nla_data(tb[TCA_KIND]), "bpf") == 0) {
NET_START_OBJECT; NET_START_OBJECT;
if (devname[0] != '\0') if (devname[0] != '\0')
NET_DUMP_STR("devname", "%s", devname); NET_DUMP_STR("devname", "%s", devname);

View File

@ -16,7 +16,7 @@
jsonw_name(json_wtr, name); \ jsonw_name(json_wtr, name); \
jsonw_start_object(json_wtr); \ jsonw_start_object(json_wtr); \
} else { \ } else { \
fprintf(stderr, "%s {", name); \ fprintf(stdout, "%s {", name); \
} \ } \
} }
@ -25,7 +25,7 @@
if (json_output) \ if (json_output) \
jsonw_start_object(json_wtr); \ jsonw_start_object(json_wtr); \
else \ else \
fprintf(stderr, "{"); \ fprintf(stdout, "{"); \
} }
#define NET_END_OBJECT_NESTED \ #define NET_END_OBJECT_NESTED \
@ -33,7 +33,7 @@
if (json_output) \ if (json_output) \
jsonw_end_object(json_wtr); \ jsonw_end_object(json_wtr); \
else \ else \
fprintf(stderr, "}"); \ fprintf(stdout, "}"); \
} }
#define NET_END_OBJECT \ #define NET_END_OBJECT \
@ -47,7 +47,7 @@
if (json_output) \ if (json_output) \
jsonw_end_object(json_wtr); \ jsonw_end_object(json_wtr); \
else \ else \
fprintf(stderr, "\n"); \ fprintf(stdout, "\n"); \
} }
#define NET_START_ARRAY(name, fmt_str) \ #define NET_START_ARRAY(name, fmt_str) \
@ -56,7 +56,7 @@
jsonw_name(json_wtr, name); \ jsonw_name(json_wtr, name); \
jsonw_start_array(json_wtr); \ jsonw_start_array(json_wtr); \
} else { \ } else { \
fprintf(stderr, fmt_str, name); \ fprintf(stdout, fmt_str, name); \
} \ } \
} }
@ -65,7 +65,7 @@
if (json_output) \ if (json_output) \
jsonw_end_array(json_wtr); \ jsonw_end_array(json_wtr); \
else \ else \
fprintf(stderr, "%s", endstr); \ fprintf(stdout, "%s", endstr); \
} }
#define NET_DUMP_UINT(name, fmt_str, val) \ #define NET_DUMP_UINT(name, fmt_str, val) \
@ -73,7 +73,7 @@
if (json_output) \ if (json_output) \
jsonw_uint_field(json_wtr, name, val); \ jsonw_uint_field(json_wtr, name, val); \
else \ else \
fprintf(stderr, fmt_str, val); \ fprintf(stdout, fmt_str, val); \
} }
#define NET_DUMP_STR(name, fmt_str, str) \ #define NET_DUMP_STR(name, fmt_str, str) \
@ -81,7 +81,7 @@
if (json_output) \ if (json_output) \
jsonw_string_field(json_wtr, name, str);\ jsonw_string_field(json_wtr, name, str);\
else \ else \
fprintf(stderr, fmt_str, str); \ fprintf(stdout, fmt_str, str); \
} }
#define NET_DUMP_STR_ONLY(str) \ #define NET_DUMP_STR_ONLY(str) \
@ -89,7 +89,7 @@
if (json_output) \ if (json_output) \
jsonw_string(json_wtr, str); \ jsonw_string(json_wtr, str); \
else \ else \
fprintf(stderr, "%s ", str); \ fprintf(stdout, "%s ", str); \
} }
#endif #endif

View File

@ -127,6 +127,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_SOCKHASH, BPF_MAP_TYPE_SOCKHASH,
BPF_MAP_TYPE_CGROUP_STORAGE, BPF_MAP_TYPE_CGROUP_STORAGE,
BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
}; };
enum bpf_prog_type { enum bpf_prog_type {
@ -2143,6 +2144,77 @@ union bpf_attr {
* request in the skb. * request in the skb.
* Return * Return
* 0 on success, or a negative error in case of failure. * 0 on success, or a negative error in case of failure.
*
* struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags)
* Description
* Look for TCP socket matching *tuple*, optionally in a child
* network namespace *netns*. The return value must be checked,
* and if non-NULL, released via **bpf_sk_release**\ ().
*
* The *ctx* should point to the context of the program, such as
* the skb or socket (depending on the hook in use). This is used
* to determine the base network namespace for the lookup.
*
* *tuple_size* must be one of:
*
* **sizeof**\ (*tuple*\ **->ipv4**)
* Look for an IPv4 socket.
* **sizeof**\ (*tuple*\ **->ipv6**)
* Look for an IPv6 socket.
*
* If the *netns* is zero, then the socket lookup table in the
* netns associated with the *ctx* will be used. For the TC hooks,
* this in the netns of the device in the skb. For socket hooks,
* this in the netns of the socket. If *netns* is non-zero, then
* it specifies the ID of the netns relative to the netns
* associated with the *ctx*.
*
* All values for *flags* are reserved for future usage, and must
* be left at zero.
*
* This helper is available only if the kernel was compiled with
* **CONFIG_NET** configuration option.
* Return
* Pointer to *struct bpf_sock*, or NULL in case of failure.
*
* struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags)
* Description
* Look for UDP socket matching *tuple*, optionally in a child
* network namespace *netns*. The return value must be checked,
* and if non-NULL, released via **bpf_sk_release**\ ().
*
* The *ctx* should point to the context of the program, such as
* the skb or socket (depending on the hook in use). This is used
* to determine the base network namespace for the lookup.
*
* *tuple_size* must be one of:
*
* **sizeof**\ (*tuple*\ **->ipv4**)
* Look for an IPv4 socket.
* **sizeof**\ (*tuple*\ **->ipv6**)
* Look for an IPv6 socket.
*
* If the *netns* is zero, then the socket lookup table in the
* netns associated with the *ctx* will be used. For the TC hooks,
* this in the netns of the device in the skb. For socket hooks,
* this in the netns of the socket. If *netns* is non-zero, then
* it specifies the ID of the netns relative to the netns
* associated with the *ctx*.
*
* All values for *flags* are reserved for future usage, and must
* be left at zero.
*
* This helper is available only if the kernel was compiled with
* **CONFIG_NET** configuration option.
* Return
* Pointer to *struct bpf_sock*, or NULL in case of failure.
*
* int bpf_sk_release(struct bpf_sock *sk)
* Description
* Release the reference held by *sock*. *sock* must be a non-NULL
* pointer that was returned from bpf_sk_lookup_xxx\ ().
* Return
* 0 on success, or a negative error in case of failure.
*/ */
#define __BPF_FUNC_MAPPER(FN) \ #define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \ FN(unspec), \
@ -2228,7 +2300,10 @@ union bpf_attr {
FN(get_current_cgroup_id), \ FN(get_current_cgroup_id), \
FN(get_local_storage), \ FN(get_local_storage), \
FN(sk_select_reuseport), \ FN(sk_select_reuseport), \
FN(skb_ancestor_cgroup_id), FN(skb_ancestor_cgroup_id), \
FN(sk_lookup_tcp), \
FN(sk_lookup_udp), \
FN(sk_release),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper /* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call * function eBPF program intends to call
@ -2398,6 +2473,23 @@ struct bpf_sock {
*/ */
}; };
struct bpf_sock_tuple {
union {
struct {
__be32 saddr;
__be32 daddr;
__be16 sport;
__be16 dport;
} ipv4;
struct {
__be32 saddr[4];
__be32 daddr[4];
__be16 sport;
__be16 dport;
} ipv6;
};
};
#define XDP_PACKET_HEADROOM 256 #define XDP_PACKET_HEADROOM 256
/* User return codes for XDP prog type. /* User return codes for XDP prog type.

View File

@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
# Most of this file is copied from tools/lib/traceevent/Makefile # Most of this file is copied from tools/lib/traceevent/Makefile
BPF_VERSION = 0 BPF_VERSION = 0

View File

@ -1,4 +1,4 @@
// SPDX-License-Identifier: LGPL-2.1 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* /*
* common eBPF ELF operations. * common eBPF ELF operations.

View File

@ -1,4 +1,4 @@
/* SPDX-License-Identifier: LGPL-2.1 */ /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* /*
* common eBPF ELF operations. * common eBPF ELF operations.
@ -20,8 +20,8 @@
* You should have received a copy of the GNU Lesser General Public * You should have received a copy of the GNU Lesser General Public
* License along with this program; if not, see <http://www.gnu.org/licenses> * License along with this program; if not, see <http://www.gnu.org/licenses>
*/ */
#ifndef __BPF_BPF_H #ifndef __LIBBPF_BPF_H
#define __BPF_BPF_H #define __LIBBPF_BPF_H
#include <linux/bpf.h> #include <linux/bpf.h>
#include <stdbool.h> #include <stdbool.h>
@ -111,4 +111,4 @@ int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len, int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len,
__u32 *prog_id, __u32 *fd_type, __u64 *probe_offset, __u32 *prog_id, __u32 *fd_type, __u64 *probe_offset,
__u64 *probe_addr); __u64 *probe_addr);
#endif #endif /* __LIBBPF_BPF_H */

View File

@ -1,4 +1,4 @@
// SPDX-License-Identifier: LGPL-2.1 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* Copyright (c) 2018 Facebook */ /* Copyright (c) 2018 Facebook */
#include <stdlib.h> #include <stdlib.h>

View File

@ -1,8 +1,8 @@
/* SPDX-License-Identifier: LGPL-2.1 */ /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* Copyright (c) 2018 Facebook */ /* Copyright (c) 2018 Facebook */
#ifndef __BPF_BTF_H #ifndef __LIBBPF_BTF_H
#define __BPF_BTF_H #define __LIBBPF_BTF_H
#include <linux/types.h> #include <linux/types.h>
@ -23,4 +23,4 @@ int btf__resolve_type(const struct btf *btf, __u32 type_id);
int btf__fd(const struct btf *btf); int btf__fd(const struct btf *btf);
const char *btf__name_by_offset(const struct btf *btf, __u32 offset); const char *btf__name_by_offset(const struct btf *btf, __u32 offset);
#endif #endif /* __LIBBPF_BTF_H */

View File

@ -1,4 +1,4 @@
// SPDX-License-Identifier: LGPL-2.1 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* /*
* Common eBPF ELF object loading operations. * Common eBPF ELF object loading operations.
@ -7,19 +7,6 @@
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com> * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc. * Copyright (C) 2015 Huawei Inc.
* Copyright (C) 2017 Nicira, Inc. * Copyright (C) 2017 Nicira, Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation;
* version 2.1 of the License (not later!)
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this program; if not, see <http://www.gnu.org/licenses>
*/ */
#define _GNU_SOURCE #define _GNU_SOURCE
@ -228,7 +215,7 @@ struct bpf_object {
}; };
#define obj_elf_valid(o) ((o)->efile.elf) #define obj_elf_valid(o) ((o)->efile.elf)
static void bpf_program__unload(struct bpf_program *prog) void bpf_program__unload(struct bpf_program *prog)
{ {
int i; int i;
@ -470,7 +457,8 @@ static int bpf_object__elf_init(struct bpf_object *obj)
obj->efile.fd = open(obj->path, O_RDONLY); obj->efile.fd = open(obj->path, O_RDONLY);
if (obj->efile.fd < 0) { if (obj->efile.fd < 0) {
char errmsg[STRERR_BUFSIZE]; char errmsg[STRERR_BUFSIZE];
char *cp = str_error(errno, errmsg, sizeof(errmsg)); char *cp = libbpf_strerror_r(errno, errmsg,
sizeof(errmsg));
pr_warning("failed to open %s: %s\n", obj->path, cp); pr_warning("failed to open %s: %s\n", obj->path, cp);
return -errno; return -errno;
@ -811,7 +799,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
data->d_size, name, idx); data->d_size, name, idx);
if (err) { if (err) {
char errmsg[STRERR_BUFSIZE]; char errmsg[STRERR_BUFSIZE];
char *cp = str_error(-err, errmsg, sizeof(errmsg)); char *cp = libbpf_strerror_r(-err, errmsg,
sizeof(errmsg));
pr_warning("failed to alloc program %s (%s): %s", pr_warning("failed to alloc program %s (%s): %s",
name, obj->path, cp); name, obj->path, cp);
@ -1140,7 +1129,7 @@ bpf_object__create_maps(struct bpf_object *obj)
*pfd = bpf_create_map_xattr(&create_attr); *pfd = bpf_create_map_xattr(&create_attr);
if (*pfd < 0 && create_attr.btf_key_type_id) { if (*pfd < 0 && create_attr.btf_key_type_id) {
cp = str_error(errno, errmsg, sizeof(errmsg)); cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n", pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
map->name, cp, errno); map->name, cp, errno);
create_attr.btf_fd = 0; create_attr.btf_fd = 0;
@ -1155,7 +1144,7 @@ bpf_object__create_maps(struct bpf_object *obj)
size_t j; size_t j;
err = *pfd; err = *pfd;
cp = str_error(errno, errmsg, sizeof(errmsg)); cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("failed to create map (name: '%s'): %s\n", pr_warning("failed to create map (name: '%s'): %s\n",
map->name, cp); map->name, cp);
for (j = 0; j < i; j++) for (j = 0; j < i; j++)
@ -1339,7 +1328,7 @@ load_program(enum bpf_prog_type type, enum bpf_attach_type expected_attach_type,
} }
ret = -LIBBPF_ERRNO__LOAD; ret = -LIBBPF_ERRNO__LOAD;
cp = str_error(errno, errmsg, sizeof(errmsg)); cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("load bpf program failed: %s\n", cp); pr_warning("load bpf program failed: %s\n", cp);
if (log_buf && log_buf[0] != '\0') { if (log_buf && log_buf[0] != '\0') {
@ -1375,9 +1364,9 @@ out:
return ret; return ret;
} }
static int int
bpf_program__load(struct bpf_program *prog, bpf_program__load(struct bpf_program *prog,
char *license, u32 kern_version) char *license, __u32 kern_version)
{ {
int err = 0, fd, i; int err = 0, fd, i;
@ -1655,7 +1644,7 @@ static int check_path(const char *path)
dir = dirname(dname); dir = dirname(dname);
if (statfs(dir, &st_fs)) { if (statfs(dir, &st_fs)) {
cp = str_error(errno, errmsg, sizeof(errmsg)); cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("failed to statfs %s: %s\n", dir, cp); pr_warning("failed to statfs %s: %s\n", dir, cp);
err = -errno; err = -errno;
} }
@ -1691,7 +1680,7 @@ int bpf_program__pin_instance(struct bpf_program *prog, const char *path,
} }
if (bpf_obj_pin(prog->instances.fds[instance], path)) { if (bpf_obj_pin(prog->instances.fds[instance], path)) {
cp = str_error(errno, errmsg, sizeof(errmsg)); cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("failed to pin program: %s\n", cp); pr_warning("failed to pin program: %s\n", cp);
return -errno; return -errno;
} }
@ -1709,7 +1698,7 @@ static int make_dir(const char *path)
err = -errno; err = -errno;
if (err) { if (err) {
cp = str_error(-err, errmsg, sizeof(errmsg)); cp = libbpf_strerror_r(-err, errmsg, sizeof(errmsg));
pr_warning("failed to mkdir %s: %s\n", path, cp); pr_warning("failed to mkdir %s: %s\n", path, cp);
} }
return err; return err;
@ -1771,7 +1760,7 @@ int bpf_map__pin(struct bpf_map *map, const char *path)
} }
if (bpf_obj_pin(map->fd, path)) { if (bpf_obj_pin(map->fd, path)) {
cp = str_error(errno, errmsg, sizeof(errmsg)); cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("failed to pin map: %s\n", cp); pr_warning("failed to pin map: %s\n", cp);
return -errno; return -errno;
} }
@ -2085,58 +2074,90 @@ void bpf_program__set_expected_attach_type(struct bpf_program *prog,
prog->expected_attach_type = type; prog->expected_attach_type = type;
} }
#define BPF_PROG_SEC_FULL(string, ptype, atype) \ #define BPF_PROG_SEC_IMPL(string, ptype, eatype, atype) \
{ string, sizeof(string) - 1, ptype, atype } { string, sizeof(string) - 1, ptype, eatype, atype }
#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_FULL(string, ptype, 0) /* Programs that can NOT be attached. */
#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_IMPL(string, ptype, 0, -EINVAL)
#define BPF_S_PROG_SEC(string, ptype) \ /* Programs that can be attached. */
BPF_PROG_SEC_FULL(string, BPF_PROG_TYPE_CGROUP_SOCK, ptype) #define BPF_APROG_SEC(string, ptype, atype) \
BPF_PROG_SEC_IMPL(string, ptype, 0, atype)
#define BPF_SA_PROG_SEC(string, ptype) \ /* Programs that must specify expected attach type at load time. */
BPF_PROG_SEC_FULL(string, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, ptype) #define BPF_EAPROG_SEC(string, ptype, eatype) \
BPF_PROG_SEC_IMPL(string, ptype, eatype, eatype)
/* Programs that can be attached but attach type can't be identified by section
* name. Kept for backward compatibility.
*/
#define BPF_APROG_COMPAT(string, ptype) BPF_PROG_SEC(string, ptype)
static const struct { static const struct {
const char *sec; const char *sec;
size_t len; size_t len;
enum bpf_prog_type prog_type; enum bpf_prog_type prog_type;
enum bpf_attach_type expected_attach_type; enum bpf_attach_type expected_attach_type;
enum bpf_attach_type attach_type;
} section_names[] = { } section_names[] = {
BPF_PROG_SEC("socket", BPF_PROG_TYPE_SOCKET_FILTER), BPF_PROG_SEC("socket", BPF_PROG_TYPE_SOCKET_FILTER),
BPF_PROG_SEC("kprobe/", BPF_PROG_TYPE_KPROBE), BPF_PROG_SEC("kprobe/", BPF_PROG_TYPE_KPROBE),
BPF_PROG_SEC("kretprobe/", BPF_PROG_TYPE_KPROBE), BPF_PROG_SEC("kretprobe/", BPF_PROG_TYPE_KPROBE),
BPF_PROG_SEC("classifier", BPF_PROG_TYPE_SCHED_CLS), BPF_PROG_SEC("classifier", BPF_PROG_TYPE_SCHED_CLS),
BPF_PROG_SEC("action", BPF_PROG_TYPE_SCHED_ACT), BPF_PROG_SEC("action", BPF_PROG_TYPE_SCHED_ACT),
BPF_PROG_SEC("tracepoint/", BPF_PROG_TYPE_TRACEPOINT), BPF_PROG_SEC("tracepoint/", BPF_PROG_TYPE_TRACEPOINT),
BPF_PROG_SEC("raw_tracepoint/", BPF_PROG_TYPE_RAW_TRACEPOINT), BPF_PROG_SEC("raw_tracepoint/", BPF_PROG_TYPE_RAW_TRACEPOINT),
BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP), BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT), BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT),
BPF_PROG_SEC("cgroup/skb", BPF_PROG_TYPE_CGROUP_SKB), BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN),
BPF_PROG_SEC("cgroup/sock", BPF_PROG_TYPE_CGROUP_SOCK), BPF_PROG_SEC("lwt_out", BPF_PROG_TYPE_LWT_OUT),
BPF_PROG_SEC("cgroup/dev", BPF_PROG_TYPE_CGROUP_DEVICE), BPF_PROG_SEC("lwt_xmit", BPF_PROG_TYPE_LWT_XMIT),
BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN), BPF_PROG_SEC("lwt_seg6local", BPF_PROG_TYPE_LWT_SEG6LOCAL),
BPF_PROG_SEC("lwt_out", BPF_PROG_TYPE_LWT_OUT), BPF_APROG_SEC("cgroup_skb/ingress", BPF_PROG_TYPE_CGROUP_SKB,
BPF_PROG_SEC("lwt_xmit", BPF_PROG_TYPE_LWT_XMIT), BPF_CGROUP_INET_INGRESS),
BPF_PROG_SEC("lwt_seg6local", BPF_PROG_TYPE_LWT_SEG6LOCAL), BPF_APROG_SEC("cgroup_skb/egress", BPF_PROG_TYPE_CGROUP_SKB,
BPF_PROG_SEC("sockops", BPF_PROG_TYPE_SOCK_OPS), BPF_CGROUP_INET_EGRESS),
BPF_PROG_SEC("sk_skb", BPF_PROG_TYPE_SK_SKB), BPF_APROG_COMPAT("cgroup/skb", BPF_PROG_TYPE_CGROUP_SKB),
BPF_PROG_SEC("sk_msg", BPF_PROG_TYPE_SK_MSG), BPF_APROG_SEC("cgroup/sock", BPF_PROG_TYPE_CGROUP_SOCK,
BPF_PROG_SEC("lirc_mode2", BPF_PROG_TYPE_LIRC_MODE2), BPF_CGROUP_INET_SOCK_CREATE),
BPF_PROG_SEC("flow_dissector", BPF_PROG_TYPE_FLOW_DISSECTOR), BPF_EAPROG_SEC("cgroup/post_bind4", BPF_PROG_TYPE_CGROUP_SOCK,
BPF_SA_PROG_SEC("cgroup/bind4", BPF_CGROUP_INET4_BIND), BPF_CGROUP_INET4_POST_BIND),
BPF_SA_PROG_SEC("cgroup/bind6", BPF_CGROUP_INET6_BIND), BPF_EAPROG_SEC("cgroup/post_bind6", BPF_PROG_TYPE_CGROUP_SOCK,
BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT), BPF_CGROUP_INET6_POST_BIND),
BPF_SA_PROG_SEC("cgroup/connect6", BPF_CGROUP_INET6_CONNECT), BPF_APROG_SEC("cgroup/dev", BPF_PROG_TYPE_CGROUP_DEVICE,
BPF_SA_PROG_SEC("cgroup/sendmsg4", BPF_CGROUP_UDP4_SENDMSG), BPF_CGROUP_DEVICE),
BPF_SA_PROG_SEC("cgroup/sendmsg6", BPF_CGROUP_UDP6_SENDMSG), BPF_APROG_SEC("sockops", BPF_PROG_TYPE_SOCK_OPS,
BPF_S_PROG_SEC("cgroup/post_bind4", BPF_CGROUP_INET4_POST_BIND), BPF_CGROUP_SOCK_OPS),
BPF_S_PROG_SEC("cgroup/post_bind6", BPF_CGROUP_INET6_POST_BIND), BPF_APROG_SEC("sk_skb/stream_parser", BPF_PROG_TYPE_SK_SKB,
BPF_SK_SKB_STREAM_PARSER),
BPF_APROG_SEC("sk_skb/stream_verdict", BPF_PROG_TYPE_SK_SKB,
BPF_SK_SKB_STREAM_VERDICT),
BPF_APROG_COMPAT("sk_skb", BPF_PROG_TYPE_SK_SKB),
BPF_APROG_SEC("sk_msg", BPF_PROG_TYPE_SK_MSG,
BPF_SK_MSG_VERDICT),
BPF_APROG_SEC("lirc_mode2", BPF_PROG_TYPE_LIRC_MODE2,
BPF_LIRC_MODE2),
BPF_APROG_SEC("flow_dissector", BPF_PROG_TYPE_FLOW_DISSECTOR,
BPF_FLOW_DISSECTOR),
BPF_EAPROG_SEC("cgroup/bind4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
BPF_CGROUP_INET4_BIND),
BPF_EAPROG_SEC("cgroup/bind6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
BPF_CGROUP_INET6_BIND),
BPF_EAPROG_SEC("cgroup/connect4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
BPF_CGROUP_INET4_CONNECT),
BPF_EAPROG_SEC("cgroup/connect6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
BPF_CGROUP_INET6_CONNECT),
BPF_EAPROG_SEC("cgroup/sendmsg4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
BPF_CGROUP_UDP4_SENDMSG),
BPF_EAPROG_SEC("cgroup/sendmsg6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
BPF_CGROUP_UDP6_SENDMSG),
}; };
#undef BPF_PROG_SEC_IMPL
#undef BPF_PROG_SEC #undef BPF_PROG_SEC
#undef BPF_PROG_SEC_FULL #undef BPF_APROG_SEC
#undef BPF_S_PROG_SEC #undef BPF_EAPROG_SEC
#undef BPF_SA_PROG_SEC #undef BPF_APROG_COMPAT
int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type, int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
enum bpf_attach_type *expected_attach_type) enum bpf_attach_type *expected_attach_type)
@ -2156,6 +2177,25 @@ int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
return -EINVAL; return -EINVAL;
} }
int libbpf_attach_type_by_name(const char *name,
enum bpf_attach_type *attach_type)
{
int i;
if (!name)
return -EINVAL;
for (i = 0; i < ARRAY_SIZE(section_names); i++) {
if (strncmp(name, section_names[i].sec, section_names[i].len))
continue;
if (section_names[i].attach_type == -EINVAL)
return -EINVAL;
*attach_type = section_names[i].attach_type;
return 0;
}
return -EINVAL;
}
static int static int
bpf_program__identify_section(struct bpf_program *prog, bpf_program__identify_section(struct bpf_program *prog,
enum bpf_prog_type *prog_type, enum bpf_prog_type *prog_type,

View File

@ -1,4 +1,4 @@
/* SPDX-License-Identifier: LGPL-2.1 */ /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* /*
* Common eBPF ELF object loading operations. * Common eBPF ELF object loading operations.
@ -6,22 +6,9 @@
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org> * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com> * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc. * Copyright (C) 2015 Huawei Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation;
* version 2.1 of the License (not later!)
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this program; if not, see <http://www.gnu.org/licenses>
*/ */
#ifndef __BPF_LIBBPF_H #ifndef __LIBBPF_LIBBPF_H
#define __BPF_LIBBPF_H #define __LIBBPF_LIBBPF_H
#include <stdio.h> #include <stdio.h>
#include <stdint.h> #include <stdint.h>
@ -104,6 +91,8 @@ void *bpf_object__priv(struct bpf_object *prog);
int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type, int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
enum bpf_attach_type *expected_attach_type); enum bpf_attach_type *expected_attach_type);
int libbpf_attach_type_by_name(const char *name,
enum bpf_attach_type *attach_type);
/* Accessors of bpf_program */ /* Accessors of bpf_program */
struct bpf_program; struct bpf_program;
@ -126,10 +115,13 @@ void bpf_program__set_ifindex(struct bpf_program *prog, __u32 ifindex);
const char *bpf_program__title(struct bpf_program *prog, bool needs_copy); const char *bpf_program__title(struct bpf_program *prog, bool needs_copy);
int bpf_program__load(struct bpf_program *prog, char *license,
__u32 kern_version);
int bpf_program__fd(struct bpf_program *prog); int bpf_program__fd(struct bpf_program *prog);
int bpf_program__pin_instance(struct bpf_program *prog, const char *path, int bpf_program__pin_instance(struct bpf_program *prog, const char *path,
int instance); int instance);
int bpf_program__pin(struct bpf_program *prog, const char *path); int bpf_program__pin(struct bpf_program *prog, const char *path);
void bpf_program__unload(struct bpf_program *prog);
struct bpf_insn; struct bpf_insn;
@ -299,18 +291,15 @@ int bpf_perf_event_read_simple(void *mem, unsigned long size,
void **buf, size_t *buf_len, void **buf, size_t *buf_len,
bpf_perf_event_print_t fn, void *priv); bpf_perf_event_print_t fn, void *priv);
struct nlmsghdr;
struct nlattr; struct nlattr;
typedef int (*dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb); typedef int (*libbpf_dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);
typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, dump_nlmsg_t, int libbpf_netlink_open(unsigned int *nl_pid);
void *cookie); int libbpf_nl_get_link(int sock, unsigned int nl_pid,
int bpf_netlink_open(unsigned int *nl_pid); libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie);
int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg, int libbpf_nl_get_class(int sock, unsigned int nl_pid, int ifindex,
void *cookie); libbpf_dump_nlmsg_t dump_class_nlmsg, void *cookie);
int nl_get_class(int sock, unsigned int nl_pid, int ifindex, int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
dump_nlmsg_t dump_class_nlmsg, void *cookie); libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie);
int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle,
dump_nlmsg_t dump_qdisc_nlmsg, void *cookie); libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie);
int nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, #endif /* __LIBBPF_LIBBPF_H */
dump_nlmsg_t dump_filter_nlmsg, void *cookie);
#endif

View File

@ -1,23 +1,10 @@
// SPDX-License-Identifier: LGPL-2.1 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* /*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org> * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com> * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc. * Copyright (C) 2015 Huawei Inc.
* Copyright (C) 2017 Nicira, Inc. * Copyright (C) 2017 Nicira, Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation;
* version 2.1 of the License (not later!)
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this program; if not, see <http://www.gnu.org/licenses>
*/ */
#include <stdio.h> #include <stdio.h>

View File

@ -1,4 +1,4 @@
// SPDX-License-Identifier: LGPL-2.1 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* Copyright (c) 2018 Facebook */ /* Copyright (c) 2018 Facebook */
#include <stdlib.h> #include <stdlib.h>
@ -18,7 +18,10 @@
#define SOL_NETLINK 270 #define SOL_NETLINK 270
#endif #endif
int bpf_netlink_open(__u32 *nl_pid) typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, libbpf_dump_nlmsg_t,
void *cookie);
int libbpf_netlink_open(__u32 *nl_pid)
{ {
struct sockaddr_nl sa; struct sockaddr_nl sa;
socklen_t addrlen; socklen_t addrlen;
@ -62,7 +65,7 @@ cleanup:
} }
static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq, static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
__dump_nlmsg_t _fn, dump_nlmsg_t fn, __dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn,
void *cookie) void *cookie)
{ {
bool multipart = true; bool multipart = true;
@ -100,7 +103,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
if (!err->error) if (!err->error)
continue; continue;
ret = err->error; ret = err->error;
nla_dump_errormsg(nh); libbpf_nla_dump_errormsg(nh);
goto done; goto done;
case NLMSG_DONE: case NLMSG_DONE:
return 0; return 0;
@ -130,7 +133,7 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
} req; } req;
__u32 nl_pid; __u32 nl_pid;
sock = bpf_netlink_open(&nl_pid); sock = libbpf_netlink_open(&nl_pid);
if (sock < 0) if (sock < 0)
return sock; return sock;
@ -178,8 +181,8 @@ cleanup:
return ret; return ret;
} }
static int __dump_link_nlmsg(struct nlmsghdr *nlh, dump_nlmsg_t dump_link_nlmsg, static int __dump_link_nlmsg(struct nlmsghdr *nlh,
void *cookie) libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
{ {
struct nlattr *tb[IFLA_MAX + 1], *attr; struct nlattr *tb[IFLA_MAX + 1], *attr;
struct ifinfomsg *ifi = NLMSG_DATA(nlh); struct ifinfomsg *ifi = NLMSG_DATA(nlh);
@ -187,14 +190,14 @@ static int __dump_link_nlmsg(struct nlmsghdr *nlh, dump_nlmsg_t dump_link_nlmsg,
len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*ifi)); len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*ifi));
attr = (struct nlattr *) ((void *) ifi + NLMSG_ALIGN(sizeof(*ifi))); attr = (struct nlattr *) ((void *) ifi + NLMSG_ALIGN(sizeof(*ifi)));
if (nla_parse(tb, IFLA_MAX, attr, len, NULL) != 0) if (libbpf_nla_parse(tb, IFLA_MAX, attr, len, NULL) != 0)
return -LIBBPF_ERRNO__NLPARSE; return -LIBBPF_ERRNO__NLPARSE;
return dump_link_nlmsg(cookie, ifi, tb); return dump_link_nlmsg(cookie, ifi, tb);
} }
int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg, int libbpf_nl_get_link(int sock, unsigned int nl_pid,
void *cookie) libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
{ {
struct { struct {
struct nlmsghdr nlh; struct nlmsghdr nlh;
@ -216,7 +219,8 @@ int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg,
} }
static int __dump_class_nlmsg(struct nlmsghdr *nlh, static int __dump_class_nlmsg(struct nlmsghdr *nlh,
dump_nlmsg_t dump_class_nlmsg, void *cookie) libbpf_dump_nlmsg_t dump_class_nlmsg,
void *cookie)
{ {
struct nlattr *tb[TCA_MAX + 1], *attr; struct nlattr *tb[TCA_MAX + 1], *attr;
struct tcmsg *t = NLMSG_DATA(nlh); struct tcmsg *t = NLMSG_DATA(nlh);
@ -224,14 +228,14 @@ static int __dump_class_nlmsg(struct nlmsghdr *nlh,
len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t));
attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t)));
if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
return -LIBBPF_ERRNO__NLPARSE; return -LIBBPF_ERRNO__NLPARSE;
return dump_class_nlmsg(cookie, t, tb); return dump_class_nlmsg(cookie, t, tb);
} }
int nl_get_class(int sock, unsigned int nl_pid, int ifindex, int libbpf_nl_get_class(int sock, unsigned int nl_pid, int ifindex,
dump_nlmsg_t dump_class_nlmsg, void *cookie) libbpf_dump_nlmsg_t dump_class_nlmsg, void *cookie)
{ {
struct { struct {
struct nlmsghdr nlh; struct nlmsghdr nlh;
@ -254,7 +258,8 @@ int nl_get_class(int sock, unsigned int nl_pid, int ifindex,
} }
static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh, static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh,
dump_nlmsg_t dump_qdisc_nlmsg, void *cookie) libbpf_dump_nlmsg_t dump_qdisc_nlmsg,
void *cookie)
{ {
struct nlattr *tb[TCA_MAX + 1], *attr; struct nlattr *tb[TCA_MAX + 1], *attr;
struct tcmsg *t = NLMSG_DATA(nlh); struct tcmsg *t = NLMSG_DATA(nlh);
@ -262,14 +267,14 @@ static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh,
len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t));
attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t)));
if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
return -LIBBPF_ERRNO__NLPARSE; return -LIBBPF_ERRNO__NLPARSE;
return dump_qdisc_nlmsg(cookie, t, tb); return dump_qdisc_nlmsg(cookie, t, tb);
} }
int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
dump_nlmsg_t dump_qdisc_nlmsg, void *cookie) libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie)
{ {
struct { struct {
struct nlmsghdr nlh; struct nlmsghdr nlh;
@ -292,7 +297,8 @@ int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
} }
static int __dump_filter_nlmsg(struct nlmsghdr *nlh, static int __dump_filter_nlmsg(struct nlmsghdr *nlh,
dump_nlmsg_t dump_filter_nlmsg, void *cookie) libbpf_dump_nlmsg_t dump_filter_nlmsg,
void *cookie)
{ {
struct nlattr *tb[TCA_MAX + 1], *attr; struct nlattr *tb[TCA_MAX + 1], *attr;
struct tcmsg *t = NLMSG_DATA(nlh); struct tcmsg *t = NLMSG_DATA(nlh);
@ -300,14 +306,14 @@ static int __dump_filter_nlmsg(struct nlmsghdr *nlh,
len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t));
attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t)));
if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
return -LIBBPF_ERRNO__NLPARSE; return -LIBBPF_ERRNO__NLPARSE;
return dump_filter_nlmsg(cookie, t, tb); return dump_filter_nlmsg(cookie, t, tb);
} }
int nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle,
dump_nlmsg_t dump_filter_nlmsg, void *cookie) libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie)
{ {
struct { struct {
struct nlmsghdr nlh; struct nlmsghdr nlh;

View File

@ -1,13 +1,8 @@
// SPDX-License-Identifier: LGPL-2.1 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* /*
* NETLINK Netlink attributes * NETLINK Netlink attributes
* *
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation version 2.1
* of the License.
*
* Copyright (c) 2003-2013 Thomas Graf <tgraf@suug.ch> * Copyright (c) 2003-2013 Thomas Graf <tgraf@suug.ch>
*/ */
@ -17,13 +12,13 @@
#include <string.h> #include <string.h>
#include <stdio.h> #include <stdio.h>
static uint16_t nla_attr_minlen[NLA_TYPE_MAX+1] = { static uint16_t nla_attr_minlen[LIBBPF_NLA_TYPE_MAX+1] = {
[NLA_U8] = sizeof(uint8_t), [LIBBPF_NLA_U8] = sizeof(uint8_t),
[NLA_U16] = sizeof(uint16_t), [LIBBPF_NLA_U16] = sizeof(uint16_t),
[NLA_U32] = sizeof(uint32_t), [LIBBPF_NLA_U32] = sizeof(uint32_t),
[NLA_U64] = sizeof(uint64_t), [LIBBPF_NLA_U64] = sizeof(uint64_t),
[NLA_STRING] = 1, [LIBBPF_NLA_STRING] = 1,
[NLA_FLAG] = 0, [LIBBPF_NLA_FLAG] = 0,
}; };
static struct nlattr *nla_next(const struct nlattr *nla, int *remaining) static struct nlattr *nla_next(const struct nlattr *nla, int *remaining)
@ -47,9 +42,9 @@ static int nla_type(const struct nlattr *nla)
} }
static int validate_nla(struct nlattr *nla, int maxtype, static int validate_nla(struct nlattr *nla, int maxtype,
struct nla_policy *policy) struct libbpf_nla_policy *policy)
{ {
struct nla_policy *pt; struct libbpf_nla_policy *pt;
unsigned int minlen = 0; unsigned int minlen = 0;
int type = nla_type(nla); int type = nla_type(nla);
@ -58,23 +53,24 @@ static int validate_nla(struct nlattr *nla, int maxtype,
pt = &policy[type]; pt = &policy[type];
if (pt->type > NLA_TYPE_MAX) if (pt->type > LIBBPF_NLA_TYPE_MAX)
return 0; return 0;
if (pt->minlen) if (pt->minlen)
minlen = pt->minlen; minlen = pt->minlen;
else if (pt->type != NLA_UNSPEC) else if (pt->type != LIBBPF_NLA_UNSPEC)
minlen = nla_attr_minlen[pt->type]; minlen = nla_attr_minlen[pt->type];
if (nla_len(nla) < minlen) if (libbpf_nla_len(nla) < minlen)
return -1; return -1;
if (pt->maxlen && nla_len(nla) > pt->maxlen) if (pt->maxlen && libbpf_nla_len(nla) > pt->maxlen)
return -1; return -1;
if (pt->type == NLA_STRING) { if (pt->type == LIBBPF_NLA_STRING) {
char *data = nla_data(nla); char *data = libbpf_nla_data(nla);
if (data[nla_len(nla) - 1] != '\0')
if (data[libbpf_nla_len(nla) - 1] != '\0')
return -1; return -1;
} }
@ -104,15 +100,15 @@ static inline int nlmsg_len(const struct nlmsghdr *nlh)
* @see nla_validate * @see nla_validate
* @return 0 on success or a negative error code. * @return 0 on success or a negative error code.
*/ */
int nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, int len, int libbpf_nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head,
struct nla_policy *policy) int len, struct libbpf_nla_policy *policy)
{ {
struct nlattr *nla; struct nlattr *nla;
int rem, err; int rem, err;
memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1)); memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1));
nla_for_each_attr(nla, head, len, rem) { libbpf_nla_for_each_attr(nla, head, len, rem) {
int type = nla_type(nla); int type = nla_type(nla);
if (type > maxtype) if (type > maxtype)
@ -144,23 +140,25 @@ errout:
* @arg policy Attribute validation policy. * @arg policy Attribute validation policy.
* *
* Feeds the stream of attributes nested into the specified attribute * Feeds the stream of attributes nested into the specified attribute
* to nla_parse(). * to libbpf_nla_parse().
* *
* @see nla_parse * @see libbpf_nla_parse
* @return 0 on success or a negative error code. * @return 0 on success or a negative error code.
*/ */
int nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla, int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
struct nla_policy *policy) struct nlattr *nla,
struct libbpf_nla_policy *policy)
{ {
return nla_parse(tb, maxtype, nla_data(nla), nla_len(nla), policy); return libbpf_nla_parse(tb, maxtype, libbpf_nla_data(nla),
libbpf_nla_len(nla), policy);
} }
/* dump netlink extended ack error message */ /* dump netlink extended ack error message */
int nla_dump_errormsg(struct nlmsghdr *nlh) int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh)
{ {
struct nla_policy extack_policy[NLMSGERR_ATTR_MAX + 1] = { struct libbpf_nla_policy extack_policy[NLMSGERR_ATTR_MAX + 1] = {
[NLMSGERR_ATTR_MSG] = { .type = NLA_STRING }, [NLMSGERR_ATTR_MSG] = { .type = LIBBPF_NLA_STRING },
[NLMSGERR_ATTR_OFFS] = { .type = NLA_U32 }, [NLMSGERR_ATTR_OFFS] = { .type = LIBBPF_NLA_U32 },
}; };
struct nlattr *tb[NLMSGERR_ATTR_MAX + 1], *attr; struct nlattr *tb[NLMSGERR_ATTR_MAX + 1], *attr;
struct nlmsgerr *err; struct nlmsgerr *err;
@ -181,14 +179,15 @@ int nla_dump_errormsg(struct nlmsghdr *nlh)
attr = (struct nlattr *) ((void *) err + hlen); attr = (struct nlattr *) ((void *) err + hlen);
alen = nlh->nlmsg_len - hlen; alen = nlh->nlmsg_len - hlen;
if (nla_parse(tb, NLMSGERR_ATTR_MAX, attr, alen, extack_policy) != 0) { if (libbpf_nla_parse(tb, NLMSGERR_ATTR_MAX, attr, alen,
extack_policy) != 0) {
fprintf(stderr, fprintf(stderr,
"Failed to parse extended error attributes\n"); "Failed to parse extended error attributes\n");
return 0; return 0;
} }
if (tb[NLMSGERR_ATTR_MSG]) if (tb[NLMSGERR_ATTR_MSG])
errmsg = (char *) nla_data(tb[NLMSGERR_ATTR_MSG]); errmsg = (char *) libbpf_nla_data(tb[NLMSGERR_ATTR_MSG]);
fprintf(stderr, "Kernel error message: %s\n", errmsg); fprintf(stderr, "Kernel error message: %s\n", errmsg);

View File

@ -1,18 +1,13 @@
/* SPDX-License-Identifier: LGPL-2.1 */ /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* /*
* NETLINK Netlink attributes * NETLINK Netlink attributes
* *
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation version 2.1
* of the License.
*
* Copyright (c) 2003-2013 Thomas Graf <tgraf@suug.ch> * Copyright (c) 2003-2013 Thomas Graf <tgraf@suug.ch>
*/ */
#ifndef __NLATTR_H #ifndef __LIBBPF_NLATTR_H
#define __NLATTR_H #define __LIBBPF_NLATTR_H
#include <stdint.h> #include <stdint.h>
#include <linux/netlink.h> #include <linux/netlink.h>
@ -23,19 +18,19 @@
* Standard attribute types to specify validation policy * Standard attribute types to specify validation policy
*/ */
enum { enum {
NLA_UNSPEC, /**< Unspecified type, binary data chunk */ LIBBPF_NLA_UNSPEC, /**< Unspecified type, binary data chunk */
NLA_U8, /**< 8 bit integer */ LIBBPF_NLA_U8, /**< 8 bit integer */
NLA_U16, /**< 16 bit integer */ LIBBPF_NLA_U16, /**< 16 bit integer */
NLA_U32, /**< 32 bit integer */ LIBBPF_NLA_U32, /**< 32 bit integer */
NLA_U64, /**< 64 bit integer */ LIBBPF_NLA_U64, /**< 64 bit integer */
NLA_STRING, /**< NUL terminated character string */ LIBBPF_NLA_STRING, /**< NUL terminated character string */
NLA_FLAG, /**< Flag */ LIBBPF_NLA_FLAG, /**< Flag */
NLA_MSECS, /**< Micro seconds (64bit) */ LIBBPF_NLA_MSECS, /**< Micro seconds (64bit) */
NLA_NESTED, /**< Nested attributes */ LIBBPF_NLA_NESTED, /**< Nested attributes */
__NLA_TYPE_MAX, __LIBBPF_NLA_TYPE_MAX,
}; };
#define NLA_TYPE_MAX (__NLA_TYPE_MAX - 1) #define LIBBPF_NLA_TYPE_MAX (__LIBBPF_NLA_TYPE_MAX - 1)
/** /**
* @ingroup attr * @ingroup attr
@ -43,8 +38,8 @@ enum {
* *
* See section @core_doc{core_attr_parse,Attribute Parsing} for more details. * See section @core_doc{core_attr_parse,Attribute Parsing} for more details.
*/ */
struct nla_policy { struct libbpf_nla_policy {
/** Type of attribute or NLA_UNSPEC */ /** Type of attribute or LIBBPF_NLA_UNSPEC */
uint16_t type; uint16_t type;
/** Minimal length of payload required */ /** Minimal length of payload required */
@ -62,49 +57,50 @@ struct nla_policy {
* @arg len length of attribute stream * @arg len length of attribute stream
* @arg rem initialized to len, holds bytes currently remaining in stream * @arg rem initialized to len, holds bytes currently remaining in stream
*/ */
#define nla_for_each_attr(pos, head, len, rem) \ #define libbpf_nla_for_each_attr(pos, head, len, rem) \
for (pos = head, rem = len; \ for (pos = head, rem = len; \
nla_ok(pos, rem); \ nla_ok(pos, rem); \
pos = nla_next(pos, &(rem))) pos = nla_next(pos, &(rem)))
/** /**
* nla_data - head of payload * libbpf_nla_data - head of payload
* @nla: netlink attribute * @nla: netlink attribute
*/ */
static inline void *nla_data(const struct nlattr *nla) static inline void *libbpf_nla_data(const struct nlattr *nla)
{ {
return (char *) nla + NLA_HDRLEN; return (char *) nla + NLA_HDRLEN;
} }
static inline uint8_t nla_getattr_u8(const struct nlattr *nla) static inline uint8_t libbpf_nla_getattr_u8(const struct nlattr *nla)
{ {
return *(uint8_t *)nla_data(nla); return *(uint8_t *)libbpf_nla_data(nla);
} }
static inline uint32_t nla_getattr_u32(const struct nlattr *nla) static inline uint32_t libbpf_nla_getattr_u32(const struct nlattr *nla)
{ {
return *(uint32_t *)nla_data(nla); return *(uint32_t *)libbpf_nla_data(nla);
} }
static inline const char *nla_getattr_str(const struct nlattr *nla) static inline const char *libbpf_nla_getattr_str(const struct nlattr *nla)
{ {
return (const char *)nla_data(nla); return (const char *)libbpf_nla_data(nla);
} }
/** /**
* nla_len - length of payload * libbpf_nla_len - length of payload
* @nla: netlink attribute * @nla: netlink attribute
*/ */
static inline int nla_len(const struct nlattr *nla) static inline int libbpf_nla_len(const struct nlattr *nla)
{ {
return nla->nla_len - NLA_HDRLEN; return nla->nla_len - NLA_HDRLEN;
} }
int nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, int len, int libbpf_nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head,
struct nla_policy *policy); int len, struct libbpf_nla_policy *policy);
int nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla, int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
struct nla_policy *policy); struct nlattr *nla,
struct libbpf_nla_policy *policy);
int nla_dump_errormsg(struct nlmsghdr *nlh); int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
#endif /* __NLATTR_H */ #endif /* __LIBBPF_NLATTR_H */

View File

@ -1,4 +1,4 @@
// SPDX-License-Identifier: LGPL-2.1 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
#undef _GNU_SOURCE #undef _GNU_SOURCE
#include <string.h> #include <string.h>
#include <stdio.h> #include <stdio.h>
@ -9,7 +9,7 @@
* libc, while checking strerror_r() return to avoid having to check this in * libc, while checking strerror_r() return to avoid having to check this in
* all places calling it. * all places calling it.
*/ */
char *str_error(int err, char *dst, int len) char *libbpf_strerror_r(int err, char *dst, int len)
{ {
int ret = strerror_r(err, dst, len); int ret = strerror_r(err, dst, len);
if (ret) if (ret)

View File

@ -1,6 +1,6 @@
// SPDX-License-Identifier: LGPL-2.1 /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
#ifndef BPF_STR_ERROR #ifndef __LIBBPF_STR_ERROR_H
#define BPF_STR_ERROR #define __LIBBPF_STR_ERROR_H
char *str_error(int err, char *dst, int len); char *libbpf_strerror_r(int err, char *dst, int len);
#endif // BPF_STR_ERROR #endif /* __LIBBPF_STR_ERROR_H */

View File

@ -23,7 +23,8 @@ $(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c
TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \ test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user \ test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user \
test_socket_cookie test_cgroup_storage test_select_reuseport test_socket_cookie test_cgroup_storage test_select_reuseport test_section_names \
test_netcnt
TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \ test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \
@ -35,7 +36,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \ test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \ test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \ get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o bpf_flow.o test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o test_sk_lookup_kern.o
# Order correspond to 'make run_tests' order # Order correspond to 'make run_tests' order
TEST_PROGS := test_kmod.sh \ TEST_PROGS := test_kmod.sh \
@ -72,6 +73,7 @@ $(OUTPUT)/test_tcpbpf_user: cgroup_helpers.c
$(OUTPUT)/test_progs: trace_helpers.c $(OUTPUT)/test_progs: trace_helpers.c
$(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
$(OUTPUT)/test_cgroup_storage: cgroup_helpers.c $(OUTPUT)/test_cgroup_storage: cgroup_helpers.c
$(OUTPUT)/test_netcnt: cgroup_helpers.c
.PHONY: force .PHONY: force

View File

@ -143,6 +143,18 @@ static unsigned long long (*bpf_skb_cgroup_id)(void *ctx) =
(void *) BPF_FUNC_skb_cgroup_id; (void *) BPF_FUNC_skb_cgroup_id;
static unsigned long long (*bpf_skb_ancestor_cgroup_id)(void *ctx, int level) = static unsigned long long (*bpf_skb_ancestor_cgroup_id)(void *ctx, int level) =
(void *) BPF_FUNC_skb_ancestor_cgroup_id; (void *) BPF_FUNC_skb_ancestor_cgroup_id;
static struct bpf_sock *(*bpf_sk_lookup_tcp)(void *ctx,
struct bpf_sock_tuple *tuple,
int size, unsigned int netns_id,
unsigned long long flags) =
(void *) BPF_FUNC_sk_lookup_tcp;
static struct bpf_sock *(*bpf_sk_lookup_udp)(void *ctx,
struct bpf_sock_tuple *tuple,
int size, unsigned int netns_id,
unsigned long long flags) =
(void *) BPF_FUNC_sk_lookup_udp;
static int (*bpf_sk_release)(struct bpf_sock *sk) =
(void *) BPF_FUNC_sk_release;
/* llvm builtin functions that eBPF C program may use to /* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions * emit BPF_LD_ABS and BPF_LD_IND instructions

View File

@ -0,0 +1,24 @@
// SPDX-License-Identifier: GPL-2.0
#ifndef __NETCNT_COMMON_H
#define __NETCNT_COMMON_H
#include <linux/types.h>
#define MAX_PERCPU_PACKETS 32
struct percpu_net_cnt {
__u64 packets;
__u64 bytes;
__u64 prev_ts;
__u64 prev_packets;
__u64 prev_bytes;
};
struct net_cnt {
__u64 packets;
__u64 bytes;
};
#endif

View File

@ -0,0 +1,71 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <linux/version.h>
#include "bpf_helpers.h"
#include "netcnt_common.h"
#define MAX_BPS (3 * 1024 * 1024)
#define REFRESH_TIME_NS 100000000
#define NS_PER_SEC 1000000000
struct bpf_map_def SEC("maps") percpu_netcnt = {
.type = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
.key_size = sizeof(struct bpf_cgroup_storage_key),
.value_size = sizeof(struct percpu_net_cnt),
};
struct bpf_map_def SEC("maps") netcnt = {
.type = BPF_MAP_TYPE_CGROUP_STORAGE,
.key_size = sizeof(struct bpf_cgroup_storage_key),
.value_size = sizeof(struct net_cnt),
};
SEC("cgroup/skb")
int bpf_nextcnt(struct __sk_buff *skb)
{
struct percpu_net_cnt *percpu_cnt;
char fmt[] = "%d %llu %llu\n";
struct net_cnt *cnt;
__u64 ts, dt;
int ret;
cnt = bpf_get_local_storage(&netcnt, 0);
percpu_cnt = bpf_get_local_storage(&percpu_netcnt, 0);
percpu_cnt->packets++;
percpu_cnt->bytes += skb->len;
if (percpu_cnt->packets > MAX_PERCPU_PACKETS) {
__sync_fetch_and_add(&cnt->packets,
percpu_cnt->packets);
percpu_cnt->packets = 0;
__sync_fetch_and_add(&cnt->bytes,
percpu_cnt->bytes);
percpu_cnt->bytes = 0;
}
ts = bpf_ktime_get_ns();
dt = ts - percpu_cnt->prev_ts;
dt *= MAX_BPS;
dt /= NS_PER_SEC;
if (cnt->bytes + percpu_cnt->bytes - percpu_cnt->prev_bytes < dt)
ret = 1;
else
ret = 0;
if (dt > REFRESH_TIME_NS) {
percpu_cnt->prev_ts = ts;
percpu_cnt->prev_packets = cnt->packets;
percpu_cnt->prev_bytes = cnt->bytes;
}
return !!ret;
}
char _license[] SEC("license") = "GPL";
__u32 _version SEC("version") = LINUX_VERSION_CODE;

View File

@ -4,6 +4,7 @@
#include <linux/filter.h> #include <linux/filter.h>
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include <sys/sysinfo.h>
#include "bpf_rlimit.h" #include "bpf_rlimit.h"
#include "cgroup_helpers.h" #include "cgroup_helpers.h"
@ -15,6 +16,14 @@ char bpf_log_buf[BPF_LOG_BUF_SIZE];
int main(int argc, char **argv) int main(int argc, char **argv)
{ {
struct bpf_insn prog[] = { struct bpf_insn prog[] = {
BPF_LD_MAP_FD(BPF_REG_1, 0), /* percpu map fd */
BPF_MOV64_IMM(BPF_REG_2, 0), /* flags, not used */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_get_local_storage),
BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, 0x1),
BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_3, 0),
BPF_LD_MAP_FD(BPF_REG_1, 0), /* map fd */ BPF_LD_MAP_FD(BPF_REG_1, 0), /* map fd */
BPF_MOV64_IMM(BPF_REG_2, 0), /* flags, not used */ BPF_MOV64_IMM(BPF_REG_2, 0), /* flags, not used */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
@ -28,9 +37,18 @@ int main(int argc, char **argv)
}; };
size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
int error = EXIT_FAILURE; int error = EXIT_FAILURE;
int map_fd, prog_fd, cgroup_fd; int map_fd, percpu_map_fd, prog_fd, cgroup_fd;
struct bpf_cgroup_storage_key key; struct bpf_cgroup_storage_key key;
unsigned long long value; unsigned long long value;
unsigned long long *percpu_value;
int cpu, nproc;
nproc = get_nprocs_conf();
percpu_value = malloc(sizeof(*percpu_value) * nproc);
if (!percpu_value) {
printf("Not enough memory for per-cpu area (%d cpus)\n", nproc);
goto err;
}
map_fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_STORAGE, sizeof(key), map_fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_STORAGE, sizeof(key),
sizeof(value), 0, 0); sizeof(value), 0, 0);
@ -39,7 +57,15 @@ int main(int argc, char **argv)
goto out; goto out;
} }
prog[0].imm = map_fd; percpu_map_fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
sizeof(key), sizeof(value), 0, 0);
if (percpu_map_fd < 0) {
printf("Failed to create map: %s\n", strerror(errno));
goto out;
}
prog[0].imm = percpu_map_fd;
prog[7].imm = map_fd;
prog_fd = bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB, prog_fd = bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
prog, insns_cnt, "GPL", 0, prog, insns_cnt, "GPL", 0,
bpf_log_buf, BPF_LOG_BUF_SIZE); bpf_log_buf, BPF_LOG_BUF_SIZE);
@ -77,7 +103,15 @@ int main(int argc, char **argv)
} }
if (bpf_map_lookup_elem(map_fd, &key, &value)) { if (bpf_map_lookup_elem(map_fd, &key, &value)) {
printf("Failed to lookup cgroup storage\n"); printf("Failed to lookup cgroup storage 0\n");
goto err;
}
for (cpu = 0; cpu < nproc; cpu++)
percpu_value[cpu] = 1000;
if (bpf_map_update_elem(percpu_map_fd, &key, percpu_value, 0)) {
printf("Failed to update the data in the cgroup storage\n");
goto err; goto err;
} }
@ -120,11 +154,31 @@ int main(int argc, char **argv)
goto err; goto err;
} }
/* Check the final value of the counter in the percpu local storage */
for (cpu = 0; cpu < nproc; cpu++)
percpu_value[cpu] = 0;
if (bpf_map_lookup_elem(percpu_map_fd, &key, percpu_value)) {
printf("Failed to lookup the per-cpu cgroup storage\n");
goto err;
}
value = 0;
for (cpu = 0; cpu < nproc; cpu++)
value += percpu_value[cpu];
if (value != nproc * 1000 + 6) {
printf("Unexpected data in the per-cpu cgroup storage\n");
goto err;
}
error = 0; error = 0;
printf("test_cgroup_storage:PASS\n"); printf("test_cgroup_storage:PASS\n");
err: err:
cleanup_cgroup_environment(); cleanup_cgroup_environment();
free(percpu_value);
out: out:
return error; return error;

View File

@ -0,0 +1,158 @@
// SPDX-License-Identifier: GPL-2.0
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>
#include <sys/sysinfo.h>
#include <sys/time.h>
#include <linux/bpf.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include "cgroup_helpers.h"
#include "bpf_rlimit.h"
#include "netcnt_common.h"
#define BPF_PROG "./netcnt_prog.o"
#define TEST_CGROUP "/test-network-counters/"
static int bpf_find_map(const char *test, struct bpf_object *obj,
const char *name)
{
struct bpf_map *map;
map = bpf_object__find_map_by_name(obj, name);
if (!map) {
printf("%s:FAIL:map '%s' not found\n", test, name);
return -1;
}
return bpf_map__fd(map);
}
int main(int argc, char **argv)
{
struct percpu_net_cnt *percpu_netcnt;
struct bpf_cgroup_storage_key key;
int map_fd, percpu_map_fd;
int error = EXIT_FAILURE;
struct net_cnt netcnt;
struct bpf_object *obj;
int prog_fd, cgroup_fd;
unsigned long packets;
unsigned long bytes;
int cpu, nproc;
__u32 prog_cnt;
nproc = get_nprocs_conf();
percpu_netcnt = malloc(sizeof(*percpu_netcnt) * nproc);
if (!percpu_netcnt) {
printf("Not enough memory for per-cpu area (%d cpus)\n", nproc);
goto err;
}
if (bpf_prog_load(BPF_PROG, BPF_PROG_TYPE_CGROUP_SKB,
&obj, &prog_fd)) {
printf("Failed to load bpf program\n");
goto out;
}
if (setup_cgroup_environment()) {
printf("Failed to load bpf program\n");
goto err;
}
/* Create a cgroup, get fd, and join it */
cgroup_fd = create_and_get_cgroup(TEST_CGROUP);
if (!cgroup_fd) {
printf("Failed to create test cgroup\n");
goto err;
}
if (join_cgroup(TEST_CGROUP)) {
printf("Failed to join cgroup\n");
goto err;
}
/* Attach bpf program */
if (bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_INET_EGRESS, 0)) {
printf("Failed to attach bpf program");
goto err;
}
assert(system("ping localhost -6 -c 10000 -f -q > /dev/null") == 0);
if (bpf_prog_query(cgroup_fd, BPF_CGROUP_INET_EGRESS, 0, NULL, NULL,
&prog_cnt)) {
printf("Failed to query attached programs");
goto err;
}
map_fd = bpf_find_map(__func__, obj, "netcnt");
if (map_fd < 0) {
printf("Failed to find bpf map with net counters");
goto err;
}
percpu_map_fd = bpf_find_map(__func__, obj, "percpu_netcnt");
if (percpu_map_fd < 0) {
printf("Failed to find bpf map with percpu net counters");
goto err;
}
if (bpf_map_get_next_key(map_fd, NULL, &key)) {
printf("Failed to get key in cgroup storage\n");
goto err;
}
if (bpf_map_lookup_elem(map_fd, &key, &netcnt)) {
printf("Failed to lookup cgroup storage\n");
goto err;
}
if (bpf_map_lookup_elem(percpu_map_fd, &key, &percpu_netcnt[0])) {
printf("Failed to lookup percpu cgroup storage\n");
goto err;
}
/* Some packets can be still in per-cpu cache, but not more than
* MAX_PERCPU_PACKETS.
*/
packets = netcnt.packets;
bytes = netcnt.bytes;
for (cpu = 0; cpu < nproc; cpu++) {
if (percpu_netcnt[cpu].packets > MAX_PERCPU_PACKETS) {
printf("Unexpected percpu value: %llu\n",
percpu_netcnt[cpu].packets);
goto err;
}
packets += percpu_netcnt[cpu].packets;
bytes += percpu_netcnt[cpu].bytes;
}
/* No packets should be lost */
if (packets != 10000) {
printf("Unexpected packet count: %lu\n", packets);
goto err;
}
/* Let's check that bytes counter matches the number of packets
* multiplied by the size of ipv6 ICMP packet.
*/
if (bytes != packets * 104) {
printf("Unexpected bytes count: %lu\n", bytes);
goto err;
}
error = 0;
printf("test_netcnt:PASS\n");
err:
cleanup_cgroup_environment();
free(percpu_netcnt);
out:
return error;
}

View File

@ -1698,6 +1698,43 @@ static void test_task_fd_query_tp(void)
"sys_enter_read"); "sys_enter_read");
} }
static void test_reference_tracking()
{
const char *file = "./test_sk_lookup_kern.o";
struct bpf_object *obj;
struct bpf_program *prog;
__u32 duration;
int err = 0;
obj = bpf_object__open(file);
if (IS_ERR(obj)) {
error_cnt++;
return;
}
bpf_object__for_each_program(prog, obj) {
const char *title;
/* Ignore .text sections */
title = bpf_program__title(prog, false);
if (strstr(title, ".text") != NULL)
continue;
bpf_program__set_type(prog, BPF_PROG_TYPE_SCHED_CLS);
/* Expect verifier failure if test name has 'fail' */
if (strstr(title, "fail") != NULL) {
libbpf_set_print(NULL, NULL, NULL);
err = !bpf_program__load(prog, "GPL", 0);
libbpf_set_print(printf, printf, NULL);
} else {
err = bpf_program__load(prog, "GPL", 0);
}
CHECK(err, title, "\n");
}
bpf_object__close(obj);
}
int main(void) int main(void)
{ {
jit_enabled = is_jit_enabled(); jit_enabled = is_jit_enabled();
@ -1719,6 +1756,7 @@ int main(void)
test_get_stack_raw_tp(); test_get_stack_raw_tp();
test_task_fd_query_rawtp(); test_task_fd_query_rawtp();
test_task_fd_query_tp(); test_task_fd_query_tp();
test_reference_tracking();
printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt); printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS; return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;

View File

@ -0,0 +1,208 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2018 Facebook
#include <err.h>
#include <bpf/libbpf.h>
#include "bpf_util.h"
struct sec_name_test {
const char sec_name[32];
struct {
int rc;
enum bpf_prog_type prog_type;
enum bpf_attach_type expected_attach_type;
} expected_load;
struct {
int rc;
enum bpf_attach_type attach_type;
} expected_attach;
};
static struct sec_name_test tests[] = {
{"InvAliD", {-EINVAL, 0, 0}, {-EINVAL, 0} },
{"cgroup", {-EINVAL, 0, 0}, {-EINVAL, 0} },
{"socket", {0, BPF_PROG_TYPE_SOCKET_FILTER, 0}, {-EINVAL, 0} },
{"kprobe/", {0, BPF_PROG_TYPE_KPROBE, 0}, {-EINVAL, 0} },
{"kretprobe/", {0, BPF_PROG_TYPE_KPROBE, 0}, {-EINVAL, 0} },
{"classifier", {0, BPF_PROG_TYPE_SCHED_CLS, 0}, {-EINVAL, 0} },
{"action", {0, BPF_PROG_TYPE_SCHED_ACT, 0}, {-EINVAL, 0} },
{"tracepoint/", {0, BPF_PROG_TYPE_TRACEPOINT, 0}, {-EINVAL, 0} },
{
"raw_tracepoint/",
{0, BPF_PROG_TYPE_RAW_TRACEPOINT, 0},
{-EINVAL, 0},
},
{"xdp", {0, BPF_PROG_TYPE_XDP, 0}, {-EINVAL, 0} },
{"perf_event", {0, BPF_PROG_TYPE_PERF_EVENT, 0}, {-EINVAL, 0} },
{"lwt_in", {0, BPF_PROG_TYPE_LWT_IN, 0}, {-EINVAL, 0} },
{"lwt_out", {0, BPF_PROG_TYPE_LWT_OUT, 0}, {-EINVAL, 0} },
{"lwt_xmit", {0, BPF_PROG_TYPE_LWT_XMIT, 0}, {-EINVAL, 0} },
{"lwt_seg6local", {0, BPF_PROG_TYPE_LWT_SEG6LOCAL, 0}, {-EINVAL, 0} },
{
"cgroup_skb/ingress",
{0, BPF_PROG_TYPE_CGROUP_SKB, 0},
{0, BPF_CGROUP_INET_INGRESS},
},
{
"cgroup_skb/egress",
{0, BPF_PROG_TYPE_CGROUP_SKB, 0},
{0, BPF_CGROUP_INET_EGRESS},
},
{"cgroup/skb", {0, BPF_PROG_TYPE_CGROUP_SKB, 0}, {-EINVAL, 0} },
{
"cgroup/sock",
{0, BPF_PROG_TYPE_CGROUP_SOCK, 0},
{0, BPF_CGROUP_INET_SOCK_CREATE},
},
{
"cgroup/post_bind4",
{0, BPF_PROG_TYPE_CGROUP_SOCK, BPF_CGROUP_INET4_POST_BIND},
{0, BPF_CGROUP_INET4_POST_BIND},
},
{
"cgroup/post_bind6",
{0, BPF_PROG_TYPE_CGROUP_SOCK, BPF_CGROUP_INET6_POST_BIND},
{0, BPF_CGROUP_INET6_POST_BIND},
},
{
"cgroup/dev",
{0, BPF_PROG_TYPE_CGROUP_DEVICE, 0},
{0, BPF_CGROUP_DEVICE},
},
{"sockops", {0, BPF_PROG_TYPE_SOCK_OPS, 0}, {0, BPF_CGROUP_SOCK_OPS} },
{
"sk_skb/stream_parser",
{0, BPF_PROG_TYPE_SK_SKB, 0},
{0, BPF_SK_SKB_STREAM_PARSER},
},
{
"sk_skb/stream_verdict",
{0, BPF_PROG_TYPE_SK_SKB, 0},
{0, BPF_SK_SKB_STREAM_VERDICT},
},
{"sk_skb", {0, BPF_PROG_TYPE_SK_SKB, 0}, {-EINVAL, 0} },
{"sk_msg", {0, BPF_PROG_TYPE_SK_MSG, 0}, {0, BPF_SK_MSG_VERDICT} },
{"lirc_mode2", {0, BPF_PROG_TYPE_LIRC_MODE2, 0}, {0, BPF_LIRC_MODE2} },
{
"flow_dissector",
{0, BPF_PROG_TYPE_FLOW_DISSECTOR, 0},
{0, BPF_FLOW_DISSECTOR},
},
{
"cgroup/bind4",
{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_BIND},
{0, BPF_CGROUP_INET4_BIND},
},
{
"cgroup/bind6",
{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_BIND},
{0, BPF_CGROUP_INET6_BIND},
},
{
"cgroup/connect4",
{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_CONNECT},
{0, BPF_CGROUP_INET4_CONNECT},
},
{
"cgroup/connect6",
{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_CONNECT},
{0, BPF_CGROUP_INET6_CONNECT},
},
{
"cgroup/sendmsg4",
{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP4_SENDMSG},
{0, BPF_CGROUP_UDP4_SENDMSG},
},
{
"cgroup/sendmsg6",
{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP6_SENDMSG},
{0, BPF_CGROUP_UDP6_SENDMSG},
},
};
static int test_prog_type_by_name(const struct sec_name_test *test)
{
enum bpf_attach_type expected_attach_type;
enum bpf_prog_type prog_type;
int rc;
rc = libbpf_prog_type_by_name(test->sec_name, &prog_type,
&expected_attach_type);
if (rc != test->expected_load.rc) {
warnx("prog: unexpected rc=%d for %s", rc, test->sec_name);
return -1;
}
if (rc)
return 0;
if (prog_type != test->expected_load.prog_type) {
warnx("prog: unexpected prog_type=%d for %s", prog_type,
test->sec_name);
return -1;
}
if (expected_attach_type != test->expected_load.expected_attach_type) {
warnx("prog: unexpected expected_attach_type=%d for %s",
expected_attach_type, test->sec_name);
return -1;
}
return 0;
}
static int test_attach_type_by_name(const struct sec_name_test *test)
{
enum bpf_attach_type attach_type;
int rc;
rc = libbpf_attach_type_by_name(test->sec_name, &attach_type);
if (rc != test->expected_attach.rc) {
warnx("attach: unexpected rc=%d for %s", rc, test->sec_name);
return -1;
}
if (rc)
return 0;
if (attach_type != test->expected_attach.attach_type) {
warnx("attach: unexpected attach_type=%d for %s", attach_type,
test->sec_name);
return -1;
}
return 0;
}
static int run_test_case(const struct sec_name_test *test)
{
if (test_prog_type_by_name(test))
return -1;
if (test_attach_type_by_name(test))
return -1;
return 0;
}
static int run_tests(void)
{
int passes = 0;
int fails = 0;
int i;
for (i = 0; i < ARRAY_SIZE(tests); ++i) {
if (run_test_case(&tests[i]))
++fails;
else
++passes;
}
printf("Summary: %d PASSED, %d FAILED\n", passes, fails);
return fails ? -1 : 0;
}
int main(int argc, char **argv)
{
return run_tests();
}

View File

@ -0,0 +1,180 @@
/* SPDX-License-Identifier: GPL-2.0 */
// Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
#include <stddef.h>
#include <stdbool.h>
#include <string.h>
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/in.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/pkt_cls.h>
#include <linux/tcp.h>
#include <sys/socket.h>
#include "bpf_helpers.h"
#include "bpf_endian.h"
int _version SEC("version") = 1;
char _license[] SEC("license") = "GPL";
/* Fill 'tuple' with L3 info, and attempt to find L4. On fail, return NULL. */
static struct bpf_sock_tuple *get_tuple(void *data, __u64 nh_off,
void *data_end, __u16 eth_proto,
bool *ipv4)
{
struct bpf_sock_tuple *result;
__u8 proto = 0;
__u64 ihl_len;
if (eth_proto == bpf_htons(ETH_P_IP)) {
struct iphdr *iph = (struct iphdr *)(data + nh_off);
if (iph + 1 > data_end)
return NULL;
ihl_len = iph->ihl * 4;
proto = iph->protocol;
*ipv4 = true;
result = (struct bpf_sock_tuple *)&iph->saddr;
} else if (eth_proto == bpf_htons(ETH_P_IPV6)) {
struct ipv6hdr *ip6h = (struct ipv6hdr *)(data + nh_off);
if (ip6h + 1 > data_end)
return NULL;
ihl_len = sizeof(*ip6h);
proto = ip6h->nexthdr;
*ipv4 = true;
result = (struct bpf_sock_tuple *)&ip6h->saddr;
}
if (data + nh_off + ihl_len > data_end || proto != IPPROTO_TCP)
return NULL;
return result;
}
SEC("sk_lookup_success")
int bpf_sk_lookup_test0(struct __sk_buff *skb)
{
void *data_end = (void *)(long)skb->data_end;
void *data = (void *)(long)skb->data;
struct ethhdr *eth = (struct ethhdr *)(data);
struct bpf_sock_tuple *tuple;
struct bpf_sock *sk;
size_t tuple_len;
bool ipv4;
if (eth + 1 > data_end)
return TC_ACT_SHOT;
tuple = get_tuple(data, sizeof(*eth), data_end, eth->h_proto, &ipv4);
if (!tuple || tuple + sizeof *tuple > data_end)
return TC_ACT_SHOT;
tuple_len = ipv4 ? sizeof(tuple->ipv4) : sizeof(tuple->ipv6);
sk = bpf_sk_lookup_tcp(skb, tuple, tuple_len, 0, 0);
if (sk)
bpf_sk_release(sk);
return sk ? TC_ACT_OK : TC_ACT_UNSPEC;
}
SEC("sk_lookup_success_simple")
int bpf_sk_lookup_test1(struct __sk_buff *skb)
{
struct bpf_sock_tuple tuple = {};
struct bpf_sock *sk;
sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0);
if (sk)
bpf_sk_release(sk);
return 0;
}
SEC("fail_use_after_free")
int bpf_sk_lookup_uaf(struct __sk_buff *skb)
{
struct bpf_sock_tuple tuple = {};
struct bpf_sock *sk;
__u32 family = 0;
sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0);
if (sk) {
bpf_sk_release(sk);
family = sk->family;
}
return family;
}
SEC("fail_modify_sk_pointer")
int bpf_sk_lookup_modptr(struct __sk_buff *skb)
{
struct bpf_sock_tuple tuple = {};
struct bpf_sock *sk;
__u32 family;
sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0);
if (sk) {
sk += 1;
bpf_sk_release(sk);
}
return 0;
}
SEC("fail_modify_sk_or_null_pointer")
int bpf_sk_lookup_modptr_or_null(struct __sk_buff *skb)
{
struct bpf_sock_tuple tuple = {};
struct bpf_sock *sk;
__u32 family;
sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0);
sk += 1;
if (sk)
bpf_sk_release(sk);
return 0;
}
SEC("fail_no_release")
int bpf_sk_lookup_test2(struct __sk_buff *skb)
{
struct bpf_sock_tuple tuple = {};
bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0);
return 0;
}
SEC("fail_release_twice")
int bpf_sk_lookup_test3(struct __sk_buff *skb)
{
struct bpf_sock_tuple tuple = {};
struct bpf_sock *sk;
sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0);
bpf_sk_release(sk);
bpf_sk_release(sk);
return 0;
}
SEC("fail_release_unchecked")
int bpf_sk_lookup_test4(struct __sk_buff *skb)
{
struct bpf_sock_tuple tuple = {};
struct bpf_sock *sk;
sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0);
bpf_sk_release(sk);
return 0;
}
void lookup_no_release(struct __sk_buff *skb)
{
struct bpf_sock_tuple tuple = {};
bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0);
}
SEC("fail_no_release_subcall")
int bpf_sk_lookup_test5(struct __sk_buff *skb)
{
lookup_no_release(skb);
return 0;
}

View File

@ -158,11 +158,7 @@ static int run_test(int cgfd)
bpf_object__for_each_program(prog, pobj) { bpf_object__for_each_program(prog, pobj) {
prog_name = bpf_program__title(prog, /*needs_copy*/ false); prog_name = bpf_program__title(prog, /*needs_copy*/ false);
if (strcmp(prog_name, "cgroup/connect6") == 0) { if (libbpf_attach_type_by_name(prog_name, &attach_type)) {
attach_type = BPF_CGROUP_INET6_CONNECT;
} else if (strcmp(prog_name, "sockops") == 0) {
attach_type = BPF_CGROUP_SOCK_OPS;
} else {
log_err("Unexpected prog: %s", prog_name); log_err("Unexpected prog: %s", prog_name);
goto err; goto err;
} }

File diff suppressed because it is too large Load Diff