Testing long jumps requires having >32k instructions. That many
instructions require the verifier log buffer of 2 megabytes.
The regular test_progs run doesn't need an increased buffer, since
gotol test with 40k instructions doesn't request a log,
but test_progs -v will set the verifier log level.
Hence to avoid breaking gotol test with -v increase the buffer size.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20240102193531.3169422-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test validating that libbpf uploads BTF and func_info with
rewritten type information for arguments of global subprogs that are
marked with __arg_ctx tag.
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a few extra cases of global funcs with context arguments. This time
rely on "arg:ctx" decl_tag (__arg_ctx macro), but put it next to
"classic" cases where context argument has to be of an exact type that
BPF verifier expects (e.g., bpf_user_pt_regs_t for kprobe/uprobe).
Colocating all these cases separately from other global func args that
rely on arg:xxx decl tags (in verifier_global_subprogs.c) allows for
simpler backwards compatibility testing on old kernels. All the cases in
test_global_func_ctx_args.c are supposed to work on older kernels, which
was manually validated during development.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Out of all special global func arg tag annotations, __arg_ctx is
practically is the most immediately useful and most critical to have
working across multitude kernel version, if possible. This would allow
end users to write much simpler code if __arg_ctx semantics worked for
older kernels that don't natively understand btf_decl_tag("arg:ctx") in
verifier logic.
Luckily, it is possible to ensure __arg_ctx works on old kernels through
a bit of extra work done by libbpf, at least in a lot of common cases.
To explain the overall idea, we need to go back at how context argument
was supported in global funcs before __arg_ctx support was added. This
was done based on special struct name checks in kernel. E.g., for
BPF_PROG_TYPE_PERF_EVENT the expectation is that argument type `struct
bpf_perf_event_data *` mark that argument as PTR_TO_CTX. This is all
good as long as global function is used from the same BPF program types
only, which is often not the case. If the same subprog has to be called
from, say, kprobe and perf_event program types, there is no single
definition that would satisfy BPF verifier. Subprog will have context
argument either for kprobe (if using bpf_user_pt_regs_t struct name) or
perf_event (with bpf_perf_event_data struct name), but not both.
This limitation was the reason to add btf_decl_tag("arg:ctx"), making
the actual argument type not important, so that user can just define
"generic" signature:
__noinline int global_subprog(void *ctx __arg_ctx) { ... }
I won't belabor how libbpf is implementing subprograms, see a huge
comment next to bpf_object_relocate_calls() function. The idea is that
each main/entry BPF program gets its own copy of global_subprog's code
appended.
This per-program copy of global subprog code *and* associated func_info
.BTF.ext information, pointing to FUNC -> FUNC_PROTO BTF type chain
allows libbpf to simulate __arg_ctx behavior transparently, even if the
kernel doesn't yet support __arg_ctx annotation natively.
The idea is straightforward: each time we append global subprog's code
and func_info information, we adjust its FUNC -> FUNC_PROTO type
information, if necessary (that is, libbpf can detect the presence of
btf_decl_tag("arg:ctx") just like BPF verifier would do it).
The rest is just mechanical and somewhat painful BTF manipulation code.
It's painful because we need to clone FUNC -> FUNC_PROTO, instead of
reusing it, as same FUNC -> FUNC_PROTO chain might be used by another
main BPF program within the same BPF object, so we can't just modify it
in-place (and cloning BTF types within the same struct btf object is
painful due to constant memory invalidation, see comments in code).
Uploaded BPF object's BTF information has to work for all BPF
programs at the same time.
Once we have FUNC -> FUNC_PROTO clones, we make sure that instead of
using some `void *ctx` parameter definition, we have an expected `struct
bpf_perf_event_data *ctx` definition (as far as BPF verifier and kernel
is concerned), which will mark it as context for BPF verifier. Same
global subprog relocated and copied into another main BPF program will
get different type information according to main program's type. It all
works out in the end in a completely transparent way for end user.
Libbpf maintains internal program type -> expected context struct name
mapping internally. Note, not all BPF program types have named context
struct, so this approach won't work for such programs (just like it
didn't before __arg_ctx). So native __arg_ctx is still important to have
in kernel to have generic context support across all BPF program types.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With all the preparations in previous patches done we are ready to
postpone BTF loading and sanitization step until after all the
relocations are performed.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Move the logic of finding and assigning exception callback indices from
BTF sanitization step to program relocations step, which seems more
logical and will unblock moving BTF loading to after relocation step.
Exception callbacks discovery and assignment has no dependency on BTF
being loaded into the kernel, it only uses BTF information. It does need
to happen before subprogram relocations happen, though. Which is why the
split.
No functional changes.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Move map creation to later during BPF object loading by pre-creating
stable placeholder FDs (utilizing memfd_create()). Use dup2()
syscall to then atomically make those placeholder FDs point to real
kernel BPF map objects.
This change allows to delay BPF map creation to after all the BPF
program relocations. That, in turn, allows to delay BTF finalization and
loading into kernel to after all the relocations as well. We'll take
advantage of the latter in subsequent patches to allow libbpf to adjust
BTF in a way that helps with BPF global function usage.
Clean up a few places where we close map->fd, which now shouldn't
happen, because map->fd should be a valid FD regardless of whether map
was created or not. Surprisingly and nicely it simplifies a bunch of
error handling code. If this change doesn't backfire, I'm tempted to
pre-create such stable FDs for other entities (progs, maybe even BTF).
We previously did some manipulations to make gen_loader work with fake
map FDs, with stable map FDs this hack is not necessary for maps (we
still have it for BTF, but I left it as is for now).
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With the upcoming switch to preallocated placeholder FDs for maps,
switch various getters/setter away from checking map->fd. Use
map_is_created() helper that detect whether BPF map can be modified based
on map->obj->loaded state, with special provision for maps set up with
bpf_map__reuse_fd().
For backwards compatibility, we take map_is_created() into account in
bpf_map__fd() getter as well. This way before bpf_object__load() phase
bpf_map__fd() will always return -1, just as before the changes in
subsequent patches adding stable map->fd placeholders.
We also get rid of all internal uses of bpf_map__fd() getter, as it's
more oriented for uses external to libbpf. The above map_is_created()
check actually interferes with some of the internal uses, if map FD is
fetched through bpf_map__fd().
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Instead of inferring whether map already point to previously
created/pinned BPF map (which user can specify with bpf_map__reuse_fd()) API),
use explicit map->reused flag that is set in such case.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It makes future grepping and code analysis a bit easier.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a selftest to capture the verification failure when the allocation
size is greater than 512.
Acked-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231222031812.1293190-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In the previous patch, the maximum data size for bpf_global_percpu_ma
is 512 bytes. This breaks selftest test_bpf_ma. The test is adjusted
in two aspects:
- Since the maximum allowed data size for bpf_global_percpu_ma is
512, remove all tests beyond that, names sizes 1024, 2048 and 4096.
- Previously the percpu data size is bucket_size - 8 in order to
avoid percpu allocation into the next bucket. This patch removed
such data size adjustment thanks to Patch 1.
Also, a better way to generate BTF type is used than adding
a member to the value struct.
Acked-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231222031807.1292853-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test that replaces the same socket with itself. This exercises a
corner case where old element and new element have the same posck.
Test protocols: TCP, UDP, stream af_unix and dgram af_unix.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20231221232327.43678-6-john.fastabend@gmail.com
Add test with multiple maps where each socket is inserted in multiple
maps. Test protocols: TCP, UDP, stream af_unix and dgram af_unix.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20231221232327.43678-5-john.fastabend@gmail.com
Add test with a single map where each socket is inserted multiple
times. Test protocols: TCP, UDP, stream af_unix and dgram af_unix.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20231221232327.43678-4-john.fastabend@gmail.com
bpf_nop_mov(var) asm macro emits nop register move: rX = rX.
If 'var' is a scalar and not a fixed constant the verifier will assign ID to it.
If it's later spilled the stack slot will carry that ID as well.
Hence the range refining comparison "if rX < const" will update all copies
including spilled slot.
This macro is a temporary workaround until the verifier gets smarter.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231226191148.48536-6-alexei.starovoitov@gmail.com
Since the last user was converted to bpf_cmp, remove bpf_assert_eq/ne/... macros.
__bpf_assert_op() macro is kept for experiments, since it's slightly more efficient
than bpf_assert(bpf_cmp_unlikely()) until LLVM is fixed.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20231226191148.48536-5-alexei.starovoitov@gmail.com
Convert exceptions_assert.c to bpf_cmp_unlikely() macro.
Since
bpf_assert(bpf_cmp_unlikely(var, ==, 100));
other code;
will generate assembly code:
if r1 == 100 goto L2;
r0 = 0
call bpf_throw
L1:
other code;
...
L2: goto L1;
LLVM generates redundant basic block with extra goto. LLVM will be fixed eventually.
Right now it's less efficient than __bpf_assert(var, ==, 100) macro that produces:
if r1 == 100 goto L1;
r0 = 0
call bpf_throw
L1:
other code;
But extra goto doesn't hurt the verification process.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20231226191148.48536-4-alexei.starovoitov@gmail.com
Compilers optimize conditional operators at will, but often bpf programmers
want to force compilers to keep the same operator in asm as it's written in C.
Introduce bpf_cmp_likely/unlikely(var1, conditional_op, var2) macros that can be used as:
- if (seen >= 1000)
+ if (bpf_cmp_unlikely(seen, >=, 1000))
The macros take advantage of BPF assembly that is C like.
The macros check the sign of variable 'seen' and emits either
signed or unsigned compare.
For example:
int a;
bpf_cmp_unlikely(a, >, 0) will be translated to 'if rX s> 0 goto' in BPF assembly.
unsigned int a;
bpf_cmp_unlikely(a, >, 0) will be translated to 'if rX > 0 goto' in BPF assembly.
C type conversions coupled with comparison operator are tricky.
int i = -1;
unsigned int j = 1;
if (i < j) // this is false.
long i = -1;
unsigned int j = 1;
if (i < j) // this is true.
Make sure BPF program is compiled with -Wsign-compare then the macros will catch
the mistake.
The macros check LHS (left hand side) only to figure out the sign of compare.
'if 0 < rX goto' is not allowed in the assembly, so the users
have to use a variable on LHS anyway.
The patch updates few tests to demonstrate the use of the macros.
The macro allows to use BPF_JSET in C code, since LLVM doesn't generate it at
present. For example:
if (i & j) compiles into r0 &= r1; if r0 == 0 goto
while
if (bpf_cmp_unlikely(i, &, j)) compiles into if r0 & r1 goto
Note that the macros has to be careful with RHS assembly predicate.
Since:
u64 __rhs = 1ull << 42;
asm goto("if r0 < %[rhs] goto +1" :: [rhs] "ri" (__rhs));
LLVM will silently truncate 64-bit constant into s32 imm.
Note that [lhs] "r"((short)LHS) the type cast is a workaround for LLVM issue.
When LHS is exactly 32-bit LLVM emits redundant <<=32, >>=32 to zero upper 32-bits.
When LHS is 64 or 16 or 8-bit variable there are no shifts.
When LHS is 32-bit the (u64) cast doesn't help. Hence use (short) cast.
It does _not_ truncate the variable before it's assigned to a register.
Traditional likely()/unlikely() macros that use __builtin_expect(!!(x), 1 or 0)
have no effect on these macros, hence macros implement the logic manually.
bpf_cmp_unlikely() macro preserves compare operator as-is while
bpf_cmp_likely() macro flips the compare.
Consider two cases:
A.
for() {
if (foo >= 10) {
bar += foo;
}
other code;
}
B.
for() {
if (foo >= 10)
break;
other code;
}
It's ok to use either bpf_cmp_likely or bpf_cmp_unlikely macros in both cases,
but consider that 'break' is effectively 'goto out_of_the_loop'.
Hence it's better to use bpf_cmp_unlikely in the B case.
While 'bar += foo' is better to keep as 'fallthrough' == likely code path in the A case.
When it's written as:
A.
for() {
if (bpf_cmp_likely(foo, >=, 10)) {
bar += foo;
}
other code;
}
B.
for() {
if (bpf_cmp_unlikely(foo, >=, 10))
break;
other code;
}
The assembly will look like:
A.
for() {
if r1 < 10 goto L1;
bar += foo;
L1:
other code;
}
B.
for() {
if r1 >= 10 goto L2;
other code;
}
L2:
The bpf_cmp_likely vs bpf_cmp_unlikely changes basic block layout, hence it will
greatly influence the verification process. The number of processed instructions
will be different, since the verifier walks the fallthrough first.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20231226191148.48536-3-alexei.starovoitov@gmail.com
GCC's -Wall includes -Wsign-compare while clang does not.
Since BPF programs are built with clang we need to add this flag explicitly
to catch problematic comparisons like:
int i = -1;
unsigned int j = 1;
if (i < j) // this is false.
long i = -1;
unsigned int j = 1;
if (i < j) // this is true.
C standard for reference:
- If either operand is unsigned long the other shall be converted to unsigned long.
- Otherwise, if one operand is a long int and the other unsigned int, then if a
long int can represent all the values of an unsigned int, the unsigned int
shall be converted to a long int; otherwise both operands shall be converted to
unsigned long int.
- Otherwise, if either operand is long, the other shall be converted to long.
- Otherwise, if either operand is unsigned, the other shall be converted to unsigned.
Unfortunately clang's -Wsign-compare is very noisy.
It complains about (s32)a == (u32)b which is safe and doen't have surprising behavior.
This patch fixes some of the issues. It needs a follow up to fix the rest.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20231226191148.48536-2-alexei.starovoitov@gmail.com
This patch adds a test for the condition that the previous patch mucked
with - illegal zero-sized helper memory access. As opposed to existing
tests, this new one uses a size whose lower bound is zero, as opposed to
a known-zero one.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231221232225.568730-3-andreimatei1@gmail.com
This patch simplifies the verification of size arguments associated to
pointer arguments to helpers and kfuncs. Many helpers take a pointer
argument followed by the size of the memory access performed to be
performed through that pointer. Before this patch, the handling of the
size argument in check_mem_size_reg() was confusing and wasteful: if the
size register's lower bound was 0, then the verification was done twice:
once considering the size of the access to be the lower-bound of the
respective argument, and once considering the upper bound (even if the
two are the same). The upper bound checking is a super-set of the
lower-bound checking(*), except: the only point of the lower-bound check
is to handle the case where zero-sized-accesses are explicitly not
allowed and the lower-bound is zero. This static condition is now
checked explicitly, replacing a much more complex, expensive and
confusing verification call to check_helper_mem_access().
Error messages change in this patch. Before, messages about illegal
zero-size accesses depended on the type of the pointer and on other
conditions, and sometimes the message was plain wrong: in some tests
that changed you'll see that the old message was something like "R1 min
value is outside of the allowed memory range", where R1 is the pointer
register; the error was wrongly claiming that the pointer was bad
instead of the size being bad. Other times the information that the size
came for a register with a possible range of values was wrong, and the
error presented the size as a fixed zero. Now the errors refer to the
right register. However, the old error messages did contain useful
information about the pointer register which is now lost; recovering
this information was deemed not important enough.
(*) Besides standing to reason that the checks for a bigger size access
are a super-set of the checks for a smaller size access, I have also
mechanically verified this by reading the code for all types of
pointers. I could convince myself that it's true for all but
PTR_TO_BTF_ID (check_ptr_to_btf_access). There, simply looking
line-by-line does not immediately prove what we want. If anyone has any
qualms, let me know.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231221232225.568730-2-andreimatei1@gmail.com
Commit 051d442098 ("net/sched: Retire CBQ qdisc") retired the CBQ qdisc.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit fb38306ceb ("net/sched: Retire ATM qdisc") retired the ATM qdisc.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit bbe77c14ee ("net/sched: Retire dsmark qdisc") retired the dsmark
classifier. Remove UAPI support for it.
Iproute2 will sync by equally removing it from user space.
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 8c710f7525 ("net/sched: Retire tcindex classifier") retired the TC
tcindex classifier.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 265b4da82d ("net/sched: Retire rsvp classifier") retired the TC RSVP
classifier.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new helper chk_msk_cestab() to check the current
established connections counter MIB_CURRESTAB in diag.sh. Invoke it
to check the counter during the connection after every chk_msk_inuse().
Signed-off-by: Geliang Tang <geliang.tang@linux.dev>
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new helper chk_cestab_nr() to check the current
established connections counter MIB_CURRESTAB. Set the newly added
variables cestab_ns1 and cestab_ns2 to indicate how many connections
are expected in ns1 or ns2.
Invoke check_cestab() to check the counter during the connection in
do_transfer() and invoke chk_cestab_nr() to re-check it when the
connection closed. These checks are embedded in add_tests().
Signed-off-by: Geliang Tang <geliang.tang@linux.dev>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit f5769faeec ("net: Namespace-ify sysctl_optmem_max")
optmem_max is per-netns, so need of switching to root namespace.
It seems trivial to keep the old logic working, so going to keep it for
a while (at least, until kernel with netns-optmem_max will be release).
Currently, there is a test that checks that optmem_max limit applies to
TCP-AO keys and a little benchmark that measures linked-list TCP-AO keys
scaling, those are fixed by this.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In unsigned-md5 selftests ip_route_add() is not needed in
client_add_ip(): the route was pre-setup in __test_init() => link_init()
for subnet, rather than a specific ip-address.
Currently, __ip_route_add() mistakenly always sets VRF table
to RT_TABLE_MAIN - this seems to have sneaked in during unsigned-md5
tests debugging. That also explains, why ip_route_add_vrf() ignored
EEXIST, returned by fib6.
Yet, keep EEXIST ignoring in bench-lookups selftests as it's expected
that those selftests may add the same (duplicate) routes.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tc ipt action was intended to run all netfilter/iptables target.
Unfortunately it has not benefitted over the years from proper updates when
netfilter changes, and for that reason it has remained rudimentary.
Pinging a bunch of people that i was aware were using this indicates that
removing it wont affect them.
Retire it to reduce maintenance efforts. Buh-bye.
Reviewed-by: Victor Noguiera <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmWFd9oACgkQ1V2XiooU
IOTBog//auEj3LR5v2b5et6CLsuMESsCFkVko1AFdzzpu94sJ8TTkqlrW7SPSfPW
CMNYz1EYFaRGK1GjfH0a49EUofeU/kPD/TmffmiYwuJlGyEIXp17ZIUQnGDrM6Mz
UBHX8VrW73McJYuZJbxM4HX6Q3aFyshKy8iOPpffdFAQcCD5XtkYbW24Mzw6V4sI
8fBz6C/iT+5Jq1wGtvM2xmGkNYuMLJxbf9qOpinRZCXQV/z5IWda/bFnv3kpJ9mR
x2ZYXHnzFEr2gVFIhZ7G6FvMgQAkkKGVQE+n9zVZ9V78czqBk9depdeLaq/RYGyx
8VYqJyaXHKQm7FqsBwSSxU964aQhi/zNrjF9SeJSfcRNhTmXgzdERndVWVjUPe9O
rFLyQVoQ6JyKy5++1lL3+63cd6ZeGw8gUw8MCw9DX8rwamnNFunUsixKv4SLzQHA
jfFgmAUq+HVWHXI4xmj+SDO8g+s7O3BpbZM09E3si6lwy4/LiQqHhV4sNgEVN/It
hf76t/U6GRYZ0Hwy2dwQl8SZg1BxMSGFmzB7vyEZyGXaqxRZb1Jym4h9slOMKXWl
gS5y4y0ZVTmzRCqJR+jUrMAKPCw/R964LrslDYwQpsQeP7nuUPeU5jH1VWD6kAbS
Vc36uDRnfojoNiRcmAfuhvZIseM1+hI6UXCIsqYR6cekCFlBad8=
=+Uvv
-----END PGP SIGNATURE-----
Merge tag 'nf-next-23-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
netfilter pull request 23-12-22
The following patchset contains Netfilter updates for net-next:
1) Add locking for NFT_MSG_GETSETELEM_RESET requests, to address a
race scenario with two concurrent processes running a dump-and-reset
which exposes negative counters to userspace, from Phil Sutter.
2) Use GFP_KERNEL in pipapo GC, from Florian Westphal.
3) Reorder nf_flowtable struct members, place the read-mostly parts
accessed by the datapath first. From Florian Westphal.
4) Set on dead flag for NFT_MSG_NEWSET in abort path,
from Florian Westphal.
5) Support filtering zone in ctnetlink, from Felix Huettner.
6) Bail out if user tries to redefine an existing chain with different
type in nf_tables.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZYVEqQAKCRDbK58LschI
gzH6AP9hVXLpHFTWMT0+2GK2lx69VX8zW1C0SmN7WHaxUbPN9QEAwzGnELfKk00P
0IKRHSl5abhVMX7JOM3sSOhCILeKjQg=
=wRLJ
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
bpf-next-for-netdev
The following pull-request contains BPF updates for your *net-next* tree.
We've added 22 non-merge commits during the last 3 day(s) which contain
a total of 23 files changed, 652 insertions(+), 431 deletions(-).
The main changes are:
1) Add verifier support for annotating user's global BPF subprogram arguments
with few commonly requested annotations for a better developer experience,
from Andrii Nakryiko.
These tags are:
- Ability to annotate a special PTR_TO_CTX argument
- Ability to annotate a generic PTR_TO_MEM as non-NULL
2) Support BPF verifier tracking of BPF_JNE which helps cases when the compiler
transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like, from
Menglong Dong.
3) Fix a warning in bpf_mem_cache's check_obj_size() as reported by LKP, from Hou Tao.
4) Re-support uid/gid options when mounting bpffs which had to be reverted with
the prior token series revert to avoid conflicts, from Daniel Borkmann.
5) Fix a libbpf NULL pointer dereference in bpf_object__collect_prog_relos() found
from fuzzing the library with malformed ELF files, from Mingyi Zhang.
6) Skip DWARF sections in libbpf's linker sanity check given compiler options to
generate compressed debug sections can trigger a rejection due to misalignment,
from Alyssa Ross.
7) Fix an unnecessary use of the comma operator in BPF verifier, from Simon Horman.
8) Fix format specifier for unsigned long values in cpustat sample, from Colin Ian King.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since previous commit, MPTCP has support for IP_BIND_ADDRESS_NO_PORT and
IP_LOCAL_PORT_RANGE sockopts.
Add ip4_mptcp and ip6_mptcp fixture variants to ip_local_port_range
selftest to provide selftest coverage for these sockopts.
Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Maxim Galaganov <max@internet.ru>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some devices do not support individual 'pmac' and 'emac' stats.
For such devices, resort to 'aggregate' stats.
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some devices have errata due to which they cannot report ETH_ZLEN (60)
in the rx-min-frag-size. This was foreseen of course, and lldpad has
logic that when we request it to advertise addFragSize 0, it will round
it up to the lowest value that is _actually_ supported by the hardware.
The problem is that the selftest expects lldpad to report back to us the
same value as we requested.
Make the selftest smarter by figuring out on its own what is a
reasonable value to expect.
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a variable RUN_IN_NETNS if the user wants to run all the selected tests
in namespace in parallel. With this, we can save a lot of testing time.
Note that some tests may not fit to run in namespace, e.g.
net/drop_monitor_tests.sh, as the dwdump needs to be run in init ns.
I also added another parameter -p to make all the logs reported separately
instead of mixing them in the stdout or output.log.
Nit: the NUM in run_one is not used, rename it to test_num.
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pmtu test use /bin/sh, so we need to source ./lib.sh instead of lib.sh
Here is the test result after conversion.
# ./pmtu.sh
TEST: ipv4: PMTU exceptions [ OK ]
TEST: ipv4: PMTU exceptions - nexthop objects [ OK ]
TEST: ipv6: PMTU exceptions [ OK ]
TEST: ipv6: PMTU exceptions - nexthop objects [ OK ]
...
TEST: ipv4: list and flush cached exceptions - nexthop objects [ OK ]
TEST: ipv6: list and flush cached exceptions [ OK ]
TEST: ipv6: list and flush cached exceptions - nexthop objects [ OK ]
TEST: ipv4: PMTU exception w/route replace [ OK ]
TEST: ipv4: PMTU exception w/route replace - nexthop objects [ OK ]
TEST: ipv6: PMTU exception w/route replace [ OK ]
TEST: ipv6: PMTU exception w/route replace - nexthop objects [ OK ]
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The setup_loopback and setup_veth use their own way to create namespace.
So let's just re-define server_ns/client_ns to unique name.
At the same time update the namespace name in gro.sh and toeplitz.sh.
As I don't have env to run toeplitz.sh. Here is only the gro test result.
# ./gro.sh
running test ipv4 data
Expected {200 }, Total 1 packets
Received {200 }, Total 1 packets.
...
Gro::large test passed.
All Tests Succeeded!
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the test result after conversion.
# ./xfrm_policy.sh
PASS: policy before exception matches
PASS: ping to .254 bypassed ipsec tunnel (exceptions)
PASS: direct policy matches (exceptions)
PASS: policy matches (exceptions)
PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies)
PASS: direct policy matches (exceptions and block policies)
PASS: policy matches (exceptions and block policies)
PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after hresh changes)
PASS: direct policy matches (exceptions and block policies after hresh changes)
PASS: policy matches (exceptions and block policies after hresh changes)
PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after hthresh change in ns3)
PASS: direct policy matches (exceptions and block policies after hthresh change in ns3)
PASS: policy matches (exceptions and block policies after hthresh change in ns3)
PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after htresh change to normal)
PASS: direct policy matches (exceptions and block policies after htresh change to normal)
PASS: policy matches (exceptions and block policies after htresh change to normal)
PASS: policies with repeated htresh change
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the test result after conversion.
# ./stress_reuseport_listen.sh
listen 24000 socks took 0.47714
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running the test in namespace, the debugfs may not load automatically.
So add a checking to make sure debugfs loaded. Here is the test result
after conversion.
# ./rtnetlink.sh
PASS: policy routing
PASS: route get
...
PASS: address proto IPv4
PASS: address proto IPv6
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test will move the device to netns 1. Add a new test_ns to do this.
Here is the test result after conversion.
# ./netns-name.sh
netns-name.sh [ OK ]
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the test result after conversion.
# ./gre_gso.sh
TEST: GREv6/v4 - copy file w/ TSO [ OK ]
TEST: GREv6/v4 - copy file w/ GSO [ OK ]
TEST: GREv6/v6 - copy file w/ TSO [ OK ]
TEST: GREv6/v6 - copy file w/ GSO [ OK ]
Tests passed: 4
Tests failed: 0
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix for out-of-tree build that I wasn't testing previously:
1. Create a directory for library object files, fixes:
> gcc lib/kconfig.c -Wall -O2 -g -D_GNU_SOURCE -fno-strict-aliasing -I ../../../../../usr/include/ -iquote /tmp/kselftest/kselftest/net/tcp_ao/lib -I ../../../../include/ -o /tmp/kselftest/kselftest/net/tcp_ao/lib/kconfig.o -c
> Assembler messages:
> Fatal error: can't create /tmp/kselftest/kselftest/net/tcp_ao/lib/kconfig.o: No such file or directory
> make[1]: *** [Makefile:46: /tmp/kselftest/kselftest/net/tcp_ao/lib/kconfig.o] Error 1
2. Include $(KHDR_INCLUDES) that's exported by selftests/Makefile, fixes:
> In file included from lib/kconfig.c:6:
> lib/aolib.h:320:45: warning: ‘struct tcp_ao_add’ declared inside parameter list will not be visible outside of this definition or declaration
> 320 | extern int test_prepare_key_sockaddr(struct tcp_ao_add *ao, const char *alg,
> | ^~~~~~~~~~
...
3. While at here, clean-up $(KSFT_KHDR_INSTALL): it's not needed anymore
since commit f2745dc0ba ("selftests: stop using KSFT_KHDR_INSTALL")
4. Also, while at here, drop .DEFAULT_GOAL definition: that has a
self-explaining comment, that was valid when I made these selftests
compile on local v4.19 kernel, but not needed since
commit 8ce72dc325 ("selftests: fix headers_install circular dependency")
Fixes: cfbab37b3d ("selftests/net: Add TCP-AO library")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312190645.q76MmHyq-lkp@intel.com/
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a handful of spelling mistakes in test messages in the
TCP-AIO selftests. Fix these.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>