linux/tools/testing/selftests/bpf/progs/test_overhead.c

49 lines
951 B
C
Raw Normal View History

selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros Streamline BPF_TRACE_x macro by moving out return type and section attribute definition out of macro itself. That makes those function look in source code similar to other BPF programs. Additionally, simplify its usage by determining number of arguments automatically (so just single BPF_TRACE vs a family of BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument syntax without commas inbetween argument type and name. Given this helper is useful not only for tracing tp_btf/fenty/fexit programs, but could be used for LSM programs and others following the same pattern, rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x usages in selftests are converted to new BPF_PROG macro. Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts same convention used by fexit programs, that last defined argument is probed function's return result. v4->v5: - fix test_overhead test (__set_task_comm is void) (Alexei); v3->v4: - rebased and fixed one more BPF_TRACE_x occurence (Alexei); v2->v3: - rename to shorter and as generic BPF_PROG (Alexei); v1->v2: - verified GCC handles pragmas as expected; - added descriptions to macros; - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected); - added original context as 'ctx' parameter, for cases where it has to be passed into BPF helpers. This might cause an accidental naming collision, unfortunately, but at least it's easy to work around. Fortunately, this situation produces quite legible compilation error: progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long *' int ctx = 123; ^ progs/bpf_dctcp.c:42:6: note: previous definition is here void BPF_HANDLER(dctcp_init, struct sock *sk) ^ ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER' ____##name(unsigned long long *ctx, ##args) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com
2020-01-10 21:16:34 +00:00
#include <stdbool.h>
#include <stddef.h>
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
#include <linux/bpf.h>
selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros Streamline BPF_TRACE_x macro by moving out return type and section attribute definition out of macro itself. That makes those function look in source code similar to other BPF programs. Additionally, simplify its usage by determining number of arguments automatically (so just single BPF_TRACE vs a family of BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument syntax without commas inbetween argument type and name. Given this helper is useful not only for tracing tp_btf/fenty/fexit programs, but could be used for LSM programs and others following the same pattern, rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x usages in selftests are converted to new BPF_PROG macro. Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts same convention used by fexit programs, that last defined argument is probed function's return result. v4->v5: - fix test_overhead test (__set_task_comm is void) (Alexei); v3->v4: - rebased and fixed one more BPF_TRACE_x occurence (Alexei); v2->v3: - rename to shorter and as generic BPF_PROG (Alexei); v1->v2: - verified GCC handles pragmas as expected; - added descriptions to macros; - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected); - added original context as 'ctx' parameter, for cases where it has to be passed into BPF helpers. This might cause an accidental naming collision, unfortunately, but at least it's easy to work around. Fortunately, this situation produces quite legible compilation error: progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long *' int ctx = 123; ^ progs/bpf_dctcp.c:42:6: note: previous definition is here void BPF_HANDLER(dctcp_init, struct sock *sk) ^ ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER' ____##name(unsigned long long *ctx, ##args) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com
2020-01-10 21:16:34 +00:00
#include <linux/ptrace.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros Streamline BPF_TRACE_x macro by moving out return type and section attribute definition out of macro itself. That makes those function look in source code similar to other BPF programs. Additionally, simplify its usage by determining number of arguments automatically (so just single BPF_TRACE vs a family of BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument syntax without commas inbetween argument type and name. Given this helper is useful not only for tracing tp_btf/fenty/fexit programs, but could be used for LSM programs and others following the same pattern, rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x usages in selftests are converted to new BPF_PROG macro. Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts same convention used by fexit programs, that last defined argument is probed function's return result. v4->v5: - fix test_overhead test (__set_task_comm is void) (Alexei); v3->v4: - rebased and fixed one more BPF_TRACE_x occurence (Alexei); v2->v3: - rename to shorter and as generic BPF_PROG (Alexei); v1->v2: - verified GCC handles pragmas as expected; - added descriptions to macros; - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected); - added original context as 'ctx' parameter, for cases where it has to be passed into BPF helpers. This might cause an accidental naming collision, unfortunately, but at least it's easy to work around. Fortunately, this situation produces quite legible compilation error: progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long *' int ctx = 123; ^ progs/bpf_dctcp.c:42:6: note: previous definition is here void BPF_HANDLER(dctcp_init, struct sock *sk) ^ ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER' ____##name(unsigned long long *ctx, ##args) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com
2020-01-10 21:16:34 +00:00
struct task_struct;
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
SEC("kprobe/__set_task_comm")
selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros Streamline BPF_TRACE_x macro by moving out return type and section attribute definition out of macro itself. That makes those function look in source code similar to other BPF programs. Additionally, simplify its usage by determining number of arguments automatically (so just single BPF_TRACE vs a family of BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument syntax without commas inbetween argument type and name. Given this helper is useful not only for tracing tp_btf/fenty/fexit programs, but could be used for LSM programs and others following the same pattern, rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x usages in selftests are converted to new BPF_PROG macro. Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts same convention used by fexit programs, that last defined argument is probed function's return result. v4->v5: - fix test_overhead test (__set_task_comm is void) (Alexei); v3->v4: - rebased and fixed one more BPF_TRACE_x occurence (Alexei); v2->v3: - rename to shorter and as generic BPF_PROG (Alexei); v1->v2: - verified GCC handles pragmas as expected; - added descriptions to macros; - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected); - added original context as 'ctx' parameter, for cases where it has to be passed into BPF helpers. This might cause an accidental naming collision, unfortunately, but at least it's easy to work around. Fortunately, this situation produces quite legible compilation error: progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long *' int ctx = 123; ^ progs/bpf_dctcp.c:42:6: note: previous definition is here void BPF_HANDLER(dctcp_init, struct sock *sk) ^ ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER' ____##name(unsigned long long *ctx, ##args) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com
2020-01-10 21:16:34 +00:00
int BPF_KPROBE(prog1, struct task_struct *tsk, const char *buf, bool exec)
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
{
selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros Streamline BPF_TRACE_x macro by moving out return type and section attribute definition out of macro itself. That makes those function look in source code similar to other BPF programs. Additionally, simplify its usage by determining number of arguments automatically (so just single BPF_TRACE vs a family of BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument syntax without commas inbetween argument type and name. Given this helper is useful not only for tracing tp_btf/fenty/fexit programs, but could be used for LSM programs and others following the same pattern, rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x usages in selftests are converted to new BPF_PROG macro. Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts same convention used by fexit programs, that last defined argument is probed function's return result. v4->v5: - fix test_overhead test (__set_task_comm is void) (Alexei); v3->v4: - rebased and fixed one more BPF_TRACE_x occurence (Alexei); v2->v3: - rename to shorter and as generic BPF_PROG (Alexei); v1->v2: - verified GCC handles pragmas as expected; - added descriptions to macros; - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected); - added original context as 'ctx' parameter, for cases where it has to be passed into BPF helpers. This might cause an accidental naming collision, unfortunately, but at least it's easy to work around. Fortunately, this situation produces quite legible compilation error: progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long *' int ctx = 123; ^ progs/bpf_dctcp.c:42:6: note: previous definition is here void BPF_HANDLER(dctcp_init, struct sock *sk) ^ ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER' ____##name(unsigned long long *ctx, ##args) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com
2020-01-10 21:16:34 +00:00
return !tsk;
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
}
SEC("kretprobe/__set_task_comm")
int BPF_KRETPROBE(prog2, int ret)
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
{
return ret;
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
}
SEC("raw_tp/task_rename")
int prog3(struct bpf_raw_tracepoint_args *ctx)
{
selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros Streamline BPF_TRACE_x macro by moving out return type and section attribute definition out of macro itself. That makes those function look in source code similar to other BPF programs. Additionally, simplify its usage by determining number of arguments automatically (so just single BPF_TRACE vs a family of BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument syntax without commas inbetween argument type and name. Given this helper is useful not only for tracing tp_btf/fenty/fexit programs, but could be used for LSM programs and others following the same pattern, rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x usages in selftests are converted to new BPF_PROG macro. Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts same convention used by fexit programs, that last defined argument is probed function's return result. v4->v5: - fix test_overhead test (__set_task_comm is void) (Alexei); v3->v4: - rebased and fixed one more BPF_TRACE_x occurence (Alexei); v2->v3: - rename to shorter and as generic BPF_PROG (Alexei); v1->v2: - verified GCC handles pragmas as expected; - added descriptions to macros; - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected); - added original context as 'ctx' parameter, for cases where it has to be passed into BPF helpers. This might cause an accidental naming collision, unfortunately, but at least it's easy to work around. Fortunately, this situation produces quite legible compilation error: progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long *' int ctx = 123; ^ progs/bpf_dctcp.c:42:6: note: previous definition is here void BPF_HANDLER(dctcp_init, struct sock *sk) ^ ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER' ____##name(unsigned long long *ctx, ##args) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com
2020-01-10 21:16:34 +00:00
return !ctx->args[0];
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
}
selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros Streamline BPF_TRACE_x macro by moving out return type and section attribute definition out of macro itself. That makes those function look in source code similar to other BPF programs. Additionally, simplify its usage by determining number of arguments automatically (so just single BPF_TRACE vs a family of BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument syntax without commas inbetween argument type and name. Given this helper is useful not only for tracing tp_btf/fenty/fexit programs, but could be used for LSM programs and others following the same pattern, rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x usages in selftests are converted to new BPF_PROG macro. Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts same convention used by fexit programs, that last defined argument is probed function's return result. v4->v5: - fix test_overhead test (__set_task_comm is void) (Alexei); v3->v4: - rebased and fixed one more BPF_TRACE_x occurence (Alexei); v2->v3: - rename to shorter and as generic BPF_PROG (Alexei); v1->v2: - verified GCC handles pragmas as expected; - added descriptions to macros; - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected); - added original context as 'ctx' parameter, for cases where it has to be passed into BPF helpers. This might cause an accidental naming collision, unfortunately, but at least it's easy to work around. Fortunately, this situation produces quite legible compilation error: progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long *' int ctx = 123; ^ progs/bpf_dctcp.c:42:6: note: previous definition is here void BPF_HANDLER(dctcp_init, struct sock *sk) ^ ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER' ____##name(unsigned long long *ctx, ##args) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com
2020-01-10 21:16:34 +00:00
SEC("fentry/__set_task_comm")
int BPF_PROG(prog4, struct task_struct *tsk, const char *buf, bool exec)
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
{
return 0;
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
}
selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros Streamline BPF_TRACE_x macro by moving out return type and section attribute definition out of macro itself. That makes those function look in source code similar to other BPF programs. Additionally, simplify its usage by determining number of arguments automatically (so just single BPF_TRACE vs a family of BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument syntax without commas inbetween argument type and name. Given this helper is useful not only for tracing tp_btf/fenty/fexit programs, but could be used for LSM programs and others following the same pattern, rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x usages in selftests are converted to new BPF_PROG macro. Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts same convention used by fexit programs, that last defined argument is probed function's return result. v4->v5: - fix test_overhead test (__set_task_comm is void) (Alexei); v3->v4: - rebased and fixed one more BPF_TRACE_x occurence (Alexei); v2->v3: - rename to shorter and as generic BPF_PROG (Alexei); v1->v2: - verified GCC handles pragmas as expected; - added descriptions to macros; - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected); - added original context as 'ctx' parameter, for cases where it has to be passed into BPF helpers. This might cause an accidental naming collision, unfortunately, but at least it's easy to work around. Fortunately, this situation produces quite legible compilation error: progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long *' int ctx = 123; ^ progs/bpf_dctcp.c:42:6: note: previous definition is here void BPF_HANDLER(dctcp_init, struct sock *sk) ^ ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER' ____##name(unsigned long long *ctx, ##args) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com
2020-01-10 21:16:34 +00:00
SEC("fexit/__set_task_comm")
int BPF_PROG(prog5, struct task_struct *tsk, const char *buf, bool exec)
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
{
return 0;
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
}
selftest/bpf: Fmod_ret prog and implement test_overhead as part of bench Add fmod_ret BPF program to existing test_overhead selftest. Also re-implement user-space benchmarking part into benchmark runner to compare results. Results with ./bench are consistently somewhat lower than test_overhead's, but relative performance of various types of BPF programs stay consisten (e.g., kretprobe is noticeably slower). This slowdown seems to be coming from the fact that test_overhead is single-threaded, while benchmark always spins off at least one thread for producer. This has been confirmed by hacking multi-threaded test_overhead variant and also single-threaded bench variant. Resutls are below. run_bench_rename.sh script from benchs/ subdirectory was used to produce results for ./bench. Single-threaded implementations =============================== /* bench: single-threaded, atomics */ base : 4.622 ± 0.049M/s kprobe : 3.673 ± 0.052M/s kretprobe : 2.625 ± 0.052M/s rawtp : 4.369 ± 0.089M/s fentry : 4.201 ± 0.558M/s fexit : 4.309 ± 0.148M/s fmodret : 4.314 ± 0.203M/s /* selftest: single-threaded, no atomics */ task_rename base 4555K events per sec task_rename kprobe 3643K events per sec task_rename kretprobe 2506K events per sec task_rename raw_tp 4303K events per sec task_rename fentry 4307K events per sec task_rename fexit 4010K events per sec task_rename fmod_ret 3984K events per sec Multi-threaded implementations ============================== /* bench: multi-threaded w/ atomics */ base : 3.910 ± 0.023M/s kprobe : 3.048 ± 0.037M/s kretprobe : 2.300 ± 0.015M/s rawtp : 3.687 ± 0.034M/s fentry : 3.740 ± 0.087M/s fexit : 3.510 ± 0.009M/s fmodret : 3.485 ± 0.050M/s /* selftest: multi-threaded w/ atomics */ task_rename base 3872K events per sec task_rename kprobe 3068K events per sec task_rename kretprobe 2350K events per sec task_rename raw_tp 3731K events per sec task_rename fentry 3639K events per sec task_rename fexit 3558K events per sec task_rename fmod_ret 3511K events per sec /* selftest: multi-threaded, no atomics */ task_rename base 3945K events per sec task_rename kprobe 3298K events per sec task_rename kretprobe 2451K events per sec task_rename raw_tp 3718K events per sec task_rename fentry 3782K events per sec task_rename fexit 3543K events per sec task_rename fmod_ret 3526K events per sec Note that the fact that ./bench benchmark always uses atomic increments for counting, while test_overhead doesn't, doesn't influence test results all that much. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-4-andriin@fb.com
2020-05-12 19:24:44 +00:00
SEC("fmod_ret/__set_task_comm")
int BPF_PROG(prog6, struct task_struct *tsk, const char *buf, bool exec)
{
return !tsk;
}
selftests/bpf: Add BPF trampoline performance test Add a test that benchmarks different ways of attaching BPF program to a kernel function. Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations: $ ./test_progs -n 49 -v|grep events task_rename base 2743K events per sec task_rename kprobe 2419K events per sec task_rename kretprobe 1876K events per sec task_rename raw_tp 2578K events per sec task_rename fentry 2710K events per sec task_rename fexit 2685K events per sec On a kernel with retpoline: $ ./test_progs -n 49 -v|grep events task_rename base 2401K events per sec task_rename kprobe 1930K events per sec task_rename kretprobe 1485K events per sec task_rename raw_tp 2053K events per sec task_rename fentry 2351K events per sec task_rename fexit 2185K events per sec All 5 approaches: - kprobe/kretprobe in __set_task_comm() - raw tracepoint in trace_task_rename() - fentry/fexit in __set_task_comm() are roughly equivalent. __set_task_comm() by itself is quite fast, so any extra instructions add up. Until BPF trampoline was introduced the fastest mechanism was raw tracepoint. kprobe via ftrace was second best. kretprobe is slow due to trap. New fentry/fexit methods via BPF trampoline are clearly the fastest and the difference is more pronounced with retpoline on, since BPF trampoline doesn't use indirect jumps. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
2019-11-22 01:15:15 +00:00
char _license[] SEC("license") = "GPL";