mirror of
https://github.com/torvalds/linux.git
synced 2024-11-15 00:21:59 +00:00
376bd59e2a
Salvatore Benedetto reported an issue that when doing syscall tracepoint tracing the kernel stack is empty. For example, using the following command line bpftrace -e 'tracepoint:syscalls:sys_enter_read { print("Kernel Stack\n"); print(kstack()); }' bpftrace -e 'tracepoint:syscalls:sys_exit_read { print("Kernel Stack\n"); print(kstack()); }' the output for both commands is === Kernel Stack === Further analysis shows that pt_regs used for bpf syscall tracepoint tracing is from the one constructed during user->kernel transition. The call stack looks like perf_syscall_enter+0x88/0x7c0 trace_sys_enter+0x41/0x80 syscall_trace_enter+0x100/0x160 do_syscall_64+0x38/0xf0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The ip address stored in pt_regs is from user space hence no kernel stack is printed. To fix the issue, kernel address from pt_regs is required. In kernel repo, there are already a few cases like this. For example, in kernel/trace/bpf_trace.c, several perf_fetch_caller_regs(fake_regs_ptr) instances are used to supply ip address or use ip address to construct call stack. Instead of allocate fake_regs in the stack which may consume a lot of bytes, the function perf_trace_buf_alloc() in perf_syscall_{enter, exit}() is leveraged to create fake_regs, which will be passed to perf_call_bpf_{enter,exit}(). For the above bpftrace script, I got the following output with this patch: for tracepoint:syscalls:sys_enter_read === Kernel Stack syscall_trace_enter+407 syscall_trace_enter+407 do_syscall_64+74 entry_SYSCALL_64_after_hwframe+75 === and for tracepoint:syscalls:sys_exit_read === Kernel Stack syscall_exit_work+185 syscall_exit_work+185 syscall_exit_to_user_mode+305 do_syscall_64+118 entry_SYSCALL_64_after_hwframe+75 === Reported-by: Salvatore Benedetto <salvabenedetto@meta.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240910214037.3663272-1-yonghong.song@linux.dev |
||
---|---|---|
.. | ||
rv | ||
blktrace.c | ||
bpf_trace.c | ||
bpf_trace.h | ||
error_report-traces.c | ||
fgraph.c | ||
fprobe.c | ||
ftrace_internal.h | ||
ftrace.c | ||
Kconfig | ||
kprobe_event_gen_test.c | ||
Makefile | ||
pid_list.c | ||
pid_list.h | ||
power-traces.c | ||
preemptirq_delay_test.c | ||
rethook.c | ||
ring_buffer_benchmark.c | ||
ring_buffer.c | ||
rpm-traces.c | ||
synth_event_gen_test.c | ||
trace_benchmark.c | ||
trace_benchmark.h | ||
trace_boot.c | ||
trace_branch.c | ||
trace_btf.c | ||
trace_btf.h | ||
trace_clock.c | ||
trace_dynevent.c | ||
trace_dynevent.h | ||
trace_entries.h | ||
trace_eprobe.c | ||
trace_event_perf.c | ||
trace_events_filter_test.h | ||
trace_events_filter.c | ||
trace_events_hist.c | ||
trace_events_inject.c | ||
trace_events_synth.c | ||
trace_events_trigger.c | ||
trace_events_user.c | ||
trace_events.c | ||
trace_export.c | ||
trace_fprobe.c | ||
trace_functions_graph.c | ||
trace_functions.c | ||
trace_hwlat.c | ||
trace_irqsoff.c | ||
trace_kdb.c | ||
trace_kprobe_selftest.c | ||
trace_kprobe_selftest.h | ||
trace_kprobe.c | ||
trace_mmiotrace.c | ||
trace_nop.c | ||
trace_osnoise.c | ||
trace_output.c | ||
trace_output.h | ||
trace_preemptirq.c | ||
trace_printk.c | ||
trace_probe_kernel.h | ||
trace_probe_tmpl.h | ||
trace_probe.c | ||
trace_probe.h | ||
trace_recursion_record.c | ||
trace_sched_switch.c | ||
trace_sched_wakeup.c | ||
trace_selftest_dynamic.c | ||
trace_selftest.c | ||
trace_seq.c | ||
trace_stack.c | ||
trace_stat.c | ||
trace_stat.h | ||
trace_synth.h | ||
trace_syscalls.c | ||
trace_uprobe.c | ||
trace.c | ||
trace.h | ||
tracing_map.c | ||
tracing_map.h |