linux/kernel/trace
Yonghong Song 376bd59e2a bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing
Salvatore Benedetto reported an issue that when doing syscall tracepoint
tracing the kernel stack is empty. For example, using the following
command line
  bpftrace -e 'tracepoint:syscalls:sys_enter_read { print("Kernel Stack\n"); print(kstack()); }'
  bpftrace -e 'tracepoint:syscalls:sys_exit_read { print("Kernel Stack\n"); print(kstack()); }'
the output for both commands is
===
  Kernel Stack
===

Further analysis shows that pt_regs used for bpf syscall tracepoint
tracing is from the one constructed during user->kernel transition.
The call stack looks like
  perf_syscall_enter+0x88/0x7c0
  trace_sys_enter+0x41/0x80
  syscall_trace_enter+0x100/0x160
  do_syscall_64+0x38/0xf0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

The ip address stored in pt_regs is from user space hence no kernel
stack is printed.

To fix the issue, kernel address from pt_regs is required.
In kernel repo, there are already a few cases like this. For example,
in kernel/trace/bpf_trace.c, several perf_fetch_caller_regs(fake_regs_ptr)
instances are used to supply ip address or use ip address to construct
call stack.

Instead of allocate fake_regs in the stack which may consume
a lot of bytes, the function perf_trace_buf_alloc() in
perf_syscall_{enter, exit}() is leveraged to create fake_regs,
which will be passed to perf_call_bpf_{enter,exit}().

For the above bpftrace script, I got the following output with this patch:
for tracepoint:syscalls:sys_enter_read
===
  Kernel Stack

        syscall_trace_enter+407
        syscall_trace_enter+407
        do_syscall_64+74
        entry_SYSCALL_64_after_hwframe+75
===
and for tracepoint:syscalls:sys_exit_read
===
Kernel Stack

        syscall_exit_work+185
        syscall_exit_work+185
        syscall_exit_to_user_mode+305
        do_syscall_64+118
        entry_SYSCALL_64_after_hwframe+75
===

Reported-by: Salvatore Benedetto <salvabenedetto@meta.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240910214037.3663272-1-yonghong.song@linux.dev
2024-09-11 13:27:27 -07:00
..
rv rv: Update rv_en(dis)able_monitor doc to match kernel-doc 2024-05-21 19:03:35 -04:00
blktrace.c blktrace: convert strncpy() to strscpy_pad() 2024-04-25 21:07:08 -07:00
bpf_trace.c bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers 2024-09-11 09:58:31 -07:00
bpf_trace.h tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
error_report-traces.c
fgraph.c function_graph: Fix the ret_stack used by ftrace_graph_ret_addr() 2024-08-07 20:20:30 -04:00
fprobe.c fprobe: Fix to allocate entry_data_size buffer with rethook instances 2024-03-01 09:18:24 +09:00
ftrace_internal.h function_graph: Make fgraph_update_pid_func() a stub for !DYNAMIC_FTRACE 2024-06-10 18:08:23 -04:00
ftrace.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
Kconfig tracing: Build event generation tests only as modules 2024-06-12 08:43:12 +09:00
kprobe_event_gen_test.c
Makefile
pid_list.c trace/pid_list: Change gfp flags in pid_list_fill_irq() 2024-07-15 15:07:14 -04:00
pid_list.h
power-traces.c
preemptirq_delay_test.c minmax: make generic MIN() and MAX() macros available everywhere 2024-07-28 15:49:18 -07:00
rethook.c rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get() 2024-05-01 23:18:48 +09:00
ring_buffer_benchmark.c ring-buffer: Read and write to ring buffers with custom sub buffer size 2023-12-20 07:54:56 -05:00
ring_buffer.c ring-buffer: Remove unused function ring_buffer_nr_pages() 2024-08-07 20:26:44 -04:00
rpm-traces.c
synth_event_gen_test.c tracing / synthetic: Disable events after testing in synth_event_gen_test_init() 2023-12-21 10:04:45 -05:00
trace_benchmark.c tracing: Improve benchmark test performance by using do_div() 2024-05-13 20:00:57 -04:00
trace_benchmark.h
trace_boot.c tracing: Allow creating instances with specified system events 2023-12-18 23:14:16 -05:00
trace_branch.c
trace_btf.c tracing/probes: Fix to search structure fields correctly 2024-02-17 21:25:42 +09:00
trace_btf.h
trace_clock.c
trace_dynevent.c
trace_dynevent.h
trace_entries.h
trace_eprobe.c tracing/probes: Support $argN in return probe (kprobe and fprobe) 2024-03-07 00:27:34 +09:00
trace_event_perf.c
trace_events_filter_test.h
trace_events_filter.c
trace_events_hist.c tracing: Have format file honor EVENT_FILE_FL_FREED 2024-08-07 18:12:46 -04:00
trace_events_inject.c tracing: Have format file honor EVENT_FILE_FL_FREED 2024-08-07 18:12:46 -04:00
trace_events_synth.c tracing/synthetic: Fix trace_string() return value 2024-02-15 11:40:01 -05:00
trace_events_trigger.c tracing: Have format file honor EVENT_FILE_FL_FREED 2024-08-07 18:12:46 -04:00
trace_events_user.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
trace_events.c tracing: Use refcount for trace_event_file reference counter 2024-08-07 18:12:46 -04:00
trace_export.c
trace_fprobe.c tracing/probes: support '%pd' type for print struct dentry's name 2024-05-01 23:18:47 +09:00
trace_functions_graph.c function_graph: Move graph notrace bit to shadow stack global var 2024-06-04 10:37:44 -04:00
trace_functions.c ftrace: Hide one more entry in stack trace when ftrace_pid is enabled 2024-06-06 15:22:18 -04:00
trace_hwlat.c
trace_irqsoff.c function_graph: Move set_graph_function tests to shadow stack global var 2024-06-04 10:37:35 -04:00
trace_kdb.c
trace_kprobe_selftest.c
trace_kprobe_selftest.h
trace_kprobe.c tracing/kprobes: Fix build error when find_module() is not available 2024-07-10 09:47:00 +09:00
trace_mmiotrace.c
trace_nop.c
trace_osnoise.c rtla/osnoise: set the default threshold to 1us 2024-07-01 18:54:31 -04:00
trace_output.c tracing: Remove precision vsnprintf() check from print event 2024-03-06 13:26:26 -05:00
trace_output.h
trace_preemptirq.c
trace_printk.c
trace_probe_kernel.h
trace_probe_tmpl.h tracing/probes: Support $argN in return probe (kprobe and fprobe) 2024-03-07 00:27:34 +09:00
trace_probe.c tracing/probes: fix error check in parse_btf_field() 2024-05-27 21:32:35 +09:00
trace_probe.h tracing/probes: support '%pd' type for print struct dentry's name 2024-05-01 23:18:47 +09:00
trace_recursion_record.c
trace_sched_switch.c tracing: Move saved_cmdline code into trace_sched_switch.c 2024-03-17 07:58:53 -04:00
trace_sched_wakeup.c function_graph: Move set_graph_function tests to shadow stack global var 2024-06-04 10:37:35 -04:00
trace_selftest_dynamic.c
trace_selftest.c fgraph: Use str_plural() in test_graph_storage_single() 2024-07-01 19:57:51 -04:00
trace_seq.c trace_seq: Increase the buffer size to almost two pages 2023-12-18 23:14:16 -05:00
trace_stack.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
trace_stat.c
trace_stat.h
trace_synth.h
trace_syscalls.c bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing 2024-09-11 13:27:27 -07:00
trace_uprobe.c uprobes: prevent mutex_lock() under rcu_read_lock() 2024-05-24 07:44:44 +09:00
trace.c tracing: Return from tracing_buffers_read() if the file has been closed 2024-08-09 12:59:35 -04:00
trace.h tracing: Have format file honor EVENT_FILE_FL_FREED 2024-08-07 18:12:46 -04:00
tracing_map.c tracing: Fix overflow in get_free_elt() 2024-08-07 20:23:12 -04:00
tracing_map.h