linux/kernel/trace
Linus Torvalds 9f0c253ddd Performance events changes for v6.12:
- Implement per-PMU context rescheduling to significantly improve single-PMU
    performance, and related cleanups/fixes. (by Peter Zijlstra and Namhyung Kim)
 
  - Fix ancient bug resulting in a lot of events being dropped erroneously
    at higher sampling frequencies. (by Luo Gengkun)
 
  - uprobes enhancements:
 
      - Implement RCU-protected hot path optimizations for better performance:
 
          "For baseline vs SRCU, peak througput increased from 3.7 M/s (million uprobe
           triggerings per second) up to about 8 M/s. For uretprobes it's a bit more
           modest with bump from 2.4 M/s to 5 M/s.
 
           For SRCU vs RCU Tasks Trace, peak throughput for uprobes increases further from
           8 M/s to 10.3 M/s (+28%!), and for uretprobes from 5.3 M/s to 5.8 M/s (+11%),
           as we have more work to do on uretprobes side.
 
           Even single-thread (no contention) performance is slightly better: 3.276 M/s to
           3.396 M/s (+3.5%) for uprobes, and 2.055 M/s to 2.174 M/s (+5.8%)
           for uretprobes."
 
           (by Andrii Nakryiko et al)
 
      - Document mmap_lock, don't abuse get_user_pages_remote(). (by Oleg Nesterov)
 
      - Cleanups & fixes to prepare for future work:
 
         - Remove uprobe_register_refctr()
 	- Simplify error handling for alloc_uprobe()
         - Make uprobe_register() return struct uprobe *
         - Fold __uprobe_unregister() into uprobe_unregister()
         - Shift put_uprobe() from delete_uprobe() to uprobe_unregister()
         - BPF: Fix use-after-free in bpf_uprobe_multi_link_attach()
 
           (by Oleg Nesterov)
 
  - New feature & ABI extension: allow events to use PERF_SAMPLE READ with
    inheritance, enabling sample based profiling of a group of counters over
    a hierarchy of processes or threads.  (by Ben Gainey)
 
  - Intel uncore & power events updates:
 
       - Add Arrow Lake and Lunar Lake support
       - Add PERF_EV_CAP_READ_SCOPE
       - Clean up and enhance cpumask and hotplug support
 
         (by Kan Liang)
 
       - Add LNL uncore iMC freerunning support
       - Use D0:F0 as a default device
 
         (by Zhenyu Wang)
 
  - Intel PT: fix AUX snapshot handling race. (by Adrian Hunter)
 
  - Misc fixes and cleanups. (by James Clark, Jiri Olsa, Oleg Nesterov and Peter Zijlstra)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmbqxEwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iusw/43UAcAZVof6Qs+j6bVAxSabF66fFfE9Wh
 jc+F4yZ2MGl9x6a1f392+CPcTdVsYp6G2QtRGMipD+trmi/lhDhmRrhxxD1KWIwP
 zVGSBx9CSFl0UpCXdGiVrGzT5xpIpJ4qqW2XUVr32n8SxTT5X/vM5ySm6KUXsIrD
 2/KXwucT9a7grkl3pvy/A/FUHxaF7oAMJjcIPSvLBveQjQSHUrZoCZdHsRGT9rjS
 HjzxG6gDy97172z5XV1ej3HJOfFlFTQ1RcoxNqdLfiZ6n3hD4hfmtsXWB5zTzRjT
 xHaCOmWLhEp5v+fK2+RCFiWUbDBsmW/mecZdrjGb3C1RIDWQhLCXXc95XtrobTvk
 BkW9QEC/XRB+vU6Ssdv3ugN7yRWxih0BsLU5sy4nlzmwoYt9qOy8fgjRvSBKHr5K
 Mu1RIFu+KXq++sa7+ZJjUMY70PHQCp2m4AHprG/Y98t93CQMhDXzGVpPzWyQuW/V
 lqYFjd/CAoCIVGF4Jxq7sqOdZ1emDN+P0WSnnFWssJ0ZJFvxN9ZDPH2AaMk4lwo7
 NFW6u3+0Vx9P0m/H6xRQj00Iye2JLMqJNCIA8QtjnB7L6upgVvcIPjgcG58fpV1o
 xfJekOR1A7T2aQUDlX5t9Cu36ZUImDRmwHj2m1p84s5AANlbD7/fOmffR1Hn9uFj
 wCTqSpi8Hg==
 =E3s3
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2024-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:

 - Implement per-PMU context rescheduling to significantly improve
   single-PMU performance, and related cleanups/fixes (Peter Zijlstra
   and Namhyung Kim)

 - Fix ancient bug resulting in a lot of events being dropped
   erroneously at higher sampling frequencies (Luo Gengkun)

 - uprobes enhancements:

     - Implement RCU-protected hot path optimizations for better
       performance:

         "For baseline vs SRCU, peak througput increased from 3.7 M/s
          (million uprobe triggerings per second) up to about 8 M/s. For
          uretprobes it's a bit more modest with bump from 2.4 M/s to
          5 M/s.

          For SRCU vs RCU Tasks Trace, peak throughput for uprobes
          increases further from 8 M/s to 10.3 M/s (+28%!), and for
          uretprobes from 5.3 M/s to 5.8 M/s (+11%), as we have more
          work to do on uretprobes side.

          Even single-thread (no contention) performance is slightly
          better: 3.276 M/s to 3.396 M/s (+3.5%) for uprobes, and 2.055
          M/s to 2.174 M/s (+5.8%) for uretprobes."

          (Andrii Nakryiko et al)

     - Document mmap_lock, don't abuse get_user_pages_remote() (Oleg
       Nesterov)

     - Cleanups & fixes to prepare for future work:
        - Remove uprobe_register_refctr()
	- Simplify error handling for alloc_uprobe()
        - Make uprobe_register() return struct uprobe *
        - Fold __uprobe_unregister() into uprobe_unregister()
        - Shift put_uprobe() from delete_uprobe() to uprobe_unregister()
        - BPF: Fix use-after-free in bpf_uprobe_multi_link_attach()
          (Oleg Nesterov)

 - New feature & ABI extension: allow events to use PERF_SAMPLE READ
   with inheritance, enabling sample based profiling of a group of
   counters over a hierarchy of processes or threads (Ben Gainey)

 - Intel uncore & power events updates:

      - Add Arrow Lake and Lunar Lake support
      - Add PERF_EV_CAP_READ_SCOPE
      - Clean up and enhance cpumask and hotplug support
        (Kan Liang)

      - Add LNL uncore iMC freerunning support
      - Use D0:F0 as a default device
        (Zhenyu Wang)

 - Intel PT: fix AUX snapshot handling race (Adrian Hunter)

 - Misc fixes and cleanups (James Clark, Jiri Olsa, Oleg Nesterov and
   Peter Zijlstra)

* tag 'perf-core-2024-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  dmaengine: idxd: Clean up cpumask and hotplug for perfmon
  iommu/vt-d: Clean up cpumask and hotplug for perfmon
  perf/x86/intel/cstate: Clean up cpumask and hotplug
  perf: Add PERF_EV_CAP_READ_SCOPE
  perf: Generic hotplug support for a PMU with a scope
  uprobes: perform lockless SRCU-protected uprobes_tree lookup
  rbtree: provide rb_find_rcu() / rb_find_add_rcu()
  perf/uprobe: split uprobe_unregister()
  uprobes: travers uprobe's consumer list locklessly under SRCU protection
  uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks
  uprobes: protected uprobe lifetime with SRCU
  uprobes: revamp uprobe refcounting and lifetime management
  bpf: Fix use-after-free in bpf_uprobe_multi_link_attach()
  perf/core: Fix small negative period being ignored
  perf: Really fix event_function_call() locking
  perf: Optimize __pmu_ctx_sched_out()
  perf: Add context time freeze
  perf: Fix event_function_call() locking
  perf: Extract a few helpers
  perf: Optimize context reschedule for single PMU cases
  ...
2024-09-18 15:03:58 +02:00
..
rv rv: Update rv_en(dis)able_monitor doc to match kernel-doc 2024-05-21 19:03:35 -04:00
blktrace.c blktrace: convert strncpy() to strscpy_pad() 2024-04-25 21:07:08 -07:00
bpf_trace.c perf/uprobe: split uprobe_unregister() 2024-09-05 16:56:14 +02:00
bpf_trace.h tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
error_report-traces.c
fgraph.c tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops() 2024-08-21 13:49:59 -04:00
fprobe.c fprobe: Fix to allocate entry_data_size buffer with rethook instances 2024-03-01 09:18:24 +09:00
ftrace_internal.h function_graph: Make fgraph_update_pid_func() a stub for !DYNAMIC_FTRACE 2024-06-10 18:08:23 -04:00
ftrace.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
Kconfig tracing: Build event generation tests only as modules 2024-06-12 08:43:12 +09:00
kprobe_event_gen_test.c
Makefile
pid_list.c trace/pid_list: Change gfp flags in pid_list_fill_irq() 2024-07-15 15:07:14 -04:00
pid_list.h
power-traces.c
preemptirq_delay_test.c minmax: make generic MIN() and MAX() macros available everywhere 2024-07-28 15:49:18 -07:00
rethook.c rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get() 2024-05-01 23:18:48 +09:00
ring_buffer_benchmark.c ring-buffer: Read and write to ring buffers with custom sub buffer size 2023-12-20 07:54:56 -05:00
ring_buffer.c ring-buffer: Remove unused function ring_buffer_nr_pages() 2024-08-07 20:26:44 -04:00
rpm-traces.c
synth_event_gen_test.c tracing / synthetic: Disable events after testing in synth_event_gen_test_init() 2023-12-21 10:04:45 -05:00
trace_benchmark.c tracing: Improve benchmark test performance by using do_div() 2024-05-13 20:00:57 -04:00
trace_benchmark.h
trace_boot.c tracing: Allow creating instances with specified system events 2023-12-18 23:14:16 -05:00
trace_branch.c
trace_btf.c tracing/probes: Fix to search structure fields correctly 2024-02-17 21:25:42 +09:00
trace_btf.h
trace_clock.c
trace_dynevent.c
trace_dynevent.h
trace_entries.h
trace_eprobe.c tracing/probes: Support $argN in return probe (kprobe and fprobe) 2024-03-07 00:27:34 +09:00
trace_event_perf.c
trace_events_filter_test.h
trace_events_filter.c tracing: Have trace_event_file have ref counters 2023-11-01 23:44:44 -04:00
trace_events_hist.c tracing: Have format file honor EVENT_FILE_FL_FREED 2024-08-07 18:12:46 -04:00
trace_events_inject.c tracing: Have format file honor EVENT_FILE_FL_FREED 2024-08-07 18:12:46 -04:00
trace_events_synth.c tracing/synthetic: Fix trace_string() return value 2024-02-15 11:40:01 -05:00
trace_events_trigger.c tracing: Have format file honor EVENT_FILE_FL_FREED 2024-08-07 18:12:46 -04:00
trace_events_user.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
trace_events.c tracing: Use refcount for trace_event_file reference counter 2024-08-07 18:12:46 -04:00
trace_export.c
trace_fprobe.c tracing/probes: support '%pd' type for print struct dentry's name 2024-05-01 23:18:47 +09:00
trace_functions_graph.c function_graph: Move graph notrace bit to shadow stack global var 2024-06-04 10:37:44 -04:00
trace_functions.c ftrace: Hide one more entry in stack trace when ftrace_pid is enabled 2024-06-06 15:22:18 -04:00
trace_hwlat.c
trace_irqsoff.c function_graph: Move set_graph_function tests to shadow stack global var 2024-06-04 10:37:35 -04:00
trace_kdb.c
trace_kprobe_selftest.c
trace_kprobe_selftest.h
trace_kprobe.c tracing/kprobes: Fix build error when find_module() is not available 2024-07-10 09:47:00 +09:00
trace_mmiotrace.c
trace_nop.c
trace_osnoise.c RCU pull request for v6.12 2024-09-18 07:52:24 +02:00
trace_output.c tracing: Remove precision vsnprintf() check from print event 2024-03-06 13:26:26 -05:00
trace_output.h
trace_preemptirq.c
trace_printk.c
trace_probe_kernel.h
trace_probe_tmpl.h tracing/probes: Support $argN in return probe (kprobe and fprobe) 2024-03-07 00:27:34 +09:00
trace_probe.c tracing/probes: fix error check in parse_btf_field() 2024-05-27 21:32:35 +09:00
trace_probe.h tracing/probes: support '%pd' type for print struct dentry's name 2024-05-01 23:18:47 +09:00
trace_recursion_record.c
trace_sched_switch.c tracing: Move saved_cmdline code into trace_sched_switch.c 2024-03-17 07:58:53 -04:00
trace_sched_wakeup.c function_graph: Move set_graph_function tests to shadow stack global var 2024-06-04 10:37:35 -04:00
trace_selftest_dynamic.c
trace_selftest.c tracing: Fix memory leak in fgraph storage selftest 2024-08-21 15:10:33 -04:00
trace_seq.c trace_seq: Increase the buffer size to almost two pages 2023-12-18 23:14:16 -05:00
trace_stack.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
trace_stat.c
trace_stat.h
trace_synth.h
trace_syscalls.c bpf: Change syscall_nr type to int in struct syscall_tp_t 2023-10-13 12:39:36 -07:00
trace_uprobe.c perf/uprobe: split uprobe_unregister() 2024-09-05 16:56:14 +02:00
trace.c tracing: Drop unused helper function to fix the build 2024-09-09 16:04:25 -04:00
trace.h tracing: Have format file honor EVENT_FILE_FL_FREED 2024-08-07 18:12:46 -04:00
tracing_map.c tracing: Fix overflow in get_free_elt() 2024-08-07 20:23:12 -04:00
tracing_map.h