The "char []" string form declares a single variable. It is better
than "char *" which creates two variables in the final assembly.
Link: https://lkml.kernel.org/r/20220512143230.28796-1-liqiong@nfschina.com
Signed-off-by: liqiong <liqiong@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 4
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
....
This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.
The result of applied patch is below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 5
close(1) = 0
Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Baik Song An <bsahn@etri.re.kr>
Cc: Hong Yeon Kim <kimhy@etri.re.kr>
Cc: Taeung Song <taeung@reallinux.co.kr>
Cc: linuxgeek@linuxgeek.io
Cc: stable@vger.kernel.org
Fixes: 4909010788 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The tracing_set_trace_write() function just removes the trailing whitespace
from the user supplied tracer name, but the leading whitespace should also
be removed.
In addition, if the user supplied tracer name contains only a few
whitespace characters, the first one will not be removed using the current
method, which results it a single whitespace character left in the buf.
To fix all of these issues, we use strim() to correctly remove both the
leading and trailing whitespace.
Link: https://lkml.kernel.org/r/20220121095623.1826679-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
A fast/NMI safe accessor for CLOCK_TAI has been introduced.
Use it for adding the additional trace clock "tai".
Link: https://lkml.kernel.org/r/20220414091805.89667-3-kurt@linutronix.de
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Move trace_eval_init() to subsys_initcall to make it start
earlier.
And to avoid tracer_init_tracefs being blocked by
trace_event_sem which trace_eval_init() hold [1],
queue tracer_init_tracefs() to eval_map_wq to let
the two works being executed sequentially.
It can speed up the initialization of kernel as result
of making tracer_init_tracefs asynchronous.
On my arm64 platform, it reduce ~20ms of 125ms which total
time do_initcalls spend.
Link: https://lkml.kernel.org/r/20220426122407.17042-3-mark-pk.tsai@mediatek.com
[1]: https://lore.kernel.org/r/68d7b3327052757d0cd6359a6c9015a85b437232.camel@pengutronix.de
Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
To prepare for support asynchronous tracer_init_tracefs initcall,
avoid calling create_trace_option_files before __update_tracer_options.
Otherwise, create_trace_option_files will show warning because
some tracers in trace_types list are already in tr->topts.
For example, hwlat_tracer call register_tracer in late_initcall,
and global_trace.dir is already created in tracing_init_dentry,
hwlat_tracer will be put into tr->topts.
Then if the __update_tracer_options is executed after hwlat_tracer
registered, create_trace_option_files find that hwlat_tracer is
already in tr->topts.
Link: https://lkml.kernel.org/r/20220426122407.17042-2-mark-pk.tsai@mediatek.com
Link: https://lore.kernel.org/lkml/20220322133339.GA32582@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When setting bootparams="trace_event=initcall:initcall_start tp_printk=1" in the
cmdline, the output_printk() was called, and the spin_lock_irqsave() was called in the
atomic and irq disable interrupt context suitation. On the PREEMPT_RT kernel,
these locks are replaced with sleepable rt-spinlock, so the stack calltrace will
be triggered.
Fix it by raw_spin_lock_irqsave when PREEMPT_RT and "trace_event=initcall:initcall_start
tp_printk=1" enabled.
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
Preemption disabled at:
[<ffffffff8992303e>] try_to_wake_up+0x7e/0xba0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt17+ #19 34c5812404187a875f32bee7977f7367f9679ea7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x60/0x8c
dump_stack+0x10/0x12
__might_resched.cold+0x11d/0x155
rt_spin_lock+0x40/0x70
trace_event_buffer_commit+0x2fa/0x4c0
? map_vsyscall+0x93/0x93
trace_event_raw_event_initcall_start+0xbe/0x110
? perf_trace_initcall_finish+0x210/0x210
? probe_sched_wakeup+0x34/0x40
? ttwu_do_wakeup+0xda/0x310
? trace_hardirqs_on+0x35/0x170
? map_vsyscall+0x93/0x93
do_one_initcall+0x217/0x3c0
? trace_event_raw_event_initcall_level+0x170/0x170
? push_cpu_stop+0x400/0x400
? cblist_init_generic+0x241/0x290
kernel_init_freeable+0x1ac/0x347
? _raw_spin_unlock_irq+0x65/0x80
? rest_init+0xf0/0xf0
kernel_init+0x1e/0x150
ret_from_fork+0x22/0x30
</TASK>
Link: https://lkml.kernel.org/r/20220419013910.894370-1-jun.miao@intel.com
Signed-off-by: Jun Miao <jun.miao@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
- The run time string verifier that checks all trace event formats
as they are read from the tracing file to make sure that the %s
pointers are not reading something that no longer exists, failed
to account for %*.s where the length given is zero, and the string
is NULL. It incorrectly flagged it as a null pointer dereference and
gave a WARN_ON().
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYjzyxBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qk+WAP4+ICutAiIGyPmHEqtjLGyDPC25nbB3
vg+qWWkWEOIi5gD+PpGsGSE7HYFdWJi1BCfshNOm8I92TyoE2nJkXh3LeA8=
=mZr5
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull trace event string verifier fix from Steven Rostedt:
"The run-time string verifier checks all trace event formats as
they are read from the tracing file to make sure that the %s pointers
are not reading something that no longer exists.
However, it failed to account for the valid case of '%*.s' where the
length given is zero, and the string is NULL. It incorrectly flagged
it as a null pointer dereference and gave a WARN_ON()"
* tag 'trace-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Have trace event string test handle zero length strings
- New user_events interface. User space can register an event with the kernel
describing the format of the event. Then it will receive a byte in a page
mapping that it can check against. A privileged task can then enable that
event like any other event, which will change the mapped byte to true,
telling the user space application to start writing the event to the
tracing buffer.
- Add new "ftrace_boot_snapshot" kernel command line parameter. When set,
the tracing buffer will be saved in the snapshot buffer at boot up when
the kernel hands things over to user space. This will keep the traces that
happened at boot up available even if user space boot up has tracing as
well.
- Have TRACE_EVENT_ENUM() also update trace event field type descriptions.
Thus if a static array defines its size with an enum, the user space trace
event parsers can still know how to parse that array.
- Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
TRACE_EVENT() macro, but will attach to an existing tracepoint. This will
make one tracepoint be able to trace different content and not be stuck at
only what the original TRACE_EVENT() macro exports.
- Fixes to tracing error logging.
- Better saving of cmdlines to PIDs when tracing (use the wakeup events for
mapping).
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYjiO3RQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhQzAQDtek5p80p/zkMGymm14wSH6qq0NdgN
Kv7fTBwEewUa0gD/UCOVLw4Oj+JtHQhCa3sCGZopmRv0BT1+4UQANqosKQY=
=Au08
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- New user_events interface. User space can register an event with the
kernel describing the format of the event. Then it will receive a
byte in a page mapping that it can check against. A privileged task
can then enable that event like any other event, which will change
the mapped byte to true, telling the user space application to start
writing the event to the tracing buffer.
- Add new "ftrace_boot_snapshot" kernel command line parameter. When
set, the tracing buffer will be saved in the snapshot buffer at boot
up when the kernel hands things over to user space. This will keep
the traces that happened at boot up available even if user space boot
up has tracing as well.
- Have TRACE_EVENT_ENUM() also update trace event field type
descriptions. Thus if a static array defines its size with an enum,
the user space trace event parsers can still know how to parse that
array.
- Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
TRACE_EVENT() macro, but will attach to an existing tracepoint. This
will make one tracepoint be able to trace different content and not
be stuck at only what the original TRACE_EVENT() macro exports.
- Fixes to tracing error logging.
- Better saving of cmdlines to PIDs when tracing (use the wakeup events
for mapping).
* tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
tracing: Have type enum modifications copy the strings
user_events: Add trace event call as root for low permission cases
tracing/user_events: Use alloc_pages instead of kzalloc() for register pages
tracing: Add snapshot at end of kernel boot up
tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
tracing: Fix strncpy warning in trace_events_synth.c
user_events: Prevent dyn_event delete racing with ioctl add/delete
tracing: Add TRACE_CUSTOM_EVENT() macro
tracing: Move the defines to create TRACE_EVENTS into their own files
tracing: Add sample code for custom trace events
tracing: Allow custom events to be added to the tracefs directory
tracing: Fix last_cmd_set() string management in histogram code
user_events: Fix potential uninitialized pointer while parsing field
tracing: Fix allocation of last_cmd in last_cmd_set()
user_events: Add documentation file
user_events: Add sample code for typical usage
user_events: Add self-test for validator boundaries
user_events: Add self-test for perf_event integration
user_events: Add self-test for dynamic_events integration
user_events: Add self-test for ftrace integration
...
If a trace event has in its TP_printk():
"%*.s", len, len ? __get_str(string) : NULL
It is perfectly valid if len is zero and passing in the NULL.
Unfortunately, the runtime string check at time of reading the trace sees
the NULL and flags it as a bad string and produces a WARN_ON().
Handle this case by passing into the test function if the format has an
asterisk (star) and if so, if the length is zero, then mark it as safe.
Link: https://lore.kernel.org/all/YjsWzuw5FbWPrdqq@bfoster/
Cc: stable@vger.kernel.org
Reported-by: Brian Foster <bfoster@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Fixes: 9a6944fee6 ("tracing: Add a verifier to check string pointers for trace events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Add ftrace_boot_snapshot kernel parameter that will take a snapshot at the
end of boot up just before switching over to user space (it happens during
the kernel freeing of init memory).
This is useful when there's interesting data that can be collected from
kernel start up, but gets overridden by user space start up code. With
this option, the ring buffer content from the boot up traces gets saved in
the snapshot at the end of boot up. This trace can be read from:
/sys/kernel/tracing/snapshot
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
__setup() handlers should generally return 1 to indicate that the
boot options have been handled.
Using invalid option values causes the entire kernel boot option
string to be reported as Unknown and added to init's environment
strings, polluting it.
Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6
kprobe_event=p,syscall_any,$arg1 trace_options=quiet
trace_clock=jiffies", will be passed to user space.
Run /sbin/init as init process
with arguments:
/sbin/init
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc6
kprobe_event=p,syscall_any,$arg1
trace_options=quiet
trace_clock=jiffies
Return 1 from the __setup() handlers so that init's environment is not
polluted with kernel boot options.
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org
Cc: stable@vger.kernel.org
Fixes: 7bcfaf54f5 ("tracing: Add trace_options kernel command line parameter")
Fixes: e1e232ca6b ("tracing: Add trace_clock=<clock> kernel parameter")
Fixes: 970988e19e ("tracing/kprobe: Add kprobe_event= boot parameter")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently, tracing_log_err.cmd strings are restricted to a length of
MAX_FILTER_STR_VAL (256), which is too short for some commands already
seen in the wild (with cmd strings longer than that showing up
truncated).
Remove the restriction so that no command string is ever truncated.
Link: https://lkml.kernel.org/r/ca965f23256b350ebd94b3dc1a319f28e8267f5f.1643319703.git.zanussi@kernel.org
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The kernel parameter "tp_printk_stop_on_boot" starts with "tp_printk" which is
the same as another kernel parameter "tp_printk". If "tp_printk" setup is
called before the "tp_printk_stop_on_boot", it will override the latter
and keep it from being set.
This is similar to other kernel parameter issues, such as:
Commit 745a600cf1 ("um: console: Ignore console= option")
or init/do_mounts.c:45 (setup function of "ro" kernel param)
Fix it by checking for a "_" right after the "tp_printk" and if that
exists do not process the parameter.
Link: https://lkml.kernel.org/r/20220208195421.969326-1-jsyoo5b@gmail.com
Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com>
[ Fixed up change log and added space after if condition ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
New:
- The Real Time Linux Analysis (RTLA) tool is added to the tools directory.
- Can safely filter on user space pointers with: field.ustring ~ "match-string"
- eprobes can now be filtered like any other event.
- trace_marker(_raw) now uses stream_open() to allow multiple threads to safely
write to it. Note, this could possibly break existing user space, but we will
not know until we hear about it, and then can revert the change if need be.
- New field in events to display when bottom halfs are disabled.
- Sorting of the ftrace functions are now done at compile time instead of
at bootup.
Infrastructure changes to support future efforts:
- Added __rel_loc type for trace events. Similar to __data_loc but the offset
to the dynamic data is based off of the location of the descriptor and not
the beginning of the event. Needed for user defined events.
- Some simplification of event trigger code.
- Make synthetic events process its callback better to not hinder other
event callbacks that are registered. Needed for user defined events.
And other small fixes and clean ups.
-
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYeGvcxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qrZtAP9ICjJxX54MTErElhhUL/NFLV7wqhJi
OIAgmp6jGVRqPAD+JxQtBnGH+3XMd71ioQkTfQ1rp+jBz2ERBj2DmELUAg0=
=zmda
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New:
- The Real Time Linux Analysis (RTLA) tool is added to the tools
directory.
- Can safely filter on user space pointers with: field.ustring ~
"match-string"
- eprobes can now be filtered like any other event.
- trace_marker(_raw) now uses stream_open() to allow multiple threads
to safely write to it. Note, this could possibly break existing
user space, but we will not know until we hear about it, and then
can revert the change if need be.
- New field in events to display when bottom halfs are disabled.
- Sorting of the ftrace functions are now done at compile time
instead of at bootup.
Infrastructure changes to support future efforts:
- Added __rel_loc type for trace events. Similar to __data_loc but
the offset to the dynamic data is based off of the location of the
descriptor and not the beginning of the event. Needed for user
defined events.
- Some simplification of event trigger code.
- Make synthetic events process its callback better to not hinder
other event callbacks that are registered. Needed for user defined
events.
And other small fixes and cleanups"
* tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits)
tracing: Add ustring operation to filtering string pointers
rtla: Add rtla timerlat hist documentation
rtla: Add rtla timerlat top documentation
rtla: Add rtla timerlat documentation
rtla: Add rtla osnoise hist documentation
rtla: Add rtla osnoise top documentation
rtla: Add rtla osnoise man page
rtla: Add Documentation
rtla/timerlat: Add timerlat hist mode
rtla: Add timerlat tool and timelart top mode
rtla/osnoise: Add the hist mode
rtla/osnoise: Add osnoise top mode
rtla: Add osnoise tool
rtla: Helper functions for rtla
rtla: Real-Time Linux Analysis tool
tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() fails
tracing: Remove duplicate warnings when calling trace_create_file()
tracing/kprobes: 'nmissed' not showed correctly for kretprobe
tracing: Add test for user space strings when filtering on string pointers
tracing: Have syscall trace events use trace_event_buffer_lock_reserve()
...
Disabling only bottom halves via local_bh_disable() disables also
preemption but this remains invisible to tracing. On a CONFIG_PREEMPT
kernel one might wonder why there is no scheduling happening despite the
N flag in the trace. The reason might be the a rcu_read_lock_bh()
section.
Add a 'b' to the tracing output if in task context with disabled bottom
halves.
Link: https://lkml.kernel.org/r/YbcbtdtC/bjCKo57@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
With the new osnoise tracer, we are seeing the below splat:
Kernel attempted to read user page (c7d880000) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel data access on read at 0xc7d880000
Faulting instruction address: 0xc0000000002ffa10
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
...
NIP [c0000000002ffa10] __trace_array_vprintk.part.0+0x70/0x2f0
LR [c0000000002ff9fc] __trace_array_vprintk.part.0+0x5c/0x2f0
Call Trace:
[c0000008bdd73b80] [c0000000001c49cc] put_prev_task_fair+0x3c/0x60 (unreliable)
[c0000008bdd73be0] [c000000000301430] trace_array_printk_buf+0x70/0x90
[c0000008bdd73c00] [c0000000003178b0] trace_sched_switch_callback+0x250/0x290
[c0000008bdd73c90] [c000000000e70d60] __schedule+0x410/0x710
[c0000008bdd73d40] [c000000000e710c0] schedule+0x60/0x130
[c0000008bdd73d70] [c000000000030614] interrupt_exit_user_prepare_main+0x264/0x270
[c0000008bdd73de0] [c000000000030a70] syscall_exit_prepare+0x150/0x180
[c0000008bdd73e10] [c00000000000c174] system_call_vectored_common+0xf4/0x278
osnoise tracer on ppc64le is triggering osnoise_taint() for negative
duration in get_int_safe_duration() called from
trace_sched_switch_callback()->thread_exit().
The problem though is that the check for a valid trace_percpu_buffer is
incorrect in get_trace_buf(). The check is being done after calculating
the pointer for the current cpu, rather than on the main percpu pointer.
Fix the check to be against trace_percpu_buffer.
Link: https://lkml.kernel.org/r/a920e4272e0b0635cf20c444707cbce1b2c8973d.1640255304.git.naveen.n.rao@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: e2ace00117 ("tracing: Choose static tp_printk buffer by explicit nesting count")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The tracing marker files are write-only streams with no meaningful
concept of file position. Using stream_open() to mark them as
stream-link indicates this and has the added advantage that a single
file descriptor can now be used from multiple threads without contention
thanks to clearing FMODE_ATOMIC_POS.
Note that this has the potential to break existing userspace by since
both lseek(2) and pwrite(2) will now return ESPIPE when previously lseek
would have updated the stored offset and pwrite would have appended to
the trace. A survey of libtracefs and several other projects found to
use trace_marker(_raw) [1][2][3] suggests that everyone limits
themselves to calling write(2) and close(2) on these file descriptors so
there is a good chance this will go unnoticed and the benefits of
reduced overhead and lock contention seem worth the risk.
[1] https://github.com/google/perfetto
[2] https://github.com/intel/media-driver/
[3] https://w1.fi/cgit/hostap/
Link: https://lkml.kernel.org/r/20211207142558.347029-1-john@metanate.com
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In case trace_event_buffer_lock_reserve() is called with preemption
enabled, the algorithm that defines the usage of the per cpu filter buffer
may fail if the task schedules to another CPU after determining which
buffer it will use.
Disable preemption when using the filter buffer. And because that same
buffer must be used throughout the call, keep preemption disabled until
the filter buffer is released.
This will also keep the semantics between the use case of when the filter
buffer is used, and when the ring buffer itself is used, as that case also
disables preemption until the ring buffer is released.
Link: https://lkml.kernel.org/r/20211130024318.880190623@goodmis.org
[ Fixed warning of assignment in if statement
Reported-by: kernel test robot <lkp@intel.com> ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The value read by this_cpu_read() is used later and its use is expected to
stay on the same CPU as being read. But this_cpu_read() does not warn if
it is called without preemption disabled, where as __this_cpu_read() will
check if preemption is disabled on CONFIG_DEBUG_PREEMPT
Currently all callers have preemption disabled, but there may be new
callers in the future that may not.
Link: https://lkml.kernel.org/r/20211130024318.698165354@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If trace_seq becomes full, trace_seq_vprintf() no longer consumes
arguments from va_list, making va_list out of sync with format
processing by trace_check_vprintf().
This causes va_arg() in trace_check_vprintf() to return wrong
positional argument, which results into a WARN_ON_ONCE() hit.
ftrace_stress_test from LTP triggers this situation.
Fix it by explicitly avoiding further use if va_list at the point
when it's consistency can no longer be guaranteed.
Link: https://lkml.kernel.org/r/20211118145516.13219-1-nikita.yushchenko@virtuozzo.com
Signed-off-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.
Use memset_startat() to avoid confusing memset() about writing beyond
the target struct member.
Link: https://lkml.kernel.org/r/20211118202217.1285588-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- osnoise and timerlat updates that will work with the RTLA tool (Real-Time
Linux Analysis). Specifically it disconnects the work load (threads
that look for latency) from the tracing instances attached to them,
allowing for more than one instance to retrieve data from the work load.
- Optimization on division in the trace histogram trigger code to use shift
and multiply when possible. Also added documentation.
- Fix prototype to my_direct_func in direct ftrace trampoline sample code.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYKWXxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qqJEAP9czpSZ/nFvDjxdGHZAcKKXCFWbGcK5
IF2cHDDwxXjZ/gD+NnpRhR1JPfA55fO52DUJPn2cOU5xOsP6DmJxu6mwDg0=
=AKVv
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull more tracing updates from Steven Rostedt:
- osnoise and timerlat updates that will work with the RTLA tool
(Real-Time Linux Analysis).
Specifically it disconnects the work load (threads that look for
latency) from the tracing instances attached to them, allowing for
more than one instance to retrieve data from the work load.
- Optimization on division in the trace histogram trigger code to use
shift and multiply when possible. Also added documentation.
- Fix prototype to my_direct_func in direct ftrace trampoline sample
code.
* tag 'trace-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/samples: Add missing prototype for my_direct_func
tracing/selftests: Add tests for hist trigger expression parsing
tracing/histogram: Document hist trigger variables
tracing/histogram: Update division by 0 documentation
tracing/histogram: Optimize division by constants
tracing/osnoise: Remove PREEMPT_RT ifdefs from inside functions
tracing/osnoise: Remove STACKTRACE ifdefs from inside functions
tracing/osnoise: Allow multiple instances of the same tracer
tracing/osnoise: Remove TIMERLAT ifdefs from inside functions
tracing/osnoise: Support a list of trace_array *tr
tracing/osnoise: Use start/stop_per_cpu_kthreads() on osnoise_cpus_write()
tracing/osnoise: Split workload start from the tracer start
tracing/osnoise: Improve comments about barrier need for NMI callbacks
tracing/osnoise: Do not follow tracing_cpumask
- kprobes: Restructured stack unwinder to show properly on x86 when a stack
dump happens from a kretprobe callback.
- Fix to bootconfig parsing
- Have tracefs allow owner and group permissions by default (only denying
others). There's been pressure to allow non root to tracefs in a
controlled fashion, and using groups is probably the safest.
- Bootconfig memory managament updates.
- Bootconfig clean up to have the tools directory be less dependent on
changes in the kernel tree.
- Allow perf to be traced by function tracer.
- Rewrite of function graph tracer to be a callback from the function tracer
instead of having its own trampoline (this change will happen on an arch
by arch basis, and currently only x86_64 implements it).
- Allow multiple direct trampolines (bpf hooks to functions) be batched
together in one synchronization.
- Allow histogram triggers to add variables that can perform calculations
against the event's fields.
- Use the linker to determine architecture callbacks from the ftrace
trampoline to allow for proper parameter prototypes and prevent warnings
from the compiler.
- Extend histogram triggers to key off of variables.
- Have trace recursion use bit magic to determine preempt context over if
branches.
- Have trace recursion disable preemption as all use cases do anyway.
- Added testing for verification of tracing utilities.
- Various small clean ups and fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYBdxhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qp1sAQD2oYFwaG3sx872gj/myBcHIBSKdiki
Hry5csd8zYDBpgD+Poylopt5JIbeDuoYw/BedgEXmscZ8Qr7VzjAXdnv/Q4=
=Loz8
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- kprobes: Restructured stack unwinder to show properly on x86 when a
stack dump happens from a kretprobe callback.
- Fix to bootconfig parsing
- Have tracefs allow owner and group permissions by default (only
denying others). There's been pressure to allow non root to tracefs
in a controlled fashion, and using groups is probably the safest.
- Bootconfig memory managament updates.
- Bootconfig clean up to have the tools directory be less dependent on
changes in the kernel tree.
- Allow perf to be traced by function tracer.
- Rewrite of function graph tracer to be a callback from the function
tracer instead of having its own trampoline (this change will happen
on an arch by arch basis, and currently only x86_64 implements it).
- Allow multiple direct trampolines (bpf hooks to functions) be batched
together in one synchronization.
- Allow histogram triggers to add variables that can perform
calculations against the event's fields.
- Use the linker to determine architecture callbacks from the ftrace
trampoline to allow for proper parameter prototypes and prevent
warnings from the compiler.
- Extend histogram triggers to key off of variables.
- Have trace recursion use bit magic to determine preempt context over
if branches.
- Have trace recursion disable preemption as all use cases do anyway.
- Added testing for verification of tracing utilities.
- Various small clean ups and fixes.
* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
tracing/histogram: Fix semicolon.cocci warnings
tracing/histogram: Fix documentation inline emphasis warning
tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
tracing: Show size of requested perf buffer
bootconfig: Initialize ret in xbc_parse_tree()
ftrace: do CPU checking after preemption disabled
ftrace: disable preemption when recursion locked
tracing/histogram: Document expression arithmetic and constants
tracing/histogram: Optimize division by a power of 2
tracing/histogram: Covert expr to const if both operands are constants
tracing/histogram: Simplify handling of .sym-offset in expressions
tracing: Fix operator precedence for hist triggers expression
tracing: Add division and multiplication support for hist triggers
tracing: Add support for creating hist trigger variables from literal
selftests/ftrace: Stop tracing while reading the trace file by default
MAINTAINERS: Update KPROBES and TRACING entries
test_kprobes: Move it from kernel/ to lib/
docs, kprobes: Remove invalid URL and add new reference
samples/kretprobes: Fix return value if register_kretprobe() failed
lib/bootconfig: Fix the xbc_get_info kerneldoc
...
When building the files in the tracefs file system, do not by default set
any permissions for OTH (other). This will make it easier for admins who
want to define a group for accessing tracefs and not having to first
disable all the permission bits for "other" in the file system.
As tracing can leak sensitive information, it should never by default
allowing all users access. An admin can still set the permission bits for
others to have access, which may be useful for creating a honeypot and
seeing who takes advantage of it and roots the machine.
Link: https://lkml.kernel.org/r/20210818153038.864149276@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The compiler warns when the data are actually unused:
kernel/trace/trace.c:1712:13: error: ‘trace_create_maxlat_file’ defined but not used [-Werror=unused-function]
1712 | static void trace_create_maxlat_file(struct trace_array *tr,
| ^~~~~~~~~~~~~~~~~~~~~~~~
[Why]
CONFIG_HWLAT_TRACER=n, CONFIG_TRACER_MAX_TRACE=n, CONFIG_OSNOISE_TRACER=y
gcc report warns.
[How]
Now trace_create_maxlat_file will only take effect when
CONFIG_HWLAT_TRACER=y or CONFIG_TRACER_MAX_TRACE=y. In fact, after
adding osnoise trace, it also needs to take effect.
Link: https://lore.kernel.org/all/c1d9e328-ad7c-920b-6c24-9e1598a6421c@infradead.org/
Link: https://lkml.kernel.org/r/20210922025122.3268022-1-liu.yun@linux.dev
Fixes: bce29ac9ce ("trace: Add osnoise tracer")
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Instead of having the logic that does trace_pid_list open coded, wrap it in
abstract functions. This will allow a rewrite of the logic that implements
the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into
the set_*_pid file, it creates a new list and uses RCU to update it. If
pid_max is lowered, but there was a pid currently in the list that was
higher than pid_max, those pids will now be removed on updating the list.
The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max,
and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Add migrate-disable counter to tracing header
- Fix error handling in event probes
- Fix missed unlock in osnoise in error path
- Fix merge issue with tools/bootconfig
- Clean up bootconfig data when init memory is removed
- Fix bootconfig to loop only on subkeys
- Have kernel command lines override bootconfig options
- Increase field counts for synthetic events
- Have histograms dynamic allocate event elements to save space
- Fixes in testing and documentation
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYToFZBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qtg5AP44U3Dn1m1lQo3y1DJ9kUP3HsAsDofS
Cv7ZM9tLV2p4MQEA9KJc3/B/5BZEK1kso3uLeLT+WxJOC4YStXY19WwmjAI=
=Wuo+
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull more tracing updates from Steven Rostedt:
- Add migrate-disable counter to tracing header
- Fix error handling in event probes
- Fix missed unlock in osnoise in error path
- Fix merge issue with tools/bootconfig
- Clean up bootconfig data when init memory is removed
- Fix bootconfig to loop only on subkeys
- Have kernel command lines override bootconfig options
- Increase field counts for synthetic events
- Have histograms dynamic allocate event elements to save space
- Fixes in testing and documentation
* tag 'trace-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/boot: Fix to loop on only subkeys
selftests/ftrace: Exclude "(fault)" in testing add/remove eprobe events
tracing: Dynamically allocate the per-elt hist_elt_data array
tracing: synth events: increase max fields count
tools/bootconfig: Show whole test command for each test case
bootconfig: Fix missing return check of xbc_node_compose_key function
tools/bootconfig: Fix tracing_on option checking in ftrace2bconf.sh
docs: bootconfig: Add how to use bootconfig for kernel parameters
init/bootconfig: Reorder init parameter from bootconfig and cmdline
init: bootconfig: Remove all bootconfig data when the init memory is removed
tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads()
tracing: Fix some alloc_event_probe() error handling bugs
tracing: Add migrate-disabled counter to tracing output.
- Simplifying the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT
- bootconfig now can start histograms
- bootconfig supports group/all enabling
- histograms now can put values in linear size buckets
- execnames can be passed to synthetic events
- Introduction of "event probes" that attach to other events and
can retrieve data from pointers of fields, or record fields
as different types (a pointer to a string as a string instead
of just a hex number)
- Various fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYTJDixQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnPLAP9XviWrZD27uFj6LU/Vp2umbq8la1aC
oW8o9itUGpLoHQD+OtsMpQXsWrxoNw/JD1OWCH4J0YN+TnZAUUG2E9e0twA=
=OZXG
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT
- bootconfig can now start histograms
- bootconfig supports group/all enabling
- histograms now can put values in linear size buckets
- execnames can be passed to synthetic events
- introduce "event probes" that attach to other events and can retrieve
data from pointers of fields, or record fields as different types (a
pointer to a string as a string instead of just a hex number)
- various fixes and clean ups
* tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits)
tracing/doc: Fix table format in histogram code
selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes
selftests/ftrace: Add selftest for testing eprobe events on synthetic events
selftests/ftrace: Add test case to test adding and removing of event probe
selftests/ftrace: Fix requirement check of README file
selftests/ftrace: Add clear_dynamic_events() to test cases
tracing: Add a probe that attaches to trace events
tracing/probes: Reject events which have the same name of existing one
tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs
tracing/probe: Change traceprobe_set_print_fmt() to take a type
tracing/probes: Use struct_size() instead of defining custom macros
tracing/probes: Allow for dot delimiter as well as slash for system names
tracing/probe: Have traceprobe_parse_probe_arg() take a const arg
tracing: Have dynamic events have a ref counter
tracing: Add DYNAMIC flag for dynamic events
tracing: Replace deprecated CPU-hotplug functions.
MAINTAINERS: Add an entry for os noise/latency
tracepoint: Fix kerneldoc comments
bootconfig/tracing/ktest: Update ktest example for boot-time tracing
tools/bootconfig: Use per-group/all enable option in ftrace2bconf script
...
migrate_disable() forbids task migration to another CPU. It is available
since v5.11 and has already users such as highmem or BPF. It is useful
to observe this task state in tracing which already has other states
like the preemption counter.
Instead of adding the migrate disable counter as a new entry to struct
trace_entry, which would extend the whole struct by four bytes, it is
squashed into the preempt-disable counter. The lower four bits represent
the preemption counter, the upper four bits represent the migrate
disable counter. Both counter shouldn't exceed 15 but if they do, there
is a safety net which caps the value at 15.
Add the migrate-disable counter to the trace entry so it shows up in the
trace. Due to the users mentioned above, it is already possible to
observe it:
| bash-1108 [000] ...21 73.950578: rss_stat: mm_id=2213312838 curr=0 type=MM_ANONPAGES size=8192B
| bash-1108 [000] d..31 73.951222: irq_disable: caller=flush_tlb_mm_range+0x115/0x130 parent=ptep_clear_flush+0x42/0x50
| bash-1108 [000] d..31 73.951222: tlb_flush: pages:1 reason:local mm shootdown (3)
The last value is the migrate-disable counter.
Things that popped up:
- trace_print_lat_context() does not print the migrate counter. Not sure
if it should. It is used in "verbose" mode and uses 8 digits and I'm
not sure ther is something processing the value.
- trace_define_common_fields() now defines a different variable. This
probably breaks things. No ide what to do in order to preserve the old
behaviour. Since this is used as a filter it should be split somehow
to be able to match both nibbles here.
Link: https://lkml.kernel.org/r/20210810132625.ylssabmsrkygokuv@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: patch description.]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ SDR: Removed change to common_preempt_count field name ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmEt+hwACgkQUqAMR0iA
lPLppBAAiyrUNVmqqtdww+IJajEs1uD/4FqPsysHRwroHBFymJeQG1XCwUpDZ7jj
6gXT0chxyjQE18gT/W9nf+PSmA9XvIVA1WSR+WCECTNW3YoZXqtgwiHfgnitXYku
HlmoZLthYeuoXWw2wn+hVLfTRh6VcPHYEaC21jXrs6B1pOXHbvjJ5eTLHlX9oCfL
UKSK+jFTHAJcn/GskRzviBe0Hpe8fqnkRol2XX13ltxqtQ73MjaGNu7imEH6/Pa7
/MHXWtuWJtOvuYz17aztQP4Qwh1xy+kakMy3aHucdlxRBTP4PTzzTuQI3L/RYi6l
+ttD7OHdRwqFAauBLY3bq3uJjYb5v/64ofd8DNnT2CJvtznY8wrPbTdFoSdPcL2Q
69/opRWHcUwbU/Gt4WLtyQf3Mk0vepgMbbVg1B5SSy55atRZaXMrA2QJ/JeawZTB
KK6D/mE7ccze/YFzsySunCUVKCm0veoNxEAcakCCZKXSbsvd1MYcIRC0e+2cv6e5
2NEH7gL4dD+5tqu5nzvIuKDn3NrDQpbi28iUBoFbkxRgcVyvHJ9AGSa62wtb5h3D
OgkqQMdVKBbjYNeUodPlQPzmXZDasytavyd0/BC/KENOcBvU/8gW++2UZTfsh/1A
dLjgwFBdyJncQcCS9Abn20/EKntbIMEX8NLa97XWkA3fuzMKtak=
=yEVq
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Optionally, provide an index of possible printk messages via
<debugfs>/printk/index/. It can be used when monitoring important
kernel messages on a farm of various hosts. The monitor has to be
updated when some messages has changed or are not longer available by
a newly deployed kernel.
- Add printk.console_no_auto_verbose boot parameter. It allows to
generate crash dump even with slow consoles in a reasonable time
frame.
- Remove printk_safe buffers. The messages are always stored directly
to the main logbuffer, even in NMI or recursive context. Also it
allows to serialize syslog operations by a mutex instead of a spin
lock.
- Misc clean up and build fixes.
* tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk/index: Fix -Wunused-function warning
lib/nmi_backtrace: Serialize even messages about idle CPUs
printk: Add printk.console_no_auto_verbose boot parameter
printk: Remove console_silent()
lib/test_scanf: Handle n_bits == 0 in random tests
printk: syslog: close window between wait and read
printk: convert @syslog_lock to mutex
printk: remove NMI tracking
printk: remove safe buffers
printk: track/limit recursion
lib/nmi_backtrace: explicitly serialize banner and regs
printk: Move the printk() kerneldoc comment to its new home
printk/index: Fix warning about missing prototypes
MIPS/asm/printk: Fix build failure caused by printk
printk: index: Add indexing support to dev_printk
printk: Userspace format indexing support
printk: Rework parse_prefix into printk_parse_prefix
printk: Straighten out log_flags into printk_info_flags
string_helpers: Escape double quotes in escape_special
printk/console: Check consistent sequence number when handling race in console_unlock()
A new dynamic event is introduced: event probe. The event is attached
to an existing tracepoint and uses its fields as arguments. The user
can specify custom format string of the new event, select what tracepoint
arguments will be printed and how to print them.
An event probe is created by writing configuration string in
'dynamic_events' ftrace file:
e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe
-:SNAME/ENAME - Delete an event probe
Where:
SNAME - System name, if omitted 'eprobes' is used.
ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used.
SYSTEM - Name of the system, where the tracepoint is defined, mandatory.
EVENT - Name of the tracepoint event in SYSTEM, mandatory.
FETCHARGS - Arguments:
<name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print
it as given TYPE with given name. Supported
types are:
(u8/u16/u32/u64/s8/s16/s32/s64), basic type
(x8/x16/x32/x64), hexadecimal types
"string", "ustring" and bitfield.
Example, attach an event probe on openat system call and print name of the
file that will be opened:
echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events
A new dynamic event is created in events/esys/eopen/ directory. It
can be deleted with:
echo "-:esys/eopen" >> dynamic_events
Filters, triggers and histograms can be attached to the new event, it can
be matched in synthetic events. There is one limitation - an event probe
can not be attached to kprobe, uprobe or another event probe.
Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com
Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As dynamic events are not created by modules, if something is attached to
one, calling "try_module_get()" on its "mod" field, is not going to keep
the dynamic event from going away.
Since dynamic events do not need the "mod" pointer of the event structure,
make a union out of it in order to save memory (there's one structure for
each of the thousand+ events in the kernel), and have any event with the
DYNAMIC flag set to use a ref counter instead.
Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/
Link: https://lkml.kernel.org/r/20210817035027.174869074@goodmis.org
Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Update both the tracefs README file as well as the histogram.rst to
include an explanation of what the buckets modifier is and how to use it.
Include an example with the wakeup_latency example for both log2 and the
buckets modifiers as there was no existing log2 example.
Link: https://lkml.kernel.org/r/20210707213922.167218794@goodmis.org
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The event filters are not applied on all of the output, which results in
the flood of printk when using tp_printk. Unfolding
event_trigger_unlock_commit_regs() into trace_event_buffer_commit(), so
the filters can be applied on every output.
Link: https://lkml.kernel.org/r/20210814034538.8428-1-kernelfans@gmail.com
Cc: stable@vger.kernel.org
Fixes: 0daa230296 ("tracing: Add tp_printk cmdline to have tracepoints go to printk()")
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The event_trace_add_tracer() can fail. In this case, it leads to a crash
in start_creating with below call stack. Handle the error scenario
properly in trace_array_create_dir.
Call trace:
down_write+0x7c/0x204
start_creating.25017+0x6c/0x194
tracefs_create_file+0xc4/0x2b4
init_tracer_tracefs+0x5c/0x940
trace_array_create_dir+0x58/0xb4
trace_array_create+0x1bc/0x2b8
trace_array_get_by_name+0xdc/0x18c
Link: https://lkml.kernel.org/r/1627651386-21315-1-git-send-email-kamaagra@codeaurora.org
Cc: stable@vger.kernel.org
Fixes: 4114fbfd02 ("tracing: Enable creating new instance early boot")
Signed-off-by: Kamal Agrawal <kamaagra@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
All NMI contexts are handled the same as the safe context: store the
message and defer printing. There is no need to have special NMI
context tracking for this. Using in_nmi() is enough.
There are several parts of the kernel that are manually calling into
the printk NMI context tracking in order to cause general printk
deferred printing:
arch/arm/kernel/smp.c
arch/powerpc/kexec/crash.c
kernel/trace/trace.c
For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new
function pair printk_deferred_enter/exit that explicitly achieves the
same objective.
For ftrace, remove the printk context manipulation completely. It was
added in commit 03fc7f9c99 ("printk/nmi: Prevent deadlock when
accessing the main log buffer in NMI"). The purpose was to enforce
storing messages directly into the ring buffer even in NMI context.
It really should have only modified the behavior in NMI context.
There is no need for a special behavior any longer. All messages are
always stored directly now. The console deferring is handled
transparently in vprintk().
Signed-off-by: John Ogness <john.ogness@linutronix.de>
[pmladek@suse.com: Remove special handling in ftrace.c completely.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
Currently the histogram logic allows the user to write "cpu" in as an
event field, and it will record the CPU that the event happened on.
The problem with this is that there's a lot of events that have "cpu"
as a real field, and using "cpu" as the CPU it ran on, makes it
impossible to run histograms on the "cpu" field of events.
For example, if I want to have a histogram on the count of the
workqueue_queue_work event on its cpu field, running:
># echo 'hist:keys=cpu' > events/workqueue/workqueue_queue_work/trigger
Gives a misleading and wrong result.
Change the command to "common_cpu" as no event should have "common_*"
fields as that's a reserved name for fields used by all events. And
this makes sense here as common_cpu would be a field used by all events.
Now we can even do:
># echo 'hist:keys=common_cpu,cpu if cpu < 100' > events/workqueue/workqueue_queue_work/trigger
># cat events/workqueue/workqueue_queue_work/hist
# event histogram
#
# trigger info: hist:keys=common_cpu,cpu:vals=hitcount:sort=hitcount:size=2048 if cpu < 100 [active]
#
{ common_cpu: 0, cpu: 2 } hitcount: 1
{ common_cpu: 0, cpu: 4 } hitcount: 1
{ common_cpu: 7, cpu: 7 } hitcount: 1
{ common_cpu: 0, cpu: 7 } hitcount: 1
{ common_cpu: 0, cpu: 1 } hitcount: 1
{ common_cpu: 0, cpu: 6 } hitcount: 2
{ common_cpu: 0, cpu: 5 } hitcount: 2
{ common_cpu: 1, cpu: 1 } hitcount: 4
{ common_cpu: 6, cpu: 6 } hitcount: 4
{ common_cpu: 5, cpu: 5 } hitcount: 14
{ common_cpu: 4, cpu: 4 } hitcount: 26
{ common_cpu: 0, cpu: 0 } hitcount: 39
{ common_cpu: 2, cpu: 2 } hitcount: 184
Now for backward compatibility, I added a trick. If "cpu" is used, and
the field is not found, it will fall back to "common_cpu" and work as
it did before. This way, it will still work for old programs that use
"cpu" to get the actual CPU, but if the event has a "cpu" as a field, it
will get that event's "cpu" field, which is probably what it wants
anyway.
I updated the tracefs/README to include documentation about both the
common_timestamp and the common_cpu. This way, if that text is present in
the README, then an application can know that common_cpu is supported over
just plain "cpu".
Link: https://lkml.kernel.org/r/20210721110053.26b4f641@oasis.local.home
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: 8b7622bf94 ("tracing: Add cpu field for hist triggers")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Added option for per CPU threads to the hwlat tracer
- Have hwlat tracer handle hotplug CPUs
- New tracer: osnoise, that detects latency caused by interrupts, softirqs
and scheduling of other tasks.
- Added timerlat tracer that creates a thread and measures in detail what
sources of latency it has for wake ups.
- Removed the "success" field of the sched_wakeup trace event.
This has been hardcoded as "1" since 2015, no tooling should be looking
at it now. If one exists, we can revert this commit, fix that tool and
try to remove it again in the future.
- tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.
- New boot command line option "tp_printk_stop", as tp_printk causes trace
events to write to console. When user space starts, this can easily live
lock the system. Having a boot option to stop just after boot up is
useful to prevent that from happening.
- Have ftrace_dump_on_oops boot command line option take numbers that match
the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.
- Bootconfig clean ups, fixes and enhancements.
- New ktest script that tests bootconfig options.
- Add tracepoint_probe_register_may_exist() to register a tracepoint
without triggering a WARN*() if it already exists. BPF has a path from
user space that can do this. All other paths are considered a bug.
- Small clean ups and fixes
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYN8YPhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhxLAP9Mo5hHv7Hg6W7Ddv77rThm+qclsMR/
yW0P+eJpMm4+xAD8Cq03oE1DimPK+9WZBKU5rSqAkqG6CjgDRw6NlIszzQQ=
=WEPR
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- Added option for per CPU threads to the hwlat tracer
- Have hwlat tracer handle hotplug CPUs
- New tracer: osnoise, that detects latency caused by interrupts,
softirqs and scheduling of other tasks.
- Added timerlat tracer that creates a thread and measures in detail
what sources of latency it has for wake ups.
- Removed the "success" field of the sched_wakeup trace event. This has
been hardcoded as "1" since 2015, no tooling should be looking at it
now. If one exists, we can revert this commit, fix that tool and try
to remove it again in the future.
- tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.
- New boot command line option "tp_printk_stop", as tp_printk causes
trace events to write to console. When user space starts, this can
easily live lock the system. Having a boot option to stop just after
boot up is useful to prevent that from happening.
- Have ftrace_dump_on_oops boot command line option take numbers that
match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.
- Bootconfig clean ups, fixes and enhancements.
- New ktest script that tests bootconfig options.
- Add tracepoint_probe_register_may_exist() to register a tracepoint
without triggering a WARN*() if it already exists. BPF has a path
from user space that can do this. All other paths are considered a
bug.
- Small clean ups and fixes
* tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (49 commits)
tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
tracing: Simplify & fix saved_tgids logic
treewide: Add missing semicolons to __assign_str uses
tracing: Change variable type as bool for clean-up
trace/timerlat: Fix indentation on timerlat_main()
trace/osnoise: Make 'noise' variable s64 in run_osnoise()
tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
tracing: Fix spelling in osnoise tracer "interferences" -> "interference"
Documentation: Fix a typo on trace/osnoise-tracer
trace/osnoise: Fix return value on osnoise_init_hotplug_support
trace/osnoise: Make interval u64 on osnoise_main
trace/osnoise: Fix 'no previous prototype' warnings
tracing: Have osnoise_main() add a quiescent state for task rcu
seq_buf: Make trace_seq_putmem_hex() support data longer than 8
seq_buf: Fix overflow in seq_buf_putmem_hex()
trace/osnoise: Support hotplug operations
trace/hwlat: Support hotplug operations
trace/hwlat: Protect kdata->kthread with get/put_online_cpus
trace: Add timerlat tracer
trace: Add osnoise tracer
...
Currently tgid_map is sized at PID_MAX_DEFAULT entries, which means that
on systems where pid_max is configured higher than PID_MAX_DEFAULT the
ftrace record-tgid option doesn't work so well. Any tasks with PIDs
higher than PID_MAX_DEFAULT are simply not recorded in tgid_map, and
don't show up in the saved_tgids file.
In particular since systemd v243 & above configure pid_max to its
highest possible 1<<22 value by default on 64 bit systems this renders
the record-tgids option of little use.
Increase the size of tgid_map to the configured pid_max instead,
allowing it to cover the full range of PIDs up to the maximum value of
PID_MAX_LIMIT if the system is configured that way.
On 64 bit systems with pid_max == PID_MAX_LIMIT this will increase the
size of tgid_map from 256KiB to 16MiB. Whilst this 64x increase in
memory overhead sounds significant 64 bit systems are presumably best
placed to accommodate it, and since tgid_map is only allocated when the
record-tgid option is actually used presumably the user would rather it
spends sufficient memory to actually record the tgids they expect.
The size of tgid_map could also increase for CONFIG_BASE_SMALL=y
configurations, but these seem unlikely to be systems upon which people
are both configuring a large pid_max and running ftrace with record-tgid
anyway.
Of note is that we only allocate tgid_map once, the first time that the
record-tgid option is enabled. Therefore its size is only set once, to
the value of pid_max at the time the record-tgid option is first
enabled. If a user increases pid_max after that point, the saved_tgids
file will not contain entries for any tasks with pids beyond the earlier
value of pid_max.
Link: https://lkml.kernel.org/r/20210701172407.889626-2-paulburton@google.com
Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks")
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Burton <paulburton@google.com>
[ Fixed comment coding style ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>