linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-11 14:42:24 +00:00

Author	SHA1	Message	Date
Namhyung Kim	a737e6dd7b	ftrace: Get rid of obsolete global_start_up variable It seems like it's a leftover from commit `4104d326b6` ("ftrace: Remove global function list and call function directly"). As it isn't updated at all, checking its value is meaningless. Let's get rid of it. Link: http://lkml.kernel.org/p/1402584972-17824-1-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:40 -04:00
Steven Rostedt (Red Hat)	7b039cb4c5	tracing: Add trace_seq_buffer_ptr() helper function There's several locations in the kernel that open code the calculation of the next location in the trace_seq buffer. This is usually done with p->buffer + p->len Instead of having this open coded, supply a helper function in the header to do it for them. This function is called trace_seq_buffer_ptr(). Link: http://lkml.kernel.org/p/20140626220129.452783019@goodmis.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:39 -04:00
Fabian Frederick	3f4d8f78a0	tracing: Remove unnecessary null test before debugfs_remove() This fixes checkpatch warning: "WARNING: debugfs_remove(NULL) is safe this check is probably not required" Link: http://lkml.kernel.org/p/1403802871-8599-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:38 -04:00
Steven Rostedt (Red Hat)	9096032fbc	tracing: Remove trace_seq_reserve() trace_seq_reserve() has no users in the kernel, it just wastes space. Remove it. Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:37 -04:00
Steven Rostedt (Red Hat)	6d2289f3fa	tracing: Make trace_seq_putmem_hex() more robust Currently trace_seq_putmem_hex() can only take as a parameter a pointer to something that is 8 bytes or less, otherwise it will overflow the buffer. This is protected by a macro that encompasses the call to trace_seq_putmem_hex() that has a BUILD_BUG_ON() for the variable before it is passed in. This is not very robust and if trace_seq_putmem_hex() ever gets used outside that macro it will cause issues. Instead of only being able to produce a hex output of memory that is for a single word, change it to be more robust and allow any size input. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:37 -04:00
Steven Rostedt (Red Hat)	36aabfff50	tracing: Clean up trace_seq.c For using trace_seq_() functions in NMI context, I posted a patch to move it to the lib/ directory. This caused Andrew Morton to take a look at the code. He went through and gave a lot of comments about missing kernel doc, inconsistent types for the save variable, mix match of EXPORT_SYMBOL_GPL() and EXPORT_SYMBOL() as well as missing EXPORT_SYMBOL()s. There were a few comments about the way variables were being compared (int vs uint). All these were good review comments and should be implemented regardless of if trace_seq.c should be moved to lib/ or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:36 -04:00
Steven Rostedt (Red Hat)	12306276fa	tracing: Move the trace_seq_* functions into its own trace_seq.c file The trace_seq_*() functions are a nice utility that allows users to manipulate buffers with printf() like formats. It has its own trace_seq.h header in include/linux and should be in its own file. Being tied with trace_output.c is rather awkward. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:35 -04:00
Masami Hiramatsu	5c27c775d5	ftrace: Simplify ftrace_hash_disable/enable path in ftrace_hash_move Simplify ftrace_hash_disable/enable path in ftrace_hash_move for hardening the process if the memory allocation failed. Link: http://lkml.kernel.org/p/20140617110442.15167.81076.stgit@kbuild-fedora.novalocal Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:34 -04:00
Steven Rostedt (Red Hat)	9674b2fada	ftrace: Add trampolines to enabled_functions debug file The enabled_functions is used to help debug the dynamic function tracing. Adding what trampolines are attached to files is useful for debugging. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:32 -04:00
Steven Rostedt (Red Hat)	79922b8009	ftrace: Optimize function graph to be called directly Function graph tracing is a bit different than the function tracers, as it is processed after either the ftrace_caller or ftrace_regs_caller and we only have one place to modify the jump to ftrace_graph_caller, the jump needs to happen after the restore of registeres. The function graph tracer is dependent on the function tracer, where even if the function graph tracing is going on by itself, the save and restore of registers is still done for function tracing regardless of if function tracing is happening, before it calls the function graph code. If there's no function tracing happening, it is possible to just call the function graph tracer directly, and avoid the wasted effort to save and restore regs for function tracing. This requires adding new flags to the dyn_ftrace records: FTRACE_FL_TRAMP FTRACE_FL_TRAMP_EN The first is set if the count for the record is one, and the ftrace_ops associated to that record has its own trampoline. That way the mcount code can call that trampoline directly. In the future, trampolines can be added to arbitrary ftrace_ops, where you can have two or more ftrace_ops registered to ftrace (like kprobes and perf) and if they are not tracing the same functions, then instead of doing a loop to check all registered ftrace_ops against their hashes, just call the ftrace_ops trampoline directly, which would call the registered ftrace_ops function directly. Without this patch perf showed: 0.05% hackbench [kernel.kallsyms] [k] ftrace_caller 0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save 0.05% hackbench [kernel.kallsyms] [k] native_sched_clock 0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit 0.04% hackbench [kernel.kallsyms] [k] preempt_trace 0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return 0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check 0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller See that the ftrace_caller took up more time than the ftrace_graph_caller did. With this patch: 0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit 0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard 0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller 0.04% hackbench [kernel.kallsyms] [k] sched_clock The ftrace_caller is no where to be found and ftrace_graph_caller still takes up the same percentage. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-07-01 07:13:31 -04:00
Steven Rostedt (Red Hat)	0376bde11b	ftrace: Add ftrace_rec_counter() macro to simplify the code The ftrace dynamic record has a flags element that also has a counter. Instead of hard coding "rec->flags & ~FTRACE_FL_MASK" all over the place. Use a macro instead. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-30 10:09:56 -04:00
Steven Rostedt (Red Hat)	4fbb48cb11	ftrace: Allow no regs if no more callbacks require it When registering a function callback for the function tracer, the ops can specify if it wants to save full regs (like an interrupt would) for each function that it traces, or if it does not care about regs and just wants to have the fastest return possible. Once a ops has registered a function, if other ops register that function they all will receive the regs too. That's because it does the work once, it does it for everyone. Now if the ops wanting regs unregisters the function so that there's only ops left that do not care about regs, those ops will still continue getting regs and going through the work for it on that function. This is because the disabling of the rec counter only sees the ops registered, and does not see the ops that are still attached, and does not know if the current ops that are still attached want regs or not. To play it safe, it just keeps regs being processed until no function is registered anymore. Instead of doing that, check the ops that are still registered for that function and if none want regs for it anymore, then disable the processing of regs. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-30 10:09:53 -04:00
Linus Torvalds	8841c8b3c4	One bug fix that goes back to 3.10. Accessing a non existent buffer if "possible cpus" is greater than actual CPUs (including offline CPUs). Namhyung Kim did some reviews of the patches I sent this merge window and found a memory leak and had a few clean ups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTmcYfAAoJEKQekfcNnQGusVIH/2wvfh1D4Mu+qCUsG7HqMLWM wHhTHcJsiE5rBpfcvc+XoLLBMGn9IKCeClGG59KYJGzznbJHHLmk1dy4qdSqNAel POTEtGh+AUX0wFBZtVLl2AesFZRsbtaxoNqD0ZBLhkHCV9FBEKbsm8uDCtJIf8wR vDDz27nPXkiFWX+wM9Z+tFSL7rohxvwREKlQjfk5Z9plyUcURE8GDbrWl870ZSJX uYzSYzioCyYdy/cpasDuTX22/I+nNUxA4dWTeyciigYC+6IqLJRIwqEXJ4RjYmfm v9+hf6KkqF8CY6mDjObLkKmy0gubPBQ8Op1G/3ChluUrjkwdffgb6oiLb352yHY= =XD1A -----END PGP SIGNATURE----- Merge tag 'trace-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing cleanups and bugfixes from Steven Rostedt: "One bug fix that goes back to 3.10. Accessing a non existent buffer if "possible cpus" is greater than actual CPUs (including offline CPUs). Namhyung Kim did some reviews of the patches I sent this merge window and found a memory leak and had a few clean ups" * tag 'trace-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix check of ftrace_trace_arrays list_empty() check tracing: Fix leak of per cpu max data in instances tracing: Cleanup saved_cmdlines_size changes ring-buffer: Check if buffer exists before polling	2014-06-12 21:07:25 -07:00
Linus Torvalds	3737a12761	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Ingo Molnar: "A second round of perf updates: - wide reaching kprobes sanitization and robustization, with the hope of fixing all 'probe this function crashes the kernel' bugs, by Masami Hiramatsu. - uprobes updates from Oleg Nesterov: tmpfs support, corner case fixes and robustization work. - perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo et al: * Add support to accumulate hist periods (Namhyung Kim) * various fixes, refactorings and enhancements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits) perf: Differentiate exec() and non-exec() comm events perf: Fix perf_event_comm() vs. exec() assumption uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates perf/documentation: Add description for conditional branch filter perf/x86: Add conditional branch filtering support perf/tool: Add conditional branch filter 'cond' to perf record perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND' uprobes: Teach copy_insn() to support tmpfs uprobes: Shift ->readpage check from __copy_insn() to uprobe_register() perf/x86: Use common PMU interrupt disabled code perf/ARM: Use common PMU interrupt disabled code perf: Disable sampled events if no PMU interrupt perf: Fix use after free in perf_remove_from_context() perf tools: Fix 'make help' message error perf record: Fix poll return value propagation perf tools: Move elide bool into perf_hpp_fmt struct perf tools: Remove elide setup for SORT_MODE__MEMORY mode perf tools: Fix "==" into "=" in ui_browser__warning assignment perf tools: Allow overriding sysfs and proc finding with env var perf tools: Consider header files outside perf directory in tags target ...	2014-06-12 19:18:49 -07:00
Steven Rostedt (Red Hat)	da9c3413a2	tracing: Fix check of ftrace_trace_arrays list_empty() check The check that tests if ftrace_trace_arrays is empty in top_trace_array(), uses the .prev pointer: if (list_empty(ftrace_trace_arrays.prev)) instead of testing the variable itself: if (list_empty(&ftrace_trace_arrays)) Although it is technically correct, it is awkward and confusing. Use the proper method. Link: http://lkml.kernel.org/r/87oay1bas8.fsf@sejong.aot.lge.com Reported-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-10 13:53:50 -04:00
Steven Rostedt (Red Hat)	f0b70cc48c	tracing: Fix leak of per cpu max data in instances The freeing of an instance, if max data is configured, there will be per cpu data structures created. But these are not freed when the instance is deleted, which causes a memory leak. A new helper function is added that frees the individual buffers within a trace array, instead of duplicating the code. This way changes made for one are applied to the other (normal buffer vs max buffer). Link: http://lkml.kernel.org/r/87k38pbake.fsf@sejong.aot.lge.com Reported-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-10 12:06:30 -04:00
Namhyung Kim	a6af8fbf17	tracing: Cleanup saved_cmdlines_size changes The recent addition of saved_cmdlines_size file had some remaining (minor - mostly coding style) issues. Fix them by passing pointer name to sizeof() and using scnprintf(). Link: http://lkml.kernel.org/p/1402384295-23680-1-git-send-email-namhyung@kernel.org Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-10 09:51:10 -04:00
Steven Rostedt (Red Hat)	8b8b36834d	ring-buffer: Check if buffer exists before polling The per_cpu buffers are created one per possible CPU. But these do not mean that those CPUs are online, nor do they even exist. With the addition of the ring buffer polling, it assumes that the caller polls on an existing buffer. But this is not the case if the user reads trace_pipe from a CPU that does not exist, and this causes the kernel to crash. Simple fix is to check the cpu against buffer bitmask against to see if the buffer was allocated or not and return -ENODEV if it is not. More updates were done to pass the -ENODEV back up to userspace. Link: http://lkml.kernel.org/r/5393DB61.6060707@oracle.com Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-10 09:46:00 -04:00
Linus Torvalds	214b931320	Lots of tweaks, small fixes, optimizations, and some helper functions to help out the rest of the kernel to ease their use of trace events. The big change for this release is the allowing of other tracers, such as the latency tracers, to be used in the trace instances and allow for function or function graph tracing to be in the top level simultaneously. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTlbUqAAoJEKQekfcNnQGuP+8H+wTBG06beHsqe6XcaeXcKNkt Mimm0O04oQdw89CBWeJvXyOwRTtiN4M/4hxHXBTDtChxM9oUyWw1o0IpSMMuQ16O w9r3DfC8e1air+ufEuYWM0QNtyzHi8EfDSNia55ON5jvtkCZTXOEKZD+n8M9w28p I7PVgr0PDztsCpethCpg0M8beK9zuQPWMzsHAQCsKI06Xl5z33kPIJR15Exh+Kr1 uVVTZW7JFVAPuSnteLSIx9pN6OjsVGzOZCljg+O+9/v/02u5nkMiS2nURxae86kg RTSiRYT6Hvl/MCBhdss/w5kgSk6BYiZ0hXbLtwetvre+vQrOR5CnDw2DxZ7e+gU= =oudH -----END PGP SIGNATURE----- Merge tag 'trace-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Lots of tweaks, small fixes, optimizations, and some helper functions to help out the rest of the kernel to ease their use of trace events. The big change for this release is the allowing of other tracers, such as the latency tracers, to be used in the trace instances and allow for function or function graph tracing to be in the top level simultaneously" * tag 'trace-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits) tracing: Fix memory leak on instance deletion tracing: Fix leak of ring buffer data when new instances creation fails tracing/kprobes: Avoid self tests if tracing is disabled on boot up tracing: Return error if ftrace_trace_arrays list is empty tracing: Only calculate stats of tracepoint benchmarks for 2^32 times tracing: Convert stddev into u64 in tracepoint benchmark tracing: Introduce saved_cmdlines_size file tracing: Add __get_dynamic_array_len() macro for trace events tracing: Remove unused variable in trace_benchmark tracing: Eliminate double free on failure of allocation on boot up ftrace/x86: Call text_ip_addr() instead of the duplicated code tracing: Print max callstack on stacktrace bug tracing: Move locking of trace_cmdline_lock into start/stop seq calls tracing: Try again for saved cmdline if failed due to locking tracing: Have saved_cmdlines use the seq_read infrastructure tracing: Add tracepoint benchmark tracepoint tracing: Print nasty banner when trace_printk() is in use tracing: Add funcgraph_tail option to print function name after closing braces tracing: Eliminate duplicate TRACE_GRAPH_PRINT_xx defines tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks ...	2014-06-09 16:39:15 -07:00
Steven Rostedt (Red Hat)	a9fcaaac37	tracing: Fix memory leak on instance deletion When an instance is created, it also gets a snapshot ring buffer allocated (with minimum of pages). But when it is deleted the snapshot buffer is not. There was a helper function added to match the allocation of these ring buffers to a way to free them, but it wasn't used by the deletion of an instance. Using that helper function solves this memory leak. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-06 23:17:28 -04:00
Steven Rostedt (Red Hat)	23aaa3c18e	tracing: Fix leak of ring buffer data when new instances creation fails Yoshihiro Yunomae reported that the ring buffer data for a trace instance does not get properly cleaned up when it fails. He proposed a patch that manually cleaned the data up and addad a bunch of labels. The labels are not needed because all trace array is allocated with a kzalloc which initializes it to 0 and all kfree()s can take a NULL pointer and will ignore it. Adding a new helper function free_trace_buffers() that can also take null buffers to free the buffers that were allocated by allocate_trace_buffers(). Link: http://lkml.kernel.org/r/20140605223522.32311.31664.stgit@yunodevel Reported-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-06 04:53:40 -04:00
Yoshihiro YUNOMAE	748ec3a20e	tracing/kprobes: Avoid self tests if tracing is disabled on boot up If tracing is disabled on boot up, the kernel should not execute tracing self tests. The kernel should check whether tracing is disabled or not before executing any of the tracing self tests. Link: http://lkml.kernel.org/p/20140605223520.32311.56097.stgit@yunodevel Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-06 04:53:39 -04:00
Yoshihiro YUNOMAE	dc81e5e3ab	tracing: Return error if ftrace_trace_arrays list is empty ftrace_trace_arrays links global_trace.list. However, global_trace is not added to ftrace_trace_arrays if trace_alloc_buffers() failed. As the result, ftrace_trace_arrays becomes an empty list. If ftrace_trace_arrays is an empty list, current top_trace_array() returns an invalid pointer. As the result, the kernel can induce memory corruption or panic. Current implementation does not check whether ftrace_trace_arrays is empty list or not. So, in this patch, if ftrace_trace_arrays is empty list, top_trace_array() returns NULL. Moreover, this patch makes all functions calling top_trace_array() handle it appropriately. Link: http://lkml.kernel.org/p/20140605223517.32311.99233.stgit@yunodevel Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-06 04:47:46 -04:00
Steven Rostedt (Red Hat)	34839f5a69	tracing: Only calculate stats of tracepoint benchmarks for 2^32 times When calculating the average and standard deviation, it is required that the count be less than UINT_MAX, otherwise the do_div() will get undefined results. After 2^32 counts of data, the average and standard deviation should pretty much be set anyway. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-06 00:41:38 -04:00
Steven Rostedt (Red Hat)	72e2fe38ea	tracing: Convert stddev into u64 in tracepoint benchmark I've been told that do_div() expects an unsigned 64 bit number, and is undefined if a signed is used. This gave a warning on the MIPS build. I'm not sure if a signed 64 bit dividend is really an issue or not, but the calculation this is used for is standard deviation, and that isn't going to be negative. We can just convert it to unsigned and be safe. Reported-by: David Daney <ddaney.cavm@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-05 20:35:30 -04:00
Yoshihiro YUNOMAE	939c7a4f04	tracing: Introduce saved_cmdlines_size file Introduce saved_cmdlines_size file for changing the number of saved pid-comms. saved_cmdlines currently stores 128 command names using SAVED_CMDLINES, but 'no-existing processes' names are often lost in saved_cmdlines when we read the trace data. So, by introducing saved_cmdlines_size file, we can now change the 128 command names saved to something much larger if needed. When we write a value to saved_cmdlines_size, the number of the value will be stored in pid-comm list: # echo 1024 > /sys/kernel/debug/tracing/saved_cmdlines_size Here, 1024 command names can be stored. The default number is 128 and the maximum number is PID_MAX_DEFAULT (=32768 if CONFIG_BASE_SMALL is not set). So, if we want to avoid losing any command names, we need to set 32768 to saved_cmdlines_size. We can read the maximum number of the list: # cat /sys/kernel/debug/tracing/saved_cmdlines_size 128 Link: http://lkml.kernel.org/p/20140605012427.22115.16173.stgit@yunodevel Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-05 12:35:49 -04:00
Ingo Molnar	10b0256496	Merge branch 'perf/kprobes' into perf/core Conflicts: arch/x86/kernel/traps.c The kprobes enhancements are fully cooked, ship them upstream. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-05 12:26:50 +02:00
Ingo Molnar	c56d34064b	Merge branch 'perf/uprobes' into perf/core These bits from Oleg are fully cooked, ship them to Linus. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-05 12:26:27 +02:00
Steven Rostedt (Red Hat)	195d280201	tracing: Remove unused variable in trace_benchmark Somehow this unused variable warning sneaked past my warnings check (probably due to it depending on a new config). kernel/trace/trace_benchmark.c: In function 'trace_do_benchmark': kernel/trace/trace_benchmark.c:38:6: warning: unused variable 'seedsq' [-Wunused-variable] u64 seedsq; ^ Link: http://lkml.kernel.org/r/20140604160921.4f4e69c4@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-04 13:36:29 -04:00
Yoshihiro YUNOMAE	198376cd88	tracing: Eliminate double free on failure of allocation on boot up If allocation of the max_buffer fails on boot up, the error path will free both per_cpu data structures from the buffers. With the new redesign of the code, those structures are freed if allocations failed. That is, the helper function that allocates the buffers will free the per cpu data on failure. No need to do it again. In fact, the second free will cause a bug as the code can not handle a double free. Link: http://lkml.kernel.org/p/20140603042803.27308.30956.stgit@yunodevel Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-03 19:58:31 -04:00
Minchan Kim	e317218194	tracing: Print max callstack on stacktrace bug While I played with my own feature(ex, something on the way to reclaim), the kernel would easily oops. I guessed that the reason had to do with stack overflow and wanted to prove it. I discovered the stack tracer which proved to be very useful for me but the kernel would oops before my user program gather the information via "watch cat /sys/kernel/debug/tracing/stack_trace" so I couldn't get any message from that. What I needed was to have the stack tracer emit the kernel stack usage before it does the oops so I could find what was hogging the stack. This patch shows the callstack of max stack usage right before an oops so we can find a culprit. So, the result is as follows. [ 1116.522206] init: lightdm main process (1246) terminated with status 1 [ 1119.922916] init: failsafe-x main process (1272) terminated with status 1 [ 3887.728131] kworker/u24:1 (6637) used greatest stack depth: 256 bytes left [ 6397.629227] cc1 (9554) used greatest stack depth: 128 bytes left [ 7174.467392] Depth Size Location (47 entries) [ 7174.467392] ----- ---- -------- [ 7174.467785] 0) 7248 256 get_page_from_freelist+0xa7/0x920 [ 7174.468506] 1) 6992 352 __alloc_pages_nodemask+0x1cd/0xb20 [ 7174.469224] 2) 6640 8 alloc_pages_current+0x10f/0x1f0 [ 7174.469413] 3) 6632 168 new_slab+0x2c5/0x370 [ 7174.469413] 4) 6464 8 __slab_alloc+0x3a9/0x501 [ 7174.469413] 5) 6456 80 __kmalloc+0x1cb/0x200 [ 7174.469413] 6) 6376 376 vring_add_indirect+0x36/0x200 [ 7174.469413] 7) 6000 144 virtqueue_add_sgs+0x2e2/0x320 [ 7174.469413] 8) 5856 288 __virtblk_add_req+0xda/0x1b0 [ 7174.469413] 9) 5568 96 virtio_queue_rq+0xd3/0x1d0 [ 7174.469413] 10) 5472 128 __blk_mq_run_hw_queue+0x1ef/0x440 [ 7174.469413] 11) 5344 16 blk_mq_run_hw_queue+0x35/0x40 [ 7174.469413] 12) 5328 96 blk_mq_insert_requests+0xdb/0x160 [ 7174.469413] 13) 5232 112 blk_mq_flush_plug_list+0x12b/0x140 [ 7174.469413] 14) 5120 112 blk_flush_plug_list+0xc7/0x220 [ 7174.469413] 15) 5008 64 io_schedule_timeout+0x88/0x100 [ 7174.469413] 16) 4944 128 mempool_alloc+0x145/0x170 [ 7174.469413] 17) 4816 96 bio_alloc_bioset+0x10b/0x1d0 [ 7174.469413] 18) 4720 48 get_swap_bio+0x30/0x90 [ 7174.469413] 19) 4672 160 __swap_writepage+0x150/0x230 [ 7174.469413] 20) 4512 32 swap_writepage+0x42/0x90 [ 7174.469413] 21) 4480 320 shrink_page_list+0x676/0xa80 [ 7174.469413] 22) 4160 208 shrink_inactive_list+0x262/0x4e0 [ 7174.469413] 23) 3952 304 shrink_lruvec+0x3e1/0x6a0 [ 7174.469413] 24) 3648 80 shrink_zone+0x3f/0x110 [ 7174.469413] 25) 3568 128 do_try_to_free_pages+0x156/0x4c0 [ 7174.469413] 26) 3440 208 try_to_free_pages+0xf7/0x1e0 [ 7174.469413] 27) 3232 352 __alloc_pages_nodemask+0x783/0xb20 [ 7174.469413] 28) 2880 8 alloc_pages_current+0x10f/0x1f0 [ 7174.469413] 29) 2872 200 __page_cache_alloc+0x13f/0x160 [ 7174.469413] 30) 2672 80 find_or_create_page+0x4c/0xb0 [ 7174.469413] 31) 2592 80 ext4_mb_load_buddy+0x1e9/0x370 [ 7174.469413] 32) 2512 176 ext4_mb_regular_allocator+0x1b7/0x460 [ 7174.469413] 33) 2336 128 ext4_mb_new_blocks+0x458/0x5f0 [ 7174.469413] 34) 2208 256 ext4_ext_map_blocks+0x70b/0x1010 [ 7174.469413] 35) 1952 160 ext4_map_blocks+0x325/0x530 [ 7174.469413] 36) 1792 384 ext4_writepages+0x6d1/0xce0 [ 7174.469413] 37) 1408 16 do_writepages+0x23/0x40 [ 7174.469413] 38) 1392 96 __writeback_single_inode+0x45/0x2e0 [ 7174.469413] 39) 1296 176 writeback_sb_inodes+0x2ad/0x500 [ 7174.469413] 40) 1120 80 __writeback_inodes_wb+0x9e/0xd0 [ 7174.469413] 41) 1040 160 wb_writeback+0x29b/0x350 [ 7174.469413] 42) 880 208 bdi_writeback_workfn+0x11c/0x480 [ 7174.469413] 43) 672 144 process_one_work+0x1d2/0x570 [ 7174.469413] 44) 528 112 worker_thread+0x116/0x370 [ 7174.469413] 45) 416 240 kthread+0xf3/0x110 [ 7174.469413] 46) 176 176 ret_from_fork+0x7c/0xb0 [ 7174.469413] ------------[ cut here ]------------ [ 7174.469413] kernel BUG at kernel/trace/trace_stack.c:174! [ 7174.469413] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 7174.469413] Dumping ftrace buffer: [ 7174.469413] (ftrace buffer empty) [ 7174.469413] Modules linked in: [ 7174.469413] CPU: 0 PID: 440 Comm: kworker/u24:0 Not tainted 3.14.0+ #212 [ 7174.469413] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 7174.469413] Workqueue: writeback bdi_writeback_workfn (flush-253:0) [ 7174.469413] task: ffff880034170000 ti: ffff880029518000 task.ti: ffff880029518000 [ 7174.469413] RIP: 0010:[<ffffffff8112336e>] [<ffffffff8112336e>] stack_trace_call+0x2de/0x340 [ 7174.469413] RSP: 0000:ffff880029518290 EFLAGS: 00010046 [ 7174.469413] RAX: 0000000000000030 RBX: 000000000000002f RCX: 0000000000000000 [ 7174.469413] RDX: 0000000000000000 RSI: 000000000000002f RDI: ffffffff810b7159 [ 7174.469413] RBP: ffff8800295182f0 R08: ffffffffffffffff R09: 0000000000000000 [ 7174.469413] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff82768dfc [ 7174.469413] R13: 000000000000f2e8 R14: ffff8800295182b8 R15: 00000000000000f8 [ 7174.469413] FS: 0000000000000000(0000) GS:ffff880037c00000(0000) knlGS:0000000000000000 [ 7174.469413] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 7174.469413] CR2: 00002acd0b994000 CR3: 0000000001c0b000 CR4: 00000000000006f0 [ 7174.469413] Stack: [ 7174.469413] 0000000000000000 ffffffff8114fdb7 0000000000000087 0000000000001c50 [ 7174.469413] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 7174.469413] 0000000000000002 ffff880034170000 ffff880034171028 0000000000000000 [ 7174.469413] Call Trace: [ 7174.469413] [<ffffffff8114fdb7>] ? get_page_from_freelist+0xa7/0x920 [ 7174.469413] [<ffffffff816eee3f>] ftrace_call+0x5/0x2f [ 7174.469413] [<ffffffff81165065>] ? next_zones_zonelist+0x5/0x70 [ 7174.469413] [<ffffffff810a23fa>] ? __bfs+0x11a/0x270 [ 7174.469413] [<ffffffff81165065>] ? next_zones_zonelist+0x5/0x70 [ 7174.469413] [<ffffffff8114fdb7>] ? get_page_from_freelist+0xa7/0x920 [ 7174.469413] [<ffffffff8119092f>] ? alloc_pages_current+0x10f/0x1f0 [ 7174.469413] [<ffffffff811507fd>] __alloc_pages_nodemask+0x1cd/0xb20 [ 7174.469413] [<ffffffff810a4de6>] ? check_irq_usage+0x96/0xe0 [ 7174.469413] [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f [ 7174.469413] [<ffffffff8119092f>] alloc_pages_current+0x10f/0x1f0 [ 7174.469413] [<ffffffff81199cd5>] ? new_slab+0x2c5/0x370 [ 7174.469413] [<ffffffff81199cd5>] new_slab+0x2c5/0x370 [ 7174.469413] [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f [ 7174.469413] [<ffffffff816db002>] __slab_alloc+0x3a9/0x501 [ 7174.469413] [<ffffffff8119af8b>] ? __kmalloc+0x1cb/0x200 [ 7174.469413] [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200 [ 7174.469413] [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200 [ 7174.469413] [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200 [ 7174.469413] [<ffffffff8119af8b>] __kmalloc+0x1cb/0x200 [ 7174.469413] [<ffffffff8141de10>] ? vring_add_indirect+0x200/0x200 [ 7174.469413] [<ffffffff8141dc46>] vring_add_indirect+0x36/0x200 [ 7174.469413] [<ffffffff8141e402>] virtqueue_add_sgs+0x2e2/0x320 [ 7174.469413] [<ffffffff8148e35a>] __virtblk_add_req+0xda/0x1b0 [ 7174.469413] [<ffffffff8148e503>] virtio_queue_rq+0xd3/0x1d0 [ 7174.469413] [<ffffffff8134aa0f>] __blk_mq_run_hw_queue+0x1ef/0x440 [ 7174.469413] [<ffffffff8134b0d5>] blk_mq_run_hw_queue+0x35/0x40 [ 7174.469413] [<ffffffff8134b7bb>] blk_mq_insert_requests+0xdb/0x160 [ 7174.469413] [<ffffffff8134be5b>] blk_mq_flush_plug_list+0x12b/0x140 [ 7174.469413] [<ffffffff81342237>] blk_flush_plug_list+0xc7/0x220 [ 7174.469413] [<ffffffff816e60ef>] ? _raw_spin_unlock_irqrestore+0x3f/0x70 [ 7174.469413] [<ffffffff816e16e8>] io_schedule_timeout+0x88/0x100 [ 7174.469413] [<ffffffff816e1665>] ? io_schedule_timeout+0x5/0x100 [ 7174.469413] [<ffffffff81149415>] mempool_alloc+0x145/0x170 [ 7174.469413] [<ffffffff8109baf0>] ? __init_waitqueue_head+0x60/0x60 [ 7174.469413] [<ffffffff811e246b>] bio_alloc_bioset+0x10b/0x1d0 [ 7174.469413] [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0 [ 7174.469413] [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0 [ 7174.469413] [<ffffffff81184110>] get_swap_bio+0x30/0x90 [ 7174.469413] [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0 [ 7174.469413] [<ffffffff81184660>] __swap_writepage+0x150/0x230 [ 7174.469413] [<ffffffff810ab405>] ? do_raw_spin_unlock+0x5/0xa0 [ 7174.469413] [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0 [ 7174.469413] [<ffffffff81184515>] ? __swap_writepage+0x5/0x230 [ 7174.469413] [<ffffffff81184782>] swap_writepage+0x42/0x90 [ 7174.469413] [<ffffffff8115ae96>] shrink_page_list+0x676/0xa80 [ 7174.469413] [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f [ 7174.469413] [<ffffffff8115b872>] shrink_inactive_list+0x262/0x4e0 [ 7174.469413] [<ffffffff8115c1c1>] shrink_lruvec+0x3e1/0x6a0 [ 7174.469413] [<ffffffff8115c4bf>] shrink_zone+0x3f/0x110 [ 7174.469413] [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f [ 7174.469413] [<ffffffff8115c9e6>] do_try_to_free_pages+0x156/0x4c0 [ 7174.469413] [<ffffffff8115cf47>] try_to_free_pages+0xf7/0x1e0 [ 7174.469413] [<ffffffff81150db3>] __alloc_pages_nodemask+0x783/0xb20 [ 7174.469413] [<ffffffff8119092f>] alloc_pages_current+0x10f/0x1f0 [ 7174.469413] [<ffffffff81145c0f>] ? __page_cache_alloc+0x13f/0x160 [ 7174.469413] [<ffffffff81145c0f>] __page_cache_alloc+0x13f/0x160 [ 7174.469413] [<ffffffff81146c6c>] find_or_create_page+0x4c/0xb0 [ 7174.469413] [<ffffffff811463e5>] ? find_get_page+0x5/0x130 [ 7174.469413] [<ffffffff812837b9>] ext4_mb_load_buddy+0x1e9/0x370 [ 7174.469413] [<ffffffff81284c07>] ext4_mb_regular_allocator+0x1b7/0x460 [ 7174.469413] [<ffffffff81281070>] ? ext4_mb_use_preallocated+0x40/0x360 [ 7174.469413] [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f [ 7174.469413] [<ffffffff81287eb8>] ext4_mb_new_blocks+0x458/0x5f0 [ 7174.469413] [<ffffffff8127d83b>] ext4_ext_map_blocks+0x70b/0x1010 [ 7174.469413] [<ffffffff8124e6d5>] ext4_map_blocks+0x325/0x530 [ 7174.469413] [<ffffffff81253871>] ext4_writepages+0x6d1/0xce0 [ 7174.469413] [<ffffffff812531a0>] ? ext4_journalled_write_end+0x330/0x330 [ 7174.469413] [<ffffffff811539b3>] do_writepages+0x23/0x40 [ 7174.469413] [<ffffffff811d2365>] __writeback_single_inode+0x45/0x2e0 [ 7174.469413] [<ffffffff811d36ed>] writeback_sb_inodes+0x2ad/0x500 [ 7174.469413] [<ffffffff811d39de>] __writeback_inodes_wb+0x9e/0xd0 [ 7174.469413] [<ffffffff811d40bb>] wb_writeback+0x29b/0x350 [ 7174.469413] [<ffffffff81057c3d>] ? __local_bh_enable_ip+0x6d/0xd0 [ 7174.469413] [<ffffffff811d6e9c>] bdi_writeback_workfn+0x11c/0x480 [ 7174.469413] [<ffffffff81070610>] ? process_one_work+0x170/0x570 [ 7174.469413] [<ffffffff81070672>] process_one_work+0x1d2/0x570 [ 7174.469413] [<ffffffff81070610>] ? process_one_work+0x170/0x570 [ 7174.469413] [<ffffffff81071bb6>] worker_thread+0x116/0x370 [ 7174.469413] [<ffffffff81071aa0>] ? manage_workers.isra.19+0x2e0/0x2e0 [ 7174.469413] [<ffffffff81078e53>] kthread+0xf3/0x110 [ 7174.469413] [<ffffffff81078d60>] ? flush_kthread_worker+0x150/0x150 [ 7174.469413] [<ffffffff816ef0ec>] ret_from_fork+0x7c/0xb0 [ 7174.469413] [<ffffffff81078d60>] ? flush_kthread_worker+0x150/0x150 [ 7174.469413] Code: c0 49 bc fc 8d 76 82 ff ff ff ff e8 44 5a 5b 00 31 f6 8b 05 95 2b b3 00 48 39 c6 7d 0e 4c 8b 04 f5 20 5f c5 81 49 83 f8 ff 75 11 <0f> 0b 48 63 05 71 5a 64 01 48 29 c3 e9 d0 fd ff ff 48 8d 5e 01 [ 7174.469413] RIP [<ffffffff8112336e>] stack_trace_call+0x2de/0x340 [ 7174.469413] RSP <ffff880029518290> [ 7174.469413] ---[ end trace c97d325b36b718f3 ]--- Link: http://lkml.kernel.org/p/1401683592-1651-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-06-02 16:43:49 -04:00
Steven Rostedt (Red Hat)	4c27e756bc	tracing: Move locking of trace_cmdline_lock into start/stop seq calls With the conversion of the saved_cmdlines output to use seq_read, there is now a race between accessing the values of the saved_cmdlines and the writing to them. The trace_cmdline_lock needs to be taken at the start and stop of the seq calls. A new __trace_find_cmdline() call is created to allow for the look up to happen without taking the lock. Fixes: `42584c81c5` tracing: Have saved_cmdlines use the seq_read infrastructure Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-30 13:03:40 -04:00
Steven Rostedt (Red Hat)	379cfdac37	tracing: Try again for saved cmdline if failed due to locking In order to prevent the saved cmdline cache from being filled when tracing is not active, the comms are only recorded after a trace event is recorded. The problem is, a comm can fail to be recorded if the trace_cmdline_lock is held. That lock is taken via a trylock to allow it to happen from any context (including NMI). If the lock fails to be taken, the comm is skipped. No big deal, as we will try again later. But! Because of the code that was added to only record after an event, we may not try again later as the recording is made as a oneshot per event per CPU. Only disable the recording of the comm if the comm is actually recorded. Fixes: `7ffbd48d5c` "tracing: Cache comms only after an event occurred" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-30 09:42:39 -04:00
Yoshihiro YUNOMAE	42584c81c5	tracing: Have saved_cmdlines use the seq_read infrastructure Current tracing_saved_cmdlines_read() implementation is naive; It allocates a large buffer, constructs output data to that buffer for each read operation, and then copies a portion of the buffer to the user space buffer. This has several issues such as slow memory allocation, high CPU usage, and even corruption of the output data. The seq_read infrastructure is made to handle this type of work. By converting it to use seq_read() the code becomes smaller, simplified, as well as correct. Link: http://lkml.kernel.org/p/20140220084431.3839.51793.stgit@yunodevel Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-29 23:08:07 -04:00
Steven Rostedt (Red Hat)	81dc9f0ef2	tracing: Add tracepoint benchmark tracepoint In order to help benchmark the time tracepoints take, a new config option is added called CONFIG_TRACEPOINT_BENCHMARK. When this option is set a tracepoint is created called "benchmark:benchmark_event". When the tracepoint is enabled, it kicks off a kernel thread that goes into an infinite loop (calling cond_sched() to let other tasks run), and calls the tracepoint. Each iteration will record the time it took to write to the tracepoint and the next iteration that data will be passed to the tracepoint itself. That is, the tracepoint will report the time it took to do the previous tracepoint. The string written to the tracepoint is a static string of 128 bytes to keep the time the same. The initial string is simply a write of "START". The second string records the cold cache time of the first write which is not added to the rest of the calculations. As it is a tight loop, it benchmarks as hot cache. That's fine because we care most about hot paths that are probably in cache already. An example of the output: START first=3672 [COLD CACHED] last=632 first=3672 max=632 min=632 avg=316 std=446 std^2=199712 last=278 first=3672 max=632 min=278 avg=303 std=316 std^2=100337 last=277 first=3672 max=632 min=277 avg=296 std=258 std^2=67064 last=273 first=3672 max=632 min=273 avg=292 std=224 std^2=50411 last=273 first=3672 max=632 min=273 avg=288 std=200 std^2=40389 last=281 first=3672 max=632 min=273 avg=287 std=183 std^2=33666 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-29 22:49:54 -04:00
Steven Rostedt	2184db46e4	tracing: Print nasty banner when trace_printk() is in use trace_printk() is used to debug fast paths within the kernel. Places that gets called in any context (interrupt or NMI) or thousands of times a second. Something you do not want to do with a printk(). In order to make it completely lockless as it needs a temporary buffer to handle some of the string formatting, a page is created per cpu for every context (four per cpu; normal, softirq, irq, NMI). Since trace_printk() should only be used for debugging purposes, there's no reason to waste memory on these buffers on a production system. That means, trace_printk() should never be used unless a developer is debugging their kernel. There's macro magic to allocate the buffers if trace_printk() is used anywhere in the kernel. To help enforce that trace_printk() isn't used outside of development, when it is used, a nasty banner is displayed on bootup (or when a module is loaded that uses trace_printk() and the kernel core does not). Here's the banner: ******************************************************** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE trace_printk() being used. Allocating extra memory. This means that this is a DEBUG kernel and it is unsafe for produciton use. If you see this message and you are not debugging the kernel, report this immediately to your vendor! NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ******************************************************** That should hopefully keep developers from trying to sneak in a trace_printk() or two. Link: http://lkml.kernel.org/p/20140528131440.2283213c@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-29 20:13:59 -04:00
Robert Elliott	607e3a2920	tracing: Add funcgraph_tail option to print function name after closing braces In the function-graph tracer, add a funcgraph_tail option to print the function name on all } lines, not just functions whose first line is no longer in the trace buffer. If a function calls other traced functions, its total time appears on its } line. This change allows grep to be used to determine the function for which the line corresponds. Update Documentation/trace/ftrace.txt to describe this new option. Link: http://lkml.kernel.org/p/20140520221041.8359.6782.stgit@beardog.cce.hp.com Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-20 23:29:32 -04:00
Robert Elliott	ccdb594653	tracing: Eliminate duplicate TRACE_GRAPH_PRINT_xx defines Eliminate duplicate TRACE_GRAPH_PRINT_xx defines in trace_functions_graph.c that are already in trace.h. Add TRACE_GRAPH_PRINT_IRQS to trace.h, which is the only one that is missing. Link: http://lkml.kernel.org/p/20140520221031.8359.24733.stgit@beardog.cce.hp.com Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-20 23:28:34 -04:00
Steven Rostedt (Red Hat)	4449bf927b	tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks Being able to show a cpumask of events can be useful as some events may affect only some CPUs. There is no standard way to record the cpumask and converting it to a string is rather expensive during the trace as traces happen in hotpaths. It would be better to record the raw event mask and be able to parse it at print time. The following macros were added for use with the TRACE_EVENT() macro: __bitmask() __assign_bitmask() __get_bitmask() To test this, I added this to the sched_migrate_task event, which looked like this: TRACE_EVENT(sched_migrate_task, TP_PROTO(struct task_struct p, int dest_cpu, const struct cpumask cpus), TP_ARGS(p, dest_cpu, cpus), TP_STRUCT__entry( __array( char, comm, TASK_COMM_LEN ) __field( pid_t, pid ) __field( int, prio ) __field( int, orig_cpu ) __field( int, dest_cpu ) __bitmask( cpumask, num_possible_cpus() ) ), TP_fast_assign( memcpy(__entry->comm, p->comm, TASK_COMM_LEN); __entry->pid = p->pid; __entry->prio = p->prio; __entry->orig_cpu = task_cpu(p); __entry->dest_cpu = dest_cpu; __assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus()); ), TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s", __entry->comm, __entry->pid, __entry->prio, __entry->orig_cpu, __entry->dest_cpu, __get_bitmask(cpumask)) ); With the output of: ksmtuned-3613 [003] d..2 485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f migration/1-13 [001] d..5 485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f awk-3615 [002] d.H5 485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff migration/2-18 [002] d..5 485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.home Suggested-by: Javi Merino <javi.merino@arm.com> Tested-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-15 11:29:37 -04:00
Steven Rostedt (Red Hat)	f1b2f2bd58	ftrace: Remove FTRACE_UPDATE_MODIFY_CALL_REGS flag As the decision to what needs to be done (converting a call to the ftrace_caller to ftrace_caller_regs or to convert from ftrace_caller_regs to ftrace_caller) can easily be determined from the rec->flags of FTRACE_FL_REGS and FTRACE_FL_REGS_EN, there's no need to have the ftrace_check_record() return either a UPDATE_MODIFY_CALL_REGS or a UPDATE_MODIFY_CALL. Just he latter is enough. This added flag causes more complexity than is required. Remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-14 11:37:30 -04:00
Steven Rostedt (Red Hat)	7c0868e03b	ftrace: Use the ftrace_addr helper functions to find the ftrace_addr With the moving of the functions that determine what the mcount call site should be replaced with into the generic code, there is a few places in the generic code that can use them instead of hard coding it as it does. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-14 11:37:29 -04:00
Steven Rostedt (Red Hat)	7413af1fb7	ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global Move and rename get_ftrace_addr() and get_ftrace_addr_old() to ftrace_get_addr_new() and ftrace_get_addr_curr() respectively. This moves these two helper functions in the generic code out from the arch specific code, and renames them to have a better generic name. This will allow other archs to use them as well as makes it a bit easier to work on getting separate trampolines for different functions. ftrace_get_addr_new() returns the trampoline address that the mcount call address will be converted to. ftrace_get_addr_curr() returns the trampoline address of what the mcount call address currently jumps to. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-14 11:37:29 -04:00
Steven Rostedt (Red Hat)	68f40969f0	ftrace: Always inline ftrace_hash_empty() helper function The ftrace_hash_empty() function is a simple test: return !hash \|\| !hash->count; But gcc seems to want to make it a call. As this is in an extreme hot path of the function tracer, there's no reason it needs to be a call. I only wrote it to be a helper function anyway, otherwise it would have been inlined manually. Force gcc to inline it, as it could have also been a macro. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-14 11:37:28 -04:00
Steven Rostedt (Red Hat)	19eab4a472	ftrace: Write in missing comment from a very old commit Back in 2011 Commit `ed926f9b35` "ftrace: Use counters to enable functions to trace" changed the way ftrace accounts for enabled and disabled traced functions. There was a comment started as: /* * */ But never finished. Well, that's rather useless. I probably forgot to save the file before committing it. And it passed review from all this time. Anyway, better late than never. I updated the comment to express what is happening in that somewhat complex code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-14 11:37:27 -04:00
Steven Rostedt (Red Hat)	66209a5bd4	ftrace: Remove boolean of hash_enable and hash_disable Commit `4104d326b6` "ftrace: Remove global function list and call function directly" cleaned up the global_ops filtering and made the code simpler, but it left a variable "hash_enable" that was used to know if the hash functions should be updated or not. It was updated if the global_ops did not override them. As the global_ops are now no different than any other ftrace_ops, the hash always gets updated and there's no reason to use the hash_enable boolean. The same goes for hash_disable used in ftrace_shutdown(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-14 11:37:25 -04:00
Christoph Lameter	bdffd893a0	tracing: Replace __get_cpu_var uses with this_cpu_ptr Replace uses of &__get_cpu_var for address calculation with this_cpu_ptr. Link: http://lkml.kernel.org/p/alpine.DEB.2.10.1404291415560.18364@gentwo.org Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-05 22:40:53 -04:00
Steven Rostedt (Red Hat)	561a4fe851	tracing: Use rcu_dereference_sched() for trace event triggers As trace event triggers are now part of the mainline kernel, I added my trace event trigger tests to my test suite I run on all my kernels. Now these tests get run under different config options, and one of those options is CONFIG_PROVE_RCU, which checks under lockdep that the rcu locking primitives are being used correctly. This triggered the following splat: =============================== [ INFO: suspicious RCU usage. ] 3.15.0-rc2-test+ #11 Not tainted ------------------------------- kernel/trace/trace_events_trigger.c:80 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 4 locks held by swapper/1/0: #0: ((&(&j_cdbs->work)->timer)){..-...}, at: [<ffffffff8104d2cc>] call_timer_fn+0x5/0x1be #1: (&(&pool->lock)->rlock){-.-...}, at: [<ffffffff81059856>] __queue_work+0x140/0x283 #2: (&p->pi_lock){-.-.-.}, at: [<ffffffff8106e961>] try_to_wake_up+0x2e/0x1e8 #3: (&rq->lock){-.-.-.}, at: [<ffffffff8106ead3>] try_to_wake_up+0x1a0/0x1e8 stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc2-test+ #11 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 0000000000000001 ffff88007e083b98 ffffffff819f53a5 0000000000000006 ffff88007b0942c0 ffff88007e083bc8 ffffffff81081307 ffff88007ad96d20 0000000000000000 ffff88007af2d840 ffff88007b2e701c ffff88007e083c18 Call Trace: <IRQ> [<ffffffff819f53a5>] dump_stack+0x4f/0x7c [<ffffffff81081307>] lockdep_rcu_suspicious+0x107/0x110 [<ffffffff810ee51c>] event_triggers_call+0x99/0x108 [<ffffffff810e8174>] ftrace_event_buffer_commit+0x42/0xa4 [<ffffffff8106aadc>] ftrace_raw_event_sched_wakeup_template+0x71/0x7c [<ffffffff8106bcbf>] ttwu_do_wakeup+0x7f/0xff [<ffffffff8106bd9b>] ttwu_do_activate.constprop.126+0x5c/0x61 [<ffffffff8106eadf>] try_to_wake_up+0x1ac/0x1e8 [<ffffffff8106eb77>] wake_up_process+0x36/0x3b [<ffffffff810575cc>] wake_up_worker+0x24/0x26 [<ffffffff810578bc>] insert_work+0x5c/0x65 [<ffffffff81059982>] __queue_work+0x26c/0x283 [<ffffffff81059999>] ? __queue_work+0x283/0x283 [<ffffffff810599b7>] delayed_work_timer_fn+0x1e/0x20 [<ffffffff8104d3a6>] call_timer_fn+0xdf/0x1be^M [<ffffffff8104d2cc>] ? call_timer_fn+0x5/0x1be [<ffffffff81059999>] ? __queue_work+0x283/0x283 [<ffffffff8104d823>] run_timer_softirq+0x1a4/0x22f^M [<ffffffff8104696d>] __do_softirq+0x17b/0x31b^M [<ffffffff81046d03>] irq_exit+0x42/0x97 [<ffffffff81a08db6>] smp_apic_timer_interrupt+0x37/0x44 [<ffffffff81a07a2f>] apic_timer_interrupt+0x6f/0x80 <EOI> [<ffffffff8100a5d8>] ? default_idle+0x21/0x32 [<ffffffff8100a5d6>] ? default_idle+0x1f/0x32 [<ffffffff8100ac10>] arch_cpu_idle+0xf/0x11 [<ffffffff8107b3a4>] cpu_startup_entry+0x1a3/0x213 [<ffffffff8102a23c>] start_secondary+0x212/0x219 The cause is that the triggers are protected by rcu_read_lock_sched() but the data is dereferenced with rcu_dereference() which expects it to be protected with rcu_read_lock(). The proper reference should be rcu_dereference_sched(). Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-02 23:12:42 -04:00
Steven Rostedt (Red Hat)	fd06a54990	ftrace: Have function graph tracer use global_ops for filtering Commit `4104d326b6` "ftrace: Remove global function list and call function directly" cleaned up the global_ops filtering and made the code simpler. But it left out function graph filtering which also depended on that code. The function graph filtering still needs to use global_ops as the filter otherwise it wont filter at all. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-05-01 23:21:16 -04:00
Oleg Nesterov	927d687480	uprobes/tracing: Fix uprobe_perf_open() on uprobe_apply() failure uprobe_perf_open()->uprobe_apply() can fail, but this error is wrongly ignored. Change uprobe_perf_open() to do uprobe_perf_close() and return the error code in this case. Change uprobe_perf_close() to propogate the error from uprobe_apply() as well, although it should not fail. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>	2014-04-30 19:10:42 +02:00
Oleg Nesterov	ce5f36a58f	uprobes/tracing: Make uprobe_perf_close() visible to uprobe_perf_open() Preparation. Move uprobe_perf_close() up before uprobe_perf_open() to avoid the forward declaration in the next patch and make it readable. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>	2014-04-30 19:10:42 +02:00

1 2 3 4 5 ...

2650 Commits