linux/kernel/trace
Jiri Olsa 30fb6aa740 ftrace: Fix unregister ftrace_ops accounting
Multiple users of the function tracer can register their functions
with the ftrace_ops structure. The accounting within ftrace will
update the counter on each function record that is being traced.
When the ftrace_ops filtering adds or removes functions, the
function records will be updated accordingly if the ftrace_ops is
still registered.

When a ftrace_ops is removed, the counter of the function records,
that the ftrace_ops traces, are decremented. When they reach zero
the functions that they represent are modified to stop calling the
mcount code.

When changes are made, the code is updated via stop_machine() with
a command passed to the function to tell it what to do. There is an
ENABLE and DISABLE command that tells the called function to enable
or disable the functions. But the ENABLE is really a misnomer as it
should just update the records, as records that have been enabled
and now have a count of zero should be disabled.

The DISABLE command is used to disable all functions regardless of
their counter values. This is the big off switch and is not the
complement of the ENABLE command.

To make matters worse, when a ftrace_ops is unregistered and there
is another ftrace_ops registered, neither the DISABLE nor the
ENABLE command are set when calling into the stop_machine() function
and the records will not be updated to match their counter. A command
is passed to that function that will update the mcount code to call
the registered callback directly if it is the only one left. This
means that the ftrace_ops that is still registered will have its callback
called by all functions that have been set for it as well as the ftrace_ops
that was just unregistered.

Here's a way to trigger this bug. Compile the kernel with
CONFIG_FUNCTION_PROFILER set and with CONFIG_FUNCTION_GRAPH not set:

 CONFIG_FUNCTION_PROFILER=y
 # CONFIG_FUNCTION_GRAPH is not set

This will force the function profiler to use the function tracer instead
of the function graph tracer.

  # cd /sys/kernel/debug/tracing
  # echo schedule > set_ftrace_filter
  # echo function > current_tracer
  # cat set_ftrace_filter
 schedule
  # cat trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 692/68108025   #P:4
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |
      kworker/0:2-909   [000] ....   531.235574: schedule <-worker_thread
           <idle>-0     [001] .N..   531.235575: schedule <-cpu_idle
      kworker/0:2-909   [000] ....   531.235597: schedule <-worker_thread
             sshd-2563  [001] ....   531.235647: schedule <-schedule_hrtimeout_range_clock

  # echo 1 > function_profile_enabled
  # echo 0 > function_porfile_enabled
  # cat set_ftrace_filter
 schedule
  # cat trace
 # tracer: function
 #
 # entries-in-buffer/entries-written: 159701/118821262   #P:4
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |
           <idle>-0     [002] ...1   604.870655: local_touch_nmi <-cpu_idle
           <idle>-0     [002] d..1   604.870655: enter_idle <-cpu_idle
           <idle>-0     [002] d..1   604.870656: atomic_notifier_call_chain <-enter_idle
           <idle>-0     [002] d..1   604.870656: __atomic_notifier_call_chain <-atomic_notifier_call_chain

The same problem could have happened with the trace_probe_ops,
but they are modified with the set_frace_filter file which does the
update at closure of the file.

The simple solution is to change ENABLE to UPDATE and call it every
time an ftrace_ops is unregistered.

Link: http://lkml.kernel.org/r/1323105776-26961-3-git-send-email-jolsa@redhat.com

Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-21 07:09:14 -05:00
..
blktrace.c kernel: Fix files explicitly needing EXPORT_SYMBOL infrastructure 2011-10-31 19:30:05 -04:00
ftrace.c ftrace: Fix unregister ftrace_ops accounting 2011-12-21 07:09:14 -05:00
Kconfig trace events: Update version number reference to new 3.x scheme for EVENT_POWER_TRACING_DEPRECATED 2011-07-25 09:37:21 +02:00
Makefile Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-10-26 17:03:38 +02:00
power-traces.c perf: Clean up power events by introducing new, more generic ones 2011-01-04 08:16:54 +01:00
ring_buffer_benchmark.c tracing: Use NUMA allocation for per-cpu ring buffer pages 2011-06-14 22:04:39 -04:00
ring_buffer.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-10-26 17:03:38 +02:00
rpm-traces.c PM / Runtime: Introduce trace points for tracing rpm_* functions 2011-09-27 22:53:27 +02:00
trace_branch.c tracing: Allow events to share their print functions 2010-05-14 14:20:32 -04:00
trace_clock.c tracing: Add a counter clock for those that do not trust clocks 2011-09-19 11:35:58 -04:00
trace_entries.h tracing: Have dynamic size event stack traces 2011-07-14 16:36:53 -04:00
trace_event_perf.c tracing: New flag to allow non privileged users to use a trace event 2010-11-18 14:37:40 +01:00
trace_events_filter_test.h tracing/filter: Add startup tests for events filter 2011-08-19 14:35:59 -04:00
trace_events_filter.c Merge branch 'perf/urgent' into perf/core 2011-12-06 06:43:49 +01:00
trace_events.c tracing: fix event_subsystem ref counting 2011-12-05 13:28:44 -05:00
trace_export.c tracing: Replace trace_event struct array with pointer array 2011-02-02 21:37:13 -05:00
trace_functions_graph.c tracing: Still trace filtered irq functions when irq trace is disabled 2011-07-07 22:26:27 -04:00
trace_functions.c ftrace: Fix regression of :mod:module function enabling 2011-07-07 11:30:08 -04:00
trace_irqsoff.c Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core 2011-11-11 08:19:37 +01:00
trace_kdb.c kdb,ftdump: Remove reference to internal kdb include 2010-10-22 15:34:11 -05:00
trace_kprobe.c ftrace/kprobes: Fix not to delete probes if in use 2011-10-10 15:13:03 -04:00
trace_mmiotrace.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
trace_nop.c
trace_output.c tracing: Add irq, preempt-count and need resched info to default trace output 2011-11-17 09:58:48 -05:00
trace_output.h tracing: Allow events to share their print functions 2010-05-14 14:20:32 -04:00
trace_printk.c tracing: Clean up tb_fmt to not give faulty compile warning 2011-08-10 20:36:32 -04:00
trace_sched_switch.c tracing: Remove obsolete sched_switch tracer 2011-02-08 17:14:56 -05:00
trace_sched_wakeup.c tracing/latency: Fix header output for latency tracers 2011-11-07 13:48:35 -05:00
trace_selftest_dynamic.c ftrace: Add self-tests for multiple function trace users 2011-05-18 19:24:51 -04:00
trace_selftest.c ftrace: Add self-tests for multiple function trace users 2011-05-18 19:24:51 -04:00
trace_stack.c tracing: Convert to kstrtoul_from_user 2011-06-14 22:48:50 -04:00
trace_stat.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
trace_stat.h
trace_syscalls.c kernel: Add <linux/module.h> to files using it implicitly 2011-10-31 09:20:12 -04:00
trace_workqueue.c jump label: Initialize workqueue tracepoints *before* they are registered 2010-09-22 16:30:03 -04:00
trace.c tracing: Add entries in buffer and total entries to default output header 2011-11-17 11:10:43 -05:00
trace.h tracing: Add irq, preempt-count and need resched info to default trace output 2011-11-17 09:58:48 -05:00