With branch stack sampling, it is possible to filter by priv levels.
In system-wide mode, that means it is possible to capture only user
level branches. The builtin SW LBR filter needs to disassemble code
based on LBR captured addresses. For that, it needs to know the task
the addresses are associated with. Because of context switches, the
content of the branch stack buffer may contain addresses from
different tasks.
We need a callback on context switch to either flush the branch stack
or save it. This patch adds a new callback in struct pmu which is called
during context switches. The callback is called only when necessary.
That is when a system-wide context has, at least, one event which
uses PERF_SAMPLE_BRANCH_STACK. The callback is never called for
per-thread context.
In this version, the Intel x86 code simply flushes (resets) the LBR
on context switches (fills it with zeroes). Those zeroed branches are
then filtered out by the SW filter.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-11-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the ability to sample taken branches to the
perf_event interface.
The ability to capture taken branches is very useful for all
sorts of analysis. For instance, basic block profiling, call
counts, statistical call graph.
This new capability requires hardware assist and as such may
not be available on all HW platforms. On Intel x86 it is
implemented on top of the Last Branch Record (LBR) facility.
To enable taken branches sampling, the PERF_SAMPLE_BRANCH_STACK
bit must be set in attr->sample_type.
Sampled taken branches may be filtered by type and/or priv
levels.
The patch adds a new field, called branch_sample_type, to the
perf_event_attr structure. It contains a bitmask of filters
to apply to the sampled taken branches.
Filters may be implemented in HW. If the HW filter does not exist
or is not good enough, some arch may also implement a SW filter.
The following generic filters are currently defined:
- PERF_SAMPLE_USER
only branches whose targets are at the user level
- PERF_SAMPLE_KERNEL
only branches whose targets are at the kernel level
- PERF_SAMPLE_HV
only branches whose targets are at the hypervisor level
- PERF_SAMPLE_ANY
any type of branches (subject to priv levels filters)
- PERF_SAMPLE_ANY_CALL
any call branches (may incl. syscall on some arch)
- PERF_SAMPLE_ANY_RET
any return branches (may incl. syscall returns on some arch)
- PERF_SAMPLE_IND_CALL
indirect call branches
Obviously filter may be combined. The priv level bits are optional.
If not provided, the priv level of the associated event are used. It
is possible to collect branches at a priv level different from the
associated event. Use of kernel, hv priv levels is subject to permissions
and availability (hv).
The number of taken branch records present in each sample may vary based
on HW, the type of sampled branches, the executed code. Therefore
each sample contains the number of taken branches it contains.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the jump label enabled case, calling static_key_enabled()
results in a function call. The function returns the results of
a compare, so it really doesn't need the overhead of a full
function call. Let's make it 'static inline' for both the jump
label enabled and disabled cases.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: a.p.zijlstra@chello.nl
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/201202281849.q1SIn1p2023270@int-mx10.intmail.prod.int.phx2.redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If kzalloc() for TYPE_DATA failed on a given cpu, previous chunk
of TYPE_INST will be leaked. Fix it.
Thanks to Peter Zijlstra for suggesting this better solution. It
should work as long as the initial value of the region is all
0's and that's the case of static (per-cpu) memory allocation.
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/1330391978-28070-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Handle pending irqs in irq_startup()
genirq: Unmask oneshot irqs when thread was not woken
This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.
epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.
This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.
__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.
ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.
The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.
In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.
Note:
- we do not have wake_up_all_poll() but wake_up_poll()
is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.
- signalfd_cleanup() uses POLLHUP along with POLLFREE,
we need a couple of simple changes in eventpoll.c to
make sure it can't be "lost".
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So here's a boot tested patch on top of Jason's series that does
all the cleanups I talked about and turns jump labels into a
more intuitive to use facility. It should also address the
various misconceptions and confusions that surround jump labels.
Typical usage scenarios:
#include <linux/static_key.h>
struct static_key key = STATIC_KEY_INIT_TRUE;
if (static_key_false(&key))
do unlikely code
else
do likely code
Or:
if (static_key_true(&key))
do likely code
else
do unlikely code
The static key is modified via:
static_key_slow_inc(&key);
...
static_key_slow_dec(&key);
The 'slow' prefix makes it abundantly clear that this is an
expensive operation.
I've updated all in-kernel code to use this everywhere. Note
that I (intentionally) have not pushed through the rename
blindly through to the lowest levels: the actual jump-label
patching arch facility should be named like that, so we want to
decouple jump labels from the static-key facility a bit.
On non-jump-label enabled architectures static keys default to
likely()/unlikely() branches.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jason Baron <jbaron@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: a.p.zijlstra@chello.nl
Cc: mathieu.desnoyers@efficios.com
Cc: davem@davemloft.net
Cc: ddaney.cavm@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime")
added a new sched:sched_stat_sleeptime tracepoint.
It's broken: the first sample we get on a task might be bad because
of a stale sleep_start value that wasn't reset at the last task switch
because the tracepoint was not active.
It also breaks the existing schedstat samples due to the side
effects of:
- se->statistics.sleep_start = 0;
...
- se->statistics.block_start = 0;
Nor do I see means to fix it without adding overhead to the scheduler
fast path, which I'm not willing to for the sake of redundant
instrumentation.
Most importantly, sleep time information can already be constructed
by tracing context switches and wakeups, and taking the timestamp
difference between the schedule-out, the wakeup and the schedule-in.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:
perf record -e ftrace:function --filter="(ip == mm_*)" ls
The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:
ip == f1[, ]f2 ... || ip != f3[, ]f4 ...
with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:
perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls
The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.
The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.
The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.
The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.
Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Adding FILTER_TRACE_FN event field type for function tracepoint
event, so it can be properly recognized within filtering code.
Currently all fields of ftrace subsystem events share the common
field type FILTER_OTHER. Since the function trace fields need
special care within the filtering code we need to recognize it
properly, hence adding the FILTER_TRACE_FN event type.
Adding filter parameter to the FTRACE_ENTRY macro, to specify the
filter field type for the event.
Link: http://lkml.kernel.org/r/1329317514-8131-7-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Adding perf registration support for the ftrace function event,
so it is now possible to register it via perf interface.
The perf_event struct statically contains ftrace_ops as a handle
for function tracer. The function tracer is registered/unregistered
in open/close actions.
To be efficient, we enable/disable ftrace_ops each time the traced
process is scheduled in/out (via TRACE_REG_PERF_(ADD|DELL) handlers).
This way tracing is enabled only when the process is running.
Intentionally using this way instead of the event's hw state
PERF_HES_STOPPED, which would not disable the ftrace_ops.
It is now possible to use function trace within perf commands
like:
perf record -e ftrace:function ls
perf stat -e ftrace:function ls
Allowed only for root.
Link: http://lkml.kernel.org/r/1329317514-8131-6-git-send-email-jolsa@redhat.com
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Adding FTRACE_ENTRY_REG macro so particular ftrace entries
could specify registration function and thus become accesible
via perf.
This will be used in upcomming patch for function trace.
Link: http://lkml.kernel.org/r/1329317514-8131-5-git-send-email-jolsa@redhat.com
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.
The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.
Link: http://lkml.kernel.org/r/1329317514-8131-4-git-send-email-jolsa@redhat.com
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.
The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the event.
The open/close actions are invoked for each tracepoint user when
opening/closing the event.
Link: http://lkml.kernel.org/r/1329317514-8131-3-git-send-email-jolsa@redhat.com
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.
Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.
When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.
The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.
Adding 3 inline functions:
ftrace_function_local_disable/ftrace_function_local_enable
- enable/disable the ftrace_ops on current cpu
ftrace_function_local_disabled
- get disabled ftrace_ops::disabled value for current cpu
Link: http://lkml.kernel.org/r/1329317514-8131-2-git-send-email-jolsa@redhat.com
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If more than one __print_*() function is used in a tracepoint
(__print_flags(), __print_symbols(), etc), then the temp seq buffer will
not be zero on entry. Using the temp seq buffer's length to know if
data has been printed or not in the current function is incorrect and
may produce incorrect results.
Currently, no in-tree tracepoint causes this bug, but new ones may
be created.
Cc: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If __print_flags() is used after another __print_*() function, the
temp seq_file buffer will not be empty on entry, and the delimiter will
be printed even though there's just one field. We get something like:
|S
instead of just:
S
This is because the length of the temp seq buffer is used to determine
if the delimiter is printed or not. But this algorithm fails when
the seq buffer is not empty on entry, and the delimiter will be printed
because it thinks that a previous field was already printed.
Link: http://lkml.kernel.org/r/1329650167-480655-1-git-send-email-avagin@openvz.org
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Assorted fixes, sat in -next for a week or so...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ocfs2: deal with wraparounds of i_nlink in ocfs2_rename()
vfs: fix compat_sys_stat() handling of overflows in st_nlink
quota: Fix deadlock with suspend and quotas
vfs: Provide function to get superblock and wait for it to thaw
vfs: fix panic in __d_lookup() with high dentry hashtable counts
autofs4 - fix lockdep splat in autofs
vfs: fix d_inode_lookup() dentry ref leak
An interrupt might be pending when irq_startup() is called, but the
startup code does not invoke the resend logic. In some cases this
prevents the device from issuing another interrupt which renders the
device non functional.
Call the resend function in irq_startup() to keep things going.
Reported-and-tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When the primary handler of an interrupt which is marked IRQ_ONESHOT
returns IRQ_HANDLED or IRQ_NONE, then the interrupt thread is not
woken and the unmask logic of the interrupt line is never
invoked. This keeps the interrupt masked forever.
This was not noticed as most IRQ_ONESHOT users wake the thread
unconditionally (usually because they cannot access the underlying
device from hard interrupt context). Though this behaviour was nowhere
documented and not necessarily intentional. Some drivers can avoid the
thread wakeup in certain cases and run into the situation where the
interrupt line s kept masked.
Handle it gracefully.
Reported-and-tested-by: Lothar Wassmann <lw@karo-electronics.de>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When the number of dentry cache hash table entries gets too high
(2147483648 entries), as happens by default on a 16TB system, use of a
signed integer in the dcache_init() initialization loop prevents the
dentry_hashtable from getting initialized, causing a panic in
__d_lookup(). Fix this in dcache_init() and similar areas.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Allow bint param accept nul values, just do same as bool param.
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.
The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107
Link: http://lkml.kernel.org/r/1322600880.1534.347.camel@localhost.localdomain
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a printk.console trace point to record any printk
messages into the trace, regardless of the current
console loglevel. This can help correlate (existing)
printk debugging with other tracing.
Link: http://lkml.kernel.org/r/1322161388.5366.54.camel@jlt3.sipsolutions.net
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Actually, sched_switch function tracer is merged into wakeup/wakeup_rt
Update 'mini-HOWTO' for ftrace(Kernel function tracer).
If we want to trace "sched:sched_switch" to trace sched_switch func,
We may utilize event option.(e.g: trace-cmd list -e | grep sched)
This patch is based on Linux-3.3.rc2-SMP-PREEMPT
Link: http://lkml.kernel.org/r/1328695537-15081-1-git-send-email-geunsik.lim@gmail.com
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Says Jens:
"Time to push off some of the pending items. I really wanted to wait
until we had the regression nailed, but alas it's not quite there yet.
But I'm very confident that it's "just" a missing expire on exit, so
fix from Tejun should be fairly trivial. I'm headed out for a week on
the slopes.
- Killing the barrier part of mtip32xx. It doesn't really support
barriers, and it doesn't need them (writes are fully ordered).
- A few fixes from Dan Carpenter, preventing overflows of integer
multiplication.
- A fixup for loop, fixing a previous commit that didn't quite solve
the partial read problem from Dave Young.
- A bio integer overflow fix from Kent Overstreet.
- Improvement/fix of the door "keep locked" part of the cdrom shared
code from Paolo Benzini.
- A few cfq fixes from Shaohua Li.
- A fix for bsg sysfs warning when removing a file it did not create
from Stanislaw Gruszka.
- Two fixes for floppy from Vivek, preventing a crash.
- A few block core fixes from Tejun. One killing the over-optimized
ioc exit path, cleaning that up nicely. Two others fixing an oops
on elevator switch, due to calling into the scheduler merge check
code without holding the queue lock."
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix lockdep warning on io_context release put_io_context()
relay: prevent integer overflow in relay_open()
loop: zero fill bio instead of return -EIO for partial read
bio: don't overflow in bio_get_nr_vecs()
floppy: Fix a crash during rmmod
floppy: Cleanup disk->queue before caling put_disk() if add_disk() was never called
cdrom: move shared static to cdrom_device_info
bsg: fix sysfs link remove warning
block: don't call elevator callbacks for plug merges
block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions
mtip32xx: removed the irrelevant argument of mtip_hw_submit_io() and the unused member of struct driver_data
block: strip out locking optimization in put_io_context()
cdrom: use copy_to_user() without the underscores
block: fix ioc locking warning
block: fix NULL icq_cache reference
block,cfq: change code order
Reflect the change in the soft and hard lockup thresholds and
their relation to the frequency of the hrtimer and NMI events in
the code comments. While at it, remove references to files that
do not exist anymore.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1328827342-6253-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix double start/stop in x86_pmu_start()
perf evsel: Fix an issue where perf report fails to show the proper percentage
perf tools: Fix prefix matching for kernel maps
perf tools: Fix perf stack to non executable on x86_64
perf: Remove deprecated WARN_ON_ONCE()
"subbuf_size" and "n_subbufs" come from the user and they need to be
capped to prevent an integer overflow.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The following patch fixes a bug introduced by the following
commit:
e050e3f0a7 ("perf: Fix broken interrupt rate throttling")
The patch caused the following warning to pop up depending on
the sampling frequency adjustments:
------------[ cut here ]------------
WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
It was caused by the following call sequence:
perf_adjust_freq_unthr_context.part() {
stop()
if (delta > 0) {
perf_adjust_period() {
if (period > 8*...) {
stop()
...
start()
}
}
}
start()
}
Which caused a double start and a double stop, thus triggering
the assert in x86_pmu_start().
The patch fixes the problem by avoiding the double calls. We
pass a new argument to perf_adjust_period() to indicate whether
or not the event is already stopped. We can't just remove the
start/stop from that function because it's called from
__perf_event_overflow where the event needs to be reloaded via a
stop/start back-toback call.
The patch reintroduces the assertion in x86_pmu_start() which
was removed by commit:
84f2b9b ("perf: Remove deprecated WARN_ON_ONCE()")
In this second version, we've added calls to disable/enable PMU
during unthrottling or frequency adjustment based on bug report
of spurious NMI interrupts from Eric Dumazet.
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: markus@trippelsdorf.de
Cc: paulus@samba.org
Link: http://lkml.kernel.org/r/20120207133956.GA4932@quad
[ Minor edits to the changelog and to the code ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
put_io_context() performed a complex trylock dancing to avoid
deferring ioc release to workqueue. It was also broken on UP because
trylock was always assumed to succeed which resulted in unbalanced
preemption count.
While there are ways to fix the UP breakage, even the most
pathological microbench (forced ioc allocation and tight fork/exit
loop) fails to show any appreciable performance benefit of the
optimization. Strip it out. If there turns out to be workloads which
are affected by this change, simpler optimization from the discussion
thread can be applied later.
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <1328514611.21268.66.camel@sli10-conroe>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
So that we can get the perf bench exec stack fixes and then apply the
remaining fix for the files added after what is in perf/urgent.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Three power management regression fixes, one for a recent regression introcuded
by the freezer changes during the 3.3 merge window and two for regressions
in cpuidle (resulting from PM QoS changes) and in the hibernate user space
interface, both introduced during the 3.2 development cycle.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJPLbiPAAoJEKhOf7ml8uNscvwQAJYYhSBL+ouK8ERS0OLkeEoB
k4O1Ap0hb5Kv54Sr85WKEm5zGRDJXUxlWeMklo9K/fvs04CU1gsBb8jhDdbZ2ovE
rnyybPjfieExQbLxX6nYIP4qKMLtnZvHhHpafuDSUz0RWq/7sCTiFI2htNj97gGu
DzXYpeePFgvzG6AaznywWkvNdXoQfmsTC0adDrXWcuKXnNrH6h8o/OIB+pO70Szw
gmU8SjVGGQjrlnuQ+Ku4WqbSyXs1bXlUkyTHJilg6CNJySrA/LUHhKPrRnP1i3Hu
LxX/rsrTqohhD1tz1qQOpnMiu86FSez+UVA65b2cF3EqZbNROY2+O1/V+OlczKYy
V9Q3rk+J4uRJtnL8DEgcniMGrRsjyle5USN5KDX50BkrC56h3mZirnEu1yaiMIJn
K8NWI/4JdK7JbA6f2hXuPuesmudSP4uo8vuUzKthEUi88QReYXYSMcz/Fy/G9z8n
JW7PimC5OmeTwYIqBcjZf+8j/1u6cHaEkvjPAJhIUgCR/ZVi6VFySnUByDD6JKTJ
bQcUSqZZ8TvEc4A6JjG18/QfmWIZMErfuG0WAKb8sqtXoPkHKR/XXjbaXof9Oppn
nRS5iJUaZGY4YivSHZZOFAk24ThqKx5ZK3qXq/dBbj9JwtJdc+++b9f0RwXUHjd9
ECoM3bFtO8ewINmZ7wRQ
=EKGs
-----END PGP SIGNATURE-----
Merge tag 'pm-fixes-for-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Power management fixes for 3.3-rc3
Three power management regression fixes, one for a recent regression introcuded
by the freezer changes during the 3.3 merge window and two for regressions
in cpuidle (resulting from PM QoS changes) and in the hibernate user space
interface, both introduced during the 3.2 development cycle.
They include:
* Two hibernate (s2disk) regression fixes from Srivatsa S. Bhat (for
regressions introduced during the 3.3 merge window and during the 3.2
development cycle).
* A cpuidle fix from Venki Pallipadi for a regression resulting from PM QoS
changes during the 3.2 development cycle causing cpuidle to work incorrectly
for CONFIG_PM unset.
* tag 'pm-fixes-for-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / QoS: CPU C-state breakage with PM Qos change
PM / Freezer: Thaw only kernel threads if freezing of kernel threads fails
PM / Hibernate: Thaw kernel threads in SNAPSHOT_CREATE_IMAGE ioctl path
If freezing of kernel threads fails, we are expected to automatically
thaw tasks in the error recovery path. However, at times, we encounter
situations in which we would like the automatic error recovery path
to thaw only the kernel threads, because we want to be able to do
some more cleanup before we thaw userspace. Something like:
error = freeze_kernel_threads();
if (error) {
/* Do some cleanup */
/* Only then thaw userspace tasks*/
thaw_processes();
}
An example of such a situation is where we freeze/thaw filesystems
during suspend/hibernation. There, if freezing of kernel threads
fails, we would like to thaw the frozen filesystems before thawing
the userspace tasks.
So, modify freeze_kernel_threads() to thaw only kernel threads in
case of freezing failure. And change suspend_freeze_processes()
accordingly. (At the same time, let us also get rid of the rather
cryptic usage of the conditional operator (:?) in that function.)
[rjw: In fact, this patch fixes a regression introduced during the
3.3 merge window, because without it thaw_processes() may be called
before swsusp_free() in some situations and that may lead to massive
memory allocation failures.]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
In function pre_handler_kretprobe(), the allocated kretprobe_instance
object will get leaked if the entry_handler callback returns non-zero.
This may cause all the preallocated kretprobe_instance objects exhausted.
This issue can be reproduced by changing
samples/kprobes/kretprobe_example.c to probe "mutex_unlock". And the fix
is straightforward: just put the allocated kretprobe_instance object back
onto the free_instances list.
[akpm@linux-foundation.org: use raw_spin_lock/unlock]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The __raise_softirq_irqoff() contains a tracepoint. As tracepoints in headers
can cause issues, and not to mention, bloats the kernel when they are
in a static inline, it is best to move the function that contains the
tracepoint out of the header and into softirq.c.
Link: http://lkml.kernel.org/r/20120118120711.GB14863@elte.hu
Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.
The set_ftrace_filter interface returns error in case the filter
did not match:
# echo krava > set_ftrace_filter
bash: echo: write error: Invalid argument
Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.
Link: http://lkml.kernel.org/r/1325495060-6402-2-git-send-email-jolsa@redhat.com
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This fixes the race in process_vm_core found by Oleg (see
http://article.gmane.org/gmane.linux.kernel/1235667/
for details).
This has been updated since I last sent it as the creation of the new
mm_access() function did almost exactly the same thing as parts of the
previous version of this patch did.
In order to use mm_access() even when /proc isn't enabled, we move it to
kernel/fork.c where other related process mm access functions already
are.
Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bugs, x86: Fix printk levels for panic, softlockups and stack dumps
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf top: Fix number of samples displayed
perf tools: Fix strlen() bug in perf_event__synthesize_event_type()
perf tools: Fix broken build by defining _GNU_SOURCE in Makefile
x86/dumpstack: Remove unneeded check in dump_trace()
perf: Fix broken interrupt rate throttling
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW
sched: Fix ancient race in do_exit()
sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug
sched/s390: Fix compile error in sched/core.c
sched: Fix rq->nr_uninterruptible update race
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/reboot: Remove VersaLogic Menlow reboot quirk
x86/reboot: Skip DMI checks if reboot set by user
x86: Properly parenthesize cmpxchg() macro arguments
In the SNAPSHOT_CREATE_IMAGE ioctl, if the call to hibernation_snapshot()
fails, the frozen tasks are not thawed.
And in the case of success, if we happen to exit due to a successful freezer
test, all tasks (including those of userspace) are thawed, whereas actually
we should have thawed only the kernel threads at that point. Fix both these
issues.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@vger.kernel.org
Commit 2aede851dd
PM / Hibernate: Freeze kernel threads after preallocating memory
introduced a mechanism by which kernel threads were frozen after
the preallocation of hibernate image memory to avoid problems with
frozen kernel threads not responding to memory freeing requests.
However, it overlooked the s2disk code path in which the
SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
which caused freeze_workqueues_begin() to BUG(), because it saw
that worqueues had been already frozen.
Although in principle this issue might be addressed by removing
the relevant BUG_ON() from freeze_workqueues_begin(), that would
reintroduce the very problem that commit 2aede851dd
attempted to avoid into that particular code path. For this reason,
to fix the issue at hand, introduce thaw_kernel_threads() and make
the SNAPSHOT_FREE ioctl execute it.
Special thanks to Srivatsa S. Bhat for detailed analysis of the
problem.
Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: stable@kernel.org