Commit Graph

21487 Commits

Author SHA1 Message Date
Steven Rostedt (Red Hat)
b7ffffbb46 ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.

When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:

	CPU 0			CPU 1
	-----			-----
   start function tracing
   set text to read-write
			     load_module
			     add functions to ftrace
			     set module text read-only

   update all functions to callbacks
   modify module functions too
   < Can't it's read-only >

When this happens, ftrace detects the issue and disables itself till the
next reboot.

To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.

Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.

Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
Chuyu Hu
05a724bd44 tracing: Fix comment to use tracing_on over tracing_enable
The file tracing_enable is obsolete and does not exist anymore. Replace
the comment that references it with the proper tracing_on file.

Link: http://lkml.kernel.org/r/1450787141-45544-1-git-send-email-chuhu@redhat.com

Signed-off-by: Chuyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23 14:27:25 -05:00
Steven Rostedt (Red Hat)
97e9b4fca5 ftrace: Clean up ftrace_module_init() code
The start and end variables were only used when ftrace_module_init() was
split up into multiple functions. No need to keep them around after the
merger.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23 14:27:23 -05:00
Abel Vesa
b6b71f66a1 ftrace: Join functions ftrace_module_init() and ftrace_init_module()
Simple cleanup. No need for two functions here.
The whole work can simply be done inside 'ftrace_module_init'.

Link: http://lkml.kernel.org/r/1449067197-5718-1-git-send-email-abelvesa@linux.com

Signed-off-by: Abel Vesa <abelvesa@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23 14:27:22 -05:00
Julia Lawall
27dff4e041 bpf: Constify bpf_verifier_ops structure
This bpf_verifier_ops structure is never modified, like the other
bpf_verifier_ops structures, so declare it as const.

Done with the help of Coccinelle.

Link: http://lkml.kernel.org/r/1449855359-13724-1-git-send-email-Julia.Lawall@lip6.fr

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23 14:27:19 -05:00
Steven Rostedt (Red Hat)
c68c0fa293 ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too
Jiri Olsa noted that the change to replace the control_ops did not update
the trampoline for when running perf on a single CPU and with CONFIG_PREEMPT
disabled (where dynamic ops, like perf, can use trampolines directly). The
result was that perf function could be called when RCU is not watching as
well as not handle the ftrace_local_disable().

Modify the ftrace_ops_get_func() to also check the RCU and PER_CPU ops flags
and use the recursive function if they are set. The recursive function is
modified to check those flags and execute the appropriate checks if they are
set.

Link: http://lkml.kernel.org/r/20151201134213.GA14155@krava.brq.redhat.com

Reported-by: Jiri Olsa <jolsa@redhat.com>
Patch-fixed-up-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23 14:27:19 -05:00
Steven Rostedt (Red Hat)
ba27f2bc73 ftrace: Remove use of control list and ops
Currently perf has its own list function within the ftrace infrastructure
that seems to be used only to allow for it to have per-cpu disabling as well
as a check to make sure that it's not called while RCU is not watching. It
uses something called the "control_ops" which is used to iterate over ops
under it with the control_list_func().

The problem is that this control_ops and control_list_func unnecessarily
complicates the code. By replacing FTRACE_OPS_FL_CONTROL with two new flags
(FTRACE_OPS_FL_RCU and FTRACE_OPS_FL_PER_CPU) we can remove all the code
that is special with the control ops and add the needed checks within the
generic ftrace_list_func().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23 14:27:18 -05:00
Steven Rostedt (Red Hat)
030f4e1cb8 ftrace: Fix output of enabled_functions for showing tramp
When showing all tramps registered to a ftrace record in the file
enabled_functions, it exits the loop with ops == NULL. But then it is
suppose to show the function on the ops->trampoline and
add_trampoline_func() is called with the given ops. But because ops is now
NULL (to exit the loop), it always shows the static trampoline instead of
the one that is really registered to the record.

The call to add_trampoline_func() that shows the trampoline for the given
ops needs to be called at every iteration.

Fixes: 39daa7b9e8 "ftrace: Show all tramps registered to a record on ftrace_bug()"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23 14:27:17 -05:00
Li Bin
b8ec330a63 ftrace: Fix a typo in comment
s/ARCH_SUPPORT_FTARCE_OPS/ARCH_SUPPORTS_FTRACE_OPS/

Link: http://lkml.kernel.org/r/1448879016-8659-1-git-send-email-huawei.libin@huawei.com

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-23 14:26:51 -05:00
Steven Rostedt (Red Hat)
39daa7b9e8 ftrace: Show all tramps registered to a record on ftrace_bug()
When an anomaly is detected in the function call modification code,
ftrace_bug() is called to disable function tracing as well as give any
information that may help debug the problem. Currently, only the first found
trampoline that is attached to the failed record is reported. Instead, show
all trampolines that are hooked to it.

Also, not only show the ops pointer but also report the function it calls.

While at it, add this info to the enabled_functions debug file too.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-25 16:04:59 -05:00
Steven Rostedt (Red Hat)
b05086c77a ftrace: Add variable ftrace_expected for archs to show expected code
When an anomaly is found while modifying function code, ftrace_bug() is
called which disables the function tracing infrastructure and reports
information about what failed. If the code that is to be replaced does not
match what is expected, then actual code is shown. Currently there is no
arch generic way to show what was expected.

Add a new variable pointer calld ftrace_expected that the arch code can set
to point to what it expected so that ftrace_bug() can report the actual text
as well as the text that was expected to be there.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-25 15:24:16 -05:00
Steven Rostedt (Red Hat)
02a392a043 ftrace: Add new type to distinguish what kind of ftrace_bug()
The ftrace function hook utility has several internal checks to make sure
that whatever it modifies is exactly what it expects to be modifying. This
is essential as modifying running code can be extremely dangerous to the
system.

When an anomaly is detected, ftrace_bug() is called which sends a splat to
the console and disables function tracing. There's some extra information
that is printed to help diagnose the issue.

One thing that is missing though is output of what ftrace was doing at the
time of the crash. Was it updating a call site or perhaps converting a call
site to a nop? A new global enum variable is created to state what ftrace
was doing at the time of the anomaly, and this is reported in ftrace_bug().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-25 15:24:15 -05:00
Tom Zanussi
4e4a4d7570 tracing: Update cond flag when enabling or disabling a trigger
When a trigger is enabled, the cond flag should be set beforehand,
otherwise a trigger that's expecting to process a trace record
(e.g. one with post_trigger set) could be invoked without one.

Likewise a trigger's cond flag should be reset after it's disabled,
not before.

Link: http://lkml.kernel.org/r/a420b52a67b1c2d3cab017914362d153255acb99.1448303214.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-25 15:24:14 -05:00
Steven Rostedt (Red Hat)
4239c38fe0 ring-buffer: Process commits whenever moving to a new page.
When crossing over to a new page, commit the current work. This will allow
readers to get data with less latency, and also simplifies the work to get
timestamps working for interrupted events.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-25 15:24:05 -05:00
Steven Rostedt (Red Hat)
70004986ff ring-buffer: Remove redundant update of page timestamp
The first commit of a buffer page updates the timestamp of that page. No
need to have the update to the next page add the timestamp too. It will only
be replaced by the first commit on that page anyway.

Only update to a page if it contains an event.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24 09:29:16 -05:00
Steven Rostedt (Red Hat)
8573636ea7 ring-buffer: Use READ_ONCE() for most tail_page access
As cpu_buffer->tail_page may be modified by interrupts at almost any time,
the flow of logic is very important. Do not let gcc get smart with
re-reading cpu_buffer->tail_page by adding READ_ONCE() around most of its
accesses.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24 09:29:15 -05:00
Steven Rostedt (Red Hat)
bd1b7cd360 ring-buffer: Put back the length if crossed page with add_timestamp
Commit fcc742eaad "ring-buffer: Add event descriptor to simplify passing
data" added a descriptor that holds various data instead of passing around
several variables through parameters. The problem was that one of the
parameters was modified in a function and the code was designed not to have
an effect on that modified  parameter. Now that the parameter is a
descriptor and any modifications to it are non-volatile, the size of the
data could be unnecessarily expanded.

Remove the extra space added if a timestamp was added and the event went
across the page.

Cc: stable@vger.kernel.org # 4.3+
Fixes: fcc742eaad "ring-buffer: Add event descriptor to simplify passing data"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24 09:27:25 -05:00
Steven Rostedt (Red Hat)
b81f472a20 ring-buffer: Update read stamp with first real commit on page
Do not update the read stamp after swapping out the reader page from the
write buffer. If the reader page is swapped out of the buffer before an
event is written to it, then the read_stamp may get an out of date
timestamp, as the page timestamp is updated on the first commit to that
page.

rb_get_reader_page() only returns a page if it has an event on it, otherwise
it will return NULL. At that point, check if the page being returned has
events and has not been read yet. Then at that point update the read_stamp
to match the time stamp of the reader page.

Cc: stable@vger.kernel.org # 2.6.30+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24 09:23:17 -05:00
Vitaly Kuznetsov
7625b3a000 kernel/panic.c: turn off locks debug before releasing console lock
Commit 08d78658f3 ("panic: release stale console lock to always get the
logbuf printed out") introduced an unwanted bad unlock balance report when
panic() is called directly and not from OOPS (e.g.  from out_of_memory()).
The difference is that in case of OOPS we disable locks debug in
oops_enter() and on direct panic call nobody does that.

Fixes: 08d78658f3 ("panic: release stale console lock to always get the logbuf printed out")
Reported-by: kernel test robot <ying.huang@linux.intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Baoquan He <bhe@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20 16:17:32 -08:00
Richard Weinberger
9d8a765211 kernel/signal.c: unexport sigsuspend()
sigsuspend() is nowhere used except in signal.c itself, so we can mark it
static do not pollute the global namespace.

But this patch is more than a boring cleanup patch, it fixes a real issue
on UserModeLinux.  UML has a special console driver to display ttys using
xterm, or other terminal emulators, on the host side.  Vegard reported
that sometimes UML is unable to spawn a xterm and he's facing the
following warning:

  WARNING: CPU: 0 PID: 908 at include/linux/thread_info.h:128 sigsuspend+0xab/0xc0()

It turned out that this warning makes absolutely no sense as the UML
xterm code calls sigsuspend() on the host side, at least it tries.  But
as the kernel itself offers a sigsuspend() symbol the linker choose this
one instead of the glibc wrapper.  Interestingly this code used to work
since ever but always blocked signals on the wrong side.  Some recent
kernel change made the WARN_ON() trigger and uncovered the bug.

It is a wonderful example of how much works by chance on computers. :-)

Fixes: 68f3f16d9a ("new helper: sigsuspend()")
Signed-off-by: Richard Weinberger <richard@nod.at>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>	[3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20 16:17:32 -08:00
Linus Torvalds
a3d66b5a17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fix from Jiri Kosina:
 "A fix for module handling in case kASLR has been enabled, from Zhou
  Chengming"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: x86: fix relocation computation with kASLR
2015-11-19 12:16:12 -08:00
Linus Torvalds
0ca9b67606 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
 "Mostly updates to the perf tool plus two fixes to the kernel core code:

   - Handle tracepoint filters correctly for inherited events (Peter
     Zijlstra)

   - Prevent a deadlock in perf_lock_task_context (Paul McKenney)

   - Add missing newlines to some pr_err() calls (Arnaldo Carvalho de
     Melo)

   - Print full source file paths when using 'perf annotate --print-line
     --full-paths' (Michael Petlan)

   - Fix 'perf probe -d' when just one out of uprobes and kprobes is
     enabled (Wang Nan)

   - Add compiler.h to list.h to fix 'make perf-tar-src-pkg' generated
     tarballs, i.e. out of tree building (Arnaldo Carvalho de Melo)

   - Add the llvm-src-base.c and llvm-src-kbuild.c files, generated by
     the 'perf test' LLVM entries, when running it in-tree, to
     .gitignore (Yunlong Song)

   - libbpf error reporting improvements, using a strerror interface to
     more precisely tell the user about problems with the provided
     scriptlet, be it in C or as a ready made object file (Wang Nan)

   - Do not be case sensitive when searching for matching 'perf test'
     entries (Arnaldo Carvalho de Melo)

   - Inform the user about objdump failures in 'perf annotate' (Andi
     Kleen)

   - Improve the LLVM 'perf test' entry, introduce a new ones for BPF
     and kbuild tests to check the environment used by clang to compile
     .c scriptlets (Wang Nan)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro
  tools include: Add compiler.h to list.h
  perf probe: Verify parameters in two functions
  perf session: Add missing newlines to some pr_err() calls
  perf annotate: Support full source file paths for srcline fix
  perf test: Add llvm-src-base.c and llvm-src-kbuild.c to .gitignore
  perf: Fix inherited events vs. tracepoint filters
  perf: Disable IRQs across RCU RS CS that acquires scheduler lock
  perf test: Do not be case sensitive when searching for matching tests
  perf test: Add 'perf test BPF'
  perf test: Enhance the LLVM tests: add kbuild test
  perf test: Enhance the LLVM test: update basic BPF test program
  perf bpf: Improve BPF related error messages
  perf tools: Make fetch_kernel_version() publicly available
  bpf tools: Add new API bpf_object__get_kversion()
  bpf tools: Improve libbpf error reporting
  perf probe: Cleanup find_perf_probe_point_from_map to reduce redundancy
  perf annotate: Inform the user about objdump failures in --stdio
  perf stat: Make stat options global
  perf sched latency: Fix thread pid reuse issue
  ...
2015-11-15 09:36:24 -08:00
Linus Torvalds
051b29f279 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
 "A single fix to prevent math underflow in the numa balancing code"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/numa: Fix math underflow in task_tick_numa()
2015-11-15 09:35:33 -08:00
Linus Torvalds
511601bdbc Merge branches 'irq-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq and timer fixes from Thomas Gleixner:

 - An irq regression fix to restore the wakeup behaviour of chained
   interrupts.

 - A timer fix for a long standing race versus timers scheduled on a
   target cpu which got exposed by recent changes in the workqueue
   implementation.

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/PM: Restore system wake up from chained interrupts

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Use proper base migration in add_timer_on()
2015-11-15 09:30:48 -08:00
Linus Torvalds
0e97606425 This contains three more clean up patches.
One patch is needed to make tracing work without debugfs now that tracing
 uses its own tracefs.
 
 The second is removing an unused variable.
 
 The third is fixing a warning about unused variables when MAX_TRACER is
 not configured. Note, this warning shows up in gcc 6.0, but does not show
 up in gcc 4.9, as it seems that gcc does not complain about constants
 not being used.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWRKA+AAoJEKKk/i67LK/8BZ4H/0UgA8+OpbUmBHQk/RdMRZW2
 BSwO/GuErq3nSKJUjP4f3b4BGdMoFgThIOxR9dvHKwJvGfwpbTivpgR0OqUdFKAZ
 1FyUT/oNM0ySNGpZFGuMQ9hvR+deeUvvyPhosjPWlxZLSnRSsEzTlePUqikX1VY6
 hEQj+ulK8J0iLlHMDYLJ5FkQ10+owrI2vmT8BU1O8KZN8EBvhoUx4EfJewD63/pa
 Gzjm6C9RpVb6ks555NmwFzLeN1Yd5X4n5m2CQFrIpzBgRlibWIGCPMVe7OgNawMu
 wONNvhq9zLfo3pftS092lek4iPgEw7ypl6iEgBxDzqTIAjSxUqf+N+1xIbOCcOE=
 =GikR
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull trace cleanups from Steven Rostedt:
 "This contains three more clean up patches.

  One patch is needed to make tracing work without debugfs now that
  tracing uses its own tracefs.

  The second is removing an unused variable.

  The third is fixing a warning about unused variables when MAX_TRACER
  is not configured.  Note, this warning shows up in gcc 6.0, but does
  not show up in gcc 4.9, as it seems that gcc does not complain about
  constants not being used"

* tag 'trace-v4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: #ifdef out uses of max trace when CONFIG_TRACER_MAX_TRACE is not set
  tracing: Remove unused ftrace_cpu_disabled per cpu variable
  tracing: Make tracing work when debugfs is not configured in
2015-11-12 16:22:54 -08:00
Linus Torvalds
3370b69eb0 Four changes:
- x86: work around two nasty cases where a benign exception occurs while
 another is being delivered.  The endless stream of exceptions causes an
 infinite loop in the processor, which not even NMIs or SMIs can interrupt;
 in the virt case, there is no possibility to exit to the host either.
 
 - x86: support for Skylake per-guest TSC rate.  Long supported by AMD,
 the patches mostly move things from there to common arch/x86/kvm/ code.
 
 - generic: remove local_irq_save/restore from the guest entry and exit
 paths when context tracking is enabled.  The patches are a few months
 old, but we discussed them again at kernel summit.  Andy will pick up
 from here and, in 4.5, try to remove it from the user entry/exit paths.
 
 - PPC: Two bug fixes, see merge commit 370289756b for details.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWRFb0AAoJEL/70l94x66DjjMH/31jr8d119MW0uv2x+03+wRq
 6dbJ8tjQ8grvBRExKvLsUVjDmHlhCa1BQl5qjCsyYhX9UeAf4NQOmoEFpq+YTLxh
 Ctveyn+yiZWC7qxbQDmauiQ4JCOp+W9ial782iqw5+ouQMajGOffq5WrojCa2ZNF
 jI278JgdHJLrKj/uie//WBu3V7MJY5Apc3p4zatnSYFSQ3MA0sxl4r4zIrwOa5qs
 23ZeeoqbP4sHh4X5wL/30Y6XFSCHj0qoYHHyAgzLi0PCMvBdt4DrAFUPDG/Rhlv6
 o1WB/kcUfcz3DtBX85wfSOMuw0nF6patWhWv07R/3EIbYoz3dKvp9d6ORYgXqlY=
 =Um9M
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second batch of kvm updates from Paolo Bonzini:
 "Four changes:

   - x86: work around two nasty cases where a benign exception occurs
     while another is being delivered.  The endless stream of exceptions
     causes an infinite loop in the processor, which not even NMIs or
     SMIs can interrupt; in the virt case, there is no possibility to
     exit to the host either.

   - x86: support for Skylake per-guest TSC rate.  Long supported by
     AMD, the patches mostly move things from there to common
     arch/x86/kvm/ code.

   - generic: remove local_irq_save/restore from the guest entry and
     exit paths when context tracking is enabled.  The patches are a few
     months old, but we discussed them again at kernel summit.  Andy
     will pick up from here and, in 4.5, try to remove it from the user
     entry/exit paths.

   - PPC: Two bug fixes, see merge commit 370289756b for details"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: x86: rename update_db_bp_intercept to update_bp_intercept
  KVM: svm: unconditionally intercept #DB
  KVM: x86: work around infinite loop in microcode when #AC is delivered
  context_tracking: avoid irq_save/irq_restore on guest entry and exit
  context_tracking: remove duplicate enabled check
  KVM: VMX: Dump TSC multiplier in dump_vmcs()
  KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC
  KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded
  KVM: VMX: Enable and initialize VMX TSC scaling
  KVM: x86: Use the correct vcpu's TSC rate to compute time scale
  KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()
  KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()
  KVM: x86: Replace call-back compute_tsc_offset() with a common function
  KVM: x86: Replace call-back set_tsc_khz() with a common function
  KVM: x86: Add a common TSC scaling function
  KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch
  KVM: x86: Collect information for setting TSC scaling ratio
  KVM: x86: declare a few variables as __read_mostly
  KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common
  KVM: PPC: Book3S HV: Don't dynamically split core when already split
  ...
2015-11-12 14:34:06 -08:00
Zhou Chengming
e41b104c7d livepatch: x86: fix relocation computation with kASLR
With kASLR enabled, old_addr provided by patch module is being shifted
accrodingly so that the symbol lookups work. To have module relocations
handled properly as well, the same transformation needs to be perfomed
on relocation address information.

[jkosina@suse.cz: extended / reworded changelog a bit]
Reported-by: Cyril B. <cbay@alwaysdata.com>
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-11-11 17:36:04 +01:00
Linus Torvalds
2df4ee78d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet.

 2) Several spots need to get to the original listner for SYN-ACK
    packets, most spots got this ok but some were not.  Whilst covering
    the remaining cases, create a helper to do this.  From Eric Dumazet.

 3) Missiing check of return value from alloc_netdev() in CAIF SPI code,
    from Rasmus Villemoes.

 4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich.

 5) Use after free in mvneta driver, from Justin Maggard.

 6) Fix race on dst->flags access in dst_release(), from Eric Dumazet.

 7) Add missing ZLIB_INFLATE dependency for new qed driver.  From Arnd
    Bergmann.

 8) Fix multicast getsockopt deadlock, from WANG Cong.

 9) Fix deadlock in btusb, from Kuba Pawlak.

10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6
    counter state.  From Sabrina Dubroca.

11) Fix packet_bind() race, which can cause lost notifications, from
    Francesco Ruggeri.

12) Fix MAC restoration in qlcnic driver during bonding mode changes,
    from Jarod Wilson.

13) Revert bridging forward delay change which broke libvirt and other
    userspace things, from Vlad Yasevich.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
  Revert "bridge: Allow forward delay to be cfgd when STP enabled"
  bpf_trace: Make dependent on PERF_EVENTS
  qed: select ZLIB_INFLATE
  net: fix a race in dst_release()
  net: mvneta: Fix memory use after free.
  net: Documentation: Fix default value tcp_limit_output_bytes
  macvtap: Resolve possible __might_sleep warning in macvtap_do_read()
  mvneta: add FIXED_PHY dependency
  net: caif: check return value of alloc_netdev
  net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA
  drivers: net: xgene: fix RGMII 10/100Mb mode
  netfilter: nft_meta: use skb_to_full_sk() helper
  net_sched: em_meta: use skb_to_full_sk() helper
  sched: cls_flow: use skb_to_full_sk() helper
  netfilter: xt_owner: use skb_to_full_sk() helper
  smack: use skb_to_full_sk() helper
  net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid()
  bpf: doc: correct arch list for supported eBPF JIT
  dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put"
  bonding: fix panic on non-ARPHRD_ETHER enslave failure
  ...
2015-11-10 18:11:41 -08:00
Steven Rostedt
a31d82d85a bpf_trace: Make dependent on PERF_EVENTS
Arnd Bergmann reported:

  In my ARM randconfig tests, I'm getting a build error for
  newly added code in bpf_perf_event_read and bpf_perf_event_output
  whenever CONFIG_PERF_EVENTS is disabled:

  kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read':
  kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member named 'oncpu'
  if (event->oncpu != smp_processor_id() ||
           ^
  kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member named 'pmu'
        event->pmu->count)

  This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT
  is disabled. I'm not sure if that is a configuration we care
  about, otherwise we could prevent this case from occuring by
  adding Kconfig dependencies.

Looking at this further, it's really that UPROBE_EVENT enables PERF_EVENTS.
By just having BPF_EVENTS depend on PERF_EVENTS, then all is fine.

Link: http://lkml.kernel.org/r/4525348.Aq9YoXkChv@wuerfel
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-10 15:40:14 -05:00
Linus Torvalds
264015f8a8 libnvdimm for 4.4:
1/ Add support for the ACPI 6.0 NFIT hot add mechanism to process
    updates of the NFIT at runtime.
 
 2/ Teach the coredump implementation how to filter out DAX mappings.
 
 3/ Introduce NUMA hints for allocations made by the pmem driver, and as
    a side effect all devm allocations now hint their NUMA node by
    default.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWQX2sAAoJEB7SkWpmfYgCWsEQAK7w/xM9zClVY/DDlFJxFtYq
 DZJ4faPj+E3FMTiJIEDzjtRgQvOFE+wmJtntYsCqKH/QZmpnyk9jeT/CbJzEEL2k
 WsAk+qHGLcVUlSb36blwN1RFzYqC+IDYThewJqUvxDbOwL1AbiibbX7gplzZHLhW
 +rj3ScVlSNOPRDgGGpkAeLNNsttuKvsGo7nB/JZopm0tV6g14rSK09wQbVhv6S6T
 Lu7VGYqnJlkteL9YlzRiROf9hW2ZFCMGJz1YZydPTy3aX3hGTBX4w2qvmsPwBIKP
 kW/gCNisVJGk1cZCk4joSJ8i/b3x3fE0zdZ5waivJ5jDvYbUUfyk0KtJkfw207Rl
 14yWitUC6aeVuCeOqXHgsjRi+1QVN9Pg7i49xgGiUN1igQiUYRTgQPWZxDv6Zo/s
 USrLFQBaRd+hJw+dl7A47lJ3mUF96tPCoQb4LCQ7DVsg5U4J2TvqXLH9Gek/CCZ4
 QsMkZDTQlZw4+JEDlzBgg/L7xVty8DadplTADMdjaRhFU3y8zKNJ85Ileokt7KVt
 IsBT4+S5HeZLvinZY95932DwAmFp1DtsyENd1BUXL06ddyvlQrFJ6NQaXji4fuDc
 EVQmMoTAqDujZFupMAux9vkUBDFj/hmaVD5F7j3+MWP87OCritw/IZn+2LgTaKoX
 EmttaYrDr2jJwIaGyw+H
 =a2/L
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "Outside of the new ACPI-NFIT hot-add support this pull request is more
  notable for what it does not contain, than what it does.  There were a
  handful of development topics this cycle, dax get_user_pages, dax
  fsync, and raw block dax, that need more more iteration and will wait
  for 4.5.

  The patches to make devm and the pmem driver NUMA aware have been in
  -next for several weeks.  The hot-add support has not, but is
  contained to the NFIT driver and is passing unit tests.  The coredump
  support is straightforward and was looked over by Jeff.  All of it has
  received a 0day build success notification across 107 configs.

  Summary:

   - Add support for the ACPI 6.0 NFIT hot add mechanism to process
     updates of the NFIT at runtime.

   - Teach the coredump implementation how to filter out DAX mappings.

   - Introduce NUMA hints for allocations made by the pmem driver, and
     as a side effect all devm allocations now hint their NUMA node by
     default"

* tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  coredump: add DAX filtering for FDPIC ELF coredumps
  coredump: add DAX filtering for ELF coredumps
  acpi: nfit: Add support for hot-add
  nfit: in acpi_nfit_init, break on a 0-length table
  pmem, memremap: convert to numa aware allocations
  devm_memremap_pages: use numa_mem_id
  devm: make allocations numa aware by default
  devm_memremap: convert to return ERR_PTR
  devm_memunmap: use devres_release()
  pmem: kill memremap_pmem()
  x86, mm: quiet arch_add_memory()
2015-11-10 12:07:22 -08:00
Chen Gang
e428abbbf6 tracing: #ifdef out uses of max trace when CONFIG_TRACER_MAX_TRACE is not set
tracing_max_lat_fops is used only when TRACER_MAX_TRACE enabled, so also
swith the related code. The related warning with defconfig under x86_64:

    CC      kernel/trace/trace.o
  kernel/trace/trace.c:5466:37: warning: ‘tracing_max_lat_fops’ defined but not used [-Wunused-const-variable]
   static const struct file_operations tracing_max_lat_fops = {

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-10 10:16:05 -05:00
Grygorii Strashko
4717f13373 genirq/PM: Restore system wake up from chained interrupts
Commit e509bd7da1 ("genirq: Allow migration of chained interrupts
by installing default action") breaks PCS wake up IRQ behaviour on
TI OMAP based platforms (dra7-evm).

TI OMAP IRQ wake up configuration:
GIC-irqchip->PCM_IRQ
  |- omap_prcm_register_chain_handler
     |- PRCM-irqchip -> PRCM_IO_IRQ
        |- pcs_irq_chain_handler
           |- pinctrl-irqchip -> PCS_uart1_wakeup_irq

This happens because IRQ PM code (irq/pm.c) is expected to ignore
chained interrupts by default:
  static bool suspend_device_irq(struct irq_desc *desc)
  {
	if (!desc->action || desc->no_suspend_depth)
		return false;
 - it's expected !desc->action = true for chained interrupts;

but, after above change, all chained interrupt descriptors will
have default action handler installed - chained_action.
As result, chained interrupts will be silently disabled during system
suspend.

Hence, fix it by introducing helper function irq_desc_is_chained() and
use it in suspend_device_irq() for chained interrupts identification
and skip them, once detected.

Fixes: e509bd7da1 ("genirq: Allow migration of chained interrupts..")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: <nsekhar@ti.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/1447149492-20699-1-git-send-email-grygorii.strashko@ti.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-10 09:11:31 -05:00
Paolo Bonzini
d0e536d893 context_tracking: avoid irq_save/irq_restore on guest entry and exit
guest_enter and guest_exit must be called with interrupts disabled,
since they take the vtime_seqlock with write_seq{lock,unlock}.
Therefore, it is not necessary to check for exceptions, nor to
save/restore the IRQ state, when context tracking functions are
called by guest_enter and guest_exit.

Split the body of context_tracking_entry and context_tracking_exit
out to __-prefixed functions, and use them from KVM.

Rik van Riel has measured this to speed up a tight vmentry/vmexit
loop by about 2%.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Tested-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:23 +01:00
Paolo Bonzini
f70cd6b07e context_tracking: remove duplicate enabled check
All calls to context_tracking_enter and context_tracking_exit
are already checking context_tracking_is_enabled, except the
context_tracking_user_enter and context_tracking_user_exit
functions left in for the benefit of assembly calls.

Pull the check up to those functions, by making them simple
wrappers around the user_enter and user_exit inline functions.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Tested-by: Rik van Riel <riel@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:22 +01:00
Linus Torvalds
bd4f203e43 Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:
 "We're pretty much done over here - I'm still waiting for a nouveau
  merge so I can cleanly finish up Christoph's dma-mapping rework.

   - bunch of small misc stuff

   - fold abs64() into abs(), remove abs64()

   - new_valid_dev() cleanups

   - binfmt_elf_fdpic feature work"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
  fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries
  fs/stat.c: remove unnecessary new_valid_dev() check
  fs/reiserfs/namei.c: remove unnecessary new_valid_dev() check
  fs/nilfs2/namei.c: remove unnecessary new_valid_dev() check
  fs/ncpfs/dir.c: remove unnecessary new_valid_dev() check
  fs/jfs: remove unnecessary new_valid_dev() checks
  fs/hpfs/namei.c: remove unnecessary new_valid_dev() check
  fs/f2fs/namei.c: remove unnecessary new_valid_dev() check
  fs/ext2/namei.c: remove unnecessary new_valid_dev() check
  fs/exofs/namei.c: remove unnecessary new_valid_dev() check
  fs/btrfs/inode.c: remove unnecessary new_valid_dev() check
  fs/9p: remove unnecessary new_valid_dev() checks
  include/linux/kdev_t.h: old/new_valid_dev() can return bool
  include/linux/kdev_t.h: remove unused huge_valid_dev()
  kmap_atomic_to_page() has no users, remove it
  drivers/scsi/cxgbi: fix build with EXTRA_CFLAGS
  dma: remove external references to dma_supported
  Documentation/sysctl/vm.txt: fix misleading code reference of overcommit_memory
  remove abs64()
  kernel.h: make abs() work with 64-bit types
  ...
2015-11-09 21:05:13 -08:00
Linus Torvalds
50c36504fc Nothing exciting, minor tweaks and cleanups.
Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWP91+AAoJENkgDmzRrbjxBAAQAJDFEKrMmdhyX56R058RFW1q
 pYK3XrHrVMCrJRa80UH6MStvCzkR5yYU7q81XAOfl+/9TR3IIi6EPMIC6wYSyiXC
 JpfnUISEJAuDMYOT19xeFDt2c7oknJnkOM7QWQt6ypY5sGWVHQ3KQUmkqlzaxQ5C
 Oen9CfFttugmmpO6KDCfIxtMvxkQ1LM6SoTAKTu7LamcVsBCp5It2Me9UwGUxADj
 1Phq14U8heJ9ScNYkroutEkWgyZLFJOZExUuNEIMwyooXmWQmZzBiwVwQ72WjstG
 2jj3ZiLucVYvBM4k8qnGnlMR4IkymcYlXD1YJ0X7tvBFnp7UGXFKLt2NSqfOskLC
 2fRPETf4PLHebZeNN/J/WKJ7qKzsBsS49KjFjJ2vm4+P6sScmcDGXw4eMyLTYfnJ
 dRbuRtZpnJV4S1vss/STjehOA8A8/fURXQwb80AUzzEEfmjujZWCMYVhfqO91+kx
 XsbtSciek+Abxyh9Ow9xHgVnMcsXgmZMkpODv4Gjc/4R6Uu6XRSVK04jvkuoLVi5
 t4VC00NK0WY2PFVK3qGYE5ZejPTOu59UGRLwxDqZ0QmXF36Yun9f//hSDWpM10BO
 Ah92OybEnny4tij7/0xz7Krg7u8BQ+at0TAmxrw4Xu9VqbnRqcpJy9Q04e52mwTu
 G6ztYV2tOGMEh5lK2k0S
 =WtDm
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Nothing exciting, minor tweaks and cleanups"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  scripts: [modpost] add new sections to white list
  modpost: Add flag -E for making section mismatches fatal
  params: don't ignore the rest of cmdline if parse_one() fails
  modpost: abort if a module symbol is too long
2015-11-09 15:53:39 -08:00
Andrew Morton
79211c8ed1 remove abs64()
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Dan Williams
85ce230051 Merge branch 'for-4.4/hotplug' into libnvdimm-for-next 2015-11-09 13:29:39 -05:00
Rik van Riel
25b3e5a334 sched/numa: Fix math underflow in task_tick_numa()
The NUMA balancing code implements delays in scanning by
advancing curr->node_stamp beyond curr->se.sum_exec_runtime.

With unsigned math, that creates an underflow, which results
in task_numa_work being queued all the time, even when we
don't want to.

Avoiding the math underflow makes it possible to reduce CPU
overhead in the NUMA balancing code.

Reported-and-tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mgorman@suse.de
Link: http://lkml.kernel.org/r/1446756983-28173-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-09 16:13:27 +01:00
Peter Zijlstra
b71b437eed perf: Fix inherited events vs. tracepoint filters
Arnaldo reported that tracepoint filters seem to misbehave (ie. not
apply) on inherited events.

The fix is obvious; filters are only set on the actual (parent)
event, use the normal pattern of using this parent event for filters.
This is safe because each child event has a reference to it.

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20151102095051.GN17308@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-09 16:13:11 +01:00
Paul E. McKenney
2fd5907775 perf: Disable IRQs across RCU RS CS that acquires scheduler lock
The perf_lock_task_context() function disables preemption across its
RCU read-side critical section because that critical section acquires
a scheduler lock.  If there was a preemption during that RCU read-side
critical section, the rcu_read_unlock() could attempt to acquire scheduler
locks, resulting in deadlock.

However, recent optimizations to expedited grace periods mean that IPI
handlers that execute during preemptible RCU read-side critical sections
can now cause the subsequent rcu_read_unlock() to acquire scheduler locks.
Disabling preemption does nothiing to prevent these IPI handlers from
executing, so these optimizations introduced a deadlock.  In theory,
this deadlock could be avoided by pulling all wakeups and printk()s out
from rnp->lock critical sections, but in practice this would re-introduce
some RCU CPU stall warning bugs.

Given that acquiring scheduler locks entails disabling interrupts, these
deadlocks can be avoided by disabling interrupts (instead of disabling
preemption) across any RCU read-side critical that acquires scheduler
locks and holds them across the rcu_read_unlock().  This commit therefore
makes this change for perf_lock_task_context().

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151104134838.GR29027@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-09 16:13:11 +01:00
Linus Torvalds
ad804a0b2a Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - most of the rest of MM

 - procfs

 - lib/ updates

 - printk updates

 - bitops infrastructure tweaks

 - checkpatch updates

 - nilfs2 update

 - signals

 - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
   dma-debug, dma-mapping, ...

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
  ipc,msg: drop dst nil validation in copy_msg
  include/linux/zutil.h: fix usage example of zlib_adler32()
  panic: release stale console lock to always get the logbuf printed out
  dma-debug: check nents in dma_sync_sg*
  dma-mapping: tidy up dma_parms default handling
  pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
  kexec: use file name as the output message prefix
  fs, seqfile: always allow oom killer
  seq_file: reuse string_escape_str()
  fs/seq_file: use seq_* helpers in seq_hex_dump()
  coredump: change zap_threads() and zap_process() to use for_each_thread()
  coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
  signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
  signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
  signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
  signals: kill block_all_signals() and unblock_all_signals()
  nilfs2: fix gcc uninitialized-variable warnings in powerpc build
  nilfs2: fix gcc unused-but-set-variable warnings
  MAINTAINERS: nilfs2: add header file for tracing
  nilfs2: add tracepoints for analyzing reading and writing metadata files
  ...
2015-11-07 14:32:45 -08:00
Dmitry Safonov
03e88ae6b3 tracing: Remove unused ftrace_cpu_disabled per cpu variable
Since the ring buffer is lockless, there is no need to disable ftrace on
CPU. And no one doing so: after commit 68179686ac ("tracing: Remove
ftrace_disable/enable_cpu()") ftrace_cpu_disabled stays the same after
initialization, nothing changes it.
ftrace_cpu_disabled shouldn't be used by any external module since it
disables only function and graph_function tracers but not any other
tracer.

Link: http://lkml.kernel.org/r/1446836846-22239-1-git-send-email-0x7f454c46@gmail.com

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-07 13:25:14 -05:00
Vitaly Kuznetsov
08d78658f3 panic: release stale console lock to always get the logbuf printed out
In some cases we may end up killing the CPU holding the console lock
while still having valuable data in logbuf. E.g. I'm observing the
following:

- A crash is happening on one CPU and console_unlock() is being called on
  some other.

- console_unlock() tries to print out the buffer before releasing the lock
  and on slow console it takes time.

- in the meanwhile crashing CPU does lots of printk()-s with valuable data
  (which go to the logbuf) and sends IPIs to all other CPUs.

- console_unlock() finishes printing previous chunk and enables interrupts
  before trying to print out the rest, the CPU catches the IPI and never
  releases console lock.

This is not the only possible case: in VT/fb subsystems we have many other
console_lock()/console_unlock() users.  Non-masked interrupts (or
receiving NMI in case of extreme slowness) will have the same result.
Getting the whole console buffer printed out on crash should be top
priority.

[akpm@linux-foundation.org: tweak comment text]
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Baoquan He <bhe@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ben Segall
8639b46139 pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
setpriority(PRIO_USER, 0, x) will change the priority of tasks outside of
the current pid namespace.  This is in contrast to both the other modes of
setpriority and the example of kill(-1).  Fix this.  getpriority and
ioprio have the same failure mode, fix them too.

Eric said:

: After some more thinking about it this patch sounds justifiable.
:
: My goal with namespaces is not to build perfect isolation mechanisms
: as that can get into ill defined territory, but to build well defined
: mechanisms.  And to handle the corner cases so you can use only
: a single namespace with well defined results.
:
: In this case you have found the two interfaces I am aware of that
: identify processes by uid instead of by pid.  Which quite frankly is
: weird.  Unfortunately the weird unexpected cases are hard to handle
: in the usual way.
:
: I was hoping for a little more information.  Changes like this one we
: have to be careful of because someone might be depending on the current
: behavior.  I don't think they are and I do think this make sense as part
: of the pid namespace.

Signed-off-by: Ben Segall <bsegall@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ambrose Feinstein <ambrose@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Minfei Huang
de90a6bcae kexec: use file name as the output message prefix
kexec output message misses the prefix "kexec", when Dave Young split the
kexec code.  Now, we use file name as the output message prefix.

Currently, the format of output message:
[  140.290795] SYSC_kexec_load: hello, world
[  140.291534] kexec: sanity_check_segment_list: hello, world

Ideally, the format of output message:
[   30.791503] kexec: SYSC_kexec_load, Hello, world
[   79.182752] kexec_core: sanity_check_segment_list, Hello, world

Remove the custom prefix "kexec" in output message.

Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Oleg Nesterov
5fa534c987 coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
task_will_free_mem() is wrong in many ways, and in particular the
SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the
coredumping without SIGNAL_GROUP_COREDUMP bit set.

change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if
other CLONE_VM processes can't react to SIGKILL.  Fortunately, at least
oom-kill case if fine; it kills all tasks sharing the same mm, so it
should also kill the process which actually dumps the core.

The change in prepare_signal() is not strictly necessary, it just ensures
that the patch does not bring another subtle behavioural change.  But it
reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Kyle Walker <kwalker@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Stanislav Kozina <skozina@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Oleg Nesterov
2e01fabe67 signals: kill block_all_signals() and unblock_all_signals()
It is hardly possible to enumerate all problems with block_all_signals()
and unblock_all_signals().  Just for example,

1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is
   multithreaded. Another thread can dequeue the signal and force the
   group stop.

2. Even is the caller is single-threaded, it will "stop" anyway. It
   will not sleep, but it will spin in kernel space until SIGCONT or
   SIGKILL.

And a lot more. In short, this interface doesn't work at all, at least
the last 10+ years.

Daniel said:

  Yeah the only times I played around with the DRM_LOCK stuff was when
  old drivers accidentally deadlocked - my impression is that the entire
  DRM_LOCK thing was never really tested properly ;-) Hence I'm all for
  purging where this leaks out of the drm subsystem.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mathias Krause
3824657c52 printk: prevent userland from spoofing kernel messages
The following statement of ABI/testing/dev-kmsg is not quite right:

   It is not possible to inject messages from userspace with the
   facility number LOG_KERN (0), to make sure that the origin of the
   messages can always be reliably determined.

Userland actually can inject messages with a facility of 0 by abusing the
fact that the facility is stored in a u8 data type.  By using a facility
which is a multiple of 256 the assignment of msg->facility in log_store()
implicitly truncates it to 0, i.e.  LOG_KERN, allowing users of /dev/kmsg
to spoof kernel messages as shown below:

The following call...
   # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg
...leads to the following log entry (dmesg -x | tail -n 1):
   user  :emerg : [   66.137758] Kernel panic - not syncing: beer empty

However, this call...
   # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg
...leads to the slightly different log entry (note the kernel facility):
   kern  :emerg : [   74.177343] Kernel panic - not syncing: beer empty

Fix that by limiting the user provided facility to 8 bit right from the
beginning and catch the truncation early.

Fixes: 7ff9554bb5 ("printk: convert byte-buffer to variable-length...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Alex Elder <elder@linaro.org>
Cc: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Dan Streetman
3d9c637f4a module: export param_free_charp()
Change the param_free_charp() function from static to exported.

It is used by zswap in the next patch ("zswap: use charp for zswap param
strings").

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00