Dump the flags which denote we have detected and/or have applied bug
workarounds to the CPU we're executing on, in a similar manner to the
feature flags.
The advantage is that those are not accumulating over time like the CPU
features.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1403609105-8332-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Both the 32-bit and 64-bit cmpxchg.h header define __HAVE_ARCH_CMPXCHG
and there's ifdeffery which checks it. But since both bitness define it,
we can just as well move it up to the main cmpxchg header and simpify a
bit of code in doing that.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20140711104338.GB17083@pd.tnic
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Flipping the LSB doesn't require four lines of code. This shaves a few
bytes of the generated code, including a branch.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1403183731-15402-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, any NMI is falsely handled by a NMI handler of NMI watchdog
if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
For example, we use external NMI to make system panic to get crash
dump, but in this case, the external NMI is falsely handled do to the
issue.
This commit deals with the issue simply by ignoring CondChgd bit.
Here is explanation in detail.
On x86 NMI watchdog uses performance monitoring feature to
periodically signal NMI each time performance counter gets overflowed.
intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
owner of a given NMI by looking at overflow status bits in
MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
handles the given NMI as its own NMI.
The problem is that the intel_pmu_handle_irq() doesn't distinguish
CondChgd bit from other bits. Unlike the other status bits, CondChgd
bit doesn't represent overflow status for performance counters. Thus,
CondChgd bit cannot be thought of as a mark indicating a given NMI is
NMI watchdog's.
As a result, if CondChgd bit is set, any NMI is falsely handled by the
NMI handler of NMI watchdog. Also, if type of the falsely handled NMI
is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding
action is never performed until CondChgd bit is cleared.
I noticed this behavior on systems with Ivy Bridge processors: Intel
Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
in the beginning at boot. Then the CondChgd bit is immediately cleared
by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
0.
On the other hand, on older processors such as Nehalem, Xeon E7540,
CondChgd bit is not set in the beginning at boot.
I'm not sure about exact behavior of CondChgd bit, in particular when
this bit is set. Although I read Intel System Programmer's Manual to
figure out that, the descriptions I found are:
In 18.9.1:
"The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to
indicate changes to the state of performancmonitoring hardware"
In Table 35-2 IA-32 Architectural MSRs
63 CondChg: status bits of this register has changed.
These are different from the bahviour I see on the actual system as I
explained above.
At least, I think ignoring CondChgd bit should be enough for NMI
watchdog perspective.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver
locked up when doing cpu hotplug.
Because we called set_cyc2ns_scale() from the time_cpufreq_notifier()
unconditionally, it gets called multiple times for each freq change,
instead of only the once, when the tsc_khz value actually changes.
Because it gets called more than once, we run out of cyc2ns data slots
and stall, waiting for a free one, but because we're half way offline,
there's no consumers to free slots.
By placing the call inside the condition that actually changes tsc_khz
we avoid superfluous calls and avoid the problem.
Reported-by: Mauro <registosites@hotmail.com>
Tested-by: Mauro <registosites@hotmail.com>
Fixes: 20d1c86a57 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs")
Cc: <stable@vger.kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Bin Gao <bin.gao@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull x86 fixes from Peter Anvin:
"A pile of fixes related to the VDSO, EFI and 32-bit badsys handling.
It turns out that removing the section headers from the VDSO breaks
gdb, so this puts back most of them. A very simple typo broke
rt_sigreturn on some versions of glibc, with obviously disastrous
results. The rest is pretty much fixes for the corresponding fallout.
The EFI fixes fixes an arithmetic overflow on 32-bit systems and
quiets some build warnings.
Finally, when invoking an invalid system call number on x86-32, we
bypass a bunch of handling, which can make the audit code oops"
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi-pstore: Fix an overflow on 32-bit builds
x86/vdso: Error out in vdso2c if DT_RELA is present
x86/vdso: Move DISABLE_BRANCH_PROFILING into the vdso makefile
x86_32, signal: Fix vdso rt_sigreturn
x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)
x86/vdso: Create .build-id links for unstripped vdso files
x86/vdso: Remove some redundant in-memory section headers
x86/vdso: Improve the fake section headers
x86/vdso2c: Use better macros for ELF bitness
x86/vdso: Discard the __bug_table section
efi: Fix compiler warnings (unused, const, type)
BorisO reports that misc_register() fails often on xen. The current code
unregisters the CPU hotplug notifier in that case. If then a CPU is
offlined and onlined back again, we end up with a second timer running
on that CPU, leading to soft lockups and system hangs.
So let's leave the hotcpu notifier always registered - even if
mce_device_create failed for some cores and never unreg it so that we
can deal with the timer handling accordingly.
Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
routine when one wants to avoid capturing a back trace for current. For
instance if one was previously captured recently.
This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
an NMI to every cpu but current and capture a back trace accordingly.
Patch x86 and sparc to support new routine.
[dzickus@redhat.com: add stub in #else clause]
[dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
[sfr@canb.auug.org.au: undo C99ism]
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit:
commit 6f121e548f
Author: Andy Lutomirski <luto@amacapital.net>
Date: Mon May 5 12:19:34 2014 -0700
x86, vdso: Reimplement vdso.so preparation in build-time C
Contained this obvious typo:
- restorer = VDSO32_SYMBOL(current->mm->context.vdso, rt_sigreturn);
+ restorer = current->mm->context.vdso +
+ selected_vdso32->sym___kernel_sigreturn;
Note the missing 'rt_' in the new code. Fix it.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1eb40ad923acde2e18357ef2832867432e70ac42.1403361010.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The bad syscall nr paths are their own incomprehensible route
through the entry control flow. Rearrange them to work just like
syscalls that return -ENOSYS.
This fixes an OOPS in the audit code when fast-path auditing is
enabled and sysenter gets a bad syscall nr (CVE-2014-4508).
This has probably been broken since Linux 2.6.27:
af0575bba0 i386 syscall audit fast-path
Cc: stable@vger.kernel.org
Cc: Roland McGrath <roland@redhat.com>
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
When invoced for positive values, DIV_ROUND macro defined in
arch/x86/kernel/tsc.c behaves exactly like DIV_ROUND_CLOSEST from
include/linux/kernel.h file, so remove the custom macro in favour
of the shared one.
[ hpa: changed line breaks ]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Link: http://lkml.kernel.org/r/1403143116-21755-1-git-send-email-mina86@mina86.com
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
X86_FEATURE_FXSAVE_LEAK, X86_FEATURE_11AP and
X86_FEATURE_CLFLUSH_MONITOR are not really features but synthetic bits
we use for applying different bug workarounds. Call them what they
really are, and make sure they get the proper cross-CPU behavior (OR
rather than AND).
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1403042783-23278-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This essentially reverts commit:
ecd50f714c ("kprobes, x86: Call exception_enter after kprobes handled")
since it causes build errors with CONFIG_CONTEXT_TRACKING and
that has been made from misunderstandings;
context_track_user_*() don't involve much in interrupt context,
it just returns if in_interrupt() is true.
Instead of changing the do_debug/int3(), this just adds
context_track_user_*() to kprobes blacklist, since those are
still can be called right before kprobes handles int3 and debug
exceptions, and probing those will cause an infinite loop.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20140614064711.7865.45957.stgit@kbuild-fedora.novalocal
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 irq fixes from Ingo Molnar:
"Two changes: a cpu-hotplug/irq race fix, plus a HyperV related fix"
* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Fix fixup_irqs() error handling
x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately
Pull more perf updates from Ingo Molnar:
"A second round of perf updates:
- wide reaching kprobes sanitization and robustization, with the hope
of fixing all 'probe this function crashes the kernel' bugs, by
Masami Hiramatsu.
- uprobes updates from Oleg Nesterov: tmpfs support, corner case
fixes and robustization work.
- perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo
et al:
* Add support to accumulate hist periods (Namhyung Kim)
* various fixes, refactorings and enhancements"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
perf: Differentiate exec() and non-exec() comm events
perf: Fix perf_event_comm() vs. exec() assumption
uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates
perf/documentation: Add description for conditional branch filter
perf/x86: Add conditional branch filtering support
perf/tool: Add conditional branch filter 'cond' to perf record
perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND'
uprobes: Teach copy_insn() to support tmpfs
uprobes: Shift ->readpage check from __copy_insn() to uprobe_register()
perf/x86: Use common PMU interrupt disabled code
perf/ARM: Use common PMU interrupt disabled code
perf: Disable sampled events if no PMU interrupt
perf: Fix use after free in perf_remove_from_context()
perf tools: Fix 'make help' message error
perf record: Fix poll return value propagation
perf tools: Move elide bool into perf_hpp_fmt struct
perf tools: Remove elide setup for SORT_MODE__MEMORY mode
perf tools: Fix "==" into "=" in ui_browser__warning assignment
perf tools: Allow overriding sysfs and proc finding with env var
perf tools: Consider header files outside perf directory in tags target
...
Pull drm updates from Dave Airlie:
"This is the main drm merge window pull request, changes all over the
place, mostly normal levels of churn.
Highlights:
Core drm:
More cleanups, fix race on connector/encoder naming, docs updates,
object locking rework in prep for atomic modeset
i915:
mipi DSI support, valleyview power fixes, cursor size fixes,
execlist refactoring, vblank improvements, userptr support, OOM
handling improvements
radeon:
GPUVM tuning and large page size support, gart fixes, deep color
HDMI support, HDMI audio cleanups
nouveau:
- displayport rework should fix lots of issues
- initial gk20a support
- gk110b support
- gk208 fixes
exynos:
probe order fixes, HDMI changes, IPP consolidation
msm:
debugfs updates, misc fixes
ast:
ast2400 support, sync with UMS driver
tegra:
cleanups, hdmi + hw cursor for Tegra 124.
panel:
fixes existing panels add some new ones.
ipuv3:
moved from staging to drivers/gpu"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (761 commits)
drm/nouveau/disp/dp: fix tmds passthrough on dp connector
drm/nouveau/dp: probe dpcd to determine connectedness
drm/nv50-: trigger update after all connectors disabled
drm/nv50-: prepare for attaching a SOR to multiple heads
drm/gf119-/disp: fix debug output on update failure
drm/nouveau/disp/dp: make use of postcursor when its available
drm/g94-/disp/dp: take max pullup value across all lanes
drm/nouveau/bios/dp: parse lane postcursor data
drm/nouveau/dp: fix support for dpms
drm/nouveau: register a drm_dp_aux channel for each dp connector
drm/g94-/disp: add method to power-off dp lanes
drm/nouveau/disp/dp: maintain link in response to hpd signal
drm/g94-/disp: bash and wait for something after changing lane power regs
drm/nouveau/disp/dp: split link config/power into two steps
drm/nv50/disp: train PIOR-attached DP from second supervisor
drm/nouveau/disp/dp: make use of existing output data for link training
drm/gf119/disp: start removing direct vbios parsing from supervisor
drm/nv50/disp: start removing direct vbios parsing from supervisor
drm/nouveau/disp/dp: maintain receiver caps in response to hpd signal
drm/nouveau/disp/dp: create subclass for dp outputs
...
to help out the rest of the kernel to ease their use of trace events.
The big change for this release is the allowing of other tracers,
such as the latency tracers, to be used in the trace instances and allow
for function or function graph tracing to be in the top level
simultaneously.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTlbUqAAoJEKQekfcNnQGuP+8H+wTBG06beHsqe6XcaeXcKNkt
Mimm0O04oQdw89CBWeJvXyOwRTtiN4M/4hxHXBTDtChxM9oUyWw1o0IpSMMuQ16O
w9r3DfC8e1air+ufEuYWM0QNtyzHi8EfDSNia55ON5jvtkCZTXOEKZD+n8M9w28p
I7PVgr0PDztsCpethCpg0M8beK9zuQPWMzsHAQCsKI06Xl5z33kPIJR15Exh+Kr1
uVVTZW7JFVAPuSnteLSIx9pN6OjsVGzOZCljg+O+9/v/02u5nkMiS2nURxae86kg
RTSiRYT6Hvl/MCBhdss/w5kgSk6BYiZ0hXbLtwetvre+vQrOR5CnDw2DxZ7e+gU=
=oudH
-----END PGP SIGNATURE-----
Merge tag 'trace-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Lots of tweaks, small fixes, optimizations, and some helper functions
to help out the rest of the kernel to ease their use of trace events.
The big change for this release is the allowing of other tracers, such
as the latency tracers, to be used in the trace instances and allow
for function or function graph tracing to be in the top level
simultaneously"
* tag 'trace-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
tracing: Fix memory leak on instance deletion
tracing: Fix leak of ring buffer data when new instances creation fails
tracing/kprobes: Avoid self tests if tracing is disabled on boot up
tracing: Return error if ftrace_trace_arrays list is empty
tracing: Only calculate stats of tracepoint benchmarks for 2^32 times
tracing: Convert stddev into u64 in tracepoint benchmark
tracing: Introduce saved_cmdlines_size file
tracing: Add __get_dynamic_array_len() macro for trace events
tracing: Remove unused variable in trace_benchmark
tracing: Eliminate double free on failure of allocation on boot up
ftrace/x86: Call text_ip_addr() instead of the duplicated code
tracing: Print max callstack on stacktrace bug
tracing: Move locking of trace_cmdline_lock into start/stop seq calls
tracing: Try again for saved cmdline if failed due to locking
tracing: Have saved_cmdlines use the seq_read infrastructure
tracing: Add tracepoint benchmark tracepoint
tracing: Print nasty banner when trace_printk() is in use
tracing: Add funcgraph_tail option to print function name after closing braces
tracing: Eliminate duplicate TRACE_GRAPH_PRINT_xx defines
tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks
...
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.
* accumulated work in next: (6809 commits)
ufs: sb mutex merge + mutex_destroy
powerpc: update comments for generic idle conversion
cris: update comments for generic idle conversion
idle: remove cpu_idle() forward declarations
nbd: zero from and len fields in NBD_CMD_DISCONNECT.
mm: convert some level-less printks to pr_*
MAINTAINERS: adi-buildroot-devel is moderated
MAINTAINERS: add linux-api for review of API/ABI changes
mm/kmemleak-test.c: use pr_fmt for logging
fs/dlm/debug_fs.c: replace seq_printf by seq_puts
fs/dlm/lockspace.c: convert simple_str to kstr
fs/dlm/config.c: convert simple_str to kstr
mm: mark remap_file_pages() syscall as deprecated
mm: memcontrol: remove unnecessary memcg argument from soft limit functions
mm: memcontrol: clean up memcg zoneinfo lookup
mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
mm/mempool.c: update the kmemleak stack trace for mempool allocations
lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
mm: introduce kmemleak_update_trace()
mm/kmemleak.c: use %u to print ->checksum
...
This reverts commit 3e1a878b7c.
It came in very late, and already has one reported failure: Sitsofe
reports that the current tree fails to boot on his EeePC, and bisected
it down to this. Rather than waste time trying to figure out what's
wrong, just revert it.
Reported-by: Sitsofe Wheeler <sitsofe@gmail.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 cdso updates from Peter Anvin:
"Vdso cleanups and improvements largely from Andy Lutomirski. This
makes the vdso a lot less ''special''"
* 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso, build: Make LE access macros clearer, host-safe
x86/vdso, build: Fix cross-compilation from big-endian architectures
x86/vdso, build: When vdso2c fails, unlink the output
x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls
x86, mm: Improve _install_special_mapping and fix x86 vdso naming
mm, fs: Add vm_ops->name as an alternative to arch_vma_name
x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments
x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
x86, vdso: Move the 32-bit vdso special pages after the text
x86, vdso: Reimplement vdso.so preparation in build-time C
x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
x86, vdso: Clean up 32-bit vs 64-bit vdso params
x86, mm: Ensure correct alignment of the fixmap
Pull x86-64 espfix changes from Peter Anvin:
"This is the espfix64 code, which fixes the IRET information leak as
well as the associated functionality problem. With this code applied,
16-bit stack segments finally work as intended even on a 64-bit
kernel.
Consequently, this patchset also removes the runtime option that we
added as an interim measure.
To help the people working on Linux kernels for very small systems,
this patchset also makes these compile-time configurable features"
* 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
x86, espfix: Make it possible to disable 16-bit support
x86, espfix: Make espfix64 a Kconfig option, fix UML
x86, espfix: Fix broken header guard
x86, espfix: Move espfix definitions into a separate header file
x86-32, espfix: Remove filter for espfix32 due to race
x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).
It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here:
https://lkml.org/lkml/2014/3/6/257
If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.
To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP. If AP doesn't respond in 10
seconds, the master CPU will timeout and cancel
AP onlining.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If system is running without debug level logging,
it will not log error if do_boot_cpu() failed to
wakeup AP. It may lead to silent AP bringup
failures at boot time.
Change message level to KERN_ERR to make error
visible to user as it's done on other architectures.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
currently if AP wake up is failed, master CPU marks AP as not
present in do_boot_cpu() by calling set_cpu_present(cpu, false).
That leads to following list corruption on the next physical CPU
hotplug:
[ 418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[ 418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
[ 418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
[ 418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
[ 418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[ 418.166433] 0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
[ 418.176460] ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
[ 418.177453] ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
[ 418.178445] Call Trace:
[ 418.185811] [<ffffffff8159b22d>] dump_stack+0x49/0x5c
[ 418.186440] [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
[ 418.187192] [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
[ 418.191231] [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
[ 418.193889] [<ffffffff812f796e>] __list_add+0xbe/0xd0
[ 418.196649] [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
[ 418.208610] [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
[ 418.213831] [<ffffffff812e2ef4>] kobject_add+0x44/0x70
[ 418.229961] [<ffffffff813e2c60>] device_add+0xd0/0x550
[ 418.234991] [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
[ 418.250226] [<ffffffff813e32be>] device_register+0x1e/0x30
[ 418.255296] [<ffffffff813e82a3>] register_cpu+0xe3/0x130
[ 418.266539] [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
[ 418.285845] [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
...
Which is caused by the fact that generic_processor_info() allocates
logical CPU id by calling:
cpu = cpumask_next_zero(-1, cpu_present_mask);
which returns id of previously failed to wake up CPU, since its
bit is cleared by do_boot_cpu() and as result register_cpu()
tries to register another CPU with the same id as already
present but failed to be onlined CPU.
Taking in account that AP will not do anything if master CPU
failed to wake it up, there is no reason to mark that AP as not
present and break next cpu hotplug attempts. As a side effect of
not marking AP as not present, user would be allowed to online
it again later.
Also fix memory corruption in acpi_unmap_lsapic()
if during CPU hotplug master CPU failed to wake up AP
it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.
However following attempt to unplug that CPU will lead to
out of bound write access to __apicid_to_node[] which is
32768 items long on x86_64 kernel.
So with above fix of cpu_present_mask make sure that a present
CPU has a valid APIC ID by not setting x86_cpu_to_apicid
to BAD_APICID in do_boot_cpu() on failure and allow
acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Purely cosmetic, no changes in .o,
1. As Jim pointed out arch_uprobe->def looks ambiguous, rename it to
->defparam.
2. Add the comment into default_post_xol_op() to explain "regs->sp +=".
3. Remove the stale part of the comment in arch_uprobe_analyze_insn().
Suggested-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Make the x86 perf code use the new common PMU interrupt disabled code.
Typically most x86 machines have working PMU interrupts, although
some older p6-class machines had this problem.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1405161715560.11099@vincent-weaver-1.umelst.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge drm-fixes into drm-next.
Both i915 and radeon need this done for later patches.
Conflicts:
drivers/gpu/drm/drm_crtc_helper.c
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_gem.c
drivers/gpu/drm/i915/i915_gem_execbuffer.c
drivers/gpu/drm/i915/i915_gem_gtt.c
Merge misc updates from Andrew Morton:
- a few fixes for 3.16. Cc'ed to stable so they'll get there somehow.
- various misc fixes and cleanups
- most of the ocfs2 queue. Review is slow...
- most of MM. The MM queue is pretty huge this time, but not much in
the way of feature work.
- some tweaks under kernel/
- printk maintenance work
- updates to lib/
- checkpatch updates
- tweaks to init/
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (276 commits)
fs/autofs4/dev-ioctl.c: add __init to autofs_dev_ioctl_init
fs/ncpfs/getopt.c: replace simple_strtoul by kstrtoul
init/main.c: remove an ifdef
kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND
init/main.c: add initcall_blacklist kernel parameter
init/main.c: don't use pr_debug()
fs/binfmt_flat.c: make old_reloc() static
fs/binfmt_elf.c: fix bool assignements
fs/efs: convert printk(KERN_DEBUG to pr_debug
fs/efs: add pr_fmt / use __func__
fs/efs: convert printk to pr_foo()
scripts/checkpatch.pl: device_initcall is not the only __initcall substitute
checkpatch: check stable email address
checkpatch: warn on unnecessary void function return statements
checkpatch: prefer kstrto<foo> to sscanf(buf, "%<lhuidx>", &bar);
checkpatch: add warning for kmalloc/kzalloc with multiply
checkpatch: warn on #defines ending in semicolon
checkpatch: make --strict a default for files in drivers/net and net/
checkpatch: always warn on missing blank line after variable declaration block
checkpatch: fix wildcard DT compatible string checking
...
... instead of naked numbers.
Stuff in sysrq.c used to set it to 8 which is supposed to mean above
default level so set it to DEBUG instead as we're terminating/killing all
tasks and we want to be verbose there.
Also, correct the check in x86_64_start_kernel which should be >= as
we're clearly issuing the string there for all debug levels, not only
the magical 10.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Joe Perches <joe@perches.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_generic_alloc_coherent() firstly attempts to allocate by
dma_alloc_from_contiguous() if CONFIG_DMA_CMA is enabled. But the
memory region allocated by it may not fit within the device's DMA mask.
This change makes it fall back to usual alloc_pages_node() allocation
for such cases.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, "cma=" kernel parameter is used to specify the size of CMA,
but we can't specify where it is located. We want to locate CMA below
4GB for devices only supporting 32-bit addressing on 64-bit systems
without iommu.
This enables to specify the placement of CMA by extending "cma=" kernel
parameter.
Examples:
1. locate 64MB CMA below 4GB by "cma=64M@0-4G"
2. locate 64MB CMA exact at 512MB by "cma=64M@512M"
Note that the DMA contiguous memory allocator on x86 assumes that
page_address() works for the pages to allocate. So this change requires
to limit end address of contiguous memory area upto max_pfn_mapped to
prevent from locating it on highmem area by the argument of
dma_contiguous_reserve().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The DMA Contiguous Memory Allocator support on x86 is disabled when
swiotlb config option is enabled. So DMA CMA is always disabled on
x86_64 because swiotlb is always enabled. This attempts to support for
DMA CMA with enabling swiotlb config option.
The contiguous memory allocator on x86 is integrated in the function
dma_generic_alloc_coherent() which is .alloc callback in nommu_dma_ops
for dma_alloc_coherent().
x86_swiotlb_alloc_coherent() which is .alloc callback in swiotlb_dma_ops
tries to allocate with dma_generic_alloc_coherent() firstly and then
swiotlb_alloc_coherent() is called as a fallback.
The main part of supporting DMA CMA with swiotlb is that changing
x86_swiotlb_free_coherent() which is .free callback in swiotlb_dma_ops
for dma_free_coherent() so that it can distinguish memory allocated by
dma_generic_alloc_coherent() from one allocated by
swiotlb_alloc_coherent() and release it with dma_generic_free_coherent()
which can handle contiguous memory. This change requires making
is_swiotlb_buffer() global function.
This also needs to change .free callback in the dma_map_ops for amd_gart
and sta2x11, because these dma_ops are also using
dma_generic_alloc_coherent().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset enhances the DMA Contiguous Memory Allocator on x86.
Currently the DMA CMA is only supported with pci-nommu dma_map_ops and
furthermore it can't be enabled on x86_64. But I would like to allocate
big contiguous memory with dma_alloc_coherent() and tell it to the device
that requires it, regardless of which dma mapping implementation is
actually used in the system.
So this makes it work with swiotlb and intel-iommu dma_map_ops, too. And
this also extends "cma=" kernel parameter to specify placement constraint
by the physical address range of memory allocations. For example, CMA
allocates memory below 4GB by "cma=64M@0-4G", it is required for the
devices only supporting 32-bit addressing on 64-bit systems without iommu.
This patch (of 5):
Calling dma_alloc_coherent() with __GFP_ZERO must return zeroed memory.
But when the contiguous memory allocator (CMA) is enabled on x86 and the
memory region is allocated by dma_alloc_from_contiguous(), it doesn't
return zeroed memory. Because dma_generic_alloc_coherent() forgot to fill
the memory region with zero if it was allocated by
dma_alloc_from_contiguous()
Most implementations of dma_alloc_coherent() return zeroed memory
regardless of whether __GFP_ZERO is specified. So this fixes it by
unconditionally zeroing the allocated memory region.
Alternatively, we could fix dma_alloc_from_contiguous() to return zeroed
out memory and remove memset() from all caller of it. But we can't simply
remove the memset on arm because __dma_clear_buffer() is used there for
ensuring cache flushing and it is used in many places. Of course we can
do redundant memset in dma_alloc_from_contiguous(), but I think this patch
is less impact for fixing this problem.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull core irq updates from Thomas Gleixner:
"The irq department delivers:
- Another tree wide update to get rid of the horrible create_irq
interface along with its even more horrible variants. That also
gets rid of the last leftovers of the initial sparse irq hackery.
arch/driver specific changes have been either acked or ignored.
- A fix for the spurious interrupt detection logic with threaded
interrupts.
- A new ARM SoC interrupt controller
- The usual pile of fixes and improvements all over the place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
Documentation: brcmstb-l2: Add Broadcom STB Level-2 interrupt controller binding
irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller
genirq: Improve documentation to match current implementation
ARM: iop13xx: fix msi support with sparse IRQ
genirq: Provide !SMP stub for irq_set_affinity_notifier()
irqchip: armada-370-xp: Move the devicetree binding documentation
irqchip: gic: Use mask field in GICC_IAR
genirq: Remove dynamic_irq mess
ia64: Use irq_init_desc
genirq: Replace dynamic_irq_init/cleanup
genirq: Remove irq_reserve_irq[s]
genirq: Replace reserve_irqs in core code
s390: Avoid call to irq_reserve_irqs()
s390: Remove pointless arch_show_interrupts()
s390: pci: Check return value of alloc_irq_desc() proper
sh: intc: Remove pointless irq_reserve_irqs() invocation
x86, irq: Remove pointless irq_reserve_irqs() call
genirq: Make create/destroy_irq() ia64 private
tile: Use SPARSE_IRQ
tile: pci: Use irq_alloc/free_hwirq()
...
- Another round of clean-up of FDT related code in architecture code.
This removes knowledge of internal FDT details from most architectures
except powerpc.
- Conversion of kernel's custom FDT parsing code to use libfdt.
- DT based initialization for generic serial earlycon. The introduction
of generic serial earlycon support went in thru tty tree.
- Improve the platform device naming for DT probed devices to ensure
unique naming and use parent names instead of a global index.
- Fix a race condition in of_update_property.
- Unify the various linker section OF match tables and fix several
function prototype errors.
- Update platform_get_irq_byname to work in deferred probe cases.
- 2 binding doc updates
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTjzgyAAoJEMhvYp4jgsXiFsUH/1PMTGo8CyD62VQD5ZKdAoW+
Fq6vCiRQ8assF5i5ZLcW1DqhjtoRaCKYhVbRKa5lj7cZdjlSpacI/qQPrF5Br2Ii
bTE3Ff/AQwipQaz/Bj7HqJCgGwfWK8xdfgW0abKsyXMWDN86Bov/zzeu8apmws0x
H1XjJRgnc/rzM4m9ny6+lss0iq6YL54SuTYNzHR33+Ywxls69SfHXIhCW0KpZcBl
5U3YUOomt40GfO46sxFA4xApAhypEK4oVq7asyiA2ArTZ/c2Pkc9p5CBqzhDLmlq
yioWTwHIISv0q+yMLCuQrVGIsbUDkQyy7RQ15z6U+/e/iGO/M+j3A5yxMc3qOi4=
=Onff
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux into next
Pull DeviceTree updates from Rob Herring:
- Another round of clean-up of FDT related code in architecture code.
This removes knowledge of internal FDT details from most
architectures except powerpc.
- Conversion of kernel's custom FDT parsing code to use libfdt.
- DT based initialization for generic serial earlycon. The
introduction of generic serial earlycon support went in through the
tty tree.
- Improve the platform device naming for DT probed devices to ensure
unique naming and use parent names instead of a global index.
- Fix a race condition in of_update_property.
- Unify the various linker section OF match tables and fix several
function prototype errors.
- Update platform_get_irq_byname to work in deferred probe cases.
- 2 binding doc updates
* tag 'devicetree-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (58 commits)
of: handle NULL node in next_child iterators
of/irq: provide more wrappers for !CONFIG_OF
devicetree: bindings: Document micrel vendor prefix
dt: bindings: dwc2: fix required value for the phy-names property
of_pci_irq: kill useless variable in of_irq_parse_pci()
of/irq: do irq resolution in platform_get_irq_byname()
of: Add a testcase for of_find_node_by_path()
of: Make of_find_node_by_path() handle /aliases
of: Create unlocked version of for_each_child_of_node()
lib: add glibc style strchrnul() variant
of: Handle memory@0 node on PPC32 only
pci/of: Remove dead code
of: fix race between search and remove in of_update_property()
of: Use NULL for pointers
of: Stop naming platform_device using dcr address
of: Ensure unique names without sacrificing determinism
tty/serial: pl011: add DT based earlycon support
of/fdt: add FDT serial scanning for earlycon
of/fdt: add FDT address translation support
serial: earlycon: add DT support
...
was a pretty active cycle for KVM. Changes include:
- a lot of s390 changes: optimizations, support for migration,
GDB support and more
- ARM changes are pretty small: support for the PSCI 0.2 hypercall
interface on both the guest and the host (the latter acked by Catalin)
- initial POWER8 and little-endian host support
- support for running u-boot on embedded POWER targets
- pretty large changes to MIPS too, completing the userspace interface
and improving the handling of virtualized timer hardware
- for x86, a larger set of changes is scheduled for 3.17. Still,
we have a few emulator bugfixes and support for running nested
fully-virtualized Xen guests (para-virtualized Xen guests have
always worked). And some optimizations too.
The only missing architecture here is ia64. It's not a coincidence
that support for KVM on ia64 is scheduled for removal in 3.17.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX
BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1
j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8
W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ
0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH
CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA
nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB
TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m
ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og
VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl
3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz
bAknaZpPiOrW11Et1htY
=j5Od
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next
Pull KVM updates from Paolo Bonzini:
"At over 200 commits, covering almost all supported architectures, this
was a pretty active cycle for KVM. Changes include:
- a lot of s390 changes: optimizations, support for migration, GDB
support and more
- ARM changes are pretty small: support for the PSCI 0.2 hypercall
interface on both the guest and the host (the latter acked by
Catalin)
- initial POWER8 and little-endian host support
- support for running u-boot on embedded POWER targets
- pretty large changes to MIPS too, completing the userspace
interface and improving the handling of virtualized timer hardware
- for x86, a larger set of changes is scheduled for 3.17. Still, we
have a few emulator bugfixes and support for running nested
fully-virtualized Xen guests (para-virtualized Xen guests have
always worked). And some optimizations too.
The only missing architecture here is ia64. It's not a coincidence
that support for KVM on ia64 is scheduled for removal in 3.17"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
KVM: add missing cleanup_srcu_struct
KVM: PPC: Book3S PR: Rework SLB switching code
KVM: PPC: Book3S PR: Use SLB entry 0
KVM: PPC: Book3S HV: Fix machine check delivery to guest
KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
KVM: PPC: Book3S HV: Fix dirty map for hugepages
KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
KVM: PPC: Book3S: Add ONE_REG register names that were missed
KVM: PPC: Add CAP to indicate hcall fixes
KVM: PPC: MPIC: Reset IRQ source private members
KVM: PPC: Graciously fail broken LE hypercalls
PPC: ePAPR: Fix hypercall on LE guest
KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
KVM: PPC: BOOK3S: Always use the saved DAR value
PPC: KVM: Make NX bit available with magic page
KVM: PPC: Disable NX for old magic page using guests
KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
...
check_irq_vectors_for_cpu_disable() can overestimate the number of
available interrupt vectors, so the check for cpu down succeeds, but
the actual cpu removal fails.
It iterates from FIRST_EXTERNAL_VECTOR to NR_VECTORS, which is wrong
because the systems vectors are not taken into account.
Limit the search to first_system_vector instead of NR_VECTORS.
The second indicator for vector availability the used_vectors bitmap
is not taken into account at all. So system vectors,
e.g. IA32_SYSCALL_VECTOR (0x80) and IRQ_MOVE_CLEANUP_VECTOR (0x20),
are accounted as available.
Add a check for the used_vectors bitmap and do not account vectors
which are marked there.
[ tglx: Simplified code. Rewrote changelog and code comments. ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Elliott, Robert (Server Storage)" <Elliott@hp.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1400160305-17774-2-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I just went over this when looking at some Xen-related ftrace initialization
problems. They were related to Xen code that is not upstream but this clean up
would make sense here.
I think that this was already the intention when text_ip_addr() was introduced
in the commit 87fbb2ac60 (ftrace/x86: Use breakpoints for converting
function graph caller). Anyway, better do it now before it shots people into
their leg ;-)
Link: http://lkml.kernel.org/p/1401812601-2359-1-git-send-email-pmladek@suse.cz
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull x86/UV changes from Ingo Molnar:
"Continued updates for SGI UV 3 hardware support"
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/UV: Fix conditional in gru_exit()
x86/UV: Set n_lshift based on GAM_GR_CONFIG MMR for UV3
Pull x86 RAS changes from Ingo Molnar:
"Improve mcheck device initialization and bootstrap robustness"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mce: Panic when a core has reached a timeout
x86/mce: Improve mcheck_init_device() error handling
Pull x86 IOSF platform updates from Ingo Molnar:
"IOSF (Intel OnChip System Fabric) updates:
- generalize the IOSF interface to allow mixed mode drivers: non-IOSF
drivers to utilize of IOSF features on IOSF platforms.
- add 'Quark X1000' IOSF/MBI support
- clean up BayTrail and Quark PCI ID enumeration"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, iosf: Add PCI ID macros for better readability
x86, iosf: Add Quark X1000 PCI ID
x86, iosf: Added Quark MBI identifiers
x86, iosf: Make IOSF driver modular and usable by more drivers
Pull x86 microcode changes from Ingo Molnar:
"A microcode-debugging boot flag plus related refactoring"
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode: Add a disable chicken bit
x86, boot: Carve out early cmdline parsing function
Pull scheduler updates from Ingo Molnar:
"The main scheduling related changes in this cycle were:
- various sched/numa updates, for better performance
- tree wide cleanup of open coded nice levels
- nohz fix related to rq->nr_running use
- cpuidle changes and continued consolidation to improve the
kernel/sched/idle.c high level idle scheduling logic. As part of
this effort I pulled cpuidle driver changes from Rafael as well.
- standardized idle polling amongst architectures
- continued work on preparing better power/energy aware scheduling
- sched/rt updates
- misc fixlets and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
sched/numa: Decay ->wakee_flips instead of zeroing
sched/numa: Update migrate_improves/degrades_locality()
sched/numa: Allow task switch if load imbalance improves
sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
sched: Initialize rq->age_stamp on processor start
sched, nohz: Change rq->nr_running to always use wrappers
sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
sched: Use clamp() and clamp_val() to make sys_nice() more readable
sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
sched/numa: Fix initialization of sched_domain_topology for NUMA
sched: Call select_idle_sibling() when not affine_sd
sched: Simplify return logic in sched_read_attr()
sched: Simplify return logic in sched_copy_attr()
sched: Fix exec_start/task_hot on migrated tasks
arm64: Remove TIF_POLLING_NRFLAG
metag: Remove TIF_POLLING_NRFLAG
sched/idle: Make cpuidle_idle_call() void
sched/idle: Reflow cpuidle_idle_call()
sched/idle: Delay clearing the polling bit
...
Pull perf updates from Ingo Molnar:
"The tooling changes maintained by Jiri Olsa until Arnaldo is on
vacation:
User visible changes:
- Add -F option for specifying output fields (Namhyung Kim)
- Propagate exit status of a command line workload for record command
(Namhyung Kim)
- Use tid for finding thread (Namhyung Kim)
- Clarify the output of perf sched map plus small sched command
fixes (Dongsheng Yang)
- Wire up perf_regs and unwind support for ARM64 (Jean Pihet)
- Factor hists statistics counts processing which in turn also fixes
several bugs in TUI report command (Namhyung Kim)
- Add --percentage option to control absolute/relative percentage
output (Namhyung Kim)
- Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by
completion scripts (Ramkumar Ramachandra)
Development/infrastructure changes and fixes:
- Android related fixes for pager and map dso resolving (Michael
Lentine)
- Add libdw DWARF post unwind support for ARM (Jean Pihet)
- Consolidate types.h for ARM and ARM64 (Jean Pihet)
- Fix possible null pointer dereference in session.c (Masanari Iida)
- Cleanup, remove unused variables in map_switch_event() (Dongsheng
Yang)
- Remove nr_state_machine_bugs in perf latency (Dongsheng Yang)
- Remove usage of trace_sched_wakeup(.success) (Peter Zijlstra)
- Cleanups for perf.h header (Jiri Olsa)
- Consolidate types.h and export.h within tools (Borislav Petkov)
- Move u64_swap union to its single user's header, evsel.h (Borislav
Petkov)
- Fix for s390 to properly parse tracepoints plus test code
(Alexander Yarygin)
- Handle EINTR error for readn/writen (Namhyung Kim)
- Add a test case for hists filtering (Namhyung Kim)
- Share map_groups among threads of the same group (Arnaldo Carvalho
de Melo, Jiri Olsa)
- Making some code (cpu node map and report parse callchain callback)
global to be usable by upcomming changes (Don Zickus)
- Fix pmu object compilation error (Jiri Olsa)
Kernel side changes:
- intrusive uprobes fixes from Oleg Nesterov. Since the interface is
admin-only, and the bug only affects user-space ("any probed
jmp/call can kill the application"), we queued these fixes via the
development tree, as a special exception.
- more fuzzer motivated race fixes and related refactoring and
robustization.
- allow PMU drivers to be built as modules. (No actual module yet,
because the x86 Intel uncore module wasn't ready in time for this)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
perf tools: Add automatic remapping of Android libraries
perf tools: Add cat as fallback pager
perf tests: Add a testcase for histogram output sorting
perf tests: Factor out print_hists_*()
perf tools: Introduce reset_output_field()
perf tools: Get rid of obsolete hist_entry__sort_list
perf hists: Reset width of output fields with header length
perf tools: Skip elided sort entries
perf top: Add --fields option to specify output fields
perf report/tui: Fix a bug when --fields/sort is given
perf tools: Add ->sort() member to struct sort_entry
perf report: Add -F option to specify output fields
perf tools: Call perf_hpp__init() before setting up GUI browsers
perf tools: Consolidate management of default sort orders
perf tools: Allow hpp fields to be sort keys
perf ui: Get rid of callback from __hpp__fmt()
perf tools: Consolidate output field handling to hpp format routines
perf tools: Use hpp formats to sort final output
perf tools: Support event grouping in hpp ->sort()
perf tools: Use hpp formats to sort hist entries
...
There is very little and maybe practically nothing we can do to recover
from a system where at least one core has reached a timeout during the
whole monarch cores gathering. So panic when that happens.
Link: http://lkml.kernel.org/r/20140523091041.GA21332@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Detect the xsaveopt, xsavec, xgetbv, and xsaves features in processor extended
state enumberation sub-leaf (eax=0x0d, ecx=1):
Bit 00: XSAVEOPT is available
Bit 01: Supports XSAVEC and the compacted form of XRSTOR if set
Bit 02: Supports XGETBV with ECX = 1 if set
Bit 03: Supports XSAVES/XRSTORS and IA32_XSS if set
The above features are defined in the new word 10 in cpu features.
The IA32_XSS MSR (index DA0H) contains a state-component bitmap that specifies
the state components that software has enabled xsaves and xrstors to manage.
If the bit corresponding to a state component is clear in XCR0 | IA32_XSS,
xsaves and xrstors will not operate on that state component, regardless of
the value of the instruction mask.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull perf fixes from Ingo Molnar:
"The biggest changes are fixes for races that kept triggering Trinity
crashes, plus liblockdep build fixes and smaller misc fixes.
The liblockdep bits in perf/urgent are a pull mistake - they should
have been in locking/urgent - but by the time I noticed other commits
were added and testing was done :-/ Sorry about that"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix a race between ring_buffer_detach() and ring_buffer_attach()
perf: Prevent false warning in perf_swevent_add
perf: Limit perf_event_attr::sample_period to 63 bits
tools/liblockdep: Remove all build files when doing make clean
tools/liblockdep: Build liblockdep from tools/Makefile
perf/x86/intel: Fix Silvermont's event constraints
perf: Fix perf_event_init_context()
perf: Fix race in removing an event
Print the AGP bridge info the same way as the rest of the kernel, e.g.,
"0000:00:04.0" instead of "00:04:00".
Also print the AGP aperture address range the same way we print resources,
and label it explicitly as a bus address range.
No functional change except the message changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Replace printk() with pr_info(), pr_err(), etc. Define pr_fmt() to prefix
output with "AGP: ".
No functional change except the addition of "AGP: " prefix in dmesg output.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
I noticed on some of my systems that page fault tracing doesn't
work:
cd /sys/kernel/debug/tracing
echo 1 > events/exceptions/enable
cat trace;
# nothing shows up
I eventually traced it down to CONFIG_KVM_GUEST. At least in a
KVM VM, enabling that option breaks page fault tracing, and
disabling fixes it. I tried on some old kernels and this does
not appear to be a regression: it never worked.
There are two page-fault entry functions today. One when tracing
is on and another when it is off. The KVM code calls do_page_fault()
directly instead of calling the traced version:
> dotraplinkage void __kprobes
> do_async_page_fault(struct pt_regs *regs, unsigned long
> error_code)
> {
> enum ctx_state prev_state;
>
> switch (kvm_read_and_reset_pf_reason()) {
> default:
> do_page_fault(regs, error_code);
> break;
> case KVM_PV_REASON_PAGE_NOT_PRESENT:
I'm also having problems with the page fault tracing on bare
metal (same symptom of no trace output). I'm unsure if it's
related.
Steven had an alternative to this which has zero overhead when
tracing is off where this includes the standard noops even when
tracing is disabled. I'm unconvinced that the extra complexity
of his apporach:
http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home
is worth it, expecially considering that the KVM code is already
making page fault entry slower here. This solution is
dirt-simple.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Merge x86/espfix into x86/vdso, due to changes in the vdso setup code
that otherwise cause conflicts.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
I haven't touched the device interrupt code, which is different
enough that it's probably not worth merging, and I haven't done
anything about paranoidzeroentry_ist yet.
This appears to produce an entry_64.o file that differs only in the
debug info line numbers.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/e7a6acfb130471700370e77af9e4b4b6ed46f5ef.1400709717.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Merge in Linus' tree with:
fa81511bb0 x86-64, modify_ldt: Make support for 16-bit segments a runtime option
... reverted, to avoid a conflict. This commit is no longer necessary
with the proper fix in place.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Add a cmdline param which disables the microcode loader. This is useful
mostly in debugging situations where we want to turn off microcode
loading, both early from the initrd and late, as a means to be able to
rule out its influence on the machine.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1400525957-11525-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch fixes a bug in precise_store_data_hsw() whereby
it would set the data source memory level to the wrong value.
As per the the SDM Vol 3b Table 18-41 (Layout of Data Linear
Address Information in PEBS Record), when status bit 0 is set
this is a L1 hit, otherwise this is a L1 miss.
This patch encodes the memory level according to the specification.
In V2, we added the filtering on the store events.
Only the following events produce L1 information:
* MEM_UOPS_RETIRED.STLB_MISS_STORES
* MEM_UOPS_RETIRED.LOCK_STORES
* MEM_UOPS_RETIRED.SPLIT_STORES
* MEM_UOPS_RETIRED.ALL_STORES
Cc: mingo@elte.hu
Cc: acme@ghostprotocols.net
Cc: jolsa@redhat.com
Cc: jmario@redhat.com
Cc: ak@linux.intel.com
Tested-and-Reviewed-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140515155644.GA3884@quad
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
That's a leftover from the time where x86 supported SPARSE_IRQ=n.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154338.967285614@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No more users. Remove the cruft
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154336.760446122@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No need to expose this outside of the ioapic code. The dynamic
allocations are guaranteed not to happen in the gsi space. See commit
62a08ae2a.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20140507154335.959870037@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No functional change just less crap.
This does not replace the requirement to move x86 to irq domains, but
it limits the mess to some degree.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154335.749579081@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the new interfaces. No functional change.
This does not replace the requirement to move x86 to irq domains, but
it limits the mess to some degree.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154334.991589924@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is just a cleanup to get rid of the create/destroy_irq variants
which were designed in hell.
The long term solution for x86 is to switch over to irq domains and
cleanup the whole vector allocation mess.
The generic irq_alloc_hwirqs() interface deliberately prevents
multi-MSI vector allocation to further enforce the irq domain
conversion (aside of the desire to support ioapic hotplug).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154334.482904047@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Checkin:
b3b42ac2cb x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
disabled 16-bit segments on 64-bit kernels due to an information
leak. However, it does seem that people are genuinely using Wine to
run old 16-bit Windows programs on Linux.
A proper fix for this ("espfix64") is coming in the upcoming merge
window, but as a temporary fix, create a sysctl to allow the
administrator to re-enable support for 16-bit segments.
It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If
you hit this issue and care about your old Windows program more than
you care about a kernel stack address information leak, you can do
echo 1 > /proc/sys/abi/ldt16
as root (add it to your startup scripts), and you should be ok.
The sysctl table is only added if you have COMPAT support enabled on
x86-64, but I assume anybody who runs old windows binaries very much
does that ;)
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com
Cc: <stable@vger.kernel.org>
As the mcount code gets more complex, it really does not belong
in the entry.S file. By moving it into its own file "mcount.S"
keeps things a bit cleaner.
Link: http://lkml.kernel.org/p/20140508152152.2130e8cf@gandalf.local.home
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As the decision to what needs to be done (converting a call to the
ftrace_caller to ftrace_caller_regs or to convert from ftrace_caller_regs
to ftrace_caller) can easily be determined from the rec->flags of
FTRACE_FL_REGS and FTRACE_FL_REGS_EN, there's no need to have the
ftrace_check_record() return either a UPDATE_MODIFY_CALL_REGS or a
UPDATE_MODIFY_CALL. Just he latter is enough. This added flag causes
more complexity than is required. Remove it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Move and rename get_ftrace_addr() and get_ftrace_addr_old() to
ftrace_get_addr_new() and ftrace_get_addr_curr() respectively.
This moves these two helper functions in the generic code out from
the arch specific code, and renames them to have a better generic
name. This will allow other archs to use them as well as makes it
a bit easier to work on getting separate trampolines for different
functions.
ftrace_get_addr_new() returns the trampoline address that the mcount
call address will be converted to.
ftrace_get_addr_curr() returns the trampoline address of what the
mcount call address currently jumps to.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The add_breakpoint() code in the ftrace updating gets the address
of what the call will become, but if the mcount address is changing
from regs to non-regs ftrace_caller or vice versa, it will use what
the record currently is.
This is rather silly as the code should always use what is currently
there regardless of if it's changing the regs function or just converting
to a nop.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If the probed insn triggers a trap, ->si_addr = regs->ip is technically
correct, but this is not what the signal handler wants; we need to pass
the address of the probed insn, not the address of xol slot.
Add the new arch-agnostic helper, uprobe_get_trap_addr(), and change
fill_trap_info() and math_error() to use it. !CONFIG_UPROBES case in
uprobes.h uses a macro to avoid include hell and ensure that it can be
compiled even if an architecture doesn't define instruction_pointer().
Test-case:
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
extern void probe_div(void);
void sigh(int sig, siginfo_t *info, void *c)
{
int passed = (info->si_addr == probe_div);
printf(passed ? "PASS\n" : "FAIL\n");
_exit(!passed);
}
int main(void)
{
struct sigaction sa = {
.sa_sigaction = sigh,
.sa_flags = SA_SIGINFO,
};
sigaction(SIGFPE, &sa, NULL);
asm (
"xor %ecx,%ecx\n"
".globl probe_div; probe_div:\n"
"idiv %ecx\n"
);
return 0;
}
it fails if probe_div() is probed.
Note: show_unhandled_signals users should probably use this helper too,
but we need to cleanup them first.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Move the callsite of fill_trap_info() into do_error_trap() and remove
the "siginfo_t *info" argument.
This obviously breaks DO_ERROR() which passed info == NULL, we simply
change fill_trap_info() to return "siginfo_t *" and add the "default"
case which returns SEND_SIG_PRIV.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Extract the fill-siginfo code from DO_ERROR_INFO() into the new helper,
fill_trap_info().
It can calculate si_code and si_addr looking at trapnr, so we can remove
these arguments from DO_ERROR_INFO() and simplify the source code. The
generated code is the same, __builtin_constant_p(trapnr) == T.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Move the common code from DO_ERROR() and DO_ERROR_INFO() into the new
helper, do_error_trap(). This simplifies define's and shaves 527 bytes
from traps.o.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
force_sig() is just force_sig_info(SEND_SIG_PRIV). Imho it should die,
we have too many ugly "send signal" helpers.
And do_trap() looks just ugly because it uses force_sig_info() or
force_sig() depending on info != NULL.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Before this patch, instructions such as div, mul, shifts with count
in CL, cmpxchg are mishandled.
This patch adds vex prefix handling. In particular, it avoids colliding
with register operand encoded in vex.vvvv field.
Since we need to avoid two possible register operands, the selection of
scratch register needs to be from at least three registers.
After looking through a lot of CPU docs, it looks like the safest choice
is SI,DI,BX. Selecting BX needs care to not collide with implicit use of
BX by cmpxchg8b.
Test-case:
#include <stdio.h>
static const char *const pass[] = { "FAIL", "pass" };
long two = 2;
void test1(void)
{
long ax = 0, dx = 0;
asm volatile("\n"
" xor %%edx,%%edx\n"
" lea 2(%%edx),%%eax\n"
// We divide 2 by 2. Result (in eax) should be 1:
" probe1: .globl probe1\n"
" divl two(%%rip)\n"
// If we have a bug (eax mangled on entry) the result will be 2,
// because eax gets restored by probe machinery.
: "=a" (ax), "=d" (dx) /*out*/
: "0" (ax), "1" (dx) /*in*/
: "memory" /*clobber*/
);
dprintf(2, "%s: %s\n", __func__,
pass[ax == 1]
);
}
long val2 = 0;
void test2(void)
{
long old_val = val2;
long ax = 0, dx = 0;
asm volatile("\n"
" mov val2,%%eax\n" // eax := val2
" lea 1(%%eax),%%edx\n" // edx := eax+1
// eax is equal to val2. cmpxchg should store edx to val2:
" probe2: .globl probe2\n"
" cmpxchg %%edx,val2(%%rip)\n"
// If we have a bug (eax mangled on entry), val2 will stay unchanged
: "=a" (ax), "=d" (dx) /*out*/
: "0" (ax), "1" (dx) /*in*/
: "memory" /*clobber*/
);
dprintf(2, "%s: %s\n", __func__,
pass[val2 == old_val + 1]
);
}
long val3[2] = {0,0};
void test3(void)
{
long old_val = val3[0];
long ax = 0, dx = 0;
asm volatile("\n"
" mov val3,%%eax\n" // edx:eax := val3
" mov val3+4,%%edx\n"
" mov %%eax,%%ebx\n" // ecx:ebx := edx:eax + 1
" mov %%edx,%%ecx\n"
" add $1,%%ebx\n"
" adc $0,%%ecx\n"
// edx:eax is equal to val3. cmpxchg8b should store ecx:ebx to val3:
" probe3: .globl probe3\n"
" cmpxchg8b val3(%%rip)\n"
// If we have a bug (edx:eax mangled on entry), val3 will stay unchanged.
// If ecx:edx in mangled, val3 will get wrong value.
: "=a" (ax), "=d" (dx) /*out*/
: "0" (ax), "1" (dx) /*in*/
: "cx", "bx", "memory" /*clobber*/
);
dprintf(2, "%s: %s\n", __func__,
pass[val3[0] == old_val + 1 && val3[1] == 0]
);
}
int main(int argc, char **argv)
{
test1();
test2();
test3();
return 0;
}
Before this change all tests fail if probe{1,2,3} are probed.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
It is possible to replace rip-relative addressing mode with addressing
mode of the same length: (reg+disp32). This eliminates the need to fix
up immediate and correct for changing instruction length.
And we can kill arch_uprobe->def.riprel_target.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
gen8_stolen_size() is missing __init, so add it.
Also all the intel_stolen_funcs structures can be marked
__initconst.
intel_stolen_ids[] can also be made const if we replace the
__initdata with __initconst.
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
CHV uses the same bits as SNB/VLV to code the Graphics Mode Select field
(GFX stolen memory size) with the addition of finer granularity modes:
4MB increments from 0x11 (8MB) to 0x1d.
Values strictly above 0x1d are either reserved or not supported.
v2: 4MB increments, not 8MB. 32MB has been omitted from the list of new
values (Ville Syrjälä)
v3: Also correctly interpret GGMS (GTT Graphics Memory Size) (Ville
Syrjälä)
v4: Don't assign a value that needs 20bits or more to a u16 (Rafael
Barbalho)
[vsyrjala: v5: Split from i915 changes and add chv_stolen_funcs]
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Tested-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
One can logically expect that when the user has specified "nordrand",
the user doesn't want any use of the CPU random number generator,
neither RDRAND nor RDSEED, so disable both.
Reported-by: Stephan Mueller <smueller@chronox.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Link: http://lkml.kernel.org/r/21542339.0lFnPSyGRS@myon.chronox.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Currently drivers that run on non-IOSF systems (Core/Xeon) can't use the IOSF
driver on SOC's without selecting it which forces an unnecessary and limiting
dependency. Provides dummy functions to allow these modules to conditionally
use the driver on IOSF equipped platforms without impacting their ability to
compile and load on non-IOSF platforms. Build default m to ensure availability
on x86 SOC's.
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-2-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
With tk->wall_to_monotonic.tv_nsec being a 32-bit value on 32-bit
systems, (tk->wall_to_monotonic.tv_nsec << tk->shift) in update_vsyscall()
may lose upper bits or, worse, add them since compiler will do this:
(u64)(tk->wall_to_monotonic.tv_nsec << tk->shift)
instead of
((u64)tk->wall_to_monotonic.tv_nsec << tk->shift)
So if, for example, tv_nsec is 0x800000 and shift is 8 we will end up
with 0xffffffff80000000 instead of 0x80000000. And then we are stuck in
the subsequent 'while' loop.
We need an explicit cast.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1399648287-15178-1-git-send-email-boris.ostrovsky@oracle.com
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: H. Peter Anvin <hpa@zytor.com>