Commit Graph

20570 Commits

Author SHA1 Message Date
Bryan O'Donoghue
38a1dfda8e x86/apic: Re-enable PCI_MSI support for non-SMP X86_32
Commit 0dbc6078c0 ('x86, build, pci: Fix PCI_MSI build on !SMP')
introduced the dependency that X86_UP_APIC is only available when
PCI_MSI is false. This effectively prevents PCI_MSI support on 32bit
UP systems because it disables both APIC and IO-APIC. But APIC support
is architecturally required for PCI_MSI.

The intention of the patch was to enforce APIC support when PCI_MSI is
enabled, but failed to do so.

Remove the !PCI_MSI dependency from X86_UP_APIC and enforce
X86_UP_APIC when PCI_MSI support is enabled on 32bit UP systems.

[ tglx: Massaged changelog ]

Fixes 0dbc6078c0 'x86, build, pci: Fix PCI_MSI build on !SMP'
Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1421967529-9037-1-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-23 10:20:30 +01:00
Juergen Gross
31bb772370 x86, mm: Change cachemode exports to non-gpl
Commit 281d4078be ("x86: Make page cache mode a real type")
introduced the symbols __cachemode2pte_tbl and __pte2cachemode_tbl and
exported them via EXPORT_SYMBOL_GPL.  The exports are part of a
replacement of code which has been EXPORT_SYMBOL before these changes
resulting in build breakage of out-of-tree non-gpl modules.

Change EXPORT_SYMBOL_GPL to EXPORT-SYMBOL for these two symbols.

Fixes: 281d4078be "x86: Make page cache mode a real type"
Reported-and-tested-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/1421926997-28615-1-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 21:50:14 +01:00
Andy Lutomirski
3669ef9fa7 x86, tls: Interpret an all-zero struct user_desc as "no segment"
The Witcher 2 did something like this to allocate a TLS segment index:

        struct user_desc u_info;
        bzero(&u_info, sizeof(u_info));
        u_info.entry_number = (uint32_t)-1;

        syscall(SYS_set_thread_area, &u_info);

Strictly speaking, this code was never correct.  It should have set
read_exec_only and seg_not_present to 1 to indicate that it wanted
to find a free slot without putting anything there, or it should
have put something sensible in the TLS slot if it wanted to allocate
a TLS entry for real.  The actual effect of this code was to
allocate a bogus segment that could be used to exploit espfix.

The set_thread_area hardening patches changed the behavior, causing
set_thread_area to return -EINVAL and crashing the game.

This changes set_thread_area to interpret this as a request to find
a free slot and to leave it empty, which isn't *quite* what the game
expects but should be close enough to keep it working.  In
particular, using the code above to allocate two segments will
allocate the same segment both times.

According to FrostbittenKing on Github, this fixes The Witcher 2.

If this somehow still causes problems, we could instead allocate
a limit==0 32-bit data segment, but that seems rather ugly to me.

Fixes: 41bdc78544 x86/tls: Validate TLS entries to protect espfix
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: stable@vger.kernel.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 21:45:07 +01:00
Andy Lutomirski
e30ab185c4 x86, tls, ldt: Stop checking lm in LDT_empty
32-bit programs don't have an lm bit in their ABI, so they can't
reliably cause LDT_empty to return true without resorting to memset.
They shouldn't need to do this.

This should fix a longstanding, if minor, issue in all 64-bit kernels
as well as a potential regression in the TLS hardening code.

Fixes: 41bdc78544 x86/tls: Validate TLS entries to protect espfix
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 21:11:06 +01:00
Dave Hansen
c922228efe x86, mpx: Fix potential performance issue on unmaps
The 3.19 merge window saw some TLB modifications merged which caused a
performance regression. They were fixed in commit 045bbb9fa.

Once that fix was applied, I also noticed that there was a small
but intermittent regression still present.  It was not present
consistently enough to bisect reliably, but I'm fairly confident
that it came from (my own) MPX patches.  The source was reading
a relatively unused field in the mm_struct via arch_unmap.

I also noted that this code was in the main instruction flow of
do_munmap() and probably had more icache impact than we want.

This patch does two things:
1. Adds a static (via Kconfig) and dynamic (via cpuid) check
   for MPX with cpu_feature_enabled().  This keeps us from
   reading that cacheline in the mm and trades it for a check
   of the global CPUID variables at least on CPUs without MPX.
2. Adds an unlikely() to ensure that the MPX call ends up out
   of the main instruction flow in do_munmap().  I've added
   a detailed comment about why this was done and why we want
   it even on systems where MPX is present.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: luto@amacapital.net
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20150108223021.AEEAB987@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 21:11:06 +01:00
Dave Hansen
814564a0a1 x86, mpx: Explicitly disable 32-bit MPX support on 64-bit kernels
We had originally planned on submitting MPX support in one patch
set.  We eventually broke it up in to two pieces for easier
review.  One of the features that didn't make the first round
was supporting 32-bit binaries on 64-bit kernels.

Once we split the set up, we never added code to restrict 32-bit
binaries from _using_ MPX on 64-bit kernels.

The 32-bit bounds tables are a different format than the 64-bit
ones.  Without this patch, the kernel will try to read a 32-bit
binary's tables as if they were the 64-bit version.  They will
likely be noticed as being invalid rather quickly and the app
will get killed, but that's kinda mean.

This patch adds an explicit check, and will make a 64-bit kernel
essentially behave as if it has no MPX support when called from
a 32-bit binary.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20150108223020.9E9AA511@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 21:11:06 +01:00
K. Y. Srinivasan
32c6590d12 x86, hyperv: Mark the Hyper-V clocksource as being continuous
The Hyper-V clocksource is continuous; mark it accordingly.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: jasowang@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: devel@linuxdriverproject.org
Cc: olaf@aepfle.de
Cc: apw@canonical.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1421108762-3331-1-git-send-email-kys@microsoft.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 14:36:25 +01:00
Juergen Gross
9d34cfdf47 x86: Don't rely on VMWare emulating PAT MSR correctly
VMWare seems not to emulate the PAT MSR correctly: reaeding
MSR_IA32_CR_PAT returns 0 even after writing another value to it.

Commit bd809af16e triggers this VMWare bug when the kernel is
booted as a VMWare guest.

Detect this bug and don't use the read value if it is 0.

Fixes: bd809af16e "x86: Enable PAT to use cache mode translation tables"
Reported-and-tested-by: Jongman Heo <jongman.heo@samsung.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: http://lkml.kernel.org/r/1421039745-14335-1-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 14:33:45 +01:00
Jan Beulich
4a0d3107d6 x86, irq: Properly tag virtualization entry in /proc/interrupts
The mis-naming likely was a copy-and-paste effect.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/54B9408B0200007800055E8B@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 12:37:23 +01:00
Kees Cook
f285f4a21c x86, boot: Skip relocs when load address unchanged
On 64-bit, relocation is not required unless the load address gets
changed. Without this, relocations do unexpected things when the kernel
is above 4G.

Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Thomas D. <whissi@whissi.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20150116005146.GA4212@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 12:37:23 +01:00
Jiang Liu
8abb850a03 x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi
Xen overrides __acpi_register_gsi and leaves __acpi_unregister_gsi as is.
That means, an IRQ allocated by acpi_register_gsi_xen_hvm() or
acpi_register_gsi_xen() will be freed by acpi_unregister_gsi_ioapic(),
which may cause undesired effects. So override __acpi_unregister_gsi to
NULL for safety.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Tony Luck <tony.luck@intel.com>
Cc: xen-devel@lists.xenproject.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Graeme Gregory <graeme.gregory@linaro.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Link: http://lkml.kernel.org/r/1421720467-7709-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 11:44:41 +01:00
Jiang Liu
b568b8601f x86/xen: Treat SCI interrupt as normal GSI interrupt
Currently Xen Domain0 has special treatment for ACPI SCI interrupt,
that is initialize irq for ACPI SCI at early stage in a special way as:
xen_init_IRQ()
	->pci_xen_initial_domain()
		->xen_setup_acpi_sci()
			Allocate and initialize irq for ACPI SCI

Function xen_setup_acpi_sci() calls acpi_gsi_to_irq() to get an irq
number for ACPI SCI. But unfortunately acpi_gsi_to_irq() depends on
IOAPIC irqdomains through following path
acpi_gsi_to_irq()
	->mp_map_gsi_to_irq()
		->mp_map_pin_to_irq()
			->check IOAPIC irqdomain

For PV domains, it uses Xen event based interrupt manangement and
doesn't make uses of native IOAPIC, so no irqdomains created for IOAPIC.
This causes Xen domain0 fail to install interrupt handler for ACPI SCI
and all ACPI events will be lost. Please refer to:
https://lkml.org/lkml/2014/12/19/178

So the fix is to get rid of special treatment for ACPI SCI, just treat
ACPI SCI as normal GSI interrupt as:
acpi_gsi_to_irq()
	->acpi_register_gsi()
		->acpi_register_gsi_xen()
			->xen_register_gsi()

With above change, there's no need for xen_setup_acpi_sci() anymore.
The above change also works with bare metal kernel too.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Tony Luck <tony.luck@intel.com>
Cc: xen-devel@lists.xenproject.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1421720467-7709-2-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 11:44:40 +01:00
Linus Torvalds
59b2858f57 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also two PMU driver fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools powerpc: Use dwfl_report_elf() instead of offline.
  perf tools: Fix segfault for symbol annotation on TUI
  perf test: Fix dwarf unwind using libunwind.
  perf tools: Avoid build splat for syscall numbers with uclibc
  perf tools: Elide strlcpy warning with uclibc
  perf tools: Fix statfs.f_type data type mismatch build error with uclibc
  tools: Remove bitops/hweight usage of bits in tools/perf
  perf machine: Fix __machine__findnew_thread() error path
  perf tools: Fix building error in x86_64 when dwarf unwind is on
  perf probe: Propagate error code when write(2) failed
  perf/x86/intel: Fix bug for "cycles:p" and "cycles:pp" on SLM
  perf/rapl: Fix sysfs_show() initialization for RAPL PMU
2015-01-18 06:24:30 +12:00
Linus Torvalds
23aa4b416a This holds a few fixes to the ftrace infrastructure as well as
the mixture of function graph tracing and kprobes.
 
 When jprobes and function graph tracing is enabled at the same time
 it will crash the system.
 
   # modprobe jprobe_example
   # echo function_graph > /sys/kernel/debug/tracing/current_tracer
 
 After the first fork (jprobe_example probes it), the system will crash.
 This is due to the way jprobes copies the stack frame and does not
 do a normal function return. This messes up with the function graph
 tracing accounting which hijacks the return address from the stack
 and replaces it with a hook function. It saves the return addresses in
 a separate stack to put back the correct return address when done.
 But because the jprobe functions do not do a normal return, their
 stack addresses are not put back until the function they probe is called,
 which means that the probed function will get the return address of
 the jprobe handler instead of its own.
 
 The simple fix here was to disable function graph tracing while the
 jprobe handler is being called.
 
 While debugging this I found two minor bugs with the function graph
 tracing.
 
 The first was about the function graph tracer sharing its function hash
 with the function tracer (they both get filtered by the same input).
 The changing of the set_ftrace_filter would not sync the function recording
 records after a change if the function tracer was disabled but the
 function graph tracer was enabled. This was due to the update only checking
 one of the ops instead of the shared ops to see if they were enabled and
 should perform the sync. This caused the ftrace accounting to break and
 a ftrace_bug() would be triggered, disabling ftrace until a reboot.
 
 The second was that the check to update records only checked one of the
 filter hashes. It needs to test both the "filter" and "notrace" hashes.
 The "filter" hash determines what functions to trace where as the "notrace"
 hash determines what functions not to trace (trace all but these).
 Both hashes need to be passed to the update code to find out what change
 is being done during the update. This also broke the ftrace record
 accounting and triggered a ftrace_bug().
 
 This patch set also include two more fixes that were reported separately
 from the kprobe issue.
 
 One was that init_ftrace_syscalls() was called twice at boot up.
 This is not a major bug, but that call performed a rather large kmalloc
 (NR_syscalls * sizeof(*syscalls_metadata)). The second call made the first
 one a memory leak, and wastes memory.
 
 The other fix is a regression caused by an update in the v3.19 merge window.
 The moving to enable events early, moved the enabling before PID 1 was
 created. The syscall events require setting the TIF_SYSCALL_TRACEPOINT
 for all tasks. But for_each_process_thread() does not include the swapper
 task (PID 0), and ended up being a nop. A suggested fix was to add
 the init_task() to have its flag set, but I didn't really want to mess
 with PID 0 for this minor bug. Instead I disable and re-enable events again
 at early_initcall() where it use to be enabled. This also handles any other
 event that might have its own reg function that could break at early
 boot up.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUt9vmAAoJEEjnJuOKh9ldLHEIAJ9XrPW2xMIY5yI69jT1F7pv
 PkSRqENnOK0l4UulD52SvIBecQTTBcEEjao4yVGkc7DCJBOws/1LZ5gW8OfNlKjq
 rMB8yaosL1tXJ1ARVPMjcQVy+228zkgTXznwEZCjku1g7LuScQ28qyXsXO7B6yiK
 xKoHqKjygmM/a2aVn+8tdiVKiDp6jdmkbYicbaFT4xP7XB5DaMmIiXRHxdvW6xdR
 azKrVfYiMyJqTZNt/EVSWUk2WjeaYhoXyNtvgPx515wTo/llCnzhjcsocXBtH2P/
 YOtwl+1L7Z89ukV9oXqrtrUJZ6Ps7+g7I1flJuL7/1FlNGnklcP9JojD+t6HeT8=
 =vkec
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fixes from Steven Rostedt:
 "This holds a few fixes to the ftrace infrastructure as well as the
  mixture of function graph tracing and kprobes.

  When jprobes and function graph tracing is enabled at the same time it
  will crash the system:

      # modprobe jprobe_example
      # echo function_graph > /sys/kernel/debug/tracing/current_tracer

  After the first fork (jprobe_example probes it), the system will
  crash.

  This is due to the way jprobes copies the stack frame and does not do
  a normal function return.  This messes up with the function graph
  tracing accounting which hijacks the return address from the stack and
  replaces it with a hook function.  It saves the return addresses in a
  separate stack to put back the correct return address when done.  But
  because the jprobe functions do not do a normal return, their stack
  addresses are not put back until the function they probe is called,
  which means that the probed function will get the return address of
  the jprobe handler instead of its own.

  The simple fix here was to disable function graph tracing while the
  jprobe handler is being called.

  While debugging this I found two minor bugs with the function graph
  tracing.

  The first was about the function graph tracer sharing its function
  hash with the function tracer (they both get filtered by the same
  input).  The changing of the set_ftrace_filter would not sync the
  function recording records after a change if the function tracer was
  disabled but the function graph tracer was enabled.  This was due to
  the update only checking one of the ops instead of the shared ops to
  see if they were enabled and should perform the sync.  This caused the
  ftrace accounting to break and a ftrace_bug() would be triggered,
  disabling ftrace until a reboot.

  The second was that the check to update records only checked one of
  the filter hashes.  It needs to test both the "filter" and "notrace"
  hashes.  The "filter" hash determines what functions to trace where as
  the "notrace" hash determines what functions not to trace (trace all
  but these).  Both hashes need to be passed to the update code to find
  out what change is being done during the update.  This also broke the
  ftrace record accounting and triggered a ftrace_bug().

  This patch set also include two more fixes that were reported
  separately from the kprobe issue.

  One was that init_ftrace_syscalls() was called twice at boot up.  This
  is not a major bug, but that call performed a rather large kmalloc
  (NR_syscalls * sizeof(*syscalls_metadata)).  The second call made the
  first one a memory leak, and wastes memory.

  The other fix is a regression caused by an update in the v3.19 merge
  window.  The moving to enable events early, moved the enabling before
  PID 1 was created.  The syscall events require setting the
  TIF_SYSCALL_TRACEPOINT for all tasks.  But for_each_process_thread()
  does not include the swapper task (PID 0), and ended up being a nop.

  A suggested fix was to add the init_task() to have its flag set, but I
  didn't really want to mess with PID 0 for this minor bug.  Instead I
  disable and re-enable events again at early_initcall() where it use to
  be enabled.  This also handles any other event that might have its own
  reg function that could break at early boot up"

* tag 'trace-fixes-v3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix enabling of syscall events on the command line
  tracing: Remove extra call to init_ftrace_syscalls()
  ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing
  ftrace: Check both notrace and filter for old hash
  ftrace: Fix updating of filters for shared global_ops filters
2015-01-17 07:55:52 +13:00
Kan Liang
33636732dc perf/x86/intel: Fix bug for "cycles:p" and "cycles:pp" on SLM
cycles:p and cycles:pp do not work on SLM since commit:

   86a04461a9 ("perf/x86: Revamp PEBS event selection")

UOPS_RETIRED.ALL is not a PEBS capable event, so it should not be used
to count cycle number.

Actually SLM calls intel_pebs_aliases_core2() which uses INST_RETIRED.ANY_P
to count the number of cycles. It's a PEBS capable event. But inv and
cmask must be set to count cycles.

Considering SLM allows all events as PEBS with no flags, only
INST_RETIRED.ANY_P, inv=1, cmask=16 needs to handled specially.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1421084541-31639-1-git-send-email-kan.liang@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-16 09:06:59 +01:00
Stephane Eranian
433678bdc6 perf/rapl: Fix sysfs_show() initialization for RAPL PMU
This patch fixes a problem with the initialization of the
sysfs_show() routine for the RAPL PMU.

The current code was wrongly relying on the EVENT_ATTR_STR()
macro which uses the events_sysfs_show() function in the x86
PMU code. That function itself was relying on the x86_pmu data
structure. Yet RAPL and the core PMU (x86_pmu) have nothing to
do with each other. They should therefore not interact with
each other.

The x86_pmu structure is initialized at boot time based on
the host CPU model. When the host CPU is not supported, the
x86_pmu remains uninitialized and some of the callbacks it
contains are NULL.

The false dependency with x86_pmu could potentially cause crashes
in case the x86_pmu is not initialized while the RAPL PMU is. This
may, for instance, be the case in virtualized environments.

This patch fixes the problem by using a private sysfs_show()
routine for exporting the RAPL PMU events.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150113225953.GA21525@thinkpad
Cc: vincent.weaver@maine.edu
Cc: jolsa@redhat.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-16 09:06:58 +01:00
Steven Rostedt (Red Hat)
237d28db03 ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing
If the function graph tracer traces a jprobe callback, the system will
crash. This can easily be demonstrated by compiling the jprobe
sample module that is in the kernel tree, loading it and running the
function graph tracer.

 # modprobe jprobe_example.ko
 # echo function_graph > /sys/kernel/debug/tracing/current_tracer
 # ls

The first two commands end up in a nice crash after the first fork.
(do_fork has a jprobe attached to it, so "ls" just triggers that fork)

The problem is caused by the jprobe_return() that all jprobe callbacks
must end with. The way jprobes works is that the function a jprobe
is attached to has a breakpoint placed at the start of it (or it uses
ftrace if fentry is supported). The breakpoint handler (or ftrace callback)
will copy the stack frame and change the ip address to return to the
jprobe handler instead of the function. The jprobe handler must end
with jprobe_return() which swaps the stack and does an int3 (breakpoint).
This breakpoint handler will then put back the saved stack frame,
simulate the instruction at the beginning of the function it added
a breakpoint to, and then continue on.

For function tracing to work, it hijakes the return address from the
stack frame, and replaces it with a hook function that will trace
the end of the call. This hook function will restore the return
address of the function call.

If the function tracer traces the jprobe handler, the hook function
for that handler will not be called, and its saved return address
will be used for the next function. This will result in a kernel crash.

To solve this, pause function tracing before the jprobe handler is called
and unpause it before it returns back to the function it probed.

Some other updates:

Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the
code look a bit cleaner and easier to understand (various tries to fix
this bug required this change).

Note, if fentry is being used, jprobes will change the ip address before
the function graph tracer runs and it will not be able to trace the
function that the jprobe is probing.

Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org

Cc: stable@vger.kernel.org # 2.6.30+
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-15 09:39:18 -05:00
Linus Torvalds
613d4cefbb xen: bug fixes for 3.19-rc4
- Several critical linear p2m fixes that prevented some hosts from
   booting.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUtQHNAAoJEFxbo/MsZsTR/qgH/iiW4k2T8dBGZ7TPyzt88iyT
 4caWjuujp2OUaRqhBQdY7z05uai6XxgJLwDyqiO+qHaRUj+ZWCrjh/ZFPU1+09hK
 GdwPMWU7xMRs/7F2ANO03jJ/ktvsYXtazcVrV89Q3t+ZZJIQ/THovDkaoa+dF2lh
 W8d5H7N2UNCJLe9w2fm5iOq4SKoTsJOq6pVQ6gUBqJcgkSDWavd6bowXnTlcepZN
 tNaSMZsOt4CAvYQIa0nKPJo6Q4QN3buRQMWEOAOmGVT/RkVi68wirwk59uNzcS7E
 HjhqxFjhXYamNTuwHYZlchBrZutdbymSlucVucb1wAoxRAX+Wd1jk5EPl6zLv4w=
 =kFSE
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:
 "Several critical linear p2m fixes that prevented some hosts from
  booting"

* tag 'stable/for-linus-3.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: properly retrieve NMI reason
  xen: check for zero sized area when invalidating memory
  xen: use correct type for physical addresses
  xen: correct race in alloc_p2m_pmd()
  xen: correct error for building p2m list on 32 bits
  x86/xen: avoid freeing static 'name' when kasprintf() fails
  x86/xen: add extra memory for remapped frames during setup
  x86/xen: don't count how many PFNs are identity mapped
  x86/xen: Free bootmem in free_p2m_page() during early boot
  x86/xen: Remove unnecessary BUG_ON(preemptible()) in xen_setup_timer()
2015-01-14 08:07:42 +13:00
Jan Beulich
f221b04fe0 x86/xen: properly retrieve NMI reason
Using the native code here can't work properly, as the hypervisor would
normally have cleared the two reason bits by the time Dom0 gets to see
the NMI (if passed to it at all). There's a shared info field for this,
and there's an existing hook to use - just fit the two together. This
is particularly relevant so that NMIs intended to be handled by APEI /
GHES actually make it to the respective handler.

Note that the hook can (and should) be used irrespective of whether
being in Dom0, as accessing port 0x61 in a DomU would be even worse,
while the shared info field would just hold zero all the time. Note
further that hardware NMI handling for PVH doesn't currently work
anyway due to missing code in the hypervisor (but it is expected to
work the native rather than the PV way).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-13 09:39:50 +00:00
Juergen Gross
9a17ad7f3d xen: check for zero sized area when invalidating memory
With the introduction of the linear mapped p2m list setting memory
areas to "invalid" had to be delayed. When doing the invalidation
make sure no zero sized areas are processed.

Signed-off-by: Juegren Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-12 10:09:55 +00:00
Juergen Gross
e86f949667 xen: use correct type for physical addresses
When converting a pfn to a physical address be sure to use 64 bit
wide types or convert the physical address to a pfn if possible.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-12 10:09:48 +00:00
Juergen Gross
f241b0b891 xen: correct race in alloc_p2m_pmd()
When allocating a new pmd for the linear mapped p2m list a check is
done for not introducing another pmd when this just happened on
another cpu. In this case the old pte pointer was returned which
points to the p2m_missing or p2m_identity page. The correct value
would be the pointer to the found new page.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-12 10:09:40 +00:00
Juergen Gross
82c92ed135 xen: correct error for building p2m list on 32 bits
In xen_rebuild_p2m_list() for large areas of invalid or identity
mapped memory the pmd entries on 32 bit systems are initialized
wrong. Correct this error.

Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-12 10:09:34 +00:00
Linus Torvalds
505569d208 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: two vdso fixes, two kbuild fixes and a boot failure fix
  with certain odd memory mappings"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, vdso: Use asm volatile in __getcpu
  x86/build: Clean auto-generated processor feature files
  x86: Fix mkcapflags.sh bash-ism
  x86: Fix step size adjustment during initial memory mapping
  x86_64, vdso: Fix the vdso address randomization algorithm
2015-01-11 11:53:46 -08:00
Linus Torvalds
ddb321a8dd Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also some kernel side fixes: uncore PMU
  driver fix, user regs sampling fix and an instruction decoder fix that
  unbreaks PEBS precise sampling"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes
  perf/x86_64: Improve user regs sampling
  perf: Move task_pt_regs sampling into arch code
  x86: Fix off-by-one in instruction decoder
  perf hists browser: Fix segfault when showing callchain
  perf callchain: Free callchains when hist entries are deleted
  perf hists: Fix children sort key behavior
  perf diff: Fix to sort by baseline field by default
  perf list: Fix --raw-dump option
  perf probe: Fix crash in dwarf_getcfi_elf
  perf probe: Fix to fall back to find probe point in symbols
  perf callchain: Append callchains only when requested
  perf ui/tui: Print backtrace symbols when segfault occurs
  perf report: Show progress bar for output resorting
2015-01-11 11:47:45 -08:00
Andi Kleen
5306c31c57 perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes
There was another report of a boot failure with a #GP fault in the
uncore SBOX initialization. The earlier work around was not enough
for this system.

The boot was failing while trying to initialize the third SBOX.

This patch detects parts with only two SBOXes and limits the number
of SBOX units to two there.

Stable material, as it affects boot problems on 3.18.

Tested-by: Andreas Oehler <andreas@oehler-net.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1420583675-9163-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:30 +01:00
Andy Lutomirski
86c269fea3 perf/x86_64: Improve user regs sampling
Perf reports user regs for kernel-mode samples so that samples can
be backtraced through user code.  The old code was very broken in
syscall context, resulting in useless backtraces.

The new code, in contrast, is still dangerously racy, but it should
at least work most of the time.

Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: chenggang.qcg@taobao.com
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/243560c26ff0f739978e2459e203f6515367634d.1420396372.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:29 +01:00
Andy Lutomirski
88a7c26af8 perf: Move task_pt_regs sampling into arch code
On x86_64, at least, task_pt_regs may be only partially initialized
in many contexts, so x86_64 should not use it without extra care
from interrupt context, let alone NMI context.

This will allow x86_64 to override the logic and will supply some
scratch space to use to make a cleaner copy of user regs.

Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: chenggang.qcg@taobao.com
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/e431cd4c18c2e1c44c774f10758527fb2d1025c4.1420396372.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:28 +01:00
Peter Zijlstra
0f363b250b x86: Fix off-by-one in instruction decoder
Stephane reported that the PEBS fixup was broken by the recent commit to
the instruction decoder. The thing had an off-by-one which resulted in
not being able to decode the last instruction and always bail.

Reported-by: Stephane Eranian <eranian@google.com>
Fixes: 6ba48ff46f ("x86: Remove arbitrary instruction size limit in instruction decoder")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # 3.18
Cc: <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Liang Kan <kan.liang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20141216104614.GV3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:26 +01:00
Linus Torvalds
74e59ea05c Power management and ACPI material for 3.19-rc4
- Fix ACPI power management intialization for device objects
    corresponding to devices that are not present at the init time
    (the _STA control method returns 0 for them) and therefore should
    not be regarded as power manageable (Rafael J Wysocki).
 
  - Rename a structure field and two functions used by the ACPI
    processor driver to make them less tied to architectures that
    use APICs (both x86 and ia64) and more suitable for ARM64
    processors (Hanjun Guo).
 
  - Add a disable_native_backlight quirk for Dell XPS15 L521X
    designed in an unusual way preventing native backlight from
    working on that machine (Hans de Goede).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJUrcCTAAoJEILEb/54YlRxeMUQAKOHS8rlq0XtOxieufCks0Rq
 e96ZExiTdLI/KYZCgqwkSJxR7w2981hbFVIcFccPpo1k9Z1YowSOvWcHvmn1RwHQ
 lXDfCEqWssIXMn0zctH3Ob/uBggvWFx9g5y8slOZ6W2LaUzzNdA+ZqxIbNsgwNSN
 vji2E1m2ZhcwgThPYeXNsvvWJzYyC3LI4nZ8UIKE8kUMM5oxYoIlrW/qjoHc8KNV
 AlUv+e0Z43XHy5jHaS8sQrKGrAPrdUroDRcByw1MG/V8r4vSXepvNjOXXwEZ9SF9
 xPp/bzLLRPK8FW4MQ2VEhTcO0FcjblDTQDhenDHKyjEonWOse5A9VbAyZ2dORfZ+
 juDsO3xalYk+OoHjRDdzxkQLIyQOATTcxkyGxyJf+HjcD6nfsdibWzisM6nl7E9h
 hpwGRhb5sgbWV9QRwmkkvFBP/uCsF3rHA/qCZQCFsxs0n3Ty5wiAg2JEHEL/kPF9
 6vPsX4ttukw5MbNzTyHfXOm41Ula3u4EfJvTdQjWNdx9uifBjsosetw3ho6q4XmP
 e6mCBFIU0Fxxfnmgij5x6ufGCBCrkpbLuZsC63HyaRDiRDN4DuftDLesVEySfJix
 +FiqilOyiOmSXHHC17cF8YNYsxMdFo1mGJDpV1PS1gnh/xwmEgfzwdWrjso7Y76L
 9MilW/AXspuXxegu6JDu
 =NEY0
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "These are an ACPI device power management initialization fix (-stable
  material), two commits renaming stuff in the ACPI processor driver to
  make it more suitable for ARM64 processors and a new ACPI backlight
  blacklist entry.

  Specifics:

   - Fix ACPI power management intialization for device objects
     corresponding to devices that are not present at the init time (the
     _STA control method returns 0 for them) and therefore should not be
     regarded as power manageable (Rafael J Wysocki).

   - Rename a structure field and two functions used by the ACPI
     processor driver to make them less tied to architectures that use
     APICs (both x86 and ia64) and more suitable for ARM64 processors
     (Hanjun Guo).

   - Add a disable_native_backlight quirk for Dell XPS15 L521X designed
     in an unusual way preventing native backlight from working on that
     machine (Hans de Goede)"

* tag 'pm+acpi-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / video: Add disable_native_backlight quirk for Dell XPS15 L521X
  ACPI / processor: Rename acpi_(un)map_lsapic() to acpi_(un)map_cpu()
  ACPI / processor: Convert apic_id to phys_id to make it arch agnostic
  ACPI / PM: Fix PM initialization for devices that are not present
2015-01-08 14:11:03 -08:00
Linus Torvalds
716c13a817 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a build problem with sha-mb with old toolchains and an
  implementation bug in the ctr(aes)/by8 branch of aesni-intel that's
  enabled when AVX is available"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: sha-mb - Add avx2_supported check.
  crypto: aesni - fix "by8" variant for 128 bit keys
2015-01-08 11:33:51 -08:00
Vitaly Kuznetsov
7be0772d19 x86/xen: avoid freeing static 'name' when kasprintf() fails
In case kasprintf() fails in xen_setup_timer() we assign name to the
static string "<timer kasprintf failed>". We, however, don't check
that fact before issuing kfree() in xen_teardown_timer(), kernel is
supposed to crash with 'kernel BUG at mm/slub.c:3341!'

Solve the issue by making name a fixed length string inside struct
xen_clock_event_device. 16 bytes should be enough.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-08 13:55:25 +00:00
David Vrabel
a97dae1a2e x86/xen: add extra memory for remapped frames during setup
If the non-RAM regions in the e820 memory map are larger than the size
of the initial balloon, a BUG was triggered as the frames are remaped
beyond the limit of the linear p2m.  The frames are remapped into the
initial balloon area (xen_extra_mem) but not enough of this is
available.

Ensure enough extra memory regions are added for these remapped
frames.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
2015-01-08 13:52:37 +00:00
David Vrabel
bc7142cf79 x86/xen: don't count how many PFNs are identity mapped
This accounting is just used to print a diagnostic message that isn't
very useful.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
2015-01-08 13:52:36 +00:00
Boris Ostrovsky
701a261ad6 x86/xen: Free bootmem in free_p2m_page() during early boot
With recent changes in p2m we now have legitimate cases when
p2m memory needs to be freed during early boot (i.e. before
slab is initialized).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-08 13:52:36 +00:00
Rafael J. Wysocki
794c3a0a93 Merge branches 'acpi-pm', 'acpi-processor' and 'acpi-video'
* acpi-pm:
  ACPI / PM: Fix PM initialization for devices that are not present

* acpi-processor:
  ACPI / processor: Rename acpi_(un)map_lsapic() to acpi_(un)map_cpu()
  ACPI / processor: Convert apic_id to phys_id to make it arch agnostic

* acpi-video:
  ACPI / video: Add disable_native_backlight quirk for Dell XPS15 L521X
2015-01-06 23:35:43 +01:00
Hanjun Guo
d02dc27db0 ACPI / processor: Rename acpi_(un)map_lsapic() to acpi_(un)map_cpu()
acpi_map_lsapic() will allocate a logical CPU number and map it to
physical CPU id (such as APIC id) for the hot-added CPU, it will also
do some mapping for NUMA node id and etc, acpi_unmap_lsapic() will
do the reverse.

We can see that the name of the function is a little bit confusing and
arch (IA64) dependent so rename them as acpi_(un)map_cpu() to make arch
agnostic and explicit.

Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-05 23:34:26 +01:00
Vinson Lee
0b8c960cf6 crypto: sha-mb - Add avx2_supported check.
This patch fixes this allyesconfig target build error with older
binutils.

  LD      arch/x86/crypto/built-in.o
ld: arch/x86/crypto/sha-mb/built-in.o: No such file: No such file or directory

Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Vinson Lee <vlee@twitter.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-01-05 21:35:02 +11:00
Mathias Krause
0b1e95b2fa crypto: aesni - fix "by8" variant for 128 bit keys
The "by8" counter mode optimization is broken for 128 bit keys with
input data longer than 128 bytes. It uses the wrong key material for
en- and decryption.

The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved
in case we're handling more than 128 bytes of input data -- they won't
get reloaded after the initial load. They must therefore be (a) loaded
on the first iteration and (b) be preserved for the latter ones. The
implementation for 128 bit keys does not comply with (a) nor (b).

Fix this by bringing the implementation back to its original source
and correctly load the key registers and preserve their values by
*not* re-using the registers for other purposes.

Kudos to James for reporting the issue and providing a test case
showing the discrepancies.

Reported-by: James Yonan <james@openvpn.net>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.18
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-01-05 21:35:02 +11:00
Daniel Borkmann
b485342bd7 x86, um: actually mark system call tables readonly
Commit a074335a37 ("x86, um: Mark system call tables readonly") was
supposed to mark the sys_call_table in UML as RO by adding the const,
but it doesn't have the desired effect as it's nevertheless being placed
into the data section since __cacheline_aligned enforces sys_call_table
being placed into .data..cacheline_aligned instead. We need to use
the ____cacheline_aligned version instead to fix this issue.

Before:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                 U sys_writev
0000000000000000 D sys_call_table
0000000000000000 D syscall_table_size

After:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                 U sys_writev
0000000000000000 R sys_call_table
0000000000000000 D syscall_table_size

Fixes: a074335a37 ("x86, um: Mark system call tables readonly")
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2015-01-04 14:21:25 +01:00
Ingo Molnar
2aba73a614 This is hopefully the last vdso fix for 3.19. It should be very
safe (it just adds a volatile).
 
 I don't think it fixes an actual bug (the __getcpu calls in the
 pvclock code may not have been needed in the first place), but
 discussion on that point is ongoing.
 
 It also fixes a big performance issue in 3.18 and earlier in which
 the lsl instructions in vclock_gettime got hoisted so far up the
 function that they happened even when the function they were in was
 never called.  n 3.19, the performance issue seems to be gone due to
 the whims of my compiler and some interaction with a branch that's
 now gone.
 
 I'll hopefully have a much bigger overhaul of the pvclock code
 for 3.20, but it needs careful review.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUmduGAAoJEK9N98ZeDfrk874H/RkkP+y6/DmdKVR1dTOUQW4u
 f1wPU0/sc5xywGjNcfR3XwUuyBJyd3s81WVaE5XXHfCHnbjG2Z4CNTqga27hL1D0
 io01Q2s3dh1Y5c0cccVmJmyw//YVzMUOzGTNM9R0NKQNXmYUz6jgQaqk+wWORdD6
 JXCU3/LI5VT0fjNPLj1M9l59eC2Qg/V4GqY2xRJ1AfbwkX1CFZTcWUPb+4FScVYv
 9gds/vOoFg54MypVJD4SeIC9I8U0qcim9gV7gGFdzyDNCXS5J4P+02sEOFNu8oYy
 HVK1B0LXhswT08Ho1yRxXUhFxpqEGeGJvTlDTvwy+r/yuKE2AVBtlhLQBqMPhnY=
 =u3d2
 -----END PGP SIGNATURE-----

Merge tag 'pr-20141223-x86-vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/urgent

Pull VDSO fix from Andy Lutomirski:

 "This is hopefully the last vdso fix for 3.19.  It should be very
  safe (it just adds a volatile).

  I don't think it fixes an actual bug (the __getcpu calls in the
  pvclock code may not have been needed in the first place), but
  discussion on that point is ongoing.

  It also fixes a big performance issue in 3.18 and earlier in which
  the lsl instructions in vclock_gettime got hoisted so far up the
  function that they happened even when the function they were in was
  never called.  n 3.19, the performance issue seems to be gone due to
  the whims of my compiler and some interaction with a branch that's
  now gone.

  I'll hopefully have a much bigger overhaul of the pvclock code
  for 3.20, but it needs careful review."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-01 22:21:22 +01:00
Paolo Bonzini
a629df7ead kvm: x86: drop severity of "generation wraparound" message
Since most virtual machines raise this message once, it is a bit annoying.
Make it KERN_DEBUG severity.

Cc: stable@vger.kernel.org
Fixes: 7a2e8aaf0f
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-27 21:52:28 +01:00
Tiejun Chen
baa035227b kvm: x86: vmx: reorder some msr writing
The commit 34a1cd60d1, "x86: vmx: move some vmx setting from
vmx_init() to hardware_setup()", tried to refactor some codes
specific to vmx hardware setting into hardware_setup(), but some
msr writing should depend on our previous setting condition like
enable_apicv, enable_ept and so on.

Reported-by: Jamie Heilman <jamie@audible.transient.net>
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-27 21:52:10 +01:00
Andy Lutomirski
1ddf0b1b11 x86, vdso: Use asm volatile in __getcpu
In Linux 3.18 and below, GCC hoists the lsl instructions in the
pvclock code all the way to the beginning of __vdso_clock_gettime,
slowing the non-paravirt case significantly.  For unknown reasons,
presumably related to the removal of a branch, the performance issue
is gone as of

e76b027e64 x86,vdso: Use LSL unconditionally for vgetcpu

but I don't trust GCC enough to expect the problem to stay fixed.

There should be no correctness issue, because the __getcpu calls in
__vdso_vlock_gettime were never necessary in the first place.

Note to stable maintainers: In 3.18 and below, depending on
configuration, gcc 4.9.2 generates code like this:

     9c3:       44 0f 03 e8             lsl    %ax,%r13d
     9c7:       45 89 eb                mov    %r13d,%r11d
     9ca:       0f 03 d8                lsl    %ax,%ebx

This patch won't apply as is to any released kernel, but I'll send a
trivial backported version if needed.

Fixes: 51c19b4f59 x86: vdso: pvclock gettime support
Cc: stable@vger.kernel.org # 3.8+
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2014-12-23 13:05:30 -08:00
Bjørn Mork
280dbc5723 x86/build: Clean auto-generated processor feature files
Commit 9def39be4e ("x86: Support compiling out human-friendly
processor feature names") made two source file targets
conditional. Such conditional targets will not be cleaned
automatically by make mrproper.

Fix by adding explicit clean-files targets for the two files.

Fixes: 9def39be4e ("x86: Support compiling out human-friendly processor feature names")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/1419335863-10608-1-git-send-email-bjorn@mork.no
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-23 15:37:06 +01:00
Sylvain BERTRAND
ea174f4c4f x86: Fix mkcapflags.sh bash-ism
Chocked while compiling linux with dash shell instead of bash
shell. See:

   http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html

Signed-off-by: Sylvain BERTRAND <sylvain.bertrand@gmail.com>
Link: http://lkml.kernel.org/r/20141223123912.GA1386@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-23 15:34:57 +01:00
Jan Beulich
132978b94e x86: Fix step size adjustment during initial memory mapping
The old scheme can lead to failure in certain cases - the
problem is that after bumping step_size the next (non-final)
iteration is only guaranteed to make available a memory block
the size of what step_size was before. E.g. for a memory block
[0,3004600000) we'd have:

 iter	start		end		step		amount
 1	3004400000	30045fffff	 2M		  2M
 2	3004000000	30043fffff	64M		  4M
 3	3000000000	3003ffffff	 2G		 64M
 4	2000000000	2fffffffff	64G		 64G

Yet to map 64G with 4k pages (as happens e.g. under PV Xen) we
need slightly over 128M, but the first three iterations made
only about 70M available.

The condition (new_mapped_ram_size > mapped_ram_size) for
bumping step_size is just not suitable. Instead we want to bump
it when we know we have enough memory available to cover a block
of the new step_size. And rather than making that condition more
complicated than needed, simply adjust step_size by the largest
possible factor we know we can cover at that point - which is
shifting it left by one less than the difference between page
table level shifts. (Interestingly the original STEP_SIZE_SHIFT
definition had a comment hinting at that having been the
intention, just that it should have been PUD_SHIFT-PMD_SHIFT-1
instead of (PUD_SHIFT-PMD_SHIFT)/2, and of course for non-PAE
32-bit we can't really use these two constants as they're equal
there.)

Furthermore the comment in get_new_step_size() didn't get
updated when the bottom-down mapping logic got added. Yet while
an overflow (flushing step_size to zero) of the shift doesn't
matter for the top-down method, it does for bottom-up because
round_up(x, 0) = 0, and an upper range boundary of zero can't
really work well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/54945C1E020000780005114E@mail.emea.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-23 11:39:34 +01:00
Boris Ostrovsky
8b8cd8a367 x86/xen: Remove unnecessary BUG_ON(preemptible()) in xen_setup_timer()
There is no reason for having it and, with commit 250a1ac685 ("x86,
smpboot: Remove pointless preempt_disable() in
native_smp_prepare_cpus()"), it prevents HVM guests from booting.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-23 10:31:55 +00:00
Ingo Molnar
fbe1bf1406 One vdso fix for a longstanding ASLR bug that's been in the news lately.
The vdso base address has always been randomized, and I don't think there's
 anything particularly wrong with the range over which it's randomized,
 but the implementation seems to have been buggy since the very beginning.
 
 This fixes the implementation to remove a large bias that caused a small
 fraction of possible vdso load addresess to be vastly more likely than
 the rest of the possible addresses.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUlhwwAAoJEK9N98ZeDfrklIcH/2/4P/ffRcCp0qYo7gFpXwPh
 bWem3ygB4xmBgiivwGJOx/GgE6/QGmedZmD6EDLoLmpuOvGjdp4iRmvU1dCSDWOE
 bMOBa1cxC4TPGzqlbGgjmyHgMPuihJq6GInAqmpJk/hZJ8W7JfnXyoZbt9pj4UBW
 gYKMLa0gSF/rMTZ5hkDQ6mVH65M7jJnmHLRydTpK8Ryfap2lu01MIr6mC6xaobVc
 5NfbI8bZexQECGLmPsFRnFWrYXNz86PKtN0j4YnPRVPRbBSTi8nHGu+zK2CJXpgu
 9hyZ0eOdqSEi4wwQgI9g9lZSsa1C+5QBc6BwiDy5XjToPJOx5EnT6+m9O+fy9Q8=
 =BVMi
 -----END PGP SIGNATURE-----

Merge tag 'pr-20141220-x86-vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/urgent

Pull a VDSO fix from Andy Lutomirski:

  "One vdso fix for a longstanding ASLR bug that's been in the news lately.

   The vdso base address has always been randomized, and I don't think there's
   anything particularly wrong with the range over which it's randomized,
   but the implementation seems to have been buggy since the very beginning.

   This fixes the implementation to remove a large bias that caused a small
   fraction of possible vdso load addresess to be vastly more likely than
   the rest of the possible addresses."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-21 11:16:49 +01:00
Andy Lutomirski
394f56fe48 x86_64, vdso: Fix the vdso address randomization algorithm
The theory behind vdso randomization is that it's mapped at a random
offset above the top of the stack.  To avoid wasting a page of
memory for an extra page table, the vdso isn't supposed to extend
past the lowest PMD into which it can fit.  Other than that, the
address should be a uniformly distributed address that meets all of
the alignment requirements.

The current algorithm is buggy: the vdso has about a 50% probability
of being at the very end of a PMD.  The current algorithm also has a
decent chance of failing outright due to incorrect handling of the
case where the top of the stack is near the top of its PMD.

This fixes the implementation.  The paxtest estimate of vdso
"randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
don't know what the paxtest code is actually calculating.)

It's worth noting that this algorithm is inherently biased: the vdso
is more likely to end up near the end of its PMD than near the
beginning.  Ideally we would either nix the PMD sharing requirement
or jointly randomize the vdso and the stack to reduce the bias.

In the mean time, this is a considerable improvement with basically
no risk of compatibility issues, since the allowed outputs of the
algorithm are unchanged.

As an easy test, doing this:

for i in `seq 10000`
  do grep -P vdso /proc/self/maps |cut -d- -f1
done |sort |uniq -d

used to produce lots of output (1445 lines on my most recent run).
A tiny subset looks like this:

7fffdfffe000
7fffe01fe000
7fffe05fe000
7fffe07fe000
7fffe09fe000
7fffe0bfe000
7fffe0dfe000

Note the suspicious fe000 endings.  With the fix, I get a much more
palatable 76 repeated addresses.

Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2014-12-20 16:56:57 -08:00