Commit Graph

20100 Commits

Author SHA1 Message Date
Mathias Krause
8266e31ed0 x86, ptdump: Add section for EFI runtime services
In commit 3891a04aaf ("x86-64, espfix: Don't leak bits 31:16 of %esp
returning..") the "ESPFix Area" was added to the page table dump special
sections. That area, though, has a limited amount of entries printed.

The EFI runtime services are, unfortunately, located in-between the
espfix area and the high kernel memory mapping. Due to the enforced
limitation for the espfix area, the EFI mappings won't be printed in the
page table dump.

To make the ESP runtime service mappings visible again, provide them a
dedicated entry.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-11-11 22:28:57 +00:00
Ard Biesheuvel
243b6754cd efi/x86: Move x86 back to libstub
This reverts commit 84be880560, which itself reverted my original
attempt to move x86 from #include'ing .c files from across the tree
to using the EFI stub built as a static library.

The issue that affected the original approach was that splitting
the implementation into several .o files resulted in the variable
'efi_early' becoming a global with external linkage, which under
-fPIC implies that references to it must go through the GOT. However,
dealing with this additional GOT entry turned out to be troublesome
on some EFI implementations. (GCC's visibility=hidden attribute is
supposed to lift this requirement, but it turned out not to work on
the 32-bit build.)

Instead, use a pure getter function to get a reference to efi_early.
This approach results in no additional GOT entries being generated,
so there is no need for any changes in the early GOT handling.

Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-11-11 22:23:11 +00:00
Linus Torvalds
96971e9aa9 This is a pretty large update. I think it is roughly as big
as what I usually had for the _whole_ rc period.
 
 There are a few bad bugs where the guest can OOPS or crash the host.  We
 have also started looking at attack models for nested virtualization;
 bugs that usually result in the guest ring 0 crashing itself become
 more worrisome if you have nested virtualization, because the nested
 guest might bring down the non-nested guest as well.  For current
 uses of nested virtualization these do not really have a security
 impact, but you never know and bugs are bugs nevertheless.
 
 A lot of these bugs are in 3.17 too, resulting in a large number of
 stable@ Ccs.  I checked that all the patches apply there with no
 conflicts.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUSjmSAAoJEL/70l94x66D2cYH/3JKWsTzhXjHGxZcXQQ85CwR
 49hp/crCLWJ2YRKzyAOkvwPI0/SgYKM5wJ8kgtKlpLxrPZKYwhGd1S9tKf6EdAib
 5gc/SDDAgHmkqL3IrXmkyKzUVeUWvgD/IFi1Sqalko1blpRlaN/JyJV0mjjGCbA+
 yH3Qi5tD0X00u00ycuZCB6mrFH0PH87BmKFiz6bSSJ43tsgD9AVD64BZid6c6hwm
 iaIfNcIuShavlv1TKG80cSez2qtNXjRLeTN8A10gVZo3hof/wP8aRm+LxF/1JEZX
 OsoNCjOhhL29qafcZOg3j/atbiAzWtSGV3vjU+iWh5mnN5oFZHcPgIGucQsuFec=
 =9oQY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "This is a pretty large update.  I think it is roughly as big as what I
  usually had for the _whole_ rc period.

  There are a few bad bugs where the guest can OOPS or crash the host.
  We have also started looking at attack models for nested
  virtualization; bugs that usually result in the guest ring 0 crashing
  itself become more worrisome if you have nested virtualization,
  because the nested guest might bring down the non-nested guest as
  well.  For current uses of nested virtualization these do not really
  have a security impact, but you never know and bugs are bugs
  nevertheless.

  A lot of these bugs are in 3.17 too, resulting in a large number of
  stable@ Ccs.  I checked that all the patches apply there with no
  conflicts"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: vfio: fix unregister kvm_device_ops of vfio
  KVM: x86: Wrong assertion on paging_tmpl.h
  kvm: fix excessive pages un-pinning in kvm_iommu_map error path.
  KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag
  KVM: x86: Emulator does not decode clflush well
  KVM: emulate: avoid accessing NULL ctxt->memopp
  KVM: x86: Decoding guest instructions which cross page boundary may fail
  kvm: x86: don't kill guest on unknown exit reason
  kvm: vmx: handle invvpid vm exit gracefully
  KVM: x86: Handle errors when RIP is set during far jumps
  KVM: x86: Emulator fixes for eip canonical checks on near branches
  KVM: x86: Fix wrong masking on relative jump/call
  KVM: x86: Improve thread safety in pit
  KVM: x86: Prevent host from panicking on shared MSR writes.
  KVM: x86: Check non-canonical addresses upon WRMSR
2014-10-24 12:42:55 -07:00
Linus Torvalds
20ca57cde5 xen: bug fixes for 3.18-rc1
- Fix regression in xen_clocksource_read() which caused all Xen guests
   to crash early in boot.
 - Several fixes for super rare race conditions in the p2m.
 - Assorted other minor fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUSh3nAAoJEFxbo/MsZsTRw6IH/imL2J++b8cafVvHjmVRt1T/
 P7KuFYPh/Tym+LISDBfk7MeOXZWsffvUDP653cGQiIMgmumEgVrU1+vR2Z0qRiRe
 95ZDIuQBmyGNBG9MiB0+zB7+STsvLECkPVWYDJCNbGVgrlHL6UHne06edrSpfr30
 13PyZeJAojezrt2hzLO43V7bu9acRmLo6WNdh6N2stfJv8QSQYSQO87baRdRB+rO
 I1r2jP7TJp9ZRtzSTsYLfpyhCGLcvXY58bci+Tz9x6xWMJ/HH5HvfJjxO17HzbdD
 2se6MKFVbOXT7DQK+BvQBDIO52t731DWZs4t7SJg24kDoINL7XiC/qSHC0vHJJM=
 =Cs0b
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - Fix regression in xen_clocksource_read() which caused all Xen guests
   to crash early in boot.
 - Several fixes for super rare race conditions in the p2m.
 - Assorted other minor fixes.

* tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pci: Allocate memory for physdev_pci_device_add's optarr
  x86/xen: panic on bad Xen-provided memory map
  x86/xen: Fix incorrect per_cpu accessor in xen_clocksource_read()
  x86/xen: avoid race in p2m handling
  x86/xen: delay construction of mfn_list_list
  x86/xen: avoid writing to freed memory after race in p2m handling
  xen/balloon: Don't continue ballooning when BP_ECANCELED is encountered
2014-10-24 12:41:50 -07:00
Nadav Amit
1715d0dcb0 KVM: x86: Wrong assertion on paging_tmpl.h
Even after the recent fix, the assertion on paging_tmpl.h is triggered.
Apparently, the assertion wants to check that the PAE is always set on
long-mode, but does it in incorrect way.  Note that the assertion is not
enabled unless the code is debugged by defining MMU_DEBUG.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:30:37 +02:00
Nadav Amit
3f6f1480d8 KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag
The decode phase of the x86 emulator assumes that every instruction with the
ModRM flag, and which can be used with RIP-relative addressing, has either
SrcMem or DstMem.  This is not the case for several instructions - prefetch,
hint-nop and clflush.

Adding SrcMem|NoAccess for prefetch and hint-nop and SrcMem for clflush.

This fixes CVE-2014-8480.

Fixes: 41061cdb98
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:30:36 +02:00
Nadav Amit
13e457e0ee KVM: x86: Emulator does not decode clflush well
Currently, all group15 instructions are decoded as clflush (e.g., mfence,
xsave).  In addition, the clflush instruction requires no prefix (66/f2/f3)
would exist. If prefix exists it may encode a different instruction (e.g.,
clflushopt).

Creating a group for clflush, and different group for each prefix.

This has been the case forever, but the next patch needs the cflush group
in order to fix a bug introduced in 3.17.

Fixes: 41061cdb98
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:30:36 +02:00
Paolo Bonzini
a430c91663 KVM: emulate: avoid accessing NULL ctxt->memopp
A failure to decode the instruction can cause a NULL pointer access.
This is fixed simply by moving the "done" label as close as possible
to the return.

This fixes CVE-2014-8481.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Cc: stable@vger.kernel.org
Fixes: 41061cdb98
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:30:35 +02:00
Nadav Amit
08da44aedb KVM: x86: Decoding guest instructions which cross page boundary may fail
Once an instruction crosses a page boundary, the size read from the second page
disregards the common case that part of the operand resides on the first page.
As a result, fetch of long insturctions may fail, and thereby cause the
decoding to fail as well.

Cc: stable@vger.kernel.org
Fixes: 5cfc7e0f5e
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:18 +02:00
Michael S. Tsirkin
2bc19dc375 kvm: x86: don't kill guest on unknown exit reason
KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
triggered by a priveledged application.  Let's not kill the guest: WARN
and inject #UD instead.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:17 +02:00
Petr Matousek
a642fc3050 kvm: vmx: handle invvpid vm exit gracefully
On systems with invvpid instruction support (corresponding bit in
IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid
causes vm exit, which is currently not handled and results in
propagation of unknown exit to userspace.

Fix this by installing an invvpid vm exit handler.

This is CVE-2014-3646.

Cc: stable@vger.kernel.org
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:17 +02:00
Nadav Amit
d1442d85cc KVM: x86: Handle errors when RIP is set during far jumps
Far jmp/call/ret may fault while loading a new RIP.  Currently KVM does not
handle this case, and may result in failed vm-entry once the assignment is
done.  The tricky part of doing so is that loading the new CS affects the
VMCS/VMCB state, so if we fail during loading the new RIP, we are left in
unconsistent state.  Therefore, this patch saves on 64-bit the old CS
descriptor and restores it if loading RIP failed.

This fixes CVE-2014-3647.

Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:16 +02:00
Nadav Amit
234f3ce485 KVM: x86: Emulator fixes for eip canonical checks on near branches
Before changing rip (during jmp, call, ret, etc.) the target should be asserted
to be canonical one, as real CPUs do.  During sysret, both target rsp and rip
should be canonical. If any of these values is noncanonical, a #GP exception
should occur.  The exception to this rule are syscall and sysenter instructions
in which the assigned rip is checked during the assignment to the relevant
MSRs.

This patch fixes the emulator to behave as real CPUs do for near branches.
Far branches are handled by the next patch.

This fixes CVE-2014-3647.

Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:16 +02:00
Nadav Amit
05c83ec9b7 KVM: x86: Fix wrong masking on relative jump/call
Relative jumps and calls do the masking according to the operand size, and not
according to the address size as the KVM emulator does today.

This patch fixes KVM behavior.

Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:15 +02:00
Andy Honig
2febc83913 KVM: x86: Improve thread safety in pit
There's a race condition in the PIT emulation code in KVM.  In
__kvm_migrate_pit_timer the pit_timer object is accessed without
synchronization.  If the race condition occurs at the wrong time this
can crash the host kernel.

This fixes CVE-2014-3611.

Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:14 +02:00
Andy Honig
8b3c3104c3 KVM: x86: Prevent host from panicking on shared MSR writes.
The previous patch blocked invalid writes directly when the MSR
is written.  As a precaution, prevent future similar mistakes by
gracefulling handle GPs caused by writes to shared MSRs.

Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig <ahonig@google.com>
[Remove parts obsoleted by Nadav's patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:08 +02:00
Nadav Amit
854e8bb1aa KVM: x86: Check non-canonical addresses upon WRMSR
Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
written to certain MSRs. The behavior is "almost" identical for AMD and Intel
(ignoring MSRs that are not implemented in either architecture since they would
anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
non-canonical address is written on Intel but not on AMD (which ignores the top
32-bits).

Accordingly, this patch injects a #GP on the MSRs which behave identically on
Intel and AMD.  To eliminate the differences between the architecutres, the
value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
canonical value before writing instead of injecting a #GP.

Some references from Intel and AMD manuals:

According to Intel SDM description of WRMSR instruction #GP is expected on
WRMSR "If the source register contains a non-canonical address and ECX
specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."

According to AMD manual instruction manual:
LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
LSTAR and CSTAR registers.  If an RIP written by WRMSR is not in canonical
form, a general-protection exception (#GP) occurs."
IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
base field must be in canonical form or a #GP fault will occur."
IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
be in canonical form."

This patch fixes CVE-2014-3610.

Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:08 +02:00
Linus Torvalds
816fb4175c The "weak" attribute is commonly used for the default version of a
function, where an architecture can override it by providing a strong
 version.
 
 Some header file declarations included the "weak" attribute.  That's
 error-prone because it causes every implementation to be weak, with no
 strong version at all, and the linker chooses one based on link order.
 
 What we want is the "weak" attribute only on the *definition* of the
 default implementation.  These changes remove "weak" from the declarations,
 leaving it on the default definitions.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUSDD6AAoJEFmIoMA60/r8QQUP/1IYJwOZ1GfJs3mRVa8qrdlS
 qx0X96DOBQtPqA+fFgzuDnVWaRAsIGRYy/dHHZm4HZZW5QLsRXMrsFe9KXf/rHcu
 u5BXAqE7n7CPekmL9zuqNFzO7NgtF9PfnWnX/ckVWwUAhUnR36BbUxD3MPMiRhZA
 re1ao/hXSgrlyUOpC/Wfp6j+fzMoG4IIOwY+uKj7xWAkCQA3DUAtJr9kmIpyEz9n
 2I3WVERjZ4fiZLwqrafX2hA8NbI4hD4/MkxBL9GkSOslPm8aL8QzMfj0ty5Z10rr
 nqc+2+kU5H5sk8dTiUp3CMBoBmOSwkvbaQ/hl10qGbKLHSFyGlHpISDgICsGC/jI
 x7081DkRcTJQ1UjUSa+56R9ZJDogMrTRDwMSFmV/OsGXKkW4IEaq+GoQaXMGOMW/
 3Lu5IuzyoaWTh+2zaCL2ngmnsJLlwrYF8vN5be1j6k6QsW50smnArgFGQ9zihry0
 8ewyfnm7nHheu1v2+qwXphxVO7sKTDqO4R1Gxli3VUJ8CngK121Auc9Y02GNluP6
 QP8zaieGTyNJ/Cxo0vy9DEm0RqLcSsncjfGG3eC2a0ruMp8Ynju2/u8QjwHPDPFo
 2PX6FDjT36I+Fo5dKzK6sNIyiELeWvxqsb3yj+N57CIvZuH6/FjhvEEBcGaJ9bZ3
 JQEOMXbVJ5oaNux25CRz
 =P4fi
 -----END PGP SIGNATURE-----

Merge tag 'remove-weak-declarations' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull weak function declaration removal from Bjorn Helgaas:
 "The "weak" attribute is commonly used for the default version of a
  function, where an architecture can override it by providing a strong
  version.

  Some header file declarations included the "weak" attribute.  That's
  error-prone because it causes every implementation to be weak, with no
  strong version at all, and the linker chooses one based on link order.

  What we want is the "weak" attribute only on the *definition* of the
  default implementation.  These changes remove "weak" from the
  declarations, leaving it on the default definitions"

* tag 'remove-weak-declarations' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  uprobes: Remove "weak" from function declarations
  memory-hotplug: Remove "weak" from memory_block_size_bytes() declaration
  kgdb: Remove "weak" from kgdb_arch_pc() declaration
  ARC: kgdb: generic kgdb_arch_pc() suffices
  vmcore: Remove "weak" from function declarations
  clocksource: Remove "weak" from clocksource_default_clock() declaration
  x86, intel-mid: Remove "weak" from function declarations
  audit: Remove "weak" from audit_classify_compat_syscall() declaration
2014-10-23 15:04:27 -07:00
Linus Torvalds
8c81f48e16 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI updates from Peter Anvin:
 "This patchset falls under the "maintainers that grovel" clause in the
  v3.18-rc1 announcement.  We had intended to push it late in the merge
  window since we got it into the -tip tree relatively late.

  Many of these are relatively simple things, but there are a couple of
  key bits, especially Ard's and Matt's patches"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  rtc: Disable EFI rtc for x86
  efi: rtc-efi: Export platform:rtc-efi as module alias
  efi: Delete the in_nmi() conditional runtime locking
  efi: Provide a non-blocking SetVariable() operation
  x86/efi: Adding efi_printks on memory allocationa and pci.reads
  x86/efi: Mark initialization code as such
  x86/efi: Update comment regarding required phys mapped EFI services
  x86/efi: Unexport add_efi_memmap variable
  x86/efi: Remove unused efi_call* macros
  efi: Resolve some shadow warnings
  arm64: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
  ia64: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
  x86: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
  efi: Introduce efi_md_typeattr_format()
  efi: Add macro for EFI_MEMORY_UCE memory attribute
  x86/efi: Clear EFI_RUNTIME_SERVICES if failing to enter virtual mode
  arm64/efi: Do not enter virtual mode if booting with efi=noruntime or noefi
  arm64/efi: uefi_init error handling fix
  efi: Add kernel param efi=noruntime
  lib: Add a generic cmdline parse function parse_option_str
  ...
2014-10-23 14:45:09 -07:00
Martin Kelly
1ea644c8f9 x86/xen: panic on bad Xen-provided memory map
Panic if Xen provides a memory map with 0 entries. Although this is
unlikely, it is better to catch the error at the point of seeing the map
than later on as a symptom of some other crash.

Signed-off-by: Martin Kelly <martkell@amazon.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-23 16:24:02 +01:00
Boris Ostrovsky
3251f20b89 x86/xen: Fix incorrect per_cpu accessor in xen_clocksource_read()
Commit 89cbc76768 ("x86: Replace __get_cpu_var uses") replaced
__get_cpu_var() with this_cpu_ptr() in xen_clocksource_read() in such a
way that instead of accessing a structure pointed to by a per-cpu pointer
we are trying to get to a per-cpu structure.

__this_cpu_read() of the pointer is the more appropriate accessor.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-23 16:24:02 +01:00
Juergen Gross
3a0e94f8ea x86/xen: avoid race in p2m handling
When a new p2m leaf is allocated this leaf is linked into the p2m tree
via cmpxchg. Unfortunately the compare value for checking the success
of the update is read after checking for the need of a new leaf. It is
possible that a new leaf has been linked into the tree concurrently
in between. This could lead to a leaked memory page and to the loss of
some p2m entries.

Avoid the race by using the read compare value for checking the need
of a new p2m leaf and use ACCESS_ONCE() to get it.

There are other places which seem to need ACCESS_ONCE() to ensure
proper operation. Change them accordingly.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-23 16:24:02 +01:00
Juergen Gross
2c185687ab x86/xen: delay construction of mfn_list_list
The 3 level p2m tree for the Xen tools is constructed very early at
boot by calling xen_build_mfn_list_list(). Memory needed for this tree
is allocated via extend_brk().

As this tree (other than the kernel internal p2m tree) is only needed
for domain save/restore, live migration and crash dump analysis it
doesn't matter whether it is constructed very early or just some
milliseconds later when memory allocation is possible by other means.

This patch moves the call of xen_build_mfn_list_list() just after
calling xen_pagetable_p2m_copy() simplifying this function, too, as it
doesn't have to bother with two parallel trees now. The same applies
for some other internal functions.

While simplifying code, make early_can_reuse_p2m_middle() static and
drop the unused second parameter. p2m_mid_identity_mfn can be removed
as well, it isn't used either.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-23 16:24:02 +01:00
Juergen Gross
239af7c713 x86/xen: avoid writing to freed memory after race in p2m handling
In case a race was detected during allocation of a new p2m tree
element in alloc_p2m() the new allocated mid_mfn page is freed without
updating the pointer to the found value in the tree. This will result
in overwriting the just freed page with the mfn of the p2m leaf.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-23 16:24:01 +01:00
Bjorn Helgaas
754f0da0dc x86, intel-mid: Remove "weak" from function declarations
For the following interfaces:

  get_penwell_ops()
  get_cloverview_ops()
  get_tangier_ops()

there is only one implementation, so they do not need to be marked "weak".

Remove the "weak" attribute from their declarations.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
CC: David Cohen <david.a.cohen@linux.intel.com>
CC: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
CC: x86@kernel.org
2014-10-22 16:14:03 -06:00
Linus Torvalds
ab074ade9c Merge git://git.infradead.org/users/eparis/audit
Pull audit updates from Eric Paris:
 "So this change across a whole bunch of arches really solves one basic
  problem.  We want to audit when seccomp is killing a process.  seccomp
  hooks in before the audit syscall entry code.  audit_syscall_entry
  took as an argument the arch of the given syscall.  Since the arch is
  part of what makes a syscall number meaningful it's an important part
  of the record, but it isn't available when seccomp shoots the
  syscall...

  For most arch's we have a better way to get the arch (syscall_get_arch)
  So the solution was two fold: Implement syscall_get_arch() everywhere
  there is audit which didn't have it.  Use syscall_get_arch() in the
  seccomp audit code.  Having syscall_get_arch() everywhere meant it was
  a useless flag on the stack and we could get rid of it for the typical
  syscall entry.

  The other changes inside the audit system aren't grand, fixed some
  records that had invalid spaces.  Better locking around the task comm
  field.  Removing some dead functions and structs.  Make some things
  static.  Really minor stuff"

* git://git.infradead.org/users/eparis/audit: (31 commits)
  audit: rename audit_log_remove_rule to disambiguate for trees
  audit: cull redundancy in audit_rule_change
  audit: WARN if audit_rule_change called illegally
  audit: put rule existence check in canonical order
  next: openrisc: Fix build
  audit: get comm using lock to avoid race in string printing
  audit: remove open_arg() function that is never used
  audit: correct AUDIT_GET_FEATURE return message type
  audit: set nlmsg_len for multicast messages.
  audit: use union for audit_field values since they are mutually exclusive
  audit: invalid op= values for rules
  audit: use atomic_t to simplify audit_serial()
  kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
  audit: reduce scope of audit_log_fcaps
  audit: reduce scope of audit_net_id
  audit: arm64: Remove the audit arch argument to audit_syscall_entry
  arm64: audit: Add audit hook in syscall_trace_enter/exit()
  audit: x86: drop arch from __audit_syscall_entry() interface
  sparc: implement is_32bit_task
  sparc: properly conditionalize use of TIF_32BIT
  ...
2014-10-19 16:25:56 -07:00
Linus Torvalds
1f6075f990 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Ingo Molnar:
 "A second (and last) round of late coming fixes and changes, almost all
  of them in perf tooling:

  User visible tooling changes:

   - Add period data column and make it default in 'perf script' (Jiri
     Olsa)

   - Add a visual cue for toggle zeroing of samples in 'perf top'
     (Taeung Song)

   - Improve callchains when using libunwind (Namhyung Kim)

  Tooling fixes and infrastructure changes:

   - Fix for double free in 'perf stat' when using some specific invalid
     command line combo (Yasser Shalabi)

   - Fix off-by-one bugs in map->end handling (Stephane Eranian)

   - Fix off-by-one bug in maps__find(), also related to map->end
     handling (Namhyung Kim)

   - Make struct symbol->end be the first addr after the symbol range,
     to make it match the convention used for struct map->end.  (Arnaldo
     Carvalho de Melo)

   - Fix perf_evlist__add_pollfd() error handling in 'perf kvm stat
     live' (Jiri Olsa)

   - Fix python test build by moving callchain_param to an object linked
     into the python binding (Jiri Olsa)

   - Document sysfs events/ interfaces (Cody P Schafer)

   - Fix typos in perf/Documentation (Masanari Iida)

   - Add missing 'struct option' forward declaration (Arnaldo Carvalho
     de Melo)

   - Add option to copy events when queuing for sorting across cpu
     buffers and enable it for 'perf kvm stat live', to avoid having
     events left in the queue pointing to the ring buffer be rewritten
     in high volume sessions.  (Alexander Yarygin, improving work done
     by David Ahern):

   - Do not include a struct hists per perf_evsel, untangling the
     histogram code from perf_evsel, to pave the way for exporting a
     minimalistic tools/lib/api/perf/ library usable by tools/perf and
     initially by the rasd daemon being developed by Borislav Petkov,
     Robert Richter and Jean Pihet.  (Arnaldo Carvalho de Melo)

   - Make perf_evlist__open(evlist, NULL, NULL), i.e. without cpu and
     thread maps mean syswide monitoring, reducing the boilerplate for
     tools that only want system wide mode.  (Arnaldo Carvalho de Melo)

   - Move exit stuff from perf_evsel__delete to perf_evsel__exit, delete
     should be just a front end for exit + free (Arnaldo Carvalho de
     Melo)

   - Add support to new style format of kernel PMU event.  (Kan Liang)

  and other misc fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  perf script: Add period as a default output column
  perf script: Add period data column
  perf evsel: No need to drag util/cgroup.h
  perf evlist: Add missing 'struct option' forward declaration
  perf evsel: Move exit stuff from __delete to __exit
  kprobes/x86: Remove stale ARCH_SUPPORTS_KPROBES_ON_FTRACE define
  perf kvm stat live: Enable events copying
  perf session: Add option to copy events when queueing
  perf Documentation: Fix typos in perf/Documentation
  perf trace: Use thread_{,_set}_priv helpers
  perf kvm: Use thread_{,_set}_priv helpers
  perf callchain: Create an address space per thread
  perf report: Set callchain_param.record_mode for future use
  perf evlist: Fix for double free in tools/perf stat
  perf test: Add test case for pmu event new style format
  perf tools: Add support to new style format of kernel PMU event
  perf tools: Parse the pmu event prefix and suffix
  Revert "perf tools: Default to cpu// for events v5"
  perf Documentation: Remove Ruplicated docs for powerpc cpu specific events
  perf Documentation: sysfs events/ interfaces
  ...
2014-10-19 11:55:41 -07:00
Linus Torvalds
857b50f5d0 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "This is the MIPS pull request for the next kernel:

   - Zubair's patch series adds CMA support for MIPS.  Doing so it also
     touches ARM64 and x86.
   - remove the last instance of IRQF_DISABLED from arch/mips
   - updates to two of the MIPS defconfig files.
   - cleanup of how cache coherency bits are handled on MIPS and
     implement support for write-combining.
   - platform upgrades for Alchemy
   - move MIPS DTS files to arch/mips/boot/dts/"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (24 commits)
  MIPS: ralink: remove deprecated IRQF_DISABLED
  MIPS: pgtable.h: Implement the pgprot_writecombine function for MIPS
  MIPS: cpu-probe: Set the write-combine CCA value on per core basis
  MIPS: pgtable-bits: Define the CCA bit for WC writes on Ingenic cores
  MIPS: pgtable-bits: Move the CCA bits out of the core's ifdef blocks
  MIPS: DMA: Add cma support
  x86: use generic dma-contiguous.h
  arm64: use generic dma-contiguous.h
  asm-generic: Add dma-contiguous.h
  MIPS: BPF: Add new emit_long_instr macro
  MIPS: ralink: Move device-trees to arch/mips/boot/dts/
  MIPS: Netlogic: Move device-trees to arch/mips/boot/dts/
  MIPS: sead3: Move device-trees to arch/mips/boot/dts/
  MIPS: Lantiq: Move device-trees to arch/mips/boot/dts/
  MIPS: Octeon: Move device-trees to arch/mips/boot/dts/
  MIPS: Add support for building device-tree binaries
  MIPS: Create common infrastructure for building built-in device-trees
  MIPS: SEAD3: Enable DEVTMPFS
  MIPS: SEAD3: Regenerate defconfigs
  MIPS: Alchemy: DB1300: Add touch penirq support
  ...
2014-10-18 14:24:36 -07:00
Andy Lutomirski
d974baa398 x86,kvm,vmx: Preserve CR4 across VM entry
CR4 isn't constant; at least the TSD and PCE bits can vary.

TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
like it's correct.

This adds a branch and a read from cr4 to each vm entry.  Because it is
extremely likely that consecutive entries into the same vcpu will have
the same host cr4 value, this fixes up the vmcs instead of restoring cr4
after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
reducing the overhead in the common case to just two memory reads and a
branch.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-18 10:09:03 -07:00
Linus Torvalds
2e923b0251 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Include fixes for netrom and dsa (Fabian Frederick and Florian
    Fainelli)

 2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO.

 3) Several SKB use after free fixes (vxlan, openvswitch, vxlan,
    ip_tunnel, fou), from Li ROngQing.

 4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy.

 5) Use after free in virtio_net, from Michael S Tsirkin.

 6) Fix flow mask handling for megaflows in openvswitch, from Pravin B
    Shelar.

 7) ISDN gigaset and capi bug fixes from Tilman Schmidt.

 8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin.

 9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov.

10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong
    Wang and Eric Dumazet.

11) Don't overwrite end of SKB when parsing malformed sctp ASCONF
    chunks, from Daniel Borkmann.

12) Don't call sock_kfree_s() with NULL pointers, this function also has
    the side effect of adjusting the socket memory usage.  From Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
  bna: fix skb->truesize underestimation
  net: dsa: add includes for ethtool and phy_fixed definitions
  openvswitch: Set flow-key members.
  netrom: use linux/uaccess.h
  dsa: Fix conversion from host device to mii bus
  tipc: fix bug in bundled buffer reception
  ipv6: introduce tcp_v6_iif()
  sfc: add support for skb->xmit_more
  r8152: return -EBUSY for runtime suspend
  ipv4: fix a potential use after free in fou.c
  ipv4: fix a potential use after free in ip_tunnel_core.c
  hyperv: Add handling of IP header with option field in netvsc_set_hash()
  openvswitch: Create right mask with disabled megaflows
  vxlan: fix a free after use
  openvswitch: fix a use after free
  ipv4: dst_entry leak in ip_send_unicast_reply()
  ipv4: clean up cookie_v4_check()
  ipv4: share tcp_v4_save_options() with cookie_v4_check()
  ipv4: call __ip_options_echo() in cookie_v4_check()
  atm: simplify lanai.c by using module_pci_driver
  ...
2014-10-18 09:31:37 -07:00
Anton Blanchard
691286b556 kprobes/x86: Remove stale ARCH_SUPPORTS_KPROBES_ON_FTRACE define
Commit e7dbfe349d ("kprobes/x86: Move ftrace-based kprobe code
into kprobes-ftrace.c") switched from using
ARCH_SUPPORTS_KPROBES_ON_FTRACE to CONFIG_KPROBES_ON_FTRACE but
missed removing the define.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: masami.hiramatsu.pt@hitachi.com
Cc: ananth@in.ibm.com
Cc: a.p.zijlstra@chello.nl
Cc: fweisbec@gmail.com
Cc: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-17 07:18:34 +02:00
Linus Torvalds
0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00
Alexei Starovoitov
e0ee9c1215 x86: bpf_jit: fix two bugs in eBPF JIT compiler
1.
JIT compiler using multi-pass approach to converge to final image size,
since x86 instructions are variable length. It starts with large
gaps between instructions (so some jumps may use imm32 instead of imm8)
and iterates until total program size is the same as in previous pass.
This algorithm works only if program size is strictly decreasing.
Programs that use LD_ABS insn need additional code in prologue, but it
was not emitted during 1st pass, so there was a chance that 2nd pass would
adjust imm32->imm8 jump offsets to the same number of bytes as increase in
prologue, which may cause algorithm to erroneously decide that size converged.
Fix it by always emitting largest prologue in the first pass which
is detected by oldproglen==0 check.
Also change error check condition 'proglen != oldproglen' to fail gracefully.

2.
while staring at the code realized that 64-byte buffer may not be enough
when 1st insn is large, so increase it to 128 to avoid buffer overflow
(theoretical maximum size of prologue+div is 109) and add runtime check.

Fixes: 622582786c ("net: filter: x86: internal BPF JIT")
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 13:13:14 -04:00
Linus Torvalds
dfe2c6dcc8 Merge branch 'akpm' (patches from Andrew Morton)
Merge second patch-bomb from Andrew Morton:
 - a few hotfixes
 - drivers/dma updates
 - MAINTAINERS updates
 - Quite a lot of lib/ updates
 - checkpatch updates
 - binfmt updates
 - autofs4
 - drivers/rtc/
 - various small tweaks to less used filesystems
 - ipc/ updates
 - kernel/watchdog.c changes

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
  mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
  kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h>
  ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[]
  frv: remove unused declarations of __start___ex_table and __stop___ex_table
  kvm: ensure hard lockup detection is disabled by default
  kernel/watchdog.c: control hard lockup detection default
  staging: rtl8192u: use %*pEn to escape buffer
  staging: rtl8192e: use %*pEn to escape buffer
  staging: wlan-ng: use %*pEhp to print SN
  lib80211: remove unused print_ssid()
  wireless: hostap: proc: print properly escaped SSID
  wireless: ipw2x00: print SSID via %*pE
  wireless: libertas: print esaped string via %*pE
  lib/vsprintf: add %*pE[achnops] format specifier
  lib / string_helpers: introduce string_escape_mem()
  lib / string_helpers: refactoring the test suite
  lib / string_helpers: move documentation to c-file
  include/linux: remove strict_strto* definitions
  arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable
  fs: check bh blocknr earlier when searching lru
  ...
2014-10-14 03:54:50 +02:00
Linus Torvalds
31003e3a9d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML update from Richard Weinberger:
 "Besides of fixes this contains also support for CONFIG_STACKTRACE by
  Daniel Walter"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: net: Eliminate NULL test after alloc_bootmem
  um: Add support for CONFIG_STACKTRACE
  um: ubd: Fix for processes stuck in D state forever
  um: delete unnecessary bootmem struct page array
  um: remove csum_partial_copy_generic_i386 to clean up exception table
2014-10-14 03:49:02 +02:00
Linus Torvalds
77654908ff Merge branches 'x86-ras-for-linus', 'x86-uv-for-linus' and 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 ras, uv and vdso fixlets from Ingo Molnar:
 "ras: tone down a kernel message to only occur during initial bootup,
    not during suspend/resume cycles.

  uv: a cleanup commit

  vdso: a fix to error checking"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Avoid showing repetitive message from intel_init_thermal()

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/uv: Remove unnecessary #ifdef

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Fix vdso2c's special_pages[] error checking
2014-10-14 02:31:22 +02:00
Linus Torvalds
2fd7476de9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc smaller fixes that missed the v3.17 cycle"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Add arch/x86/purgatory/ make generated files to gitignore
  x86: Fix section conflict for numachip
  x86: Reject x32 executables if x32 ABI not supported
  x86_64, entry: Filter RFLAGS.NT on entry from userspace
  x86, boot, kaslr: Fix nuisance warning on 32-bit builds
2014-10-14 02:28:16 +02:00
Linus Torvalds
ba1a96fc7d Merge branch 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 seccomp changes from Ingo Molnar:
 "This tree includes x86 seccomp filter speedups and related preparatory
  work, which touches core seccomp facilities as well.

  The main idea is to split seccomp into two phases, to be able to enter
  a simple fast path for syscalls with ptrace side effects.

  There's no substantial user-visible (and ABI) effects expected from
  this, except a change in how we emit a better audit record for
  SECCOMP_RET_TRACE events"

* 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
  x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
  x86: Split syscall_trace_enter into two phases
  x86, entry: Only call user_exit if TIF_NOHZ
  x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
  seccomp: Document two-phase seccomp and arch-provided seccomp_data
  seccomp: Allow arch code to provide seccomp_data
  seccomp: Refactor the filter callback and the API
  seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
2014-10-14 02:27:06 +02:00
Linus Torvalds
f1bfbd984b Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "The main changes in this tree are:

   - fix and update Intel Quark [Galileo] SoC platform support

   - update IOSF chipset side band interface and make it available via
     debugfs

   - enable HPETs on Soekris net6501 and other e6xx based systems"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache()
  x86: Quark: Comment setup_arch() to document TLB/PGE bug
  x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
  x86/platform/intel/iosf: Add debugfs config option for IOSF
  x86/platform/intel/iosf: Add better description of IOSF driver in config
  x86/platform/intel/iosf: Add Braswell PCI ID
  x86/platform/pmc_atom: Fix warning when CONFIG_DEBUG_FS=n
  x86: HPET force enable for e6xx based systems
  x86/iosf: Add debugfs support
  x86/iosf: Add Kconfig prompt for IOSF_MBI selection
2014-10-14 02:23:55 +02:00
Linus Torvalds
df133e8fa8 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "This tree includes the following changes:

   - fix memory hotplug
   - fix hibernation bootup memory layout assumptions
   - fix hyperv numa guest kernel messages
   - remove dead code
   - update documentation"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Update memory map description to list hypervisor-reserved area
  x86/mm, hibernate: Do not assume the first e820 area to be RAM
  x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data()
  x86/mm/hotplug: Modify PGD entry when removing memory
  x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable()
  x86: Remove set_pmd_pfn
2014-10-14 02:22:41 +02:00
Linus Torvalds
e3438330f5 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Ingo Molnar:
 "Misc smaller cleanups"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, intel: Fix total_size computation
  x86, microcode, intel: Rename apply_microcode and declare it static
  x86, microcode, intel: Fix typos
  x86, microcode, intel: Add missing static declarations
  x86, microcode, amd: Fix missing static declaration
2014-10-14 02:21:51 +02:00
Linus Torvalds
c7b228adca Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU updates from Ingo Molnar:
 "x86 FPU handling fixes, cleanups and enhancements from Oleg.

  The signal handling race fix and the __restore_xstate_sig() preemption
  fix for eager-mode is marked for -stable as well"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: copy_thread: Don't nullify ->ptrace_bps twice
  x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct()
  x86, fpu: copy_process: Sanitize fpu->last_cpu initialization
  x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
  x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()
  x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
  x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
2014-10-14 02:20:50 +02:00
Linus Torvalds
708d0b41a2 Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Ingo Molnar:
 "This tree includes the following changes:

   - Introduce DISABLED_MASK to list disabled CPU features, to simplify
     CPU feature handling and avoid excessive #ifdefs

   - Remove the lightly used cpu_has_pae() primitive"

* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add more disabled features
  x86: Introduce disabled-features
  x86: Axe the lightly-used cpu_has_pae
2014-10-14 02:19:47 +02:00
Ulrich Obergfell
9919e39a17 kvm: ensure hard lockup detection is disabled by default
Use watchdog_enable_hardlockup_detector() to set hard lockup detection's
default value to false.  It's risky to run this detection in a guest, as
false positives are easy to trigger, especially if the host is
overcommitted.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:27 +02:00
Xishi Qiu
bd5cfb8977 arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable
If all the nodes are marked hotpluggable, alloc node data will fail.
Because __next_mem_range_rev() will skip the hotpluggable memory
regions.  numa_clear_kernel_node_hotplug() is called after alloc node
data.

numa_init()
    ...
    ret = init_func();  // this will mark hotpluggable flag from SRAT
    ...
    memblock_set_bottom_up(false);
    ...
    ret = numa_register_memblks(&numa_meminfo);  // this will alloc node data(pglist_data)
    ...
    numa_clear_kernel_node_hotplug();  // in case all the nodes are hotpluggable
    ...

numa_register_memblks()
    setup_node_data()
        memblock_find_in_range_node()
            __memblock_find_range_top_down()
                for_each_mem_range_rev()
                    __next_mem_range_rev()

This patch moves numa_clear_kernel_node_hotplug() into
numa_register_memblks(), clear kernel node hotpluggable flag before
alloc node data, then alloc node data won't fail even all the nodes
are hotpluggable.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:26 +02:00
Andrew Morton
e48510f451 arch/x86/kernel/cpu/common.c: fix unused symbol warning
x86_64 allnoconfig:

arch/x86/kernel/cpu/common.c:968: warning: 'syscall32_cpu_init' defined but not used

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:23 +02:00
Mike Travis
906e36c5c7 x86: use optimized ioresource lookup in ioremap function
Use the optimized ioresource lookup, "region_is_ram", for the ioremap
function.  If the region is not found, it falls back to the
"page_is_ram" function.  If it is found and it is RAM, then the usual
warning message is issued, and the ioremap operation is aborted.
Otherwise, the ioremap operation continues.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Reviewed-by: Cliff Wickman <cpw@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:22 +02:00
Vivek Goyal
f8da964dfb kexec-bzimage64: fix sparse warnings
David Howells brought to my attention the mails generated by kbuild test
bot and following sparse warnings were present.  This patch fixes these
warnings.

  arch/x86/kernel/kexec-bzimage64.c:270:5: warning: symbol 'bzImage64_probe' was not declared. Should it be static?
  arch/x86/kernel/kexec-bzimage64.c:328:6: warning: symbol 'bzImage64_load' was not declared. Should it be static?
  arch/x86/kernel/kexec-bzimage64.c:517:5: warning: symbol 'bzImage64_cleanup' was not declared. Should it be static?
  arch/x86/kernel/kexec-bzimage64.c:531:5: warning: symbol 'bzImage64_verify_sig' was not declared. Should it be static?
  arch/x86/kernel/kexec-bzimage64.c:546:23: warning: symbol 'kexec_bzImage64_ops' was not declared. Should it be static?

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:21 +02:00
Baoquan He
a2d6aa8fa0 kexec: check if crashk_res_low exists when exclude it from crash mem ranges
Add a check if crashk_res_low exists just like GART region does.  If
crashk_res_low doesn't exist, calling exclude_mem_range is unnecessary.

Meanwhile, since crashk_res_low has been initialized at definition, it's
safe just use "if (crashk_low_res.end)" to check if it's exist.  And this
can make it consistent with other places of check.

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:21 +02:00
Baoquan He
887f4f8666 arch/x86/purgatory/Makefile: try to use automatic variable in kexec purgatory makefile
Make the Makefile of kexec purgatory be consistent with others in linux
src tree, and make it look generic and simple.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:21 +02:00