Commit Graph

20689 Commits

Author SHA1 Message Date
Jiang Liu
67dc5e701f x86, irq: Remove __init marker for functions will be used by IOAPIC hotplug
Remove __init marker for functions which will be used by IOAPIC hotplug
at runtime.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-12-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:15 +01:00
Yinghai Lu
5411dc4ccb x86, irq: Prefer assigned ID in APIC ID register for x86_64
Perfer the assigned ID in APIC ID register for x86_64 if it's still
available.

[ tglx: Added comments ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-11-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:15 +01:00
Yinghai Lu
7e8994196a x86, irq: Split out alloc_ioapic_save_registers()
Split out alloc_ioapic_save_registers() from early_irq_init(),
so it could be used for ioapic hotplug later.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-10-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:15 +01:00
Thomas Gleixner
250a1ac685 x86, smpboot: Remove pointless preempt_disable() in native_smp_prepare_cpus()
There is no reason to keep preemption disabled in this function.

We only have two other threads live: kthreadd and idle. Neither of
them is going to preempt. But that preempt_disable forces all the code
inside to do GFP_ATOMIC allocations which is just insane.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141205084147.153643952@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:14 +01:00
Jan Beulich
2414e021ac x86: Avoid building unused IRQ entry stubs
When X86_LOCAL_APIC (i.e. unconditionally on x86-64),
first_system_vector will never end up being higher than
LOCAL_TIMER_VECTOR (0xef), and hence building stubs for vectors
0xef...0xff is pointlessly reducing code density. Deal with this at
build time already.

Taking into consideration that X86_64 implies X86_LOCAL_APIC, also
simplify (and hence make easier to read and more consistent with the
change done here) some #if-s in arch/x86/kernel/irqinit.c.

While we could further improve the packing of the IRQ entry stubs (the
four ones now left in the last set could be fit into the four padding
bytes each of the final four sets have) this doesn't seem to provide
any real benefit: Both irq_entries_start and common_interrupt getting
cache line aligned, eliminating the 30th set would just produce 32
bytes of padding between the 29th and common_interrupt.

[ tglx: Folded lguest fix from Dan Carpenter ]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: lguest@lists.ozlabs.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Link: http://lkml.kernel.org/r/54574D5F0200007800044389@mail.emea.novell.com
Link: http://lkml.kernel.org/r/20141115185718.GB6530@mwanda
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:14 +01:00
Jan Beulich
e106798259 x86: irq: Fix placement of mp_should_keep_irq()
While f3761db164 ("x86, irq: Fix build error caused by
9eabc99a635a77cbf09") addressed the original build problem,
declaration, inline stub, and definition still seem misplaced: It isn't
really IO-APIC related, and it's being used solely in arch/x86/pci/.
This also means stubbing it out when !CONFIG_X86_IO_APIC was at least
questionable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/545747BE020000780004436E@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:14 +01:00
Jiang Liu
9f50c6ea0d x86, irq, ACPI: Fix building warning of unused code
Fix building warning when CONFIG_X86_LOCAL_APIC is disabled.
arch/x86/kernel/acpi/boot.c:699:20: warning: ‘acpi_set_irq_model_ioapic’ defined but not used [-Wunused-function]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Link: http://lkml.kernel.org/r/1414397531-28254-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:14 +01:00
Jiang Liu
e32c67e0cb x86, irq: Provide empty send_cleanup_vector() stub for UP builds
Define an empty send_cleanup_vector() for UP kernel to fix link error
of undefined reference, which is used by uv_irq and irq_remapping.

[ tglx: Made it an inline stub and moved it ahead of the file split
  	changes ]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1414397531-28254-21-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:14 +01:00
Linus Torvalds
988adfdffd Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Highlights:

   - AMD KFD driver merge

     This is the AMD HSA interface for exposing a lowlevel interface for
     GPGPU use.  They have an open source userspace built on top of this
     interface, and the code looks as good as it was going to get out of
     tree.

   - Initial atomic modesetting work

     The need for an atomic modesetting interface to allow userspace to
     try and send a complete set of modesetting state to the driver has
     arisen, and been suffering from neglect this past year.  No more,
     the start of the common code and changes for msm driver to use it
     are in this tree.  Ongoing work to get the userspace ioctl finished
     and the code clean will probably wait until next kernel.

   - DisplayID 1.3 and tiled monitor exposed to userspace.

     Tiled monitor property is now exposed for userspace to make use of.

   - Rockchip drm driver merged.

   - imx gpu driver moved out of staging

  Other stuff:

   - core:
        panel - MIPI DSI + new panels.
        expose suggested x/y properties for virtual GPUs

   - i915:
        Initial Skylake (SKL) support
        gen3/4 reset work
        start of dri1/ums removal
        infoframe tracking
        fixes for lots of things.

   - nouveau:
        tegra k1 voltage support
        GM204 modesetting support
        GT21x memory reclocking work

   - radeon:
        CI dpm fixes
        GPUVM improvements
        Initial DPM fan control

   - rcar-du:
        HDMI support added
        removed some support for old boards
        slave encoder driver for Analog Devices adv7511

   - exynos:
        Exynos4415 SoC support

   - msm:
        a4xx gpu support
        atomic helper conversion

   - tegra:
        iommu support
        universal plane support
        ganged-mode DSI support

   - sti:
        HDMI i2c improvements

   - vmwgfx:
        some late fixes.

   - qxl:
        use suggested x/y properties"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits)
  drm: sti: fix module compilation issue
  drm/i915: save/restore GMBUS freq across suspend/resume on gen4
  drm: sti: correctly cleanup CRTC and planes
  drm: sti: add HQVDP plane
  drm: sti: add cursor plane
  drm: sti: enable auxiliary CRTC
  drm: sti: fix delay in VTG programming
  drm: sti: prepare sti_tvout to support auxiliary crtc
  drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off}
  drm: sti: fix hdmi avi infoframe
  drm: sti: remove event lock while disabling vblank
  drm: sti: simplify gdp code
  drm: sti: clear all mixer control
  drm: sti: remove gpio for HDMI hot plug detection
  drm: sti: allow to change hdmi ddc i2c adapter
  drm/doc: Document drm_add_modes_noedid() usage
  drm/i915: Remove '& 0xffff' from the mask given to WA_REG()
  drm/i915: Invert the mask and val arguments in wa_add() and WA_REG()
  drm: Zero out DRM object memory upon cleanup
  drm/i915/bdw: Fix the write setting up the WIZ hashing mode
  ...
2014-12-15 15:52:01 -08:00
Linus Torvalds
26178ec11e x86: mm: consolidate VM_FAULT_RETRY handling
The VM_FAULT_RETRY handling was confusing and incorrect for the case of
returning to kernel mode.  We need to handle the exception table fixup
if we return to kernel mode due to a fatal signal - it will basically
look to the kernel user mode access like the access failed due to the VM
going away from udner it.  Which is correct - the process is dying - and
avoids the whole "repeat endless kernel page faults" case.

Handling the VM_FAULT_RETRY early and in just one place also simplifies
the mmap_sem handling, since once we've taken care of VM_FAULT_RETRY we
know that we can just drop the lock.  The remaining accounting and
possible error handling is thread-local and does not need the mmap_sem.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-15 15:07:33 -08:00
Linus Torvalds
7fb08eca45 x86: mm: move mmap_sem unlock from mm_fault_error() to caller
This replaces four copies in various stages of mm_fault_error() handling
with just a single one.  It will also allow for more natural placement
of the unlocking after some further cleanup.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-15 14:46:06 -08:00
Chen, Gong
d91525eb8e ACPI, EINJ: Enhance error injection tolerance level
Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
specific errors. EINJ will report errors as below when hitting such
cases:

APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers

It is because on x86 platform ACPI based PCI MMCFG logic has
reserved all MMCFG spaces so that EINJ can't reserve it again.
We already trust the ACPI/APEI code when using the EINJ interface
so it is not a big leap to also trust it to access the right
MMCFG addresses. Skip address checking to allow the access.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-12-15 11:36:37 -08:00
Dave Hansen
72e9b5fe9b x86, mpx: Give MPX a real config option prompt
Give MPX a real config option. The CPUs that support it (referenced
here):

  https://software.intel.com/en-us/forums/topic/402393

are not available publicly yet. Right now only the software emulator
provides MPX for the general public.

[ tglx: Make it default off. There is no point in having it on right
        now as no hardware and no proper tooling support are available ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141212183836.2569D58D@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-15 15:58:57 +01:00
Paolo Bonzini
333bce5aac Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot
problems, clarifies VCPU init, and fixes a regression concerning the
 VGIC init flow.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUjsVhAAoJEEtpOizt6ddy5rIH/1V/YVwhprC55YqdHelU9Qu2
 Muzsx+7F71NxC7xgMGFqPD1YrPR+hxvoPhy+ADOBlvcqlolrkDnV9I+8e3geaYNc
 nZ/yEnoGTtbAggiS1smx7usBv34Z88Sd5txNjmj1cmHBy+VOWlyidWMkGBTsfBRe
 mVc61BDUfyC47udgRHXhwS80sbHLJHElmADisFOVmQNBYwwiHiTdx0hMBMnHcC3Y
 /3T0tKxHdeTISnmA+J+n7TcChtTIM4xqC6kwf3rw3b7XX8gdtTKylDHX2GLAg646
 RdebAG2twmGpIc6SxXZbo38f3oY9OFo1Le5xZGa6iUjD56VDw/e4wg4iA2juo0Y=
 =J2Ut
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-3.19-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot
problems, clarifies VCPU init, and fixes a regression concerning the
VGIC init flow.

Conflicts:
	arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]
2014-12-15 13:06:40 +01:00
Linus Torvalds
e6b5be2be4 Driver core patches for 3.19-rc1
Here's the set of driver core patches for 3.19-rc1.
 
 They are dominated by the removal of the .owner field in platform
 drivers.  They touch a lot of files, but they are "simple" changes, just
 removing a line in a structure.
 
 Other than that, a few minor driver core and debugfs changes.  There are
 some ath9k patches coming in through this tree that have been acked by
 the wireless maintainers as they relied on the debugfs changes.
 
 Everything has been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
 53kAoLeteByQ3iVwWurwwseRPiWa8+MI
 =OVRS
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core update from Greg KH:
 "Here's the set of driver core patches for 3.19-rc1.

  They are dominated by the removal of the .owner field in platform
  drivers.  They touch a lot of files, but they are "simple" changes,
  just removing a line in a structure.

  Other than that, a few minor driver core and debugfs changes.  There
  are some ath9k patches coming in through this tree that have been
  acked by the wireless maintainers as they relied on the debugfs
  changes.

  Everything has been in linux-next for a while"

* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
  Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
  fs: debugfs: add forward declaration for struct device type
  firmware class: Deletion of an unnecessary check before the function call "vunmap"
  firmware loader: fix hung task warning dump
  devcoredump: provide a one-way disable function
  device: Add dev_<level>_once variants
  ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
  ath: use seq_file api for ath9k debugfs files
  debugfs: add helper function to create device related seq_file
  drivers/base: cacheinfo: remove noisy error boot message
  Revert "core: platform: add warning if driver has no owner"
  drivers: base: support cpu cache information interface to userspace via sysfs
  drivers: base: add cpu_device_create to support per-cpu devices
  topology: replace custom attribute macros with standard DEVICE_ATTR*
  cpumask: factor out show_cpumap into separate helper function
  driver core: Fix unbalanced device reference in drivers_probe
  driver core: fix race with userland in device_add()
  sysfs/kernfs: make read requests on pre-alloc files use the buffer.
  sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
  fs: sysfs: return EGBIG on write if offset is larger than file size
  ...
2014-12-14 16:10:09 -08:00
Linus Torvalds
536e89ee53 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes (mainly Andy's TLS fixes), plus a cleanup"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tls: Disallow unusual TLS segments
  x86/tls: Validate TLS entries to protect espfix
  MAINTAINERS: Add me as x86 VDSO submaintainer
  x86/asm: Unify segment selector defines
  x86/asm: Guard against building the 32/64-bit versions of the asm-offsets*.c file directly
  x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  x86/mm: Use min() instead of min_t() in the e820 printout code
  x86/mm: Fix zone ranges boot printout
  x86/doc: Update documentation after file shuffling
2014-12-14 11:51:50 -08:00
Andy Lutomirski
0e58af4e1d x86/tls: Disallow unusual TLS segments
Users have no business installing custom code segments into the
GDT, and segments that are not present but are otherwise valid
are a historical source of interesting attacks.

For completeness, block attempts to set the L bit.  (Prior to
this patch, the L bit would have been silently dropped.)

This is an ABI break.  I've checked glibc, musl, and Wine, and
none of them look like they'll have any trouble.

Note to stable maintainers: this is a hardening patch that fixes
no known bugs.  Given the possibility of ABI issues, this
probably shouldn't be backported quickly.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org # optional
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: security@kernel.org <security@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-14 08:50:31 +01:00
Andy Lutomirski
41bdc78544 x86/tls: Validate TLS entries to protect espfix
Installing a 16-bit RW data segment into the GDT defeats espfix.
AFAICT this will not affect glibc, Wine, or dosemu at all.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: security@kernel.org <security@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-14 08:50:31 +01:00
Linus Torvalds
52bb452558 Here's two fixes:
1) Discovered by Fengguang Wu's tests. I changed the parameters to
    the function graph x86 prepare_ftrace_return call but forgot
    to update the call from entry_32 (i386 version). This patch corrects
    that.
 
 2) I was tracing some code and found that the sched_switch tracepoint
    was showing tasks in the INTERRUPTIBLE state as RUNNING. This was
    due to the updates to convert preempt_count into a per_cpu variable.
    The tracepoint logic was made to use the tasks saved_preempt_count
    which could hold a stale "PREEMPT_ACTIVE", instead of using the
    current preempt_count() call.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEbBAABAgAGBQJUi75HAAoJEEjnJuOKh9ldYVoH+LZsoT0m5UVQT8vQmahXABnY
 A3l19KGUAL3qaWRf4Au50sK2NdBTfGjvt8KGNkskPsvVv5X3z0GoXIIA76SD/MtX
 ysyLUXGCayNCqb3akuzznGZxE8CNKcU5aj3Hy+hIvRI6tgg2sEDt67QBAYsukIOR
 MN3us7ezvh0r+8muyPdrpC2OtkwsC1QvX2My1km0UU67CcWxo8zIuzUeeMSe+1+4
 6eCjLsVWAznaTo5W9i2DTKpw85hZjfAFgaGn21yRrAHbC+REpaigB9mxZg3Bb1Qb
 SovdyGSSEspke4/0Pu7bVXW/lx7dlnEsNWqc0RHWY1nd5FINQY+tRfbmgSoPRA==
 =DMsV
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Here's two fixes:

  1) Discovered by Fengguang Wu's tests.  I changed the parameters to
     the function graph x86 prepare_ftrace_return call but forgot to
     update the call from entry_32 (i386 version).  This patch corrects
     that.

  2) I was tracing some code and found that the sched_switch tracepoint
     was showing tasks in the INTERRUPTIBLE state as RUNNING.  This was
     due to the updates to convert preempt_count into a per_cpu
     variable.  The tracepoint logic was made to use the tasks
     saved_preempt_count which could hold a stale "PREEMPT_ACTIVE",
     instead of using the current preempt_count() call"

* tag 'trace-fixes-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/sched: Check preempt_count() for current when reading task->state
  ftrace/x86: Update i386 call to prepare_ftrace_return()
2014-12-13 14:01:11 -08:00
Linus Torvalds
e3aa91a7cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 - The crypto API is now documented :)
 - Disallow arbitrary module loading through crypto API.
 - Allow get request with empty driver name through crypto_user.
 - Allow speed testing of arbitrary hash functions.
 - Add caam support for ctr(aes), gcm(aes) and their derivatives.
 - nx now supports concurrent hashing properly.
 - Add sahara support for SHA1/256.
 - Add ARM64 version of CRC32.
 - Misc fixes.

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
  crypto: tcrypt - Allow speed testing of arbitrary hash functions
  crypto: af_alg - add user space interface for AEAD
  crypto: qat - fix problem with coalescing enable logic
  crypto: sahara - add support for SHA1/256
  crypto: sahara - replace tasklets with kthread
  crypto: sahara - add support for i.MX53
  crypto: sahara - fix spinlock initialization
  crypto: arm - replace memset by memzero_explicit
  crypto: powerpc - replace memset by memzero_explicit
  crypto: sha - replace memset by memzero_explicit
  crypto: sparc - replace memset by memzero_explicit
  crypto: algif_skcipher - initialize upon init request
  crypto: algif_skcipher - removed unneeded code
  crypto: algif_skcipher - Fixed blocking recvmsg
  crypto: drbg - use memzero_explicit() for clearing sensitive data
  crypto: drbg - use MODULE_ALIAS_CRYPTO
  crypto: include crypto- module prefix in template
  crypto: user - add MODULE_ALIAS
  crypto: sha-mb - remove a bogus NULL check
  crytpo: qat - Fix 64 bytes requests
  ...
2014-12-13 13:33:26 -08:00
Linus Torvalds
78a45c6f06 Merge branch 'akpm' (second patch-bomb from Andrew)
Merge second patchbomb from Andrew Morton:
 - the rest of MM
 - misc fs fixes
 - add execveat() syscall
 - new ratelimit feature for fault-injection
 - decompressor updates
 - ipc/ updates
 - fallocate feature creep
 - fsnotify cleanups
 - a few other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits)
  cgroups: Documentation: fix trivial typos and wrong paragraph numberings
  parisc: percpu: update comments referring to __get_cpu_var
  percpu: update local_ops.txt to reflect this_cpu operations
  percpu: remove __get_cpu_var and __raw_get_cpu_var macros
  fsnotify: remove destroy_list from fsnotify_mark
  fsnotify: unify inode and mount marks handling
  fallocate: create FAN_MODIFY and IN_MODIFY events
  mm/cma: make kmemleak ignore CMA regions
  slub: fix cpuset check in get_any_partial
  slab: fix cpuset check in fallback_alloc
  shmdt: use i_size_read() instead of ->i_size
  ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
  ipc/msg: increase MSGMNI, remove scaling
  ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
  ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
  lib/decompress.c: consistency of compress formats for kernel image
  decompress_bunzip2: off by one in get_next_block()
  usr/Kconfig: make initrd compression algorithm selection not expert
  fault-inject: add ratelimit option
  ratelimit: add initialization macro
  ...
2014-12-13 13:00:36 -08:00
Riku Voipio
957e3facd1 gcov: enable GCOV_PROFILE_ALL from ARCH Kconfigs
Following the suggestions from Andrew Morton and Stephen Rothwell,
Dont expand the ARCH list in kernel/gcov/Kconfig. Instead,
define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures
can enable.

set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was
previously allowed + ARM64 which I tested.

Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:51 -08:00
David Drysdale
27d6ec7ad6 x86: hook up execveat system call
Hook up x86-64, i386 and x32 ABIs.

Signed-off-by: David Drysdale <drysdale@google.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: Shuah Khan <shuah.kh@samsung.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rich Felker <dalias@aerifal.cx>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:51 -08:00
Joonsoo Kim
031bc5743f mm/debug-pagealloc: make debug-pagealloc boottime configurable
Now, we have prepared to avoid using debug-pagealloc in boottime.  So
introduce new kernel-parameter to disable debug-pagealloc in boottime, and
makes related functions to be disabled in this case.

Only non-intuitive part is change of guard page functions.  Because guard
page is effective only if debug-pagealloc is enabled, turning off
according to debug-pagealloc is reasonable thing to do.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Rafael J. Wysocki
16394fd031 x86 / PM: Replace CONFIG_PM_RUNTIME in io_apic.c
After commit b2b49ccbdd (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.

Replace CONFIG_PM_RUNTIME with CONFIG_PM in x86/kernel/apic/io_apic.c.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-13 02:07:36 +01:00
Linus Torvalds
f96fe22567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull another networking update from David Miller:
 "Small follow-up to the main merge pull from the other day:

  1) Alexander Duyck's DMA memory barrier patch set.

  2) cxgb4 driver fixes from Karen Xie.

  3) Add missing export of fixed_phy_register() to modules, from Mark
     Salter.

  4) DSA bug fixes from Florian Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
  net/macb: add TX multiqueue support for gem
  linux/interrupt.h: remove the definition of unused tasklet_hi_enable
  jme: replace calls to redundant function
  net: ethernet: davicom: Allow to select DM9000 for nios2
  net: ethernet: smsc: Allow to select SMC91X for nios2
  cxgb4: Add support for QSA modules
  libcxgbi: fix freeing skb prematurely
  cxgb4i: use set_wr_txq() to set tx queues
  cxgb4i: handle non-pdu-aligned rx data
  cxgb4i: additional types of negative advice
  cxgb4/cxgb4i: set the max. pdu length in firmware
  cxgb4i: fix credit check for tx_data_wr
  cxgb4i: fix tx immediate data credit check
  net: phy: export fixed_phy_register()
  fib_trie: Fix trie balancing issue if new node pushes down existing node
  vlan: Add ability to always enable TSO/UFO
  r8169:update rtl8168g pcie ephy parameter
  net: dsa: bcm_sf2: force link for all fixed PHY devices
  fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
  r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
  ...
2014-12-12 16:11:12 -08:00
Ingo Molnar
3459f0d78f Merge branch 'linus' into perf/urgent, to pick up the upstream merged bits
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-12 09:09:03 +01:00
Steven Rostedt (Red Hat)
f823b37ba6 ftrace/x86: Update i386 call to prepare_ftrace_return()
The parameters for prepare_ftrace_return() used by the function graph
tracer were swapped to simplify the code on x86_64. But i386 function
graph trampoline also calls this function, and it did not have its
parameters swapped.

Link: http://lkml.kernel.org/r/20141210231732.GA24163@wfg-t540p.sh.intel.com

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: 6a06bdbf7f "ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-11 23:28:54 -05:00
Linus Torvalds
9d050966e2 xen: features and fixes for 3.19-rc0
- Fully support non-coherent devices on ARM by introducing the
   mechanisms to request the hypervisor to perform the required cache
   maintainance operations.
 
 - A number of pciback bug fixes and cleanups.  Notably a deadlock fix
   if a PCI device was manually uunbound and a fix for incorrectly
   restoring state after a function reset.
 
 - In x86 PVHVM guests, use the APIC for interrupts if this has been
   virtualized by the hardware.  This reduces the number of interrupt-
   related VM exits on such hardware.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUiYb+AAoJEFxbo/MsZsTRwmEH+gNaJz5r8gIJlq8Q51+nOIs4
 Gw6HdjUB5MOT47vDV4treEOx0Bk8hYTfgWUWvAC81JMJ1sMWOVrUGuG/0lmzaomW
 zXvSk+o0n4LafwEhHb8LIccZMbaH7f9o3PNdNchrTkPrIl8Gf2nmBXCkDsT4mRye
 5ZFpc4ntgBrznh3baPYDS8PCAmlyZ0uVEnz1ofYI6S80dC13siEiPG0c9TrNEKzO
 glhvgCRmR0C4ZNLblM36HWBEqrdLuGCoNJSH+7okygyP2TLD3aO4R+9aD5JWYNdf
 fO2WmivX/zK+UGVAElrLx+rb8R2dv3ddeaE5piZhIBUieopIWJd32L3LhQORdtc=
 =N6DP
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.19-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen features and fixes from David Vrabel:

 - Fully support non-coherent devices on ARM by introducing the
   mechanisms to request the hypervisor to perform the required cache
   maintainance operations.

 - A number of pciback bug fixes and cleanups.  Notably a deadlock fix
   if a PCI device was manually uunbound and a fix for incorrectly
   restoring state after a function reset.

 - In x86 PVHVM guests, use the APIC for interrupts if this has been
   virtualized by the hardware.  This reduces the number of interrupt-
   related VM exits on such hardware.

* tag 'stable/for-linus-3.19-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (26 commits)
  Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
  xen/pci: Use APIC directly when APIC virtualization hardware is available
  xen/pci: Defer initialization of MSI ops on HVM guests
  xen-pciback: drop SR-IOV VFs when PF driver unloads
  xen/pciback: Restore configuration space when detaching from a guest.
  PCI: Expose pci_load_saved_state for public consumption.
  xen/pciback: Remove tons of dereferences
  xen/pciback: Print out the domain owning the device.
  xen/pciback: Include the domain id if removing the device whilst still in use
  driver core: Provide an wrapper around the mutex to do lockdep warnings
  xen/pciback: Don't deadlock when unbinding.
  swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single
  swiotlb-xen: call xen_dma_sync_single_for_device when appropriate
  swiotlb-xen: remove BUG_ON in xen_bus_to_phys
  swiotlb-xen: pass dev_addr to xen_dma_unmap_page and xen_dma_sync_single_for_cpu
  xen/arm: introduce GNTTABOP_cache_flush
  xen/arm/arm64: introduce xen_arch_need_swiotlb
  xen/arm/arm64: merge xen/mm32.c into xen/mm.c
  xen/arm: use hypercall to flush caches in map_page
  xen: add a dma_addr_t dev_addr argument to xen_dma_map_page
  ...
2014-12-11 18:15:33 -08:00
Alexander Duyck
1077fa36f2 arch: Add lightweight memory barriers dma_rmb() and dma_wmb()
There are a number of situations where the mandatory barriers rmb() and
wmb() are used to order memory/memory operations in the device drivers
and those barriers are much heavier than they actually need to be.  For
example in the case of PowerPC wmb() calls the heavy-weight sync
instruction when for coherent memory operations all that is really needed
is an lsync or eieio instruction.

This commit adds a coherent only version of the mandatory memory barriers
rmb() and wmb().  In most cases this should result in the barrier being the
same as the SMP barriers for the SMP case, however in some cases we use a
barrier that is somewhere in between rmb() and smp_rmb().  For example on
ARM the rmb barriers break down as follows:

  Barrier   Call     Explanation
  --------- -------- ----------------------------------
  rmb()     dsb()    Data synchronization barrier - system
  dma_rmb() dmb(osh) data memory barrier - outer sharable
  smp_rmb() dmb(ish) data memory barrier - inner sharable

These new barriers are not as safe as the standard rmb() and wmb().
Specifically they do not guarantee ordering between coherent and incoherent
memories.  The primary use case for these would be to enforce ordering of
reads and writes when accessing coherent memory that is shared between the
CPU and a device.

It may also be noted that there is no dma_mb().  Most architectures don't
provide a good mechanism for performing a coherent only full barrier without
resorting to the same mechanism used in mb().  As such there isn't much to
be gained in trying to define such a function.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 21:15:06 -05:00
Alexander Duyck
8a44971841 arch: Cleanup read_barrier_depends() and comments
This patch is meant to cleanup the handling of read_barrier_depends and
smp_read_barrier_depends.  In multiple spots in the kernel headers
read_barrier_depends is defined as "do {} while (0)", however we then go
into the SMP vs non-SMP sections and have the SMP version reference
read_barrier_depends, and the non-SMP define it as yet another empty
do/while.

With this commit I went through and cleaned out the duplicate definitions
and reduced the number of definitions down to 2 per header.  In addition I
moved the 50 line comments for the macro from the x86 and mips headers that
defined it as an empty do/while to those that were actually defining the
macro, alpha and blackfin.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 21:15:05 -05:00
Linus Torvalds
70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Linus Torvalds
bae41e45b7 sound updates for 3.19-rc1
This became a fairly large pull request.  In addition to the usual
 driver updates / fixes, there have been a high amount of cleanups in
 ASoC area, as well as control API helpers and kernel documentations
 fixes touching through the whole tree.
 
 In the driver side, the biggest changes are the support for new Intel
 SoC found on new x86 machines, and the updates of FireWire dice and
 oxfw drivers.
 
 Some remarkable items are below:
 
 * ALSA core
  - PCM mmap code cleanup, removal of arch-dependent codes
  - PCM xrun injection support
  - PCM hwptr tracepoint support
  - Refactoring of snd_pcm_action(), simplification of PCM locking
  - Robustified sequecner auto-load functionality
  - New control API helpers and lots of cleanups along with them
  - Lots of kerneldoc fixes and cleanups
 
 * USB-audio
  - The mixer resume code was largely rewritten, and the devices with
    quirks are resumed properly.
  - New hardware support: Focusrite Scarlett, Digidesign Mbox1,
    Denon/Marantz DACs, Zoom R16/24
 
 * FireWire
  - DICE driver updates with better duplex and sync support, including
    MIDI support
  - New OXFW driver for Oxford Semiconductor FW970/971 chipset,
    including the previous LaCie Speakers device.  Fullduplex and MIDI
    support included as well as DICE driver.
 
 * HD-audio
  - Refactoring the driver-caps quirk handling in snd-hda-intel
  - More consistent control names representing the topology better
  - Fixups: HP mute LED with ALC268 codec, Ideapad S210 built-in mic
    fix, ASUS Z99He laptop EAPD
 
 * ASoC
  - Conversion of AC'97 drivers to use regmap, bringing us closer to
    the removal of the ASoC level I/O code
  - Clean up a lot of old drivers that were open coding things that
    have subsequently been implemented in the core
  - Some DAPM performance improvements
  - Removal of the now seldom used CODEC mutex
  - Lots of updates for the newer Intel SoC support, including support
    for the DSP and some Cherrytrail and Braswell machine drivers
  - Support for Samsung boards using rt5631 as the CODEC
  - Removal of the obsolete AFEB9260 machine driver
  - Driver support for the TI TS3A227E headset driver used in some
    Chrombeooks
 
 * Others
  - ASIHPI driver update and cleanups
  - Lots of dev_*() printk conversions
  - Lots of trivial cleanups for the codes spotted by Coccinelle
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJUiYaqAAoJEGwxgFQ9KSmkeo0P/2aDx2w8iVi8n7Og/7VBubkm
 VZkk08IOpP3h1ojyQRsBQPI0H5AquqQTZN1TJUDcy+6PD9vckYYcag9JWhA+0RBr
 I+BfTMLB3E4umIkzOjxeoyOzheL7GoZ+eZYEm8DkAhaue+cFhjNJz+S6g8ENkxJ9
 lSjErXQxyiowc39I0v1WBZcuq6glX1psEsVup9U8m7KhNx6lexj28A2MkqicW4hs
 DZE6pYrk57W7y3+/NWxaBiglrItvScBAPpPqoyDm9zuDNTmAtGjf1uMRmRyHe30Z
 iunHXki8Fc2yBBapmfYrcLC2jyIyZykcxniF8Hd4nXUvddisFUEFFhNmB6v392d0
 4/NXSqTnsq48vm0Ezjia2LySWKZZVQtam8t9262BKHcosKYObxirekD6vijSoWO8
 ZWoXa+U1oWSFEoOAFDsu6GFqFHFRi5VhqBgIaPEIxrT2MQGHL3KU1bp8CJi/5CTU
 pNh0wC9SMtnSJJXBIP/nYH81WQxaik3c4eiHFPN4+0McBZQiIaIqMG6x+iiVNvPB
 MNLLVAzk0QiWeCmSo8OBdjOV0/T+pfQ7lrTCn2B1jdJi1CkAO8m2SwQrG4PpRx8k
 lUTBd4zTx5DYR+yPF69OyoCQg0XKjW9g62Qo5rmxrQreiidROZOBS1bljWzIPeft
 otupLmK5kz67n3eB2eto
 =sB6v
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "This became a fairly large pull request.  In addition to the usual
  driver updates / fixes, there have been a high amount of cleanups in
  ASoC area, as well as control API helpers and kernel documentations
  fixes touching through the whole tree.

  In the driver side, the biggest changes are the support for new Intel
  SoC found on new x86 machines, and the updates of FireWire dice and
  oxfw drivers.

  Some remarkable items are below:

  ALSA core:
   - PCM mmap code cleanup, removal of arch-dependent codes
   - PCM xrun injection support
   - PCM hwptr tracepoint support
   - Refactoring of snd_pcm_action(), simplification of PCM locking
   - Robustified sequecner auto-load functionality
   - New control API helpers and lots of cleanups along with them
   - Lots of kerneldoc fixes and cleanups

  USB-audio:
   - The mixer resume code was largely rewritten, and the devices with
     quirks are resumed properly.
   - New hardware support: Focusrite Scarlett, Digidesign Mbox1,
     Denon/Marantz DACs, Zoom R16/24

  FireWire:
   - DICE driver updates with better duplex and sync support, including
     MIDI support
   - New OXFW driver for Oxford Semiconductor FW970/971 chipset,
     including the previous LaCie Speakers device.  Fullduplex and MIDI
     support included as well as DICE driver.

  HD-audio:
   - Refactoring the driver-caps quirk handling in snd-hda-intel
   - More consistent control names representing the topology better
   - Fixups: HP mute LED with ALC268 codec, Ideapad S210 built-in mic
     fix, ASUS Z99He laptop EAPD

  ASoC:
   - Conversion of AC'97 drivers to use regmap, bringing us closer to
     the removal of the ASoC level I/O code
   - Clean up a lot of old drivers that were open coding things that
     have subsequently been implemented in the core
   - Some DAPM performance improvements
   - Removal of the now seldom used CODEC mutex
   - Lots of updates for the newer Intel SoC support, including support
     for the DSP and some Cherrytrail and Braswell machine drivers
   - Support for Samsung boards using rt5631 as the CODEC
   - Removal of the obsolete AFEB9260 machine driver
   - Driver support for the TI TS3A227E headset driver used in some
     Chrombeooks

  Others:
   - ASIHPI driver update and cleanups
   - Lots of dev_*() printk conversions
   - Lots of trivial cleanups for the codes spotted by Coccinelle"

* tag 'sound-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (594 commits)
  ALSA: pcxhr: NULL dereference on probe failure
  ALSA: lola: NULL dereference on probe failure
  ALSA: hda - Add "eapd" model string for AD1986A codec
  ALSA: hda - Add EAPD fixup for ASUS Z99He laptop
  ALSA: oxfw: Add hwdep interface
  ALSA: oxfw: Add support for capture/playback MIDI messages
  ALSA: oxfw: add support for capturing PCM samples
  ALSA: oxfw: Add support AMDTP in-stream
  ALSA: oxfw: Add support for Behringer/Mackie devices
  ALSA: oxfw: Change the way to start stream
  ALSA: oxfw: Add proc interface for debugging purpose
  ALSA: oxfw: Change the way to make PCM rules/constraints
  ALSA: oxfw: Add support for AV/C stream format command to get/set supported stream formation
  ALSA: oxfw: Change the way to name card
  ALSA: dice: Add support for MIDI capture/playback
  ALSA: dice: Add support for capturing PCM samples
  ALSA: dice: Support for non SYT-Match sampling clock source mode
  ALSA: dice: Add support for duplex streams with synchronization
  ALSA: dice: Change the way to start stream
  ALSA: jack: Add dummy snd_jack_set_key() definition
  ...
2014-12-11 13:20:50 -08:00
Juergen Gross
cdfa0badfc xen: switch to post-init routines in xen mmu.c earlier
With the virtual mapped linear p2m list the post-init mmu operations
must be used for setting up the p2m mappings, as in case of
CONFIG_FLATMEM the init routines may trigger BUGs.

paging_init() sets up all infrastructure needed to switch to the
post-init mmu ops done by xen_post_allocator_init(). With the virtual
mapped linear p2m list we need some mmu ops during setup of this list,
so we have to switch to the correct mmu ops as soon as possible.

The p2m list is usable from the beginning, just expansion requires to
have established the new linear mapping. So the call of
xen_remap_memory() had to be introduced, but this is not due to the
mmu ops requiring this.

Summing it up: calling xen_post_allocator_init() not directly after
paging_init() was conceptually wrong in the beginning, it just didn't
matter up to now as no functions used between the two calls needed
some critical mmu ops (e.g. alloc_pte). This has changed now, so I
corrected it.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-11 12:05:53 +00:00
Nadav Amit
ab646f54f4 KVM: x86: em_ret_far overrides cpl
commit d50eaa1803 ("KVM: x86: Perform limit checks when assigning EIP")
mistakenly used zero as cpl on em_ret_far. Use the actual one.

Fixes: d50eaa1803
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-11 12:27:32 +01:00
Bandan Das
78051e3b7e KVM: nVMX: Disable unrestricted mode if ept=0
If L0 has disabled EPT, don't advertise unrestricted
mode at all since it depends on EPT to run real mode code.

Fixes: 92fbc7b195
Cc: stable@vger.kernel.org
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-11 12:26:15 +01:00
Borislav Petkov
be9d1738b1 x86/asm: Unify segment selector defines
Those are identical on 32- and 64-bit, unify them. No functional
change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418127959-29902-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11 11:45:03 +01:00
Borislav Petkov
5de2b61a63 x86/asm: Guard against building the 32/64-bit versions of the asm-offsets*.c file directly
Sometimes it is helpful to build a kernel compilation unit
directly, i.e.:

  make .../<filename>.i

in order to look at compiler output.

Since asm-offsets_{32,64}.c are included by asm-offsets.c and
building them directly doesn't evaluate the macros used (thus
making the preprocessor output not very useful), error out when
an attempt is made to build them. Issue a hint for the user to
build asm-offsets.c instead.

Suggested-by: Michael Matz <matz@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418139917-12722-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11 11:43:56 +01:00
Andy Lutomirski
f647d7c155 x86_64, switch_to(): Load TLS descriptors before switching DS and ES
Otherwise, if buggy user code points DS or ES into the TLS
array, they would be corrupted after a context switch.

This also significantly improves the comments and documents some
gotchas in the code.

Before this patch, the both tests below failed.  With this
patch, the es test passes, although the gsbase test still fails.

 ----- begin es test -----

/*
 * Copyright (c) 2014 Andy Lutomirski
 * GPL v2
 */

static unsigned short GDT3(int idx)
{
	return (idx << 3) | 3;
}

static int create_tls(int idx, unsigned int base)
{
	struct user_desc desc = {
		.entry_number    = idx,
		.base_addr       = base,
		.limit           = 0xfffff,
		.seg_32bit       = 1,
		.contents        = 0, /* Data, grow-up */
		.read_exec_only  = 0,
		.limit_in_pages  = 1,
		.seg_not_present = 0,
		.useable         = 0,
	};

	if (syscall(SYS_set_thread_area, &desc) != 0)
		err(1, "set_thread_area");

	return desc.entry_number;
}

int main()
{
	int idx = create_tls(-1, 0);
	printf("Allocated GDT index %d\n", idx);

	unsigned short orig_es;
	asm volatile ("mov %%es,%0" : "=rm" (orig_es));

	int errors = 0;
	int total = 1000;
	for (int i = 0; i < total; i++) {
		asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx)));
		usleep(100);

		unsigned short es;
		asm volatile ("mov %%es,%0" : "=rm" (es));
		asm volatile ("mov %0,%%es" : : "rm" (orig_es));
		if (es != GDT3(idx)) {
			if (errors == 0)
				printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n",
				       GDT3(idx), es);
			errors++;
		}
	}

	if (errors) {
		printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total);
		return 1;
	} else {
		printf("[OK]\tES was preserved\n");
		return 0;
	}
}

 ----- end es test -----

 ----- begin gsbase test -----

/*
 * gsbase.c, a gsbase test
 * Copyright (c) 2014 Andy Lutomirski
 * GPL v2
 */

static unsigned char *testptr, *testptr2;

static unsigned char read_gs_testvals(void)
{
	unsigned char ret;
	asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr));
	return ret;
}

int main()
{
	int errors = 0;

	testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE,
		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
	if (testptr == MAP_FAILED)
		err(1, "mmap");

	testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE,
		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
	if (testptr2 == MAP_FAILED)
		err(1, "mmap");

	*testptr = 0;
	*testptr2 = 1;

	if (syscall(SYS_arch_prctl, ARCH_SET_GS,
		    (unsigned long)testptr2 - (unsigned long)testptr) != 0)
		err(1, "ARCH_SET_GS");

	usleep(100);

	if (read_gs_testvals() == 1) {
		printf("[OK]\tARCH_SET_GS worked\n");
	} else {
		printf("[FAIL]\tARCH_SET_GS failed\n");
		errors++;
	}

	asm volatile ("mov %0,%%gs" : : "r" (0));

	if (read_gs_testvals() == 0) {
		printf("[OK]\tWriting 0 to gs worked\n");
	} else {
		printf("[FAIL]\tWriting 0 to gs failed\n");
		errors++;
	}

	usleep(100);

	if (read_gs_testvals() == 0) {
		printf("[OK]\tgsbase is still zero\n");
	} else {
		printf("[FAIL]\tgsbase was corrupted\n");
		errors++;
	}

	return errors == 0 ? 0 : 1;
}

 ----- end gsbase test -----

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: <stable@vger.kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11 11:40:08 +01:00
Xishi Qiu
29258cf49e x86/mm: Use min() instead of min_t() in the e820 printout code
The type of "MAX_DMA_PFN" and "xXx_pfn" are both unsigned long
now, so use min() instead of min_t().

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Linux MM <linux-mm@kvack.org>
Cc: <dave@sr71.net>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/r/5487AB3F.7050807@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11 11:35:02 +01:00
Xishi Qiu
c072b90c8d x86/mm: Fix zone ranges boot printout
This is the usual physical memory layout boot printout:
	...
	[    0.000000] Zone ranges:
	[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
	[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
	[    0.000000]   Normal   [mem 0x100000000-0xc3fffffff]
	[    0.000000] Movable zone start for each node
	[    0.000000] Early memory node ranges
	[    0.000000]   node   0: [mem 0x00001000-0x00099fff]
	[    0.000000]   node   0: [mem 0x00100000-0xbf78ffff]
	[    0.000000]   node   0: [mem 0x100000000-0x63fffffff]
	[    0.000000]   node   1: [mem 0x640000000-0xc3fffffff]
	...

This is the log when we set "mem=2G" on the boot cmdline:
	...
	[    0.000000] Zone ranges:
	[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
	[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]  // should be 0x7fffffff, right?
	[    0.000000]   Normal   empty
	[    0.000000] Movable zone start for each node
	[    0.000000] Early memory node ranges
	[    0.000000]   node   0: [mem 0x00001000-0x00099fff]
	[    0.000000]   node   0: [mem 0x00100000-0x7fffffff]
	...

This patch fixes the printout, the following log shows the right
ranges:
	...
	[    0.000000] Zone ranges:
	[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
	[    0.000000]   DMA32    [mem 0x01000000-0x7fffffff]
	[    0.000000]   Normal   empty
	[    0.000000] Movable zone start for each node
	[    0.000000] Early memory node ranges
	[    0.000000]   node   0: [mem 0x00001000-0x00099fff]
	[    0.000000]   node   0: [mem 0x00100000-0x7fffffff]
	...

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Linux MM <linux-mm@kvack.org>
Cc: <dave@sr71.net>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/r/5487AB3D.6070306@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11 11:35:02 +01:00
Jiri Olsa
af91568e76 perf/x86/intel/uncore: Make sure only uncore events are collected
The uncore_collect_events functions assumes that event group
might contain only uncore events which is wrong, because it
might contain any type of events.

This bug leads to uncore framework touching 'not' uncore events,
which could end up all sorts of bugs.

One was triggered by Vince's perf fuzzer, when the uncore code
touched breakpoint event private event space as if it was uncore
event and caused BUG:

   BUG: unable to handle kernel paging request at ffffffff82822068
   IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250
   ...

The code in uncore_assign_events() function was looking for
event->hw.idx data while the event was initialized as a
breakpoint with different members in event->hw union.

This patch forces uncore_collect_events() to collect only uncore
events.

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11 11:24:14 +01:00
Linus Torvalds
92a578b064 ACPI and power management updates for 3.19-rc1
This time we have some more new material than we used to have during
 the last couple of development cycles.
 
 The most important part of it to me is the introduction of a unified
 interface for accessing device properties provided by platform
 firmware.  It works with Device Trees and ACPI in a uniform way and
 drivers using it need not worry about where the properties come
 from as long as the platform firmware (either DT or ACPI) makes
 them available.  It covers both devices and "bare" device node
 objects without struct device representation as that turns out to
 be necessary in some cases.  This has been in the works for quite
 a few months (and development cycles) and has been approved by
 all of the relevant maintainers.
 
 On top of that, some drivers are switched over to the new interface
 (at25, leds-gpio, gpio_keys_polled) and some additional changes are
 made to the core GPIO subsystem to allow device drivers to manipulate
 GPIOs in the "canonical" way on platforms that provide GPIO information
 in their ACPI tables, but don't assign names to GPIO lines (in which
 case the driver needs to do that on the basis of what it knows about
 the device in question).  That also has been approved by the GPIO
 core maintainers and the rfkill driver is now going to use it.
 
 Second is support for hardware P-states in the intel_pstate driver.
 It uses CPUID to detect whether or not the feature is supported by
 the processor in which case it will be enabled by default.  However,
 it can be disabled entirely from the kernel command line if necessary.
 
 Next is support for a platform firmware interface based on ACPI
 operation regions used by the PMIC (Power Management Integrated
 Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
 That interface is used for manipulating power resources and for
 thermal management: sensor temperature reporting, trip point setting
 and so on.
 
 Also the ACPI core is now going to support the _DEP configuration
 information in a limited way.  Basically, _DEP it supposed to reflect
 off-the-hierarchy dependencies between devices which may be very
 indirect, like when AML for one device accesses locations in an
 operation region handled by another device's driver (usually, the
 device depended on this way is a serial bus or GPIO controller).
 The support added this time is sufficient to make the ACPI battery
 driver work on Asus T100A, but it is general enough to be able to
 cover some other use cases in the future.
 
 Finally, we have a new cpufreq driver for the Loongson1B processor.
 
 In addition to the above, there are fixes and cleanups all over the
 place as usual and a traditional ACPICA update to a recent upstream
 release.
 
 As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver
 for Intel platforms should be able to handle power management of
 the DMA engine correctly, the cpufreq-dt driver should interact
 with the thermal subsystem in a better way and the ACPI backlight
 driver should handle some more corner cases, among other things.
 
 On top of the ACPICA update there are fixes for race conditions
 in the ACPICA's interrupt handling code which might lead to some
 random and strange looking failures on some systems.
 
 In the cleanups department the most visible part is the series
 of commits targeted at getting rid of the CONFIG_PM_RUNTIME
 configuration option.  That was triggered by a discussion
 regarding the generic power domains code during which we realized
 that trying to support certain combinations of PM config options
 was painful and not really worth it, because nobody would use them
 in production anyway.  For this reason, we decided to make
 CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and that lead to the
 conclusion that the latter became redundant and CONFIG_PM could
 be used instead of it.  The material here makes that replacement
 in a major part of the tree, but there will be at least one more
 batch of that in the second part of the merge window.
 
 Specifics:
 
  - Support for retrieving device properties information from ACPI
    _DSD device configuration objects and a unified device properties
    interface for device drivers (and subsystems) on top of that.
    As stated above, this works with Device Trees and ACPI and allows
    device drivers to be written in a platform firmware (DT or ACPI)
    agnostic way.  The at25, leds-gpio and gpio_keys_polled drivers
    are now going to use this new interface and the GPIO subsystem
    is additionally modified to allow device drivers to assign names
    to GPIO resources returned by ACPI _CRS objects (in case _DSD is
    not present or does not provide the expected data).  The changes
    in this set are mostly from Mika Westerberg, Rafael J Wysocki,
    Aaron Lu, and Darren Hart with some fixes from others (Fabio Estevam,
    Geert Uytterhoeven).
 
  - Support for Hardware Managed Performance States (HWP) as described
    in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
    driver.  CPUID is used to detect whether or not the feature is
    supported by the processor.  If supported, it will be enabled
    automatically unless the intel_pstate=no_hwp switch is present in
    the kernel command line.  From Dirk Brandewie.
 
  - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).
 
  - Support for firmware interface based on ACPI operation regions
    used by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
    platforms for power resource control and thermal management
    (Aaron Lu).
 
  - Limited support for retrieving off-the-hierarchy dependencies
    between devices from ACPI _DEP device configuration objects
    and deferred probing support for the ACPI battery driver based
    on the _DEP information to make that driver work on Asus T100A
    (Lan Tianyu).
 
  - New cpufreq driver for the Loongson1B processor (Kelvin Cheung).
 
  - ACPICA update to upstream revision 20141107 which only affects
    tools (Bob Moore).
 
  - Fixes for race conditions in the ACPICA's interrupt handling
    code and in the ACPI code related to system suspend and resume
    (Lv Zheng and Rafael J Wysocki).
 
  - ACPI core fix for an RCU-related issue in the ioremap() regions
    management code that slowed down significantly after CPUs had
    been allowed to enter idle states even if they'd had RCU callbakcs
    queued and triggered some problems in certain proprietary graphics
    driver (and elsewhere).  The fix replaces synchronize_rcu() in
    that code with synchronize_rcu_expedited() which makes the issue
    go away.  From Konstantin Khlebnikov.
 
  - ACPI LPSS (Low-Power Subsystem) driver fix to handle power
    management of the DMA engine included into the LPSS correctly.
    The problem is that the DMA engine doesn't have ACPI PM support
    of its own and it simply is turned off when the last LPSS device
    having ACPI PM support goes into D3cold.  To work around that,
    the PM domain used by the ACPI LPSS driver is redesigned so at
    least one device with ACPI PM support will be on as long as the
    DMA engine is in use.  From Andy Shevchenko.
 
  - ACPI backlight driver fix to avoid using it on "Win8-compatible"
    systems where it doesn't work and where it was used by default by
    mistake (Aaron Lu).
 
  - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
    Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and
    Ashwin Chaugule (mostly related to the upcoming ARM64 support).
 
  - Intel RAPL (Running Average Power Limit) power capping driver
    fixes and improvements including new processor IDs (Jacob Pan).
 
  - Generic power domains modification to power up domains after
    attaching devices to them to meet the expectations of device
    drivers and bus types assuming devices to be accessible at
    probe time (Ulf Hansson).
 
  - Preliminary support for controlling device clocks from the
    generic power domains core code and modifications of the
    ARM/shmobile platform to use that feature (Ulf Hansson).
 
  - Assorted minor fixes and cleanups of the generic power
    domains core code (Ulf Hansson, Geert Uytterhoeven).
 
  - Assorted minor fixes and cleanups of the device clocks control
    code in the PM core (Geert Uytterhoeven, Grygorii Strashko).
 
  - Consolidation of device power management Kconfig options by making
    CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
    which is now redundant (Rafael J Wysocki and Kevin Hilman).  That
    is the first batch of the changes needed for this purpose.
 
  - Core device runtime power management support code cleanup related
    to the execution of callbacks (Andrzej Hajda).
 
  - cpuidle ARM support improvements (Lorenzo Pieralisi).
 
  - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and
    a new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
    Bartlomiej Zolnierkiewicz).
 
  - New cpufreq driver callback (->ready) to be executed when the
    cpufreq core is ready to use a given policy object and cpufreq-dt
    driver modification to use that callback for cooling device
    registration (Viresh Kumar).
 
  - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu,
    James Geboski, Tomeu Vizoso).
 
  - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
    cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
    Stefan Wahren, Petr Cvek).
 
  - OPP (Operating Performance Points) framework modification to
    allow OPPs to be removed too and update of a few cpufreq drivers
    (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
    during initialization) on driver removal (Viresh Kumar).
 
  - Hibernation core fixes and cleanups (Tina Ruchandani and
    Markus Elfring).
 
  - PM Kconfig fix related to CPU power management (Pankaj Dubey).
 
  - cpupower tool fix (Prarit Bhargava).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJUhj6JAAoJEILEb/54YlRxTM4P/j5g5SfqvY0QKsn7sR7MGZ6v
 nsgCBhJAqTw3ocNC7EAs8z9h2GWy1KbKpakKYWAh9Fs1yZoey7tFSlcv/Rgjlp70
 uU5sDQHtpE9mHKiymdsowiQuWgpl962L4k+k8hUslhlvgk1PvVbpajR6OqG8G+pD
 asuIW9eh1APNkLyXmRJ3ZPomzs0VmRdZJ0NEs0lKX9mJskqEvxPIwdaxq3iaJq9B
 Fo0J345zUDcJnxWblDRdHlOigCimglElfN5qJwaC4KpwUKuBvLRKbp4f69+wfT0c
 kYFiR29X5KjJ2kLfP/wKsLyuDCYYXRq3tCia5M1tAqOjZ+UA89H/GDftx/5lntmv
 qUlBa35VfdS1SX4HyApZitOHiLgo+It/hl8Z9bJnhyVw66NxmMQ8JYN2imb8Lhqh
 XCLR7BxLTah82AapLJuQ0ZDHPzZqMPG2veC2vAzRMYzVijict/p4Y2+qBqONltER
 4rs9uRVn+hamX33lCLg8BEN8zqlnT3rJFIgGaKjq/wXHAU/zpE9CjOrKMQcAg9+s
 t51XMNPwypHMAYyGVhEL89ImjXnXxBkLRuquhlmEpvQchIhR+mR3dLsarGn7da44
 WPIQJXzcsojXczcwwfqsJCR4I1FTFyQIW+UNh02GkDRgRovQqo+Jk762U7vQwqH+
 LBdhvVaS1VW4v+FWXEoZ
 =5dox
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:
 "This time we have some more new material than we used to have during
  the last couple of development cycles.

  The most important part of it to me is the introduction of a unified
  interface for accessing device properties provided by platform
  firmware.  It works with Device Trees and ACPI in a uniform way and
  drivers using it need not worry about where the properties come from
  as long as the platform firmware (either DT or ACPI) makes them
  available.  It covers both devices and "bare" device node objects
  without struct device representation as that turns out to be necessary
  in some cases.  This has been in the works for quite a few months (and
  development cycles) and has been approved by all of the relevant
  maintainers.

  On top of that, some drivers are switched over to the new interface
  (at25, leds-gpio, gpio_keys_polled) and some additional changes are
  made to the core GPIO subsystem to allow device drivers to manipulate
  GPIOs in the "canonical" way on platforms that provide GPIO
  information in their ACPI tables, but don't assign names to GPIO lines
  (in which case the driver needs to do that on the basis of what it
  knows about the device in question).  That also has been approved by
  the GPIO core maintainers and the rfkill driver is now going to use
  it.

  Second is support for hardware P-states in the intel_pstate driver.
  It uses CPUID to detect whether or not the feature is supported by the
  processor in which case it will be enabled by default.  However, it
  can be disabled entirely from the kernel command line if necessary.

  Next is support for a platform firmware interface based on ACPI
  operation regions used by the PMIC (Power Management Integrated
  Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
  That interface is used for manipulating power resources and for
  thermal management: sensor temperature reporting, trip point setting
  and so on.

  Also the ACPI core is now going to support the _DEP configuration
  information in a limited way.  Basically, _DEP it supposed to reflect
  off-the-hierarchy dependencies between devices which may be very
  indirect, like when AML for one device accesses locations in an
  operation region handled by another device's driver (usually, the
  device depended on this way is a serial bus or GPIO controller).  The
  support added this time is sufficient to make the ACPI battery driver
  work on Asus T100A, but it is general enough to be able to cover some
  other use cases in the future.

  Finally, we have a new cpufreq driver for the Loongson1B processor.

  In addition to the above, there are fixes and cleanups all over the
  place as usual and a traditional ACPICA update to a recent upstream
  release.

  As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for
  Intel platforms should be able to handle power management of the DMA
  engine correctly, the cpufreq-dt driver should interact with the
  thermal subsystem in a better way and the ACPI backlight driver should
  handle some more corner cases, among other things.

  On top of the ACPICA update there are fixes for race conditions in the
  ACPICA's interrupt handling code which might lead to some random and
  strange looking failures on some systems.

  In the cleanups department the most visible part is the series of
  commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration
  option.  That was triggered by a discussion regarding the generic
  power domains code during which we realized that trying to support
  certain combinations of PM config options was painful and not really
  worth it, because nobody would use them in production anyway.  For
  this reason, we decided to make CONFIG_PM_SLEEP select
  CONFIG_PM_RUNTIME and that lead to the conclusion that the latter
  became redundant and CONFIG_PM could be used instead of it.  The
  material here makes that replacement in a major part of the tree, but
  there will be at least one more batch of that in the second part of
  the merge window.

  Specifics:

   - Support for retrieving device properties information from ACPI _DSD
     device configuration objects and a unified device properties
     interface for device drivers (and subsystems) on top of that.  As
     stated above, this works with Device Trees and ACPI and allows
     device drivers to be written in a platform firmware (DT or ACPI)
     agnostic way.  The at25, leds-gpio and gpio_keys_polled drivers are
     now going to use this new interface and the GPIO subsystem is
     additionally modified to allow device drivers to assign names to
     GPIO resources returned by ACPI _CRS objects (in case _DSD is not
     present or does not provide the expected data).  The changes in
     this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron
     Lu, and Darren Hart with some fixes from others (Fabio Estevam,
     Geert Uytterhoeven).

   - Support for Hardware Managed Performance States (HWP) as described
     in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
     driver.  CPUID is used to detect whether or not the feature is
     supported by the processor.  If supported, it will be enabled
     automatically unless the intel_pstate=no_hwp switch is present in
     the kernel command line.  From Dirk Brandewie.

   - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).

   - Support for firmware interface based on ACPI operation regions used
     by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
     platforms for power resource control and thermal management (Aaron
     Lu).

   - Limited support for retrieving off-the-hierarchy dependencies
     between devices from ACPI _DEP device configuration objects and
     deferred probing support for the ACPI battery driver based on the
     _DEP information to make that driver work on Asus T100A (Lan
     Tianyu).

   - New cpufreq driver for the Loongson1B processor (Kelvin Cheung).

   - ACPICA update to upstream revision 20141107 which only affects
     tools (Bob Moore).

   - Fixes for race conditions in the ACPICA's interrupt handling code
     and in the ACPI code related to system suspend and resume (Lv Zheng
     and Rafael J Wysocki).

   - ACPI core fix for an RCU-related issue in the ioremap() regions
     management code that slowed down significantly after CPUs had been
     allowed to enter idle states even if they'd had RCU callbakcs
     queued and triggered some problems in certain proprietary graphics
     driver (and elsewhere).  The fix replaces synchronize_rcu() in that
     code with synchronize_rcu_expedited() which makes the issue go
     away.  From Konstantin Khlebnikov.

   - ACPI LPSS (Low-Power Subsystem) driver fix to handle power
     management of the DMA engine included into the LPSS correctly.  The
     problem is that the DMA engine doesn't have ACPI PM support of its
     own and it simply is turned off when the last LPSS device having
     ACPI PM support goes into D3cold.  To work around that, the PM
     domain used by the ACPI LPSS driver is redesigned so at least one
     device with ACPI PM support will be on as long as the DMA engine is
     in use.  From Andy Shevchenko.

   - ACPI backlight driver fix to avoid using it on "Win8-compatible"
     systems where it doesn't work and where it was used by default by
     mistake (Aaron Lu).

   - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
     Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin
     Chaugule (mostly related to the upcoming ARM64 support).

   - Intel RAPL (Running Average Power Limit) power capping driver fixes
     and improvements including new processor IDs (Jacob Pan).

   - Generic power domains modification to power up domains after
     attaching devices to them to meet the expectations of device
     drivers and bus types assuming devices to be accessible at probe
     time (Ulf Hansson).

   - Preliminary support for controlling device clocks from the generic
     power domains core code and modifications of the ARM/shmobile
     platform to use that feature (Ulf Hansson).

   - Assorted minor fixes and cleanups of the generic power domains core
     code (Ulf Hansson, Geert Uytterhoeven).

   - Assorted minor fixes and cleanups of the device clocks control code
     in the PM core (Geert Uytterhoeven, Grygorii Strashko).

   - Consolidation of device power management Kconfig options by making
     CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
     which is now redundant (Rafael J Wysocki and Kevin Hilman).  That
     is the first batch of the changes needed for this purpose.

   - Core device runtime power management support code cleanup related
     to the execution of callbacks (Andrzej Hajda).

   - cpuidle ARM support improvements (Lorenzo Pieralisi).

   - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a
     new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
     Bartlomiej Zolnierkiewicz).

   - New cpufreq driver callback (->ready) to be executed when the
     cpufreq core is ready to use a given policy object and cpufreq-dt
     driver modification to use that callback for cooling device
     registration (Viresh Kumar).

   - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James
     Geboski, Tomeu Vizoso).

   - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
     cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
     Stefan Wahren, Petr Cvek).

   - OPP (Operating Performance Points) framework modification to allow
     OPPs to be removed too and update of a few cpufreq drivers
     (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
     during initialization) on driver removal (Viresh Kumar).

   - Hibernation core fixes and cleanups (Tina Ruchandani and Markus
     Elfring).

   - PM Kconfig fix related to CPU power management (Pankaj Dubey).

   - cpupower tool fix (Prarit Bhargava)"

* tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits)
  i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c
  dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  tools: cpupower: fix return checks for sysfs_get_idlestate_count()
  drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME
  MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  leds: leds-gpio: Fix multiple instances registration without 'label' property
  iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  hwrandom / exynos / PM: Use CONFIG_PM in #ifdef
  block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  USB / PM: Drop CONFIG_PM_RUNTIME from the USB core
  PM: Merge the SET*_RUNTIME_PM_OPS() macros
  ...
2014-12-10 21:17:00 -08:00
Linus Torvalds
350e4f4985 This code is a fork from the trace-3.19 pull as it needed the trace_seq
clean ups from that branch.
 
 This code solves the issue of performing stack dumps from NMI context.
 The issue is that printk() is not safe from NMI context as if the NMI
 were to trigger when a printk() was being performed, the NMI could
 deadlock from the printk() internal locks. This has been seen in practice.
 
 With lots of review from Petr Mladek, this code went through several
 iterations, and we feel that it is now at a point of quality to be
 accepted into mainline.
 
 Here's what is contained in this patch set:
 
  o Creates a "seq_buf" generic buffer utility that allows a descriptor
    to be passed around where functions can write their own "printk()"
    formatted strings into it. The generic version was pulled out of
    the trace_seq() code that was made specifically for tracing.
 
  o The seq_buf code was change to model the seq_file code. I have
    a patch (not included for 3.19) that converts the seq_file.c code
    over to use seq_buf.c like the trace_seq.c code does. This was done
    to make sure that seq_buf.c is compatible with seq_file.c. I may
    try to get that patch in for 3.20.
 
  o The seq_buf.c file was moved to lib/ to remove it from being dependent
    on CONFIG_TRACING.
 
  o The printk() was updated to allow for a per_cpu "override" of
    the internal calls. That is, instead of writing to the console, a call
    to printk() may do something else. This made it easier to allow the
    NMI to change what printk() does in order to call dump_stack() without
    needing to update that code as well.
 
  o Finally, the dump_stack from all CPUs via NMI code was converted to
    use the seq_buf code. The caller to trigger the NMI code would wait
    till all the NMIs finished, and then it would print the seq_buf
    data to the console safely from a non NMI context.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUhbrnAAoJEEjnJuOKh9ldsCoIAJ3sKIJ5B3jxJJTCHPAx/lZD
 GVbV1J1mu4kTAZuhJZOAxW8D6PZGZMyEjg0y6ScDEnBGcjAZ9gTiWCdakPktf9EX
 GfaPPqwiL9dZ18J9Qc6uR+7M1Ffpzzwbcc6lJrpoTcjRgkoH9wCiLS9ozFQyYzWb
 /7m5UbUM/PIk9WAjLYXPW6UUVtPTPT0RdEQKofMGTeah+vgqj4TXCOROdlxsXXWF
 77vqBvPd5TUPWFH9ftzJGDtZS8SroXVKCu3fZIqHgzAU0yqwVtH/JzDTy9u2UYhX
 GzDEPeAIdp6m6Uyc406VuIf1QW0gfBgmA0ir80vFoP27uFMM6j5HlF7azgQfx34=
 =YBgA
 -----END PGP SIGNATURE-----

Merge tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull nmi-safe seq_buf printk update from Steven Rostedt:
 "This code is a fork from the trace-3.19 pull as it needed the
  trace_seq clean ups from that branch.

  This code solves the issue of performing stack dumps from NMI context.
  The issue is that printk() is not safe from NMI context as if the NMI
  were to trigger when a printk() was being performed, the NMI could
  deadlock from the printk() internal locks.  This has been seen in
  practice.

  With lots of review from Petr Mladek, this code went through several
  iterations, and we feel that it is now at a point of quality to be
  accepted into mainline.

  Here's what is contained in this patch set:

   - Creates a "seq_buf" generic buffer utility that allows a descriptor
     to be passed around where functions can write their own "printk()"
     formatted strings into it.  The generic version was pulled out of
     the trace_seq() code that was made specifically for tracing.

   - The seq_buf code was change to model the seq_file code.  I have a
     patch (not included for 3.19) that converts the seq_file.c code
     over to use seq_buf.c like the trace_seq.c code does.  This was
     done to make sure that seq_buf.c is compatible with seq_file.c.  I
     may try to get that patch in for 3.20.

   - The seq_buf.c file was moved to lib/ to remove it from being
     dependent on CONFIG_TRACING.

   - The printk() was updated to allow for a per_cpu "override" of the
     internal calls.  That is, instead of writing to the console, a call
     to printk() may do something else.  This made it easier to allow
     the NMI to change what printk() does in order to call dump_stack()
     without needing to update that code as well.

   - Finally, the dump_stack from all CPUs via NMI code was converted to
     use the seq_buf code.  The caller to trigger the NMI code would
     wait till all the NMIs finished, and then it would print the
     seq_buf data to the console safely from a non NMI context

  One added bonus is that this code also makes the NMI dump stack work
  on PREEMPT_RT kernels.  As printk() includes sleeping locks on
  PREEMPT_RT, printk() only writes to console if the console does not
  use any rt_mutex converted spin locks.  Which a lot do"

* tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  x86/nmi: Fix use of unallocated cpumask_var_t
  printk/percpu: Define printk_func when printk is not defined
  x86/nmi: Perform a safe NMI stack trace on all CPUs
  printk: Add per_cpu printk func to allow printk to be diverted
  seq_buf: Move the seq_buf code to lib/
  seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF
  tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions
  tracing: Have seq_buf use full buffer
  seq_buf: Add seq_buf_can_fit() helper function
  tracing: Add paranoid size check in trace_printk_seq()
  tracing: Use trace_seq_used() and seq_buf_used() instead of len
  tracing: Clean up tracing_fill_pipe_page()
  seq_buf: Create seq_buf_used() to find out how much was written
  tracing: Add a seq_buf_clear() helper and clear len and readpos in init
  tracing: Convert seq_buf fields to be like seq_file fields
  tracing: Convert seq_buf_path() to be like seq_path()
  tracing: Create seq_buf layer in trace_seq
2014-12-10 20:35:41 -08:00
Linus Torvalds
1dd7dcb6ea There was a lot of clean ups and minor fixes. One of those clean ups was
to the trace_seq code. It also removed the return values to the
 trace_seq_*() functions and use trace_seq_has_overflowed() to see if
 the buffer filled up or not. This is similar to work being done to the
 seq_file code as well in another tree.
 
 Some of the other goodies include:
 
  o Added some "!" (NOT) logic to the tracing filter.
 
  o Fixed the frame pointer logic to the x86_64 mcount trampolines
 
  o Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
    That is, the ftrace trampoline can be dynamically allocated
    and be called directly by functions that only have a single hook
    to them.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUhbLGAAoJEEjnJuOKh9ldRV4H/3NcLbgGB2iu96la1zdYE6pG
 Q7cDJMxXK80YIIL70h9G0IItcD4t62LMb72lfBnMGRj3msgFb3AgISW57EuI0Pxk
 xk24wuIPoTG2S7v9sc3SboNFwO8qbtIjxD2OBmqIUrGo2sZIiGjyj3gX7mCY3uzL
 WB2bUOSFz/22OgaANinR5EELHA3pZZCf54Vz1K9ndmtK0xp0j1a7xJShD6TrMdYv
 mZ3zH5ViIhW4A3mdcMceh6fy2JLQAiEKF0uPTvcMMz7NlVul0mxyL/+10P7AE/3R
 Ehw4fzmm4NDshPDtBOkKH0LsppgXzuItFuQUTpact3JlqTg++bV6onSsrkt1hlY=
 =Z7Cm
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "There was a lot of clean ups and minor fixes.  One of those clean ups
  was to the trace_seq code.  It also removed the return values to the
  trace_seq_*() functions and use trace_seq_has_overflowed() to see if
  the buffer filled up or not.  This is similar to work being done to
  the seq_file code as well in another tree.

  Some of the other goodies include:

   - Added some "!" (NOT) logic to the tracing filter.

   - Fixed the frame pointer logic to the x86_64 mcount trampolines

   - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
     That is, the ftrace trampoline can be dynamically allocated and be
     called directly by functions that only have a single hook to them"

* tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits)
  tracing: Truncated output is better than nothing
  tracing: Add additional marks to signal very large time deltas
  Documentation: describe trace_buf_size parameter more accurately
  tracing: Allow NOT to filter AND and OR clauses
  tracing: Add NOT to filtering logic
  ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter
  ftrace/x86: Get rid of ftrace_caller_setup
  ftrace/x86: Have save_mcount_regs macro also save stack frames if needed
  ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs
  ftrace/x86: Simplify save_mcount_regs on getting RIP
  ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter
  ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments
  ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
  ftrace/x86: Have static tracing also use ftrace_caller_setup
  ftrace/x86: Have static function tracing always test for function graph
  kprobes: Add IPMODIFY flag to kprobe_ftrace_ops
  ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
  kprobes/ftrace: Recover original IP if pre_handler doesn't change it
  tracing/trivial: Fix typos and make an int into a bool
  tracing: Deletion of an unnecessary check before iput()
  ...
2014-12-10 19:58:13 -08:00
Linus Torvalds
b6da0076ba Merge branch 'akpm' (patchbomb from Andrew)
Merge first patchbomb from Andrew Morton:
 - a few minor cifs fixes
 - dma-debug upadtes
 - ocfs2
 - slab
 - about half of MM
 - procfs
 - kernel/exit.c
 - panic.c tweaks
 - printk upates
 - lib/ updates
 - checkpatch updates
 - fs/binfmt updates
 - the drivers/rtc tree
 - nilfs
 - kmod fixes
 - more kernel/exit.c
 - various other misc tweaks and fixes

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
  exit: pidns: fix/update the comments in zap_pid_ns_processes()
  exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
  exit: exit_notify: re-use "dead" list to autoreap current
  exit: reparent: call forget_original_parent() under tasklist_lock
  exit: reparent: avoid find_new_reaper() if no children
  exit: reparent: introduce find_alive_thread()
  exit: reparent: introduce find_child_reaper()
  exit: reparent: document the ->has_child_subreaper checks
  exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
  exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
  exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
  exit: proc: don't try to flush /proc/tgid/task/tgid
  exit: release_task: fix the comment about group leader accounting
  exit: wait: drop tasklist_lock before psig->c* accounting
  exit: wait: don't use zombie->real_parent
  exit: wait: cleanup the ptrace_reparented() checks
  usermodehelper: kill the kmod_thread_locker logic
  usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
  fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
  nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
  ...
2014-12-10 18:34:42 -08:00
Kirill A. Shutemov
c164e038ee mm: fix huge zero page accounting in smaps report
As a small zero page, huge zero page should not be accounted in smaps
report as normal page.

For small pages we rely on vm_normal_page() to filter out zero page, but
vm_normal_page() is not designed to handle pmds.  We only get here due
hackish cast pmd to pte in smaps_pte_range() -- pte and pmd format is not
necessary compatible on each and every architecture.

Let's add separate codepath to handle pmds.  follow_trans_huge_pmd() will
detect huge zero page for us.

We would need pmd_dirty() helper to do this properly.  The patch adds it
to THP-enabled architectures which don't yet have one.

[akpm@linux-foundation.org: use do_div to fix 32-bit build]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengwei Yin <yfw.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:08 -08:00
Linus Torvalds
cbfe0de303 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS changes from Al Viro:
 "First pile out of several (there _definitely_ will be more).  Stuff in
  this one:

   - unification of d_splice_alias()/d_materialize_unique()

   - iov_iter rewrite

   - killing a bunch of ->f_path.dentry users (and f_dentry macro).

     Getting that completed will make life much simpler for
     unionmount/overlayfs, since then we'll be able to limit the places
     sensitive to file _dentry_ to reasonably few.  Which allows to have
     file_inode(file) pointing to inode in a covered layer, with dentry
     pointing to (negative) dentry in union one.

     Still not complete, but much closer now.

   - crapectomy in lustre (dead code removal, mostly)

   - "let's make seq_printf return nothing" preparations

   - assorted cleanups and fixes

  There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  copy_from_iter_nocache()
  new helper: iov_iter_kvec()
  csum_and_copy_..._iter()
  iov_iter.c: handle ITER_KVEC directly
  iov_iter.c: convert copy_to_iter() to iterate_and_advance
  iov_iter.c: convert copy_from_iter() to iterate_and_advance
  iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
  iov_iter.c: convert iov_iter_zero() to iterate_and_advance
  iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
  iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
  iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
  iov_iter.c: iterate_and_advance
  iov_iter.c: macros for iterating over iov_iter
  kill f_dentry macro
  dcache: fix kmemcheck warning in switch_names
  new helper: audit_file()
  nfsd_vfs_write(): use file_inode()
  ncpfs: use file_inode()
  kill f_dentry uses
  lockd: get rid of ->f_path.dentry->d_sb
  ...
2014-12-10 16:10:49 -08:00
Linus Torvalds
3a5dc1fafb Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Reload microcode when resuming and the case when only the early
     loader has been utilized.  (Borislav Petkov)

   - Also, do not load the driver on paravirt guests.  (Boris
     Ostrovsky)"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/intel: Fish out the stashed microcode for the BSP
  x86, microcode: Reload microcode on resume
  x86, microcode: Don't initialize microcode code on paravirt
  x86, microcode, intel: Drop unused parameter
  x86, microcode, AMD: Do not use smp_processor_id() in preemtible context
2014-12-10 15:01:43 -08:00
Linus Torvalds
3100e448e7 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso updates from Ingo Molnar:
 "Various vDSO updates from Andy Lutomirski, mostly cleanups and
  reorganization to improve maintainability, but also some
  micro-optimizations and robustization changes"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64/vsyscall: Restore orig_ax after vsyscall seccomp
  x86_64: Add a comment explaining the TASK_SIZE_MAX guard page
  x86_64,vsyscall: Make vsyscall emulation configurable
  x86_64, vsyscall: Rewrite comment and clean up headers in vsyscall code
  x86_64, vsyscall: Turn vsyscalls all the way off when vsyscall==none
  x86,vdso: Use LSL unconditionally for vgetcpu
  x86: vdso: Fix build with older gcc
  x86_64/vdso: Clean up vgetcpu init and merge the vdso initcalls
  x86_64/vdso: Remove jiffies from the vvar page
  x86/vdso: Make the PER_CPU segment 32 bits
  x86/vdso: Make the PER_CPU segment start out accessed
  x86/vdso: Change the PER_CPU segment to use struct desc_struct
  x86_64/vdso: Move getcpu code from vsyscall_64.c to vdso/vma.c
  x86_64/vsyscall: Move all of the gate_area code to vsyscall_64.c
2014-12-10 14:24:20 -08:00
Linus Torvalds
c9f861c772 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The biggest change in this cycle is better support for UCNA
  (UnCorrected No Action) events:

    "Handle all uncorrected error reports in the same way (soft
     offline the page). We used to only do that for SRAO
     (software recoverable action optional) machine checks, but
     it makes sense to also do it for UCNA (UnCorrected No
     Action) logs found by CMCI or polling."

  plus various x86 MCE handling updates and fixes"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Spell "panicked" correctly
  x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
  x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
  x86, MCE, AMD: Assign interrupt handler only when bank supports it
  x86, MCE, AMD: Drop software-defined bank in error thresholding
  x86, MCE, AMD: Move invariant code out from loop body
  x86, MCE, AMD: Correct thresholding error logging
  x86, MCE, AMD: Use macros to compute bank MSRs
  RAS, HWPOISON: Fix wrong error recovery status
  GHES: Make ghes_estatus_caches static
  APEI, GHES: Cleanup unnecessary function for lockless list
2014-12-10 14:20:10 -08:00
Linus Torvalds
a023748d53 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm tree changes from Ingo Molnar:
 "The biggest change is full PAT support from Jürgen Gross:

     The x86 architecture offers via the PAT (Page Attribute Table) a
     way to specify different caching modes in page table entries.  The
     PAT MSR contains 8 entries each specifying one of 6 possible cache
     modes.  A pte references one of those entries via 3 bits:
     _PAGE_PAT, _PAGE_PWT and _PAGE_PCD.

     The Linux kernel currently supports only 4 different cache modes.
     The PAT MSR is set up in a way that the setting of _PAGE_PAT in a
     pte doesn't matter: the top 4 entries in the PAT MSR are the same
     as the 4 lower entries.

     This results in the kernel not supporting e.g. write-through mode.
     Especially this cache mode would speed up drivers of video cards
     which now have to use uncached accesses.

     OTOH some old processors (Pentium) don't support PAT correctly and
     the Xen hypervisor has been using a different PAT MSR configuration
     for some time now and can't change that as this setting is part of
     the ABI.

     This patch set abstracts the cache mode from the pte and introduces
     tables to translate between cache mode and pte bits (the default
     cache mode "write back" is hard-wired to PAT entry 0).  The tables
     are statically initialized with values being compatible to old
     processors and current usage.  As soon as the PAT MSR is changed
     (or - in case of Xen - is read at boot time) the tables are changed
     accordingly.  Requests of mappings with special cache modes are
     always possible now, in case they are not supported there will be a
     fallback to a compatible but slower mode.

     Summing it up, this patch set adds the following features:

      - capability to support WT and WP cache modes on processors with
        full PAT support

      - processors with no or uncorrect PAT support are still working as
        today, even if WT or WP cache mode are selected by drivers for
        some pages

      - reduction of Xen special handling regarding cache mode

  Another change is a boot speedup on ridiculously large RAM systems,
  plus other smaller fixes"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  x86: mm: Move PAT only functions to mm/pat.c
  xen: Support Xen pv-domains using PAT
  x86: Enable PAT to use cache mode translation tables
  x86: Respect PAT bit when copying pte values between large and normal pages
  x86: Support PAT bit in pagetable dump for lower levels
  x86: Clean up pgtable_types.h
  x86: Use new cache mode type in memtype related functions
  x86: Use new cache mode type in mm/ioremap.c
  x86: Use new cache mode type in setting page attributes
  x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
  x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
  x86: Use new cache mode type in mm/iomap_32.c
  x86: Use new cache mode type in asm/pgtable.h
  x86: Use new cache mode type in arch/x86/mm/init_64.c
  x86: Use new cache mode type in arch/x86/pci
  x86: Use new cache mode type in drivers/video/fbdev/vermilion
  x86: Use new cache mode type in drivers/video/fbdev/gbefb.c
  x86: Use new cache mode type in include/asm/fb.h
  x86: Make page cache mode a real type
  x86: mm: Use 2GB memory block size on large-memory x86-64 systems
  ...
2014-12-10 13:59:34 -08:00
Linus Torvalds
773fed910d Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform changes from Ingo Molnar:
 "A handful of numachip APIC driver updates/fixes, and two small SGI/UV
  fixes"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: numachip: APIC driver cleanups
  x86: numachip: Elide self-IPI ICR polling
  x86: numachip: Fix 16-bit APIC ID truncation

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: UV BAU: Increase maximum CPUs per socket/hub
  x86: UV BAU: Avoid NULL pointer reference in ptc_seq_show
2014-12-10 13:40:11 -08:00
David S. Miller
22f10923dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/amd/xgbe/xgbe-desc.c
	drivers/net/ethernet/renesas/sh_eth.c

Overlapping changes in both conflict cases.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:48:20 -05:00
Linus Torvalds
8139548136 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "Changes in this cycle are:

   - support module unload for efivarfs (Mathias Krause)

   - another attempt at moving x86 to libstub taking advantage of the
     __pure attribute (Ard Biesheuvel)

   - add EFI runtime services section to ptdump (Mathias Krause)"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, ptdump: Add section for EFI runtime services
  efi/x86: Move x86 back to libstub
  efivarfs: Allow unloading when build as module
2014-12-10 12:42:16 -08:00
Linus Torvalds
206f18f2ca Merge branches 'x86-build-for-linus', 'x86-cleanups-for-linus' and 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build, cleanup and defconfig updates from Ingo Molnar:
 "A single minor build change to suppress a repetitive build messages,
  misc cleanups and a defconfig update"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/purgatory, build: Suppress kexec-purgatory.c is up to date message

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, CPU, AMD: Move K8 TLB flush filter workaround to K8 code
  x86, espfix: Remove stale ptemask
  x86, msr: Use seek definitions instead of hard-coded values
  x86, msr: Convert printk to pr_foo()
  x86, msr: Use PTR_ERR_OR_ZERO
  x86/simplefb: Use PTR_ERR_OR_ZERO
  x86/sysfb: Use PTR_ERR_OR_ZERO
  x86, cpuid: Use PTR_ERR_OR_ZERO

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kconfig/defconfig: Enable CONFIG_FHANDLE=y
2014-12-10 12:35:46 -08:00
Daniel Borkmann
0cb6c969ed net, lib: kill arch_fast_hash library bits
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.

This basically reverts commit 71ae8aac3e ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df5
("perf tools: Fix include for non x86 architectures").

Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:17:46 -05:00
Linus Torvalds
b6444bd0a1 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot and percpu updates from Ingo Molnar:
 "This tree contains a bootable images documentation update plus three
  slightly misplaced x86/asm percpu changes/optimizations"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64: Use RIP-relative addressing for most per-CPU accesses
  x86-64: Handle PC-relative relocations on per-CPU data
  x86: Convert a few more per-CPU items to read-mostly ones
  x86, boot: Document intermediates more clearly
2014-12-10 12:10:24 -08:00
Linus Torvalds
9d0cf6f564 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "Misc changes:

   - context switch micro-optimization
   - debug printout micro-optimization
   - comment enhancements and typo fix"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Replace seq_printf() with seq_puts()
  x86/asm: Fix typo in arch/x86/kernel/asm_offset_64.c
  sched/x86: Add a comment clarifying LDT context switching
  sched/x86_64: Don't save flags on context switch
2014-12-10 12:09:26 -08:00
Linus Torvalds
3eb5b893eb Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MPX support from Thomas Gleixner:
 "This enables support for x86 MPX.

  MPX is a new debug feature for bound checking in user space.  It
  requires kernel support to handle the bound tables and decode the
  bound violating instruction in the trap handler"

* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  asm-generic: Remove asm-generic arch_bprm_mm_init()
  mm: Make arch_unmap()/bprm_mm_init() available to all architectures
  x86: Cleanly separate use of asm-generic/mm_hooks.h
  x86 mpx: Change return type of get_reg_offset()
  fs: Do not include mpx.h in exec.c
  x86, mpx: Add documentation on Intel MPX
  x86, mpx: Cleanup unused bound tables
  x86, mpx: On-demand kernel allocation of bounds tables
  x86, mpx: Decode MPX instruction to get bound violation information
  x86, mpx: Add MPX-specific mmap interface
  x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: Add MPX to disabled features
  ia64: Sync struct siginfo with general version
  mips: Sync struct siginfo with general version
  mpx: Extend siginfo structure to include bound violation information
  x86, mpx: Rename cfg_reg_u and status_reg
  x86: mpx: Give bndX registers actual names
  x86: Remove arbitrary instruction size limit in instruction decoder
2014-12-10 09:34:43 -08:00
Linus Torvalds
9e66645d72 Merge branch 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq domain updates from Thomas Gleixner:
 "The real interesting irq updates:

   - Support for hierarchical irq domains:

     For complex interrupt routing scenarios where more than one
     interrupt related chip is involved we had no proper representation
     in the generic interrupt infrastructure so far.  That made people
     implement rather ugly constructs in their nested irq chip
     implementations.  The main offenders are x86 and arm/gic.

     To distangle that mess we have now hierarchical irqdomains which
     seperate the various interrupt chips and connect them via the
     hierarchical domains.  That keeps the domain specific details
     internal to the particular hierarchy level and removes the
     criss/cross referencing of chip internals.  The resulting hierarchy
     for a complex x86 system will look like this:

        vector          mapped: 74
          msi-0         mapped: 2
          dmar-ir-1     mapped: 69
            ioapic-1    mapped: 4
            ioapic-0    mapped: 20
            pci-msi-2   mapped: 45
          dmar-ir-0     mapped: 3
            ioapic-2    mapped: 1
            pci-msi-1   mapped: 2
          htirq         mapped: 0

     Neither ioapic nor pci-msi know about the dmar interrupt remapping
     between themself and the vector domain.  If interrupt remapping is
     disabled ioapic and pci-msi become direct childs of the vector
     domain.

     In hindsight we should have done that years ago, but in hindsight
     we always know better :)

   - Support for generic MSI interrupt domain handling

     We have more and more non PCI related MSI interrupts, so providing
     a generic infrastructure for this is better than having all
     affected architectures implementing their own private hacks.

   - Support for PCI-MSI interrupt domain handling, based on the generic
     MSI support.

     This part carries the pci/msi branch from Bjorn Helgaas pci tree to
     avoid a massive conflict.  The PCI/MSI parts are acked by Bjorn.

  I have two more branches on top of this.  The full conversion of x86
  to hierarchical domains and a partial conversion of arm/gic"

* 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  genirq: Move irq_chip_write_msi_msg() helper to core
  PCI/MSI: Allow an msi_controller to be associated to an irq domain
  PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain
  PCI/MSI: Enhance core to support hierarchy irqdomain
  PCI/MSI: Move cached entry functions to irq core
  genirq: Provide default callbacks for msi_domain_ops
  genirq: Introduce msi_domain_alloc/free_irqs()
  asm-generic: Add msi.h
  genirq: Add generic msi irq domain support
  genirq: Introduce callback irq_chip.irq_write_msi_msg
  genirq: Work around __irq_set_handler vs stacked domains ordering issues
  irqdomain: Introduce helper function irq_domain_add_hierarchy()
  irqdomain: Implement a method to automatically call parent domains alloc/free
  genirq: Introduce helper irq_domain_set_info() to reduce duplicated code
  genirq: Split out flow handler typedefs into seperate header file
  genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip
  genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip
  genirq: Add more helper functions to support stacked irq_chip
  genirq: Introduce helper functions to support stacked irq_chip
  irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF
  ...
2014-12-10 09:01:01 -08:00
Nadav Amit
64a38292ed KVM: x86: Emulate should check #UD before #GP
Intel SDM table 6-2 ("Priority Among Simultaneous Exceptions and Interrupts")
shows that faults from decoding the next instruction got higher priority than
general protection.  Moving the protected-mode check before the CPL check to
avoid wrong exception on vm86 mode.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-10 12:53:37 +01:00
Nadav Amit
bc397a6c91 KVM: x86: Do not push eflags.vm on pushf
The pushf instruction does not push eflags.VM, so emulation should not do so as
well.  Although eflags.RF should not be pushed as well, it is already cleared
by the time pushf is executed.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-10 12:51:28 +01:00
Nadav Amit
53bb4f789a KVM: x86: Remove prefix flag when GP macro is used
The macro GP already sets the flag Prefix. Remove the redundant flag for
0f_38_f0 and 0f_38_f1 opcodes.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-10 12:51:13 +01:00
Andy Lutomirski
29fa682546 x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
paravirt_enabled has the following effects:

 - Disables the F00F bug workaround warning.  There is no F00F bug
   workaround any more because Linux's standard IDT handling already
   works around the F00F bug, but the warning still exists.  This
   is only cosmetic, and, in any event, there is no such thing as
   KVM on a CPU with the F00F bug.

 - Disables 32-bit APM BIOS detection.  On a KVM paravirt system,
   there should be no APM BIOS anyway.

 - Disables tboot.  I think that the tboot code should check the
   CPUID hypervisor bit directly if it matters.

 - paravirt_enabled disables espfix32.  espfix32 should *not* be
   disabled under KVM paravirt.

The last point is the purpose of this patch.  It fixes a leak of the
high 16 bits of the kernel stack address on 32-bit KVM paravirt
guests.  Fixes CVE-2014-8134.

Cc: stable@vger.kernel.org
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-10 12:49:39 +01:00
Borislav Petkov
25cdb9c868 x86/microcode/intel: Fish out the stashed microcode for the BSP
I'm such a moron! The simple solution of saving the BSP patch
for use on resume was too simple (and wrong!), hint:
sizeof(struct microcode_intel).

What needs to be done instead is to fish out the microcode patch
we have stashed previously and apply that on the BSP in case the
late loader hasn't been utilized.

So do that instead.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141208110820.GB20057@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-10 11:36:28 +01:00
Linus Torvalds
86c6a2fddf Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.

     This instruments might_sleep() checks to catch places that nest
     blocking primitives - such as mutex usage in a wait loop.  Such
     bugs can result in hard to debug races/hangs.

     Another category of invalid nesting that this facility will detect
     is the calling of blocking functions from within schedule() ->
     sched_submit_work() -> blk_schedule_flush_plug().

     There's some potential for false positives (if secondary blocking
     primitives themselves are not ready yet for this facility), but the
     kernel will warn once about such bugs per bootup, so the warning
     isn't much of a nuisance.

     This feature comes with a number of fixes, for problems uncovered
     with it, so no messages are expected normally.

   - Another round of sched/numa optimizations and refinements, for
     CONFIG_NUMA_BALANCING=y.

   - Another round of sched/dl fixes and refinements.

  Plus various smaller fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched: Add missing rcu protection to wake_up_all_idle_cpus
  sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
  sched/numa: Init numa balancing fields of init_task
  sched/deadline: Remove unnecessary definitions in cpudeadline.h
  sched/cpupri: Remove unnecessary definitions in cpupri.h
  sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
  sched/fair: Fix stale overloaded status in the busiest group finding logic
  sched: Move p->nr_cpus_allowed check to select_task_rq()
  sched/completion: Document when to use wait_for_completion_io_*()
  sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
  sched/fair: Kill task_struct::numa_entry and numa_group::task_list
  sched: Refactor task_struct to use numa_faults instead of numa_* pointers
  sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
  sched/deadline: Reschedule from switched_from_dl() after a successful pull
  sched/deadline: Push task away if the deadline is equal to curr during wakeup
  sched/deadline: Add deadline rq status print
  sched/deadline: Fix artificial overrun introduced by yield_task_dl()
  sched/rt: Clean up check_preempt_equal_prio()
  sched/core: Use dl_bw_of() under rcu_read_lock_sched()
  sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
  ...
2014-12-09 21:21:34 -08:00
Linus Torvalds
bee2782f30 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull leftover perf fixes from Ingo Molnar:
 "Two perf fixes left over from the previous cycle"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf session: Do not fail on processing out of order event
  x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
2014-12-09 21:18:06 -08:00
Linus Torvalds
5706ffd045 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events update from Ingo Molnar:
 "On the kernel side there's few changes, the one that stands out is
  PEBS machine state sampling support on x86, by Stephane Eranian.

  On the tooling side:

  User visible tooling changes:

   - Don't open the DWARF info multiple times, keeping instead a dwfl
     handle in struct dso, greatly speeding up 'perf report' on powerpc.
     (Sukadev Bhattiprolu)

   - Introduce PARSE_OPT_DISABLED option flag and use it to avoid
     showing undersired options in tools that provides frontends to
     'perf record', like sched, kvm, etc (Namhyung Kim)

   - Fallback to kallsyms when using the minimal 'ELF' loader (Arnaldo
     Carvalho de Melo)

   - Fix annotation with kcore (Adrian Hunter)

   - Support source line numbers in annotate using a hotkey (Andi Kleen)

   - Callchain improvements including:
     * Enable printing the srcline in the history
     * Make get_srcline fall back to sym+offset (Andi Kleen)

   - TUI hist_entry browser fixes, including showing missing overhead
     value for first level callchain.  Detected comparing the output of
     --stdio/--gui (that matched) with --tui, that had this problem.
     (Namhyung Kim)

   - Support handling complete branch stacks as histograms (Andi Kleen)

  Tooling infrastructure changes:

   - Prep work for supporting per-pkg and snapshot counters in 'perf
     stat' (Jiri Olsa)

   - 'perf stat' refactorings, moving stuff from it to evsel.c to use in
     per-pkg/snapshot format changes (Jiri Olsa)

   - Add per-pkg format file parsing (Matt Fleming)

   - Clean up libelf feature support code (Namhyung Kim)

   - Add gzip decompression support for kernel modules (Namhyung Kim)

   - More prep patches for Intel PT, including a a thread stack and more
     stuff made available via the database export mechanism (Adrian
     Hunter)

   - More Intel PT work, including a facility to export sample data
     (comms, threads, symbol names, etc) in a database friendly way,
     with an script to use this to create a postgresql database.
     (Adrian Hunter)

   - Make sure that thread->mg->machine points to the machine where the
     thread exists (it was being set only for the kmaps kernel modules
     case, do it as well for the mmaps) and use it to shorten function
     signatures (Arnaldo Carvalho de Melo)

  ... and lots of other fixes and smaller improvements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits)
  perf report: In branch stack mode use address history sorting
  perf report: Add --branch-history option
  perf callchain: Support handling complete branch stacks as histograms
  perf stat: Add support for snapshot counters
  perf stat: Add support for per-pkg counters
  perf tools: Remove perf_evsel__read interface
  perf stat: Use read_counter in read_counter_aggr
  perf stat: Make read_counter work over the thread dimension
  perf stat: Use perf_evsel__read_cb in read_counter
  perf tools: Add snapshot format file parsing
  perf tools: Add per-pkg format file parsing
  perf evsel: Introduce perf_evsel__read_cb function
  perf evsel: Introduce perf_counts_values__scale function
  perf evsel: Introduce perf_evsel__compute_deltas function
  perf tools: Allow to force redirect pr_debug to stderr.
  perf tools: Fix segfault due to invalid kernel dso access
  perf callchain: Make get_srcline fall back to sym+offset
  perf symbols: Move bfd_demangle stubbing to its only user
  perf callchain: Enable printing the srcline in the history
  perf tools: Collapse first level callchain entry if it has sibling
  ...
2014-12-09 20:55:37 -08:00
Linus Torvalds
9c37f95936 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking tree changes from Ingo Molnar:
 "Two changes: a documentation update and a ticket locks live lock fix"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ticketlock: Fix spin_unlock_wait() livelock
  locking/lglocks: Add documentation of current lglocks implementation
2014-12-09 19:59:22 -08:00
Linus Torvalds
a0e4467726 asm-generic: asm/io.h rewrite
While there normally is no reason to have a pull request for asm-generic
 but have all changes get merged through whichever tree needs them, I do
 have a series for 3.19. There are two sets of patches that change
 significant portions of asm/io.h, and this branch contains both in order
 to resolve the conflicts:
 
 - Will Deacon has done a set of patches to ensure that all architectures
   define {read,write}{b,w,l,q}_relaxed() functions or get them by
   including asm-generic/io.h. These functions are commonly used on ARM
   specific drivers to avoid expensive L2 cache synchronization implied by
   the normal {read,write}{b,w,l,q}, but we need to define them on all
   architectures in order to share the drivers across architectures and
   to enable CONFIG_COMPILE_TEST configurations for them
 
 - Thierry Reding has done an unrelated set of patches that extends
   the asm-generic/io.h file to the degree necessary to make it useful
   on ARM64 and potentially other architectures.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAVIdwNmCrR//JCVInAQJWuw/9FHt2ThMnI1J1Jqy4CVwtyjWTSa6Y/uVj
 xSytS7AOvmU/nw1quSoba5mN9fcUQUtK9kqjqNcq71WsQcDE6BF9SFpi9cWtjWcI
 ZfWsC+5kqry/mbnuHefENipem9RqBrLbOBJ3LARf5M8rZJuTz1KbdZs9r9+1QsCX
 ou8jeqVvNKUn9J1WyekJBFSrPOtZ4bCUpeyh23JHRfPtJeAHNOuPuymj6WceAz98
 uMV1icRaCBMySsf9HgsHRYW5HwuCm3MrrYj6ukyPpgxYz7FRq4hJLDs6GnlFtAGb
 71g87NpFdB32qbW+y1ntfYaJyUryMHMVHBWcV5H9m0btdHTRHYZjoOGOPuyLHHO8
 +l4/FaOQhnDL8cNDj0HKfhdlyaFylcWgs1wzj68nv31c1dGjcJcQiyCDwry9mJhr
 erh4EewcerUvWzbBMQ4JP1f8syKMsKwbo1bVU61a1RQJxEqVCzJMLweGSOFmqMX2
 6E4ZJVWv81UFLoFTzYx+7+M45K4NWywKNQdzwKmqKHc4OQyvq4ALJI0A7SGFJdDR
 HJ7VqDiLaSdBitgJcJUxNzKcyXij6wE9jE1fBe3YDFE4LrnZXFVLN+MX6hs7AIFJ
 vJM1UpxRxQUMGIH2m7rbDNazOAsvQGxINOjNor23cNLuf6qLY1LrpHVPQDAfJVvA
 6tROM77bwIQ=
 =xUv6
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic asm/io.h rewrite from Arnd Bergmann:
 "While there normally is no reason to have a pull request for
  asm-generic but have all changes get merged through whichever tree
  needs them, I do have a series for 3.19.

  There are two sets of patches that change significant portions of
  asm/io.h, and this branch contains both in order to resolve the
  conflicts:

   - Will Deacon has done a set of patches to ensure that all
     architectures define {read,write}{b,w,l,q}_relaxed() functions or
     get them by including asm-generic/io.h.

     These functions are commonly used on ARM specific drivers to avoid
     expensive L2 cache synchronization implied by the normal
     {read,write}{b,w,l,q}, but we need to define them on all
     architectures in order to share the drivers across architectures
     and to enable CONFIG_COMPILE_TEST configurations for them

   - Thierry Reding has done an unrelated set of patches that extends
     the asm-generic/io.h file to the degree necessary to make it useful
     on ARM64 and potentially other architectures"

* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (29 commits)
  ARM64: use GENERIC_PCI_IOMAP
  sparc: io: remove duplicate relaxed accessors on sparc32
  ARM: sa11x0: Use void __iomem * in MMIO accessors
  arm64: Use include/asm-generic/io.h
  ARM: Use include/asm-generic/io.h
  asm-generic/io.h: Implement generic {read,write}s*()
  asm-generic/io.h: Reconcile I/O accessor overrides
  /dev/mem: Use more consistent data types
  Change xlate_dev_{kmem,mem}_ptr() prototypes
  ARM: ixp4xx: Properly override I/O accessors
  ARM: ixp4xx: Fix build with IXP4XX_INDIRECT_PCI
  ARM: ebsa110: Properly override I/O accessors
  ARC: Remove redundant PCI_IOBASE declaration
  documentation: memory-barriers: clarify relaxed io accessor semantics
  x86: io: implement dummy relaxed accessor macros for writes
  tile: io: implement dummy relaxed accessor macros for writes
  sparc: io: implement dummy relaxed accessor macros for writes
  powerpc: io: implement dummy relaxed accessor macros for writes
  parisc: io: implement dummy relaxed accessor macros for writes
  mn10300: io: implement dummy relaxed accessor macros for writes
  ...
2014-12-09 17:25:00 -08:00
Joe Perches
5cccc702fd x86: bpf_jit_comp: Remove inline from static function definitions
Let the compiler decide instead.

No change in object size x86-64 -O2 no profiling

Signed-off-by: Joe Perches <joe@perches.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 14:56:41 -05:00
Joe Perches
d148134be5 x86: bpf_jit_comp: Reduce is_ereg() code size
Use the (1 << reg) & mask trick to reduce code size.

x86-64 size difference -O2 without profiling for various
gcc versions:

$ size arch/x86/net/bpf_jit_comp.o*
   text    data     bss     dec     hex filename
   9266       4       0    9270    2436 arch/x86/net/bpf_jit_comp.o.4.4.new
  10042       4       0   10046    273e arch/x86/net/bpf_jit_comp.o.4.4.old
   9109       4       0    9113    2399 arch/x86/net/bpf_jit_comp.o.4.6.new
   9717       4       0    9721    25f9 arch/x86/net/bpf_jit_comp.o.4.6.old
   8789       4       0    8793    2259 arch/x86/net/bpf_jit_comp.o.4.7.new
  10245       4       0   10249    2809 arch/x86/net/bpf_jit_comp.o.4.7.old
   9671       4       0    9675    25cb arch/x86/net/bpf_jit_comp.o.4.9.new
  10679       4       0   10683    29bb arch/x86/net/bpf_jit_comp.o.4.9.old

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 14:56:41 -05:00
Linus Torvalds
0160928e79 EDAC updates all over the place:
* Enablement for AMD F15h models 0x60 CPUs. Most notably DDR4 RAM
 support. Out of tree stuff is
 
  arch/x86/kernel/amd_nb.c       |   2 +
  include/linux/pci_ids.h        |   2 +
 
 adding the required PCI IDs. From Aravind Gopalakrishnan.
 
 * Enable amd64_edac for 32-bit due to popular demand. From Tomasz Pala.
 
 * Convert the AMD MCE injection module to debugfs, where it belongs.
 
 * Misc EDAC cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUhZbcAAoJEBLB8Bhh3lVKd5kQAK77qsB4ebpke1rEBerQl9jQ
 YCVrByCKu7QTRt/xvlqU9Vyp7EvcnpNxFbRCCqzIpBcJjre9v17QVRA2/zFS0q81
 QRDTOWf9uhMWbssI2Zu1hbjDNMWYiEb9+aeZHjScVtkzPDmsgYuKGdWfTSLw9dkS
 AG/UUZ3ojwyc2dK3i0W3pCjakKYLUsCmijyTZBfb37+u4rRGuAgrQ9G8fBn2lL+l
 huptgV2BsTCQqdL554zTs64Yt912PnidsJWYCPCjMubgEPSeNcWRzTTBYDUf9NIn
 RxFXYHnOQxetPSQqfLVXlX3V/cGNQg0yEXFZ9S0tCt5uLNbbN/D8Uumtst0rq9x3
 XkJ/EGHXBFP+KwHdV/i9j6OYM5rq4z+4ql4OqbWzsvPrEDbh/4p5gRbhqd1Hhy9U
 zgHJjVPpD/l2t82Tpz0jIOscQruZ6VqGMDSYo3LiLnNt724pcrmr3DiN9mc6ljzJ
 rsNsemMH0IoH8KbBHKGtMLnBVO6HbnrtC6iKFfocNBisvo1PKKzn9s2O1pdjmsCs
 jHwz5njoM7Ki/ygkJhbKiSDMXPs67eggwoGIGzpNMoY4RWxrcQzYE9yKfKzRNxET
 Qb3xUwDWDyL8ErwHtL3xMxGwyfkhb+SZdMd5aKYA5Rdbf+TN8P6iAv05nrnfpkyk
 lPTv5o9EQYvr8/Tc9FZW
 =ULmb
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "EDAC updates all over the place:

   - Enablement for AMD F15h models 0x60 CPUs.  Most notably DDR4 RAM
     support.  Out of tree stuff is adding the required PCI IDs.  From
     Aravind Gopalakrishnan.

   - Enable amd64_edac for 32-bit due to popular demand.  From Tomasz
     Pala.

   - Convert the AMD MCE injection module to debugfs, where it belongs.

   - Misc EDAC cleanups"

* tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, MCE, AMD: Correct formatting of decoded text
  EDAC, mce_amd_inj: Add an injector function
  EDAC, mce_amd_inj: Add hw-injection attributes
  EDAC, mce_amd_inj: Enable direct writes to MCE MSRs
  EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs
  EDAC: Delete unnecessary check before calling pci_dev_put()
  EDAC, pci_sysfs: remove unneccessary ifdef around entire file
  ghes_edac: Use snprintf() to silence a static checker warning
  amd64_edac: Build module on x86-32
  EDAC, MCE, AMD: Add decoding table for MC6 xec
  amd64_edac: Add F15h M60h support
  {mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED
  EDAC: Sync memory types and names
  EDAC: Add DDR3 LRDIMM entries to edac_mem_types
  x86, amd_nb: Add device IDs to NB tables for F15h M60h
  pci_ids: Add PCI device IDs for F15h M60h
2014-12-08 20:17:49 -08:00
Al Viro
ba00410b81 Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
Rafael J. Wysocki
cfc75ed68b Merge branch 'pm-cpufreq'
* pm-cpufreq: (21 commits)
  intel_pstate: skip this driver if Sun server has _PPC method
  cpufreq: arm_big_little: free OPP table created during ->init()
  imx6q: free OPP table created during ->init()
  exynos5440: free OPP table created during ->init()
  cpufreq-dt: free OPP table created during ->init()
  cpufreq-dt: register cooling device from ->ready() callback
  cpufreq: Introduce ->ready() callback for cpufreq drivers
  cpufreq-dt: pass 'policy->related_cpus' to of_cpufreq_cooling_register()
  cpufreq: Fix formatting issues in 'struct cpufreq_driver'
  cpufreq: pxa2xx: Add Kconfig entry
  cpufreq: Ref the policy object sooner
  cpufreq: Kconfig: Remove architecture specific menu entries
  cpufreq: pcc: Enable autoload of pcc-cpufreq for ACPI processors
  intel_pstate: Add CPUID for BDW-H CPU
  intel_pstate: Add support for HWP
  x86: Add support for Intel HWP feature detection.
  cpufreq: respect the min/max settings from user space
  cpufreq: cpufreq-dt: Handle regulator_get_voltage() failure
  cpufreq: cpufreq-dt: Improve debug about matching OPP
  cpufreq: Loongson1: Add cpufreq driver for Loongson1B
  ...
2014-12-08 20:00:15 +01:00
Rafael J. Wysocki
648fcab2b0 Merge branch 'pm-cpuidle'
* pm-cpuidle:
  cpuidle: add MAINTAINERS entry for ARM Exynos cpuidle driver
  drivers: cpuidle: Remove cpuidle-arm64 duplicate error messages
  drivers: cpuidle: Add idle-state-name description to ARM idle states
  drivers: cpuidle: Add status property to ARM idle states
  cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic
2014-12-08 20:00:09 +01:00
Mark Brown
addaeea9ee Merge remote-tracking branches 'asoc/topic/hdmi', 'asoc/topic/intel', 'asoc/topic/jack', 'asoc/topic/jz4740' and 'asoc/topic/lm49453' into asoc-next 2014-12-08 13:12:00 +00:00
Dan Carpenter
e10abb2f77 x86_64/traps: Fix always true condition
We should be checking IS_ERR() here.  PTR_ERR() is always true.

Fixes: fe3d197f84 ('x86, mpx: On-demand kernel allocation of
bounds tables')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20141125172114.GA24535@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 12:06:59 +01:00
Richard Weinberger
15bae280e4 x86/kconfig/defconfig: Enable CONFIG_FHANDLE=y
systemd has a hard dependency on CONFIG_FHANDLE.
If you run systemd with CONFIG_FHANDLE=n it will somehow
boot but fail to spawn a getty or other basic services.
As systemd is now used by most x86 distributions it
makes sense to enabled this by default and save kernel
hackers a lot of value debugging time.

Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: gregkh@linuxfoundation.org
Cc: rafael.j.wysocki@intel.com
Cc: pebolle@tiscali.nl
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1416958612-7448-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 12:04:17 +01:00
Juergen Gross
76f0a486fa xen: annotate xen_set_identity_and_remap_chunk() with __init
Commit 5b8e7d8054 removed the __init
annotation from xen_set_identity_and_remap_chunk(). Add it again.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-08 10:55:30 +00:00
Juergen Gross
90fff3ea15 xen: introduce helper functions to do safe read and write accesses
Introduce two helper functions to safely read and write unsigned long
values from or to memory when the access may fault because the mapping
is non-present or read-only.

These helpers can be used instead of open coded uses of __get_user()
and __put_user() avoiding the need to do casts to fix sparse warnings.

Use the helpers in page.h and p2m.c. This will fix the sparse
warnings when doing "make C=1".

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-08 10:53:59 +00:00
Ingo Molnar
2a2662bf88 Merge branch 'perf/core-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into perf/hw_breakpoints
Pull AMD range breakpoints support from Frederic Weisbecker:

" - Extend breakpoint tools and core to support address range through perf
    event with initial backend support for AMD extended breakpoints.

    Syntax is:

           perf record -e mem:addr/len:type

    For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512)

           perf record -e mem:0x1000/512:w

 - Clean up a bit breakpoint code validation

 It has been acked by Jiri and Oleg. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 11:50:24 +01:00
Rasmus Villemoes
3736708f03 x86: Replace seq_printf() with seq_puts()
seq_puts is a lot cheaper than seq_printf, so use that to print
literal strings.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: http://lkml.kernel.org/r/1417208622-12264-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 11:48:15 +01:00
Oleg Nesterov
78bff1c868 x86/ticketlock: Fix spin_unlock_wait() livelock
arch_spin_unlock_wait() looks very suboptimal, to the point I
think this is just wrong and can lead to livelock: if the lock
is heavily contended we can never see head == tail.

But we do not need to wait for arch_spin_is_locked() == F. If it
is locked we only need to wait until the current owner drops
this lock. So we could simply spin until old_head !=
lock->tickets.head in this case, but .head can overflow and thus
we can't check "unlocked" only once before the main loop.

Also, the "unlocked" check can ignore TICKET_SLOWPATH_FLAG bit.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Paul E.McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/20141201213417.GA5842@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 11:36:44 +01:00
Borislav Petkov
c7c9b3929b x86/mce: Spell "panicked" correctly
We need the additional "k" to make it a hard-c:

  https://en.wiktionary.org/wiki/panicked

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417642605-15730-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 11:12:46 +01:00
Dave Airlie
8c86394470 Linux 3.18
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUhNLZAAoJEHm+PkMAQRiGAEcH/iclYDW7k2GKemMqboy+Ohmh
 +ELbQothNhlGZlS1wWdD69LBiiXkkQ+ufVYFh/hC0oy0gUdfPMt5t+bOHy6cjn6w
 9zOcACtpDKnqbOwRqXZjZgNmIabk7lRjbn7GK4GQqpIaW4oO0FWcT91FFhtGSPDa
 tjtmGRqDmbNsqfzr18h0WPEpUZmT6MxIdv17AYDliPB1MaaRuAv1Kss05TJrXdfL
 Oucv+C0uwnybD9UWAz6pLJ3H/HR9VJFdkaJ4Y0pbCHAuxdd1+swoTpicluHlsJA1
 EkK5iWQRMpcmGwKvB0unCAQljNpaJiq4/Tlmmv8JlYpMlmIiVLT0D8BZx5q05QQ=
 =oGNw
 -----END PGP SIGNATURE-----

Merge tag 'v3.18' into drm-next

Linux 3.18

Backmerge Linus tree into -next as we had conflicts in i915/radeon/nouveau,
and everyone was solving them individually.

* tag 'v3.18': (57 commits)
  Linux 3.18
  watchdog: s3c2410_wdt: Fix the mask bit offset for Exynos7
  uapi: fix to export linux/vm_sockets.h
  i2c: cadence: Set the hardware time-out register to maximum value
  i2c: davinci: generate STP always when NACK is received
  ahci: disable MSI on SAMSUNG 0xa800 SSD
  context_tracking: Restore previous state in schedule_user
  slab: fix nodeid bounds check for non-contiguous node IDs
  lib/genalloc.c: export devm_gen_pool_create() for modules
  mm: fix anon_vma_clone() error treatment
  mm: fix swapoff hang after page migration and fork
  fat: fix oops on corrupted vfat fs
  ipc/sem.c: fully initialize sem_array before making it visible
  drivers/input/evdev.c: don't kfree() a vmalloc address
  cxgb4: Fill in supported link mode for SFP modules
  xen-netfront: Remove BUGs on paged skb data which crosses a page boundary
  mm/vmpressure.c: fix race in vmpressure_work_fn()
  mm: frontswap: invalidate expired data on a dup-store failure
  mm: do not overwrite reserved pages counter at show_mem()
  drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6
  ...

Conflicts:
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/nouveau/nouveau_drm.c
	drivers/gpu/drm/radeon/radeon_cs.c
2014-12-08 10:33:52 +10:00
Borislav Petkov
fbae4ba8c4 x86, microcode: Reload microcode on resume
Normally, we do reapply microcode on resume. However, in the cases where
that microcode comes from the early loader and the late loader hasn't
been utilized yet, there's no easy way for us to go and apply the patch
applied during boot by the early loader.

Thus, reuse the patch stashed by the early loader for the BSP.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-06 13:03:03 +01:00
Boris Ostrovsky
a18a0f6850 x86, microcode: Don't initialize microcode code on paravirt
Paravirtual guests are not expected to load microcode into processors
and therefore it is not necessary to initialize microcode loading
logic.

In fact, under certain circumstances initializing this logic may cause
the guest to crash. Specifically, 32-bit kernels use __pa_nodebug()
macro which does not work in Xen (the code path that leads to this macro
happens during resume when we call mc_bp_resume()->load_ucode_ap()
->check_loader_disabled_ap())

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1417469264-31470-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-06 12:59:03 +01:00
Borislav Petkov
47768626c6 x86, microcode, intel: Drop unused parameter
apply_microcode_early() doesn't use mc_saved_data, kill it.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-06 12:58:56 +01:00
Alexei Starovoitov
769e0de647 bpf: x86: fix epilogue generation for eBPF programs
classic BPF has a restriction that last insn is always BPF_RET.
eBPF doesn't have BPF_RET instruction and this restriction.
It has BPF_EXIT insn which can appear anywhere in the program
one or more times and it doesn't have to be last insn.
Fix eBPF JIT to emit epilogue when first BPF_EXIT is seen
and all other BPF_EXIT instructions will be emitted as jump.

Since jump offset to epilogue is computed as:
jmp_offset = ctx->cleanup_addr - addrs[i]
we need to change type of cleanup_addr to signed to compute the offset as:
(long long) ((int)20 - (int)30)
instead of:
(long long) ((unsigned int)20 - (int)30)

Fixes: 622582786c ("net: filter: x86: internal BPF JIT")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:23:54 -08:00
Linus Torvalds
beb5af4033 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Two final fixlets for 3.18:
   - Prevent microcode reload wreckage on 32bit
   - Unbreak cross compilation"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode: Limit the microcode reloading to 64-bit for now
  x86: Use $(OBJDUMP) instead of plain objdump
2014-12-05 10:47:19 -08:00
Radim Krčmář
e08e833616 KVM: cpuid: recompute CPUID 0xD.0:EBX,ECX
We reused host EBX and ECX, but KVM might not support all features;
emulated XSAVE size should be smaller.

EBX depends on unknown XCR0, so we default to ECX.

SDM CPUID (EAX = 0DH, ECX = 0):
 EBX Bits 31-00: Maximum size (bytes, from the beginning of the
     XSAVE/XRSTOR save area) required by enabled features in XCR0. May
     be different than ECX if some features at the end of the XSAVE save
     area are not enabled.

 ECX Bit 31-00: Maximum size (bytes, from the beginning of the
     XSAVE/XRSTOR save area) of the XSAVE/XRSTOR save area required by
     all supported features in the processor, i.e all the valid bit
     fields in XCR0.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:49 +01:00
Wanpeng Li
81dc01f749 kvm: vmx: add nested virtualization support for xsaves
Add nested virtualization support for xsaves.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:44 +01:00
Wanpeng Li
203000993d kvm: vmx: add MSR logic for XSAVES
Add logic to get/set the XSS model-specific register.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:39 +01:00
Wanpeng Li
f53cd63c2d kvm: x86: handle XSAVES vmcs and vmexit
Initialize the XSS exit bitmap.  It is zero so there should be no XSAVES
or XRSTORS exits.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:33 +01:00
Paolo Bonzini
404e0a19e1 KVM: cpuid: mask more bits in leaf 0xd and subleaves
- EAX=0Dh, ECX=1: output registers EBX/ECX/EDX are reserved.

- EAX=0Dh, ECX>1: output register ECX bit 0 is clear for all the CPUID
leaves we support, because variable "supported" comes from XCR0 and not
XSS.  Bits above 0 are reserved, so ECX is overall zero.  Output register
EDX is reserved.

Source: Intel Architecture Instruction Set Extensions Programming
Reference, ref. number 319433-022

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:17 +01:00
Paolo Bonzini
412a3c411e KVM: cpuid: set CPUID(EAX=0xd,ECX=1).EBX correctly
This is the size of the XSAVES area.  This starts providing guest support
for XSAVES (with no support yet for supervisor states, i.e. XSS == 0
always in guests for now).

Wanpeng Li suggested testing XSAVEC as well as XSAVES, since in practice
no real processor exists that only has one of them, and there is no
other way for userspace programs to compute the area of the XSAVEC
save area.  CPUID(EAX=0xd,ECX=1).EBX provides an upper bound.

Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:17 +01:00
Wanpeng Li
55412b2eda kvm: x86: Add kvm_x86_ops hook that enables XSAVES for guest
Expose the XSAVES feature to the guest if the kvm_x86_ops say it is
available.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:16 +01:00
Paolo Bonzini
5c404cabd1 KVM: x86: use F() macro throughout cpuid.c
For code that deals with cpuid, this makes things a bit more readable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:15 +01:00
Paolo Bonzini
df1daba7d1 KVM: x86: support XSAVES usage in the host
Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
struct xsave_struct might be using the compacted format.  Convert
in order to preserve userspace ABI.

Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE
but the kernel will pass it to XRSTORS, and we need to convert back.

Fixes: f31a9f7c71
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: stable@vger.kernel.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:05 +01:00
Paolo Bonzini
ba7b39203a x86: export get_xsave_addr
get_xsave_addr is the API to access XSAVE states, and KVM would
like to use it.  Export it.

Cc: stable@vger.kernel.org
Cc: x86@kernel.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:55:44 +01:00
Radim Krčmář
45c3094a64 KVM: x86: allow 256 logical x2APICs again
While fixing an x2apic bug,
 17d68b7 KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)
we've made only one cluster available.  This means that the amount of
logically addressible x2APICs was reduced to 16 and VCPUs kept
overwriting themselves in that region, so even the first cluster wasn't
set up correctly.

This patch extends x2APIC support back to the logical_map's limit, and
keeps the CVE fixed as messages for non-present APICs are dropped.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04 15:29:08 +01:00
Radim Krčmář
25995e5b4a KVM: x86: check bounds of APIC maps
They can't be violated now, but play it safe for the future.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04 15:29:08 +01:00
Radim Krčmář
fa834e9197 KVM: x86: fix APIC physical destination wrapping
x2apic allows destinations > 0xff and we don't want them delivered to
lower APICs.  They are correctly handled by doing nothing.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04 15:29:07 +01:00
Radim Krčmář
085563fb04 KVM: x86: deliver phys lowest-prio
Physical mode can't address more than one APIC, but lowest-prio is
allowed, so we just reuse our paths.

SDM 10.6.2.1 Physical Destination:
  Also, for any non-broadcast IPI or I/O subsystem initiated interrupt
  with lowest priority delivery mode, software must ensure that APICs
  defined in the interrupt address are present and enabled to receive
  interrupts.

We could warn on top of that.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04 15:29:06 +01:00
Radim Krčmář
698f9755d9 KVM: x86: don't retry hopeless APIC delivery
False from kvm_irq_delivery_to_apic_fast() means that we don't handle it
in the fast path, but we still return false in cases that were perfectly
handled, fix that.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04 15:29:06 +01:00
Radim Krčmář
decdc28382 KVM: x86: use MSR_ICR instead of a number
0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04 15:29:05 +01:00
Nadav Amit
c69d3d9bc1 KVM: x86: Fix reserved x2apic registers
x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 "x2APIC
Register Address Space"). KVM needs to cause #GP on such accesses.

Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04 15:29:05 +01:00
Nadav Amit
39f062ff51 KVM: x86: Generate #UD when memory operand is required
Certain x86 instructions that use modrm operands only allow memory operand
(i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact.
Currently, the instructions that are such and are emulated by KVM are MOVBE,
MOVNTPS, MOVNTPD and MOVNTI.  MOVBE is the most blunt example, since it may be
emulated by the host regardless of MMIO.

The fix introduces a new group for handling such instructions, marking mod3 as
illegal instruction.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04 15:29:04 +01:00
Juergen Gross
2e917175e1 xen: Speed up set_phys_to_machine() by using read-only mappings
Instead of checking at each call of set_phys_to_machine() whether a
new p2m page has to be allocated due to writing an entry in a large
invalid or identity area, just map those areas read only and react
to a page fault on write by allocating the new page.

This change will make the common path with no allocation much
faster as it only requires a single write of the new mfn instead
of walking the address translation tables and checking for the
special cases.

Suggested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:09:20 +00:00
Juergen Gross
054954eb05 xen: switch to linear virtual mapped sparse p2m list
At start of the day the Xen hypervisor presents a contiguous mfn list
to a pv-domain. In order to support sparse memory this mfn list is
accessed via a three level p2m tree built early in the boot process.
Whenever the system needs the mfn associated with a pfn this tree is
used to find the mfn.

Instead of using a software walked tree for accessing a specific mfn
list entry this patch is creating a virtual address area for the
entire possible mfn list including memory holes. The holes are
covered by mapping a pre-defined  page consisting only of "invalid
mfn" entries. Access to a mfn entry is possible by just using the
virtual base address of the mfn list and the pfn as index into that
list. This speeds up the (hot) path of determining the mfn of a
pfn.

Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
showed following improvements:

Elapsed time: 32:50 ->  32:35
System:       18:07 ->  17:47
User:        104:00 -> 103:30

Tested with following configurations:
- 64 bit dom0, 8GB RAM
- 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
- 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                    the patch)
- 32 bit domU, ballooning up and down
- 32 bit domU, save and restore
- 32 bit domU with PCI passthrough
- 64 bit domU, 8 GB, 2049 MB, 5000 MB
- 64 bit domU, ballooning up and down
- 64 bit domU, save and restore
- 64 bit domU with PCI passthrough

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:09:15 +00:00
Juergen Gross
0aad568983 xen: Hide get_phys_to_machine() to be able to tune common path
Today get_phys_to_machine() is always called when the mfn for a pfn
is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
to be able to avoid calling get_phys_to_machine() when possible as
soon as the switch to a linear mapped p2m list has been done.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:09:09 +00:00
Juergen Gross
792230c3a6 x86: Introduce function to get pmd entry pointer
Introduces lookup_pmd_address() to get the address of the pmd entry
related to a virtual address in the current address space. This
function is needed for support of a virtual mapped sparse p2m list
in xen pv domains, as we need the address of the pmd entry, not the
one of the pte in that case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:09:04 +00:00
Juergen Gross
5b8e7d8054 xen: Delay invalidating extra memory
When the physical memory configuration is initialized the p2m entries
for not pouplated memory pages are set to "invalid". As those pages
are beyond the hypervisor built p2m list the p2m tree has to be
extended.

This patch delays processing the extra memory related p2m entries
during the boot process until some more basic memory management
functions are callable. This removes the need to create new p2m
entries until virtual memory management is available.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:08:59 +00:00
Juergen Gross
97f4533a60 xen: Delay m2p_override initialization
The m2p overrides are used to be able to find the local pfn for a
foreign mfn mapped into the domain. They are used by driver backends
having to access frontend data.

As this functionality isn't used in early boot it makes no sense to
initialize the m2p override functions very early. It can be done
later without doing any harm, removing the need for allocating memory
via extend_brk().

While at it make some m2p override functions static as they are only
used internally.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:08:53 +00:00
Juergen Gross
1f3ac86b4c xen: Delay remapping memory of pv-domain
Early in the boot process the memory layout of a pv-domain is changed
to match the E820 map (either the host one for Dom0 or the Xen one)
regarding placement of RAM and PCI holes. This requires removing memory
pages initially located at positions not suitable for RAM and adding
them later at higher addresses where no restrictions apply.

To be able to operate on the hypervisor supported p2m list until a
virtual mapped linear p2m list can be constructed, remapping must
be delayed until virtual memory management is initialized, as the
initial p2m list can't be extended unlimited at physical memory
initialization time due to it's fixed structure.

A further advantage is the reduction in complexity and code volume as
we don't have to be careful regarding memory restrictions during p2m
updates.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:08:48 +00:00
Juergen Gross
7108c9ce8f xen: use common page allocation function in p2m.c
In arch/x86/xen/p2m.c three different allocation functions for
obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
or __get_free_page().  Which of those functions is used depends on the
progress of the boot process of the system.

Introduce a common allocation routine selecting the to be called
allocation routine dynamically based on the boot progress. This allows
moving initialization steps without having to care about changing
allocation calls.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:08:42 +00:00
Juergen Gross
820c4db2be xen: Make functions static
Some functions in arch/x86/xen/p2m.c are used locally only. Make them
static. Rearrange the functions in p2m.c to avoid forward declarations.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:08:37 +00:00
Juergen Gross
6f58d89e6c xen: fix some style issues in p2m.c
The source arch/x86/xen/p2m.c has some coding style issues. Fix them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:08:29 +00:00
Boris Ostrovsky
14520c92cb xen/pci: Use APIC directly when APIC virtualization hardware is available
When hardware supports APIC/x2APIC virtualization we don't need to use
pirqs for MSI handling and instead use APIC since most APIC accesses
(MMIO or MSR) will now be processed without VMEXITs.

As an example, netperf on the original code produces this profile
(collected wih 'xentrace -e 0x0008ffff -T 5'):

    342 cpu_change
    260 CPUID
  34638 HLT
  64067 INJ_VIRQ
  28374 INTR
  82733 INTR_WINDOW
     10 NPF
  24337 TRAP
 370610 vlapic_accept_pic_intr
 307528 VMENTRY
 307527 VMEXIT
 140998 VMMCALL
    127 wrap_buffer

After applying this patch the same test shows

    230 cpu_change
    260 CPUID
  36542 HLT
    174 INJ_VIRQ
  27250 INTR
    222 INTR_WINDOW
     20 NPF
  24999 TRAP
 381812 vlapic_accept_pic_intr
 166480 VMENTRY
 166479 VMEXIT
  77208 VMMCALL
     81 wrap_buffer

ApacheBench results (ab -n 10000 -c 200) improve by about 10%

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 13:02:43 +00:00
Boris Ostrovsky
066d79e4e2 xen/pci: Defer initialization of MSI ops on HVM guests
If the hardware supports APIC virtualization we may decide not to use
pirqs and instead use APIC/x2APIC directly, meaning that we don't want
to set x86_msi.setup_msi_irqs and x86_msi.teardown_msi_irq to
Xen-specific routines.  However, x2APIC is not set up by the time
pci_xen_hvm_init() is called so we need to postpone setting these ops
until later, when we know which APIC mode is used.

(Note that currently x2APIC is never initialized on HVM guests. This
may change in the future)

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 13:00:51 +00:00
Stefano Stabellini
a4dba13089 xen/arm/arm64: introduce xen_arch_need_swiotlb
Introduce an arch specific function to find out whether a particular dma
mapping operation needs to bounce on the swiotlb buffer.

On ARM and ARM64, if the page involved is a foreign page and the device
is not coherent, we need to bounce because at unmap time we cannot
execute any required cache maintenance operations (we don't know how to
find the pfn from the mfn).

No change of behaviour for x86.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-12-04 12:41:54 +00:00
Stefano Stabellini
a0f2dee0cd xen: add a dma_addr_t dev_addr argument to xen_dma_map_page
dev_addr is the machine address of the page.

The new parameter can be used by the ARM and ARM64 implementations of
xen_dma_map_page to find out if the page is a local page (pfn == mfn) or
a foreign page (pfn != mfn).

dev_addr could be retrieved again from the physical address, using
pfn_to_mfn, but it requires accessing an rbtree. Since we already have
the dev_addr in our hands at the call site there is no need to get the
mfn twice.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-12-04 12:41:51 +00:00
Jacob Shin
36748b9518 perf/x86: Remove get_hbp_len and replace with bp_len
Clean up the logic for determining the breakpoint length

Signed-off-by: Jacob Shin <jacob.w.shin@gmail.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiakaixu <xiakaixu@huawei.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-12-03 15:14:30 +01:00
Jacob Shin
d6d55f0b9d perf/x86/amd: AMD support for bp_len > HW_BREAKPOINT_LEN_8
Implement hardware breakpoint address mask for AMD Family 16h and
above processors. CPUID feature bit indicates hardware support for
DRn_ADDR_MASK MSRs. These masks further qualify DRn/DR7 hardware
breakpoint addresses to allow matching of larger addresses ranges.

Valuable advice and pseudo code from Oleg Nesterov <oleg@redhat.com>

Signed-off-by: Jacob Shin <jacob.w.shin@gmail.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiakaixu <xiakaixu@huawei.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-12-03 15:14:26 +01:00
Julia Lawall
a6326ba025 crypto: sha - replace memset by memzero_explicit
Memset on a local variable may be removed when it is called just before the
variable goes out of scope.  Using memzero_explicit defeats this
optimization.  A simplified version of the semantic patch that makes this
change is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
type T;
@@

{
... when any
T x[...];
... when any
    when exists
- memset
+ memzero_explicit
  (x,
-0,
  ...)
... when != x
    when strict
}
// </smpl>

This change was suggested by Daniel Borkmann <dborkman@redhat.com>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-12-02 22:55:49 +08:00
Dave Airlie
e8115e79aa Linux 3.18-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUe7l9AAoJEHm+PkMAQRiGkGcIAIryQ7NKn4IaxUtS807Lx4Ih
 obEnx7nNKZTXCZpD/7XQGHMMJyozMJR50PHZESJoHu4Luhv9h7EFRnyJ6MdqMlwn
 zla3zY0yRsHwPoJKcHbSE0CPHZz0WPQHj7IEbM+XJz2tMNJfbgTrezElmcCM4DRp
 c9ae+ggwZ2cyNYM0r2RSwSJ525WMh69f9dzSUE27fpvkllQgwqNs/jHYz8HNOEht
 FWcv5UhvzKjwJS3awULfOB3zH2QdFvVTrwAzd+kbV2Q6T6CaUoFRlhXeKUO6W2Jv
 pJM6oj8tMZUkdXEv7EQXT1kwEqC4DULTTTHs4tSF79O1ESmNfePiOwwBcwoM2nM=
 =kG1Y
 -----END PGP SIGNATURE-----

Merge tag 'v3.18-rc7' into drm-next

This fixes a bunch of conflicts prior to merging i915 tree.

Linux 3.18-rc7

Conflicts:
	drivers/gpu/drm/exynos/exynos_drm_drv.c
	drivers/gpu/drm/i915/i915_drv.c
	drivers/gpu/drm/i915/intel_pm.c
	drivers/gpu/drm/tegra/dc.c
2014-12-02 10:58:33 +10:00
Steven Rostedt (Red Hat)
6a06bdbf7f ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter
The function graph helper function prepare_ftrace_return() which does the work
to hijack the parent pointer has that parent pointer as its first parameter.
Instead, if we make it the second parameter and have ip as the first parameter
(self_addr), then it can use the %rdi from save_mcount_regs that loads it
already.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-01 14:08:58 -05:00
Steven Rostedt (Red Hat)
f1ab00af81 ftrace/x86: Get rid of ftrace_caller_setup
Move all the work from ftrace_caller_setup into save_mcount_regs. This
simplifies the code and makes it easier to understand.

Link: http://lkml.kernel.org/r/CA+55aFxUTUbdxpjVMW8X9c=o8sui7OB_MYPfcbJuDyfUWtNrNg@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-01 14:08:43 -05:00
Steven Rostedt (Red Hat)
0687c36e45 ftrace/x86: Have save_mcount_regs macro also save stack frames if needed
The save_mcount_regs macro saves and restores the required mcount regs that
need to be saved before calling C code. It is done for all the function hook
utilities (static tracing, dynamic tracing, regs, function graph).

When frame pointers are enabled, the ftrace trampolines need to set up
frames and pointers such that a back trace (dump stack) can continue passed
them. Currently, a separate macro is used (create_frame) to do this, but
it's only done for the ftrace_caller and ftrace_reg_caller functions. It
is not done for the static tracer or function graph tracing.

Instead of having a separate macro doing the recording of the frames,
have the save_mcount_regs perform this task. This also has all tracers
saving the frame pointers when needed.

Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-01 14:08:27 -05:00
Steven Rostedt (Red Hat)
85f6f0290c ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs
The macro save_mcount_regs saves regs onto the stack. But to uncouple the
amount of stack used in that macro from the users of the macro, we need
to have a define that tells all the users how much stack is used by that
macro. This way we can change the amount of stack the macro uses without
breaking its users.

Also remove some dead code that was left over from commit fdc841b58c
"ftrace: x86: Remove check of obsolete variable function_trace_stop".

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-01 14:08:15 -05:00
Steven Rostedt (Red Hat)
527aa75b33 ftrace/x86: Simplify save_mcount_regs on getting RIP
Currently save_mcount_regs is passed a "skip" parameter to know how much
stack updated the pt_regs, as it tries to keep the saved pt_regs in the
same location for all users. This is rather stupid, especially since the
part stored on the pt_regs has nothing to do with what is suppose to be
in that location.

Instead of doing that, just pass in an "added" parameter that lets that
macro know how much stack was added before it was called so that it
can get to the RIP.  But the difference is that it will now offset the
pt_regs by that "added" count. The caller now needs to take care of
the offset of the pt_regs.

This will make it easier to simplify the code later.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-01 14:07:50 -05:00
Steven Rostedt (Red Hat)
094dfc5451 ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter
Instead of having save_mcount_regs store the RIP in %rdx as a temp register
to place it in the proper location of the pt_regs on the stack. Use the
%rdi register as the temp register. This lets us remove the extra store
in the ftrace_caller_setup macro.

Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-01 14:07:39 -05:00
Steven Rostedt (Red Hat)
05df710ec3 ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments
The name MCOUNT_SAVE_FRAME is rather confusing as it really isn't a
function frame that is saved, but just the required mcount registers
that are needed to be saved before C code may be called. The word
"frame" confuses it as being a function frame which it is not.

Rename MCOUNT_SAVE_FRAME and MCOUNT_RESTORE_FRAME to save_mcount_regs
and restore_mcount_regs respectively. Noticed the lower case, which
keeps it from screaming at the reviewers.

Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-01 14:07:27 -05:00
Steven Rostedt (Red Hat)
4bcdf1522f ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
Linus pointed out that MCOUNT_SAVE_FRAME is used in only a single file
and that there's no reason that it should be in a header file.
Move the macro to the code that uses it.

Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-01 14:07:16 -05:00
Steven Rostedt (Red Hat)
76c2f13c55 ftrace/x86: Have static tracing also use ftrace_caller_setup
Linus pointed out that there were locations that did the hard coded
update of the parent and rip parameters. One of them was the static tracer
which could also use the ftrace_caller_setup to do that work. In fact,
because it did not use it, it is prone to bugs, and since the static
tracer is hardly ever used (who wants function tracing code always being
called?) it doesn't get tested very often. I only run a few "does it still
work" tests on it. But I do not run stress tests on that code. Although,
since it is never turned off, just having it on should be stressful enough.
(especially for the performance folks)

There's no reason that the static tracer can't also use ftrace_caller_setup.
Have it do so.

Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-01 14:06:52 -05:00
Borislav Petkov
2ef84b3bb9 x86, microcode, AMD: Do not use smp_processor_id() in preemtible context
Hand down the cpu number instead, otherwise lockdep screams when doing

echo 1 > /sys/devices/system/cpu/microcode/reload.

BUG: using smp_processor_id() in preemptible [00000000] code: amd64-microcode/2470
caller is debug_smp_processor_id+0x12/0x20
CPU: 1 PID: 2470 Comm: amd64-microcode Not tainted 3.18.0-rc6+ #26
...

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417428741-4501-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-01 11:51:05 +01:00
Borislav Petkov
02ecc41abc x86, microcode: Limit the microcode reloading to 64-bit for now
First, there was this: https://bugzilla.kernel.org/show_bug.cgi?id=88001

The problem there was that microcode patches are not being reapplied
after suspend-to-ram. It was important to reapply them, though, because
of for example Haswell's TSX erratum which disabled TSX instructions
with a microcode patch.

A simple fix was fb86b97300 ("x86, microcode: Update BSPs microcode
on resume") but, as it is often the case, simple fixes are too
simple. This one causes 32-bit resume to fail:

https://bugzilla.kernel.org/show_bug.cgi?id=88391

Properly fixing this would require more involved changes for which it
is too late now, right before the merge window. Thus, limit this to
64-bit only temporarily.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417353999-32236-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-01 10:55:08 +01:00
Rafael J. Wysocki
8a497cfdc0 Merge back earlier cpufreq material for 3.19-rc1. 2014-12-01 02:46:24 +01:00
Ard Biesheuvel
d3fccc7ef8 kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()
This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.

The problem being addressed by the patch above was that some ARM code
based the memory mapping attributes of a pfn on the return value of
kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
be mapped as device memory.

However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
and the existing non-ARM users were already using it in a way which
suggests that its name should probably have been 'kvm_is_reserved_pfn'
from the beginning, e.g., whether or not to call get_page/put_page on
it etc. This means that returning false for the zero page is a mistake
and the patch above should be reverted.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-26 14:40:45 +01:00
Kees Cook
4943ba16bb crypto: include crypto- module prefix in template
This adds the module loading prefix "crypto-" to the template lookup
as well.

For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly
includes the "crypto-" prefix at every level, correctly rejecting "vfat":

	net-pf-38
	algif-hash
	crypto-vfat(blowfish)
	crypto-vfat(blowfish)-all
	crypto-vfat

Reported-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-26 20:06:30 +08:00
Sasha Levin
db08655437 x86/nmi: Fix use of unallocated cpumask_var_t
Commit "x86/nmi: Perform a safe NMI stack trace on all CPUs" has introduced
a cpumask_var_t variable:

	+static cpumask_var_t printtrace_mask;

But never allocated it before using it, which caused a NULL ptr deref when
trying to print the stack trace:

[ 1110.296154] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1110.296169] IP: __memcpy (arch/x86/lib/memcpy_64.S:151)
[ 1110.296178] PGD 4c34b3067 PUD 4c351b067 PMD 0
[ 1110.296186] Oops: 0002 [#1] PREEMPT SMP KASAN
[ 1110.296234] Dumping ftrace buffer:
[ 1110.296330]    (ftrace buffer empty)
[ 1110.296339] Modules linked in:
[ 1110.296345] CPU: 1 PID: 10538 Comm: trinity-c99 Not tainted 3.18.0-rc5-next-20141124-sasha-00058-ge2a8c09-dirty #1499
[ 1110.296348] task: ffff880152650000 ti: ffff8804c3560000 task.ti: ffff8804c3560000
[ 1110.296357] RIP: __memcpy (arch/x86/lib/memcpy_64.S:151)
[ 1110.296360] RSP: 0000:ffff8804c3563870  EFLAGS: 00010246
[ 1110.296363] RAX: 0000000000000000 RBX: ffffe8fff3c4a809 RCX: 0000000000000000
[ 1110.296366] RDX: 0000000000000008 RSI: ffffffff9e254040 RDI: 0000000000000000
[ 1110.296369] RBP: ffff8804c3563908 R08: 0000000000ffffff R09: 0000000000ffffff
[ 1110.296371] R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000000
[ 1110.296375] R13: 0000000000000000 R14: ffffffff9e254040 R15: ffffe8fff3c4a809
[ 1110.296379] FS:  00007f9e43b0b700(0000) GS:ffff880107e00000(0000) knlGS:0000000000000000
[ 1110.296382] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1110.296385] CR2: 0000000000000000 CR3: 00000004e4334000 CR4: 00000000000006a0
[ 1110.296400] Stack:
[ 1110.296406]  ffffffff81b1e46c 0000000000000000 ffff880107e03fb8 000000000000000b
[ 1110.296413]  ffff880107dfffc0 ffff880107e03fc0 0000000000000008 ffffffff93f2e9c8
[ 1110.296419]  0000000000000000 ffffda0020fc07f7 0000000000000008 ffff8804c3563901
[ 1110.296420] Call Trace:
[ 1110.296429] ? memcpy (mm/kasan/kasan.c:275)
[ 1110.296437] ? arch_trigger_all_cpu_backtrace (include/linux/bitmap.h:215 include/linux/cpumask.h:506 arch/x86/kernel/apic/hw_nmi.c:76)
[ 1110.296444] arch_trigger_all_cpu_backtrace (include/linux/bitmap.h:215 include/linux/cpumask.h:506 arch/x86/kernel/apic/hw_nmi.c:76)
[ 1110.296451] ? dump_stack (./arch/x86/include/asm/preempt.h:95 lib/dump_stack.c:55)
[ 1110.296458] do_raw_spin_lock (./arch/x86/include/asm/spinlock.h:86 kernel/locking/spinlock_debug.c:130 kernel/locking/spinlock_debug.c:137)
[ 1110.296468] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 1110.296474] ? __page_check_address (include/linux/spinlock.h:309 mm/rmap.c:630)
[ 1110.296481] __page_check_address (include/linux/spinlock.h:309 mm/rmap.c:630)
[ 1110.296487] ? preempt_count_sub (kernel/sched/core.c:2615)
[ 1110.296493] try_to_unmap_one (include/linux/rmap.h:202 mm/rmap.c:1146)
[ 1110.296504] ? anon_vma_interval_tree_iter_next (mm/interval_tree.c:72 mm/interval_tree.c:103)
[ 1110.296514] rmap_walk (mm/rmap.c:1653 mm/rmap.c:1725)
[ 1110.296521] ? page_get_anon_vma (include/linux/rcupdate.h:423 include/linux/rcupdate.h:935 mm/rmap.c:435)
[ 1110.296530] try_to_unmap (mm/rmap.c:1545)
[ 1110.296536] ? page_get_anon_vma (mm/rmap.c:437)
[ 1110.296545] ? try_to_unmap_nonlinear (mm/rmap.c:1138)
[ 1110.296551] ? SyS_msync (mm/rmap.c:1501)
[ 1110.296558] ? page_remove_rmap (mm/rmap.c:1409)
[ 1110.296565] ? page_get_anon_vma (mm/rmap.c:448)
[ 1110.296571] ? anon_vma_ctor (mm/rmap.c:1496)
[ 1110.296579] migrate_pages (mm/migrate.c:913 mm/migrate.c:956 mm/migrate.c:1136)
[ 1110.296586] ? _raw_spin_unlock_irq (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:169 kernel/locking/spinlock.c:199)
[ 1110.296593] ? buffer_migrate_lock_buffers (mm/migrate.c:1584)
[ 1110.296601] ? handle_mm_fault (mm/memory.c:3163 mm/memory.c:3223 mm/memory.c:3336 mm/memory.c:3365)
[ 1110.296607] migrate_misplaced_page (mm/migrate.c:1738)
[ 1110.296614] handle_mm_fault (mm/memory.c:3170 mm/memory.c:3223 mm/memory.c:3336 mm/memory.c:3365)
[ 1110.296623] __do_page_fault (arch/x86/mm/fault.c:1246)
[ 1110.296630] ? vtime_account_user (kernel/sched/cputime.c:701)
[ 1110.296638] ? get_parent_ip (kernel/sched/core.c:2559)
[ 1110.296646] ? context_tracking_user_exit (kernel/context_tracking.c:144)
[ 1110.296656] trace_do_page_fault (arch/x86/mm/fault.c:1329 include/linux/jump_label.h:114 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1330)
[ 1110.296664] do_async_page_fault (arch/x86/kernel/kvm.c:280)
[ 1110.296670] async_page_fault (arch/x86/kernel/entry_64.S:1285)
[ 1110.296755] Code: 08 4c 8b 54 16 f0 4c 8b 5c 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 90 83 fa 08 72 1b 4c 8b 06 4c 8b 4c 16 f8 <4c> 89 07 4c 89 4c 17 f8 c3 66 2e 0f 1f 84 00 00 00 00 00 83 fa
All code
========
   0:   08 4c 8b 54             or     %cl,0x54(%rbx,%rcx,4)
   4:   16                      (bad)
   5:   f0 4c 8b 5c 16 f8       lock mov -0x8(%rsi,%rdx,1),%r11
   b:   4c 89 07                mov    %r8,(%rdi)
   e:   4c 89 4f 08             mov    %r9,0x8(%rdi)
  12:   4c 89 54 17 f0          mov    %r10,-0x10(%rdi,%rdx,1)
  17:   4c 89 5c 17 f8          mov    %r11,-0x8(%rdi,%rdx,1)
  1c:   c3                      retq
  1d:   90                      nop
  1e:   83 fa 08                cmp    $0x8,%edx
  21:   72 1b                   jb     0x3e
  23:   4c 8b 06                mov    (%rsi),%r8
  26:   4c 8b 4c 16 f8          mov    -0x8(%rsi,%rdx,1),%r9
  2b:*  4c 89 07                mov    %r8,(%rdi)               <-- trapping instruction
  2e:   4c 89 4c 17 f8          mov    %r9,-0x8(%rdi,%rdx,1)
  33:   c3                      retq
  34:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  3b:   00 00 00
  3e:   83 fa 00                cmp    $0x0,%edx

Code starting with the faulting instruction
===========================================
   0:   4c 89 07                mov    %r8,(%rdi)
   3:   4c 89 4c 17 f8          mov    %r9,-0x8(%rdi,%rdx,1)
   8:   c3                      retq
   9:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  10:   00 00 00
  13:   83 fa 00                cmp    $0x0,%edx
[ 1110.296760] RIP __memcpy (arch/x86/lib/memcpy_64.S:151)
[ 1110.296763]  RSP <ffff8804c3563870>
[ 1110.296765] CR2: 0000000000000000

Link: http://lkml.kernel.org/r/1416931560-10603-1-git-send-email-sasha.levin@oracle.com

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-25 14:15:30 -05:00
Dan Carpenter
5d1b3c98ec crypto: sha-mb - remove a bogus NULL check
This can't be NULL and we dereferenced it earlier.  Smatch used to
ignore these things where the pointer was obviously non-NULL but I've
found that sometimes the intention was to check something else so we
were maybe missing bugs.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-25 22:50:43 +08:00
Ard Biesheuvel
bf4bea8e9a kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()
This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.

The problem being addressed by the patch above was that some ARM code
based the memory mapping attributes of a pfn on the return value of
kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
be mapped as device memory.

However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
and the existing non-ARM users were already using it in a way which
suggests that its name should probably have been 'kvm_is_reserved_pfn'
from the beginning, e.g., whether or not to call get_page/put_page on
it etc. This means that returning false for the zero page is a mistake
and the patch above should be reverted.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-11-25 13:57:26 +00:00
Andy Lutomirski
7ddc6a2199 x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
These functions can be executed on the int3 stack, so kprobes
are dangerous. Tracing is probably a bad idea, too.

Fixes: b645af2d59 ("x86_64, traps: Rework bad_iret")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: <stable@vger.kernel.org> # Backport as far back as it would apply
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-25 07:26:55 +01:00
Steven Rostedt (Red Hat)
62a207d748 ftrace/x86: Have static function tracing always test for function graph
New updates to the ftrace generic code had ftrace_stub not always being
called when ftrace is off. This causes the static tracer to always save
and restore functions. But it also showed that when function tracing is
running, the function graph tracer can not. We should always check to see
if function graph tracing is running even if the function tracer is running
too. The function tracer code is not the only one that uses the hook to
function mcount.

Cc: Markos Chandras <Markos.Chandras@imgtec.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-24 15:02:25 -05:00
Paolo Bonzini
2b4a273b42 kvm: x86: avoid warning about potential shift wrapping bug
cs.base is declared as a __u64 variable and vector is a u32 so this
causes a static checker warning.  The user indeed can set "sipi_vector"
to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the
value should really have 8-bit precision only.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-24 16:53:50 +01:00
Paolo Bonzini
c9eab58f64 KVM: x86: move device assignment out of kvm_host.h
Create a new header, and hide the device assignment functions there.
Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying
arch/x86/kvm/iommu.c to take a PCI device struct.

Based on a patch by Radim Krcmar <rkrcmark@redhat.com>.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-24 16:53:50 +01:00
Kees Cook
5d26a105b5 crypto: prefix module autoloading with "crypto-"
This prefixes all crypto module loading with "crypto-" so we never run
the risk of exposing module auto-loading to userspace via a crypto API,
as demonstrated by Mathias Krause:

https://lkml.org/lkml/2013/3/4/70

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-24 22:43:57 +08:00
Andy Lutomirski
82975bc6a6 uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns.  I suspect that this is a mistake and that
the code only works because int3 is paranoid.

Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug.  With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-23 14:25:28 -08:00
Linus Torvalds
00c89b2f11 Merge branch 'x86-traps' (trap handling from Andy Lutomirski)
Merge x86-64 iret fixes from Andy Lutomirski:
 "This addresses the following issues:

   - an unrecoverable double-fault triggerable with modify_ldt.
   - invalid stack usage in espfix64 failed IRET recovery from IST
     context.
   - invalid stack usage in non-espfix64 failed IRET recovery from IST
     context.

  It also makes a good but IMO scary change: non-espfix64 failed IRET
  will now report the correct error.  Hopefully nothing depended on the
  old incorrect behavior, but maybe Wine will get confused in some
  obscure corner case"

* emailed patches from Andy Lutomirski <luto@amacapital.net>:
  x86_64, traps: Rework bad_iret
  x86_64, traps: Stop using IST for #SS
  x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
2014-11-23 13:56:55 -08:00
Andy Lutomirski
b645af2d59 x86_64, traps: Rework bad_iret
It's possible for iretq to userspace to fail.  This can happen because
of a bad CS, SS, or RIP.

Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace.  To make this work, there's
an extra fixup to fudge the gs base into a usable state.

This is suboptimal because it loses the original exception.  It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with.  For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.

This patch throws out bad_iret entirely.  As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C.  It's should be clearer and more correct.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-23 13:56:19 -08:00
Andy Lutomirski
6f442be2fb x86_64, traps: Stop using IST for #SS
On a 32-bit kernel, this has no effect, since there are no IST stacks.

On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code.  The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.

This fixes a bug in which the espfix64 code mishandles a stack segment
violation.

This saves 4k of memory per CPU and a tiny bit of code.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-23 13:56:19 -08:00
Andy Lutomirski
af726f21ed x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
There's nothing special enough about the espfix64 double fault fixup to
justify writing it in assembly.  Move it to C.

This also fixes a bug: if the double fault came from an IST stack, the
old asm code would return to a partially uninitialized stack frame.

Fixes: 3891a04aaf
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-23 13:56:18 -08:00
Chris Clayton
e2e68ae688 x86: Use $(OBJDUMP) instead of plain objdump
commit e6023367d7 'x86, kaslr: Prevent .bss from overlaping initrd'
broke the cross compile of x86. It added a objdump invocation, which
invokes the host native objdump and ignores an active cross tool
chain.

Use $(OBJDUMP) instead which takes the CROSS_COMPILE prefix into
account.

[ tglx: Massage changelog and use $(OBJDUMP) ]

Fixes: e6023367d7 'x86, kaslr: Prevent .bss from overlaping initrd'
Signed-off-by: Chris Clayton <chris2553@googlemail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/54705C8E.1080400@googlemail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-23 21:21:53 +01:00
Paolo Bonzini
b65d6e17fe kvm: x86: mask out XSAVES
This feature is not supported inside KVM guests yet, because we do not emulate
MSR_IA32_XSS.  Mask it out.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-23 18:33:37 +01:00
Radim Krčmář
c274e03af7 kvm: x86: move assigned-dev.c and iommu.c to arch/x86/
Now that ia64 is gone, we can hide deprecated device assignment in x86.

Notable changes:
 - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl()

The easy parts were removed from generic kvm code, remaining
 - kvm_iommu_(un)map_pages() would require new code to be moved
 - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-23 18:33:36 +01:00
Thomas Gleixner
280510f106 PCI/MSI: Rename mask/unmask_msi_irq treewide
The PCI/MSI irq chip callbacks mask/unmask_msi_irq have been renamed
to pci_msi_mask/unmask_irq to mark them PCI specific. Rename all usage
sites. The conversion helper functions are kept around to avoid
conflicts in next and will be removed after merging into mainline.

Coccinelle assisted conversion. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: x86@kernel.org
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Mohit Kumar <mohit.kumar@st.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yijing Wang <wangyijing@huawei.com>
2014-11-23 13:01:45 +01:00
Jiang Liu
83a18912b0 PCI/MSI: Rename write_msi_msg() to pci_write_msi_msg()
Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI
specific.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-23 13:01:45 +01:00
Jiang Liu
891d4a48f7 PCI/MSI: Rename __read_msi_msg() to __pci_read_msi_msg()
Rename __read_msi_msg() to __pci_read_msi_msg() and kill unused
read_msi_msg(). It's a preparation to separate generic MSI code from
PCI core.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-23 13:01:45 +01:00
Linus Torvalds
c6c9161d06 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Misc fixes:
   - gold linker build fix
   - noxsave command line parsing fix
   - bugfix for NX setup
   - microcode resume path bug fix
   - _TIF_NOHZ versus TIF_NOHZ bugfix as discussed in the mysterious
     lockup thread"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
  x86, kaslr: Handle Gold linker for finding bss/brk
  x86, mm: Set NX across entire PMD at boot
  x86, microcode: Update BSPs microcode on resume
  x86: Require exact match for 'noxsave' command line option
2014-11-21 15:46:17 -08:00
Linus Torvalds
13f5004c94 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes: two Intel uncore driver fixes, a CPU-hotplug fix and a
  build dependencies fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP
  perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP
  perf: Fix corruption of sibling list with hotplug
  perf/x86: Fix embarrasing typo
2014-11-21 15:44:07 -08:00
Radim Krcmar
3bf58e9ae8 kvm: remove CONFIG_X86 #ifdefs from files formerly shared with ia64
Signed-off-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-21 18:07:26 +01:00
Paolo Bonzini
6ef768fac9 kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/
ia64 does not need them anymore.  Ack notifiers become x86-specific
too.

Suggested-by: Gleb Natapov <gleb@kernel.org>
Reviewed-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-21 18:02:37 +01:00
Rafael J. Wysocki
0a924200ae Merge back earlier cpuidle material for 3.19-rc1.
Conflicts:
	drivers/cpuidle/dt_idle_states.c
2014-11-21 16:31:42 +01:00
Andy Lutomirski
b5e212a305 x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
TIF_NOHZ is 19 (i.e. _TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME |
_TIF_SINGLESTEP), not (1<<19).

This code is involved in Dave's trinity lockup, but I don't see why
it would cause any of the problems he's seeing, except inadvertently
by causing a different path through entry_64.S's syscall handling.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/a6cd3b60a3f53afb6e1c8081b0ec30ff19003dd7.1416434075.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-20 23:01:53 +01:00
Masami Hiramatsu
a017784f1b kprobes/ftrace: Recover original IP if pre_handler doesn't change it
Recover original IP register if the pre_handler doesn't change it.
Since current kprobes doesn't expect that another ftrace handler
may change regs->ip, it sets kprobe.addr + MCOUNT_INSN_SIZE to
regs->ip and returns to ftrace.
This seems wrong behavior since kprobes can recover regs->ip
and safely pass it to another handler.

This adds code which recovers original regs->ip passed from
ftrace right before returning to ftrace, so that another ftrace
user can change regs->ip.

Link: http://lkml.kernel.org/r/20141009130106.4698.26362.stgit@kbuild-f20.novalocal

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-20 11:42:48 -05:00
Steven Rostedt (Red Hat)
a9edc88093 x86/nmi: Perform a safe NMI stack trace on all CPUs
When trigger_all_cpu_backtrace() is called on x86, it will trigger an
NMI on each CPU and call show_regs(). But this can lead to a hard lock
up if the NMI comes in on another printk().

In order to avoid this, when the NMI triggers, it switches the printk
routine for that CPU to call a NMI safe printk function that records the
printk in a per_cpu seq_buf descriptor. After all NMIs have finished
recording its data, the seq_bufs are printed in a safe context.

Link: http://lkml.kernel.org/p/20140619213952.360076309@goodmis.org
Link: http://lkml.kernel.org/r/20141115050605.055232587@goodmis.org

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:21 -05:00
Steven Rostedt (Red Hat)
467aa1f276 x86/kvm/tracing: Use helper function trace_seq_buffer_ptr()
To allow for the restructiong of the trace_seq code, we need users
of it to use the helper functions instead of accessing the internals
of the trace_seq structure itself.

Link: http://lkml.kernel.org/r/20141104160221.585025609@goodmis.org

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Mark Rustad <mark.d.rustad@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:36 -05:00
Steven Rostedt (Red Hat)
aec0be2d6e ftrace/x86/extable: Add is_ftrace_trampoline() function
Stack traces that happen from function tracing check if the address
on the stack is a __kernel_text_address(). That is, is the address
kernel code. This calls core_kernel_text() which returns true
if the address is part of the builtin kernel code. It also calls
is_module_text_address() which returns true if the address belongs
to module code.

But what is missing is ftrace dynamically allocated trampolines.
These trampolines are allocated for individual ftrace_ops that
call the ftrace_ops callback functions directly. But if they do a
stack trace, the code checking the stack wont detect them as they
are neither core kernel code nor module address space.

Adding another field to ftrace_ops that also stores the size of
the trampoline assigned to it we can create a new function called
is_ftrace_trampoline() that returns true if the address is a
dynamically allocate ftrace trampoline. Note, it ignores trampolines
that are not dynamically allocated as they will return true with
the core_kernel_text() function.

Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:26 -05:00
Steven Rostedt (Red Hat)
9960efeb80 ftrace/x86: Add frames pointers to trampoline as necessary
When CONFIG_FRAME_POINTERS are enabled, it is required that the
ftrace_caller and ftrace_regs_caller trampolines set up frame pointers
otherwise a stack trace from a function call wont print the functions
that called the trampoline. This is due to a check in
__save_stack_address():

 #ifdef CONFIG_FRAME_POINTER
	if (!reliable)
		return;
 #endif

The "reliable" variable is only set if the function address is equal to
contents of the address before the address the frame pointer register
points to. If the frame pointer is not set up for the ftrace caller
then this will fail the reliable test. It will miss the function that
called the trampoline. Worse yet, if fentry is used (gcc 4.6 and
beyond), it will also miss the parent, as the fentry is called before
the stack frame is set up. That means the bp frame pointer points
to the stack of just before the parent function was called.

Link: http://lkml.kernel.org/r/20141119034829.355440340@goodmis.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org # 3.7+
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:24:31 -05:00
Chen Yucong
fa92c58694 x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
Uncorrected no action required (UCNA) - is a uncorrected recoverable
machine check error that is not signaled via a machine check exception
and, instead, is reported to system software as a corrected machine
check error. UCNA errors indicate that some data in the system is
corrupted, but the data has not been consumed and the processor state
is valid and you may continue execution on this processor. UCNA errors
require no action from system software to continue execution. Note that
UCNA errors are supported by the processor only when IA32_MCG_CAP[24]
(MCG_SER_P) is set.
                                               -- Intel SDM Volume 3B

Deferred errors are errors that cannot be corrected by hardware, but
do not cause an immediate interruption in program flow, loss of data
integrity, or corruption of processor state. These errors indicate
that data has been corrupted but not consumed. Hardware writes information
to the status and address registers in the corresponding bank that
identifies the source of the error if deferred errors are enabled for
logging. Deferred errors are not reported via machine check exceptions;
they can be seen by polling the MCi_STATUS registers.
                                                -- AMD64 APM Volume 2

Above two items, both UCNA and Deferred errors belong to detected
errors, but they can't be corrected by hardware, and this is very
similar to Software Recoverable Action Optional (SRAO) errors.
Therefore, we can take some actions that have been used for handling
SRAO errors to handle UCNA and Deferred errors.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-11-19 10:56:51 -08:00
Chen Yucong
e3480271f5 x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.

This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.

In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.

Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-11-19 10:55:43 -08:00
Al Viro
a455589f18 assorted conversions to %p[dD]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:01:20 -05:00
Nicholas Krause
86619e7ba3 KVM: x86: Remove FIXMEs in emulate.c
Remove FIXME comments about needing fault addresses to be returned.  These
are propaagated from walk_addr_generic to gva_to_gpa and from there to
ops->read_std and ops->write_std.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:54:43 +01:00
Paolo Bonzini
997b04128d KVM: emulator: remove duplicated limit check
The check on the higher limit of the segment, and the check on the
maximum accessible size, is the same for both expand-up and
expand-down segments.  Only the computation of "lim" varies.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:40:24 +01:00
Paolo Bonzini
01485a2230 KVM: emulator: remove code duplication in register_address{,_increment}
register_address has been a duplicate of address_mask ever since the
ancestor of __linearize was born in 90de84f50b (KVM: x86 emulator:
preserve an operand's segment identity, 2010-11-17).

However, we can put it to a better use by including the call to reg_read
in register_address.  Similarly, the call to reg_rmw can be moved to
register_address_increment.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:27:27 +01:00
Nadav Amit
31ff64881b KVM: x86: Move __linearize masking of la into switch
In __linearize there is check of the condition whether to check if masking of
the linear address is needed.  It occurs immediately after switch that
evaluates the same condition.  Merge them.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:20:15 +01:00
Nadav Amit
abc7d8a4c9 KVM: x86: Non-canonical access using SS should cause #SS
When SS is used using a non-canonical address, an #SS exception is generated on
real hardware.  KVM emulator causes a #GP instead. Fix it to behave as real x86
CPU.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:19:57 +01:00
Nadav Amit
d50eaa1803 KVM: x86: Perform limit checks when assigning EIP
If branch (e.g., jmp, ret) causes limit violations, since the target IP >
limit, the #GP exception occurs before the branch.  In other words, the RIP
pushed on the stack should be that of the branch and not that of the target.

To do so, we can call __linearize, with new EIP, which also saves us the code
which performs the canonical address checks. On the case of assigning an EIP >=
2^32 (when switching cs.l), we also safe, as __linearize will check the new EIP
does not exceed the limit and would trigger #GP(0) otherwise.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:19:22 +01:00
Nadav Amit
a7315d2f3c KVM: x86: Emulator performs privilege checks on __linearize
When segment is accessed, real hardware does not perform any privilege level
checks.  In contrast, KVM emulator does. This causes some discrepencies from
real hardware. For instance, reading from readable code segment may fail due to
incorrect segment checks. In addition, it introduces unnecassary overhead.

To reference Intel SDM 5.5 ("Privilege Levels"): "Privilege levels are checked
when the segment selector of a segment descriptor is loaded into a segment
register." The SDM never mentions privilege level checks during memory access,
except for loading far pointers in section 5.10 ("Pointer Validation"). Those
are actually segment selector loads and are emulated in the similarily (i.e.,
regardless to __linearize checks).

This behavior was also checked using sysexit. A data-segment whose DPL=0 was
loaded, and after sysexit (CPL=3) it is still accessible.

Therefore, all the privilege level checks in __linearize are removed.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:17:58 +01:00
Nadav Amit
1c1c35ae4b KVM: x86: Stack size is overridden by __linearize
When performing segmented-read/write in the emulator for stack operations, it
ignores the stack size, and uses the ad_bytes as indication for the pointer
size. As a result, a wrong address may be accessed.

To fix this behavior, we can remove the masking of address in __linearize and
perform it beforehand.  It is already done for the operands (so currently it is
inefficiently done twice). It is missing in two cases:
1. When using rip_relative
2. On fetch_bit_operand that changes the address.

This patch masks the address on these two occassions, and removes the masking
from __linearize.

Note that it does not mask EIP during fetch. In protected/legacy mode code
fetch when RIP >= 2^32 should result in #GP and not wrap-around. Since we make
limit checks within __linearize, this is the expected behavior.

Partial revert of commit 518547b32a (KVM: x86: Emulator does not
calculate address correctly, 2014-09-30).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:17:10 +01:00
Nadav Amit
7d882ffa81 KVM: x86: Revert NoBigReal patch in the emulator
Commit 10e38fc7cab6 ("KVM: x86: Emulator flag for instruction that only support
16-bit addresses in real mode") introduced NoBigReal for instructions such as
MONITOR. Apparetnly, the Intel SDM description that led to this patch is
misleading.  Since no instruction is using NoBigReal, it is safe to remove it,
we fully understand what the SDM means.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19 18:13:27 +01:00
Dave Hansen
a1ea1c032b x86: Cleanly separate use of asm-generic/mm_hooks.h
asm-generic/mm_hooks.h provides some generic fillers for the 90%
of architectures that do not need to hook some mmap-manipulation
functions.  A comment inside says:

> Define generic no-op hooks for arch_dup_mmap and
> arch_exit_mmap, to be included in asm-FOO/mmu_context.h
> for any arch FOO which doesn't need to hook these.

So, does x86 need to hook these?  It depends on CONFIG_PARAVIRT.
We *conditionally* include this generic header if we have
CONFIG_PARAVIRT=n.  That's madness.

With this patch, x86 stops using asm-generic/mmu_hooks.h entirely.
We use our own copies of the functions.  The paravirt code
provides some stubs if it is disabled, and we always call those
stubs in our x86-private versions of arch_exit_mmap() and
arch_dup_mmap().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141118182349.14567FA5@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-19 11:54:13 +01:00
Dave Hansen
68c009c413 x86 mpx: Change return type of get_reg_offset()
get_reg_offset() used to return the register contents themselves
instead of the register offset.  When it did that, it was an
unsigned long.  I changed it to return an integer _offset_
instead of the register.  But, I neglected to change the return
type of the function or the variables in which we store the
result of the call.

This fixes up the code to clear up the warnings from the smatch
bot:

New smatch warnings:
arch/x86/mm/mpx.c:178 mpx_get_addr_ref() warn: unsigned 'addr_offset' is never less than zero.
arch/x86/mm/mpx.c:184 mpx_get_addr_ref() warn: unsigned 'base_offset' is never less than zero.
arch/x86/mm/mpx.c:188 mpx_get_addr_ref() warn: unsigned 'indx_offset' is never less than zero.
arch/x86/mm/mpx.c:196 mpx_get_addr_ref() warn: unsigned 'addr_offset' is never less than zero.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141118182343.C3E0C629@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-19 11:54:12 +01:00
Kees Cook
70b61e3621 x86, kaslr: Handle Gold linker for finding bss/brk
When building with the Gold linker, the .bss and .brk areas of vmlinux
are shown as consecutive instead of having the same file offset. Allow
for either state, as long as things add up correctly.

Fixes: e6023367d7 ("x86, kaslr: Prevent .bss from overlaping initrd")
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Link: http://lkml.kernel.org/r/20141118001604.GA25045@www.outflux.net
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 18:32:24 +01:00
Kees Cook
45e2a9d470 x86, mm: Set NX across entire PMD at boot
When setting up permissions on kernel memory at boot, the end of the
PMD that was split from bss remained executable. It should be NX like
the rest. This performs a PMD alignment instead of a PAGE alignment to
get the correct span of memory.

Before:
---[ High Kernel Mapping ]---
...
0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
0xffffffff82df5000-0xffffffff82e00000    44K     RW       GLB x  pte
0xffffffff82e00000-0xffffffffc0000000   978M                     pmd

After:
---[ High Kernel Mapping ]---
...
0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
0xffffffff82200000-0xffffffff82e00000    12M     RW   PSE GLB NX pmd
0xffffffff82e00000-0xffffffffc0000000   978M                     pmd

[ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment.
        We really should unmap the reminder along with the holes
        caused by init,initdata etc. but thats a different issue ]

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20141114194737.GA3091@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 18:32:24 +01:00
Borislav Petkov
fb86b97300 x86, microcode: Update BSPs microcode on resume
In the situation when we apply early microcode but do *not* apply late
microcode, we fail to update the BSP's microcode on resume because we
haven't initialized the uci->mc microcode pointer. So, in order to
alleviate that, we go and dig out the stashed microcode patch during
early boot. It is basically the same thing that is done on the APs early
during boot so do that too here.

Tested-by: alex.schnaidt@gmail.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88001
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20141118094657.GA6635@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 18:32:24 +01:00
Tiejun Chen
842bb26a40 kvm: x86: vmx: remove MMIO_MAX_GEN
MMIO_MAX_GEN is the same as MMIO_GEN_MASK.  Use only one.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-18 11:12:18 +01:00
Tiejun Chen
81ed33e4aa kvm: x86: vmx: cleanup handle_ept_violation
Instead, just use PFERR_{FETCH, PRESENT, WRITE}_MASK
inside handle_ept_violation() for slightly better code.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-18 11:07:53 +01:00
Rafael J. Wysocki
bd2a0f6754 Merge back cpufreq material for 3.19-rc1. 2014-11-18 01:22:29 +01:00
Dave Hansen
1de4fa14ee x86, mpx: Cleanup unused bound tables
The previous patch allocates bounds tables on-demand.  As noted in
an earlier description, these can add up to *HUGE* amounts of
memory.  This has caused OOMs in practice when running tests.

This patch adds support for freeing bounds tables when they are no
longer in use.

There are two types of mappings in play when unmapping tables:
 1. The mapping with the actual data, which userspace is
    munmap()ing or brk()ing away, etc...
 2. The mapping for the bounds table *backing* the data
    (is tagged with VM_MPX, see the patch "add MPX specific
    mmap interface").

If userspace use the prctl() indroduced earlier in this patchset
to enable the management of bounds tables in kernel, when it
unmaps the first type of mapping with the actual data, the kernel
needs to free the mapping for the bounds table backing the data.
This patch hooks in at the very end of do_unmap() to do so.
We look at the addresses being unmapped and find the bounds
directory entries and tables which cover those addresses.  If
an entire table is unused, we clear associated directory entry
and free the table.

Once we unmap the bounds table, we would have a bounds directory
entry pointing at empty address space. That address space might
now be allocated for some other (random) use, and the MPX
hardware might now try to walk it as if it were a bounds table.
That would be bad.  So any unmapping of an enture bounds table
has to be accompanied by a corresponding write to the bounds
directory entry to invalidate it.  That write to the bounds
directory can fault, which causes the following problem:

Since we are doing the freeing from munmap() (and other paths
like it), we hold mmap_sem for write. If we fault, the page
fault handler will attempt to acquire mmap_sem for read and
we will deadlock.  To avoid the deadlock, we pagefault_disable()
when touching the bounds directory entry and use a
get_user_pages() to resolve the fault.

The unmapping of bounds tables happends under vm_munmap().  We
also (indirectly) call vm_munmap() to _do_ the unmapping of the
bounds tables.  We avoid unbounded recursion by disallowing
freeing of bounds tables *for* bounds tables.  This would not
occur normally, so should not have any practical impact.  Being
strict about it here helps ensure that we do not have an
exploitable stack overflow.

Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151831.E4531C4A@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:54 +01:00
Dave Hansen
fe3d197f84 x86, mpx: On-demand kernel allocation of bounds tables
This is really the meat of the MPX patch set.  If there is one patch to
review in the entire series, this is the one.  There is a new ABI here
and this kernel code also interacts with userspace memory in a
relatively unusual manner.  (small FAQ below).

Long Description:

This patch adds two prctl() commands to provide enable or disable the
management of bounds tables in kernel, including on-demand kernel
allocation (See the patch "on-demand kernel allocation of bounds tables")
and cleanup (See the patch "cleanup unused bound tables"). Applications
do not strictly need the kernel to manage bounds tables and we expect
some applications to use MPX without taking advantage of this kernel
support. This means the kernel can not simply infer whether an application
needs bounds table management from the MPX registers.  The prctl() is an
explicit signal from userspace.

PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
require kernel's help in managing bounds tables.

PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
won't allocate and free bounds tables even if the CPU supports MPX.

PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
directory out of a userspace register (bndcfgu) and then cache it into
a new field (->bd_addr) in  the 'mm_struct'.  PR_MPX_DISABLE_MANAGEMENT
will set "bd_addr" to an invalid address.  Using this scheme, we can
use "bd_addr" to determine whether the management of bounds tables in
kernel is enabled.

Also, the only way to access that bndcfgu register is via an xsaves,
which can be expensive.  Caching "bd_addr" like this also helps reduce
the cost of those xsaves when doing table cleanup at munmap() time.
Unfortunately, we can not apply this optimization to #BR fault time
because we need an xsave to get the value of BNDSTATUS.

==== Why does the hardware even have these Bounds Tables? ====

MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs to
spill them somewhere. It has two special instructions for this
which allow the bounds to be moved between the bounds registers
and some new "bounds tables".

They are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present. This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal processes
address space (essentially calling the new mmap() interface indroduced
earlier in this patch set.) and then pointing the bounds-directory
over to it.

The tables *need* to be accessed and controlled by userspace because
the instructions for moving bounds in and out of them are extremely
frequent. They potentially happen every time a register pointing to
memory is dereferenced. Any direct kernel involvement (like a syscall)
to access the tables would obviously destroy performance.

==== Why not do this in userspace? ====

This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace. Here are
a few ways this *could* be done. I don't think any of them are
practical in the real-world, but here they are.

Q: Can virtual space simply be reserved for the bounds tables so
   that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*. An X-GB virtual
   area needs 4*X GB of virtual space, plus 2GB for the bounds
   directory. If we were to preallocate them for the 128TB of
   user virtual address space, we would need to reserve 512TB+2GB,
   which is larger than the entire virtual address space today.
   This means they can not be reserved ahead of time. Also, a
   single process's pre-popualated bounds directory consumes 2GB
   of virtual *AND* physical memory. IOW, it's completely
   infeasible to prepopulate bounds directories.

Q: Can we preallocate bounds table space at the same time memory
   is allocated which might contain pointers that might eventually
   need bounds tables?
A: This would work if we could hook the site of each and every
   memory allocation syscall. This can be done for small,
   constrained applications. But, it isn't practical at a larger
   scale since a given app has no way of controlling how all the
   parts of the app might allocate memory (think libraries). The
   kernel is really the only place to intercept these calls.

Q: Could a bounds fault be handed to userspace and the tables
   allocated there in a signal handler instead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
   handler functions and even if mmap() would work it still
   requires locking or nasty tricks to keep track of the
   allocation state there.

Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand in
the kernel.

Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Dave Hansen
fcc7ffd679 x86, mpx: Decode MPX instruction to get bound violation information
This patch sets bound violation fields of siginfo struct in #BR
exception handler by decoding the user instruction and constructing
the faulting pointer.

We have to be very careful when decoding these instructions.  They
are completely controlled by userspace and may be changed at any
time up to and including the point where we try to copy them in to
the kernel.  They may or may not be MPX instructions and could be
completely invalid for all we know.

Note: This code is based on Qiaowei Ren's specialized MPX
decoder, but uses the generic decoder whenever possible.  It was
tested for robustness by generating a completely random data
stream and trying to decode that stream.  I also unmapped random
pages inside the stream to test the "partial instruction" short
read code.

We kzalloc() the siginfo instead of stack allocating it because
we need to memset() it anyway, and doing this makes it much more
clear when it got initialized by the MPX instruction decoder.

Changes from the old decoder:
 * Use the generic decoder instead of custom functions.  Saved
   ~70 lines of code overall.
 * Remove insn->addr_bytes code (never used??)
 * Make sure never to possibly overflow the regoff[] array, plus
   check the register range correctly in 32 and 64-bit modes.
 * Allow get_reg() to return an error and have mpx_get_addr_ref()
   handle when it sees errors.
 * Only call insn_get_*() near where we actually use the values
   instead if trying to call them all at once.
 * Handle short reads from copy_from_user() and check the actual
   number of read bytes against what we expect from
   insn_get_length().  If a read stops in the middle of an
   instruction, we error out.
 * Actually check the opcodes intead of ignoring them.
 * Dynamically kzalloc() siginfo_t so we don't leak any stack
   data.
 * Detect and handle decoder failures instead of ignoring them.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151828.5BDD0915@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Qiaowei Ren
57319d80e1 x86, mpx: Add MPX-specific mmap interface
We have chosen to perform the allocation of bounds tables in
kernel (See the patch "on-demand kernel allocation of bounds
tables") and to mark these VMAs with VM_MPX.

However, there is currently no suitable interface to actually do
this.  Existing interfaces, like do_mmap_pgoff(), have no way to
set a modified ->vm_ops or ->vm_flags and don't hold mmap_sem
long enough to let a caller do it.

This patch wraps mmap_region() and hold mmap_sem long enough to
make the modifications to the VMA which we need.

Also note the 32/64-bit #ifdef in the header.  We actually need
to do this at runtime eventually.  But, for now, we don't support
running 32-bit binaries on 64-bit kernels.  Support for this will
come in later patches.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151827.CE440F67@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Dave Hansen
95290cf13e x86, mpx: Add MPX to disabled features
This allows us to use cpu_feature_enabled(X86_FEATURE_MPX) as
both a runtime and compile-time check.

When CONFIG_X86_INTEL_MPX is disabled,
cpu_feature_enabled(X86_FEATURE_MPX) will evaluate at
compile-time to 0. If CONFIG_X86_INTEL_MPX=y, then the cpuid
flag will be checked at runtime.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151823.B358EAD2@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Dave Hansen
62e7759b1b x86, mpx: Rename cfg_reg_u and status_reg
According to Intel SDM extension, MPX configuration and status registers
should be BNDCFGU and BNDSTATUS. This patch renames cfg_reg_u and
status_reg to bndcfgu and bndstatus.

[ tglx: Renamed 'struct bndscr_struct' to 'struct bndscr' ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Link: http://lkml.kernel.org/r/20141114151817.031762AC@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Dave Hansen
c04e051ccc x86: mpx: Give bndX registers actual names
Consider the bndX MPX registers.  There 4 registers each
containing a 64-bit lower and a 64-bit upper bound.  That's 8*64
bits and we declare it thusly:

	struct bndregs_struct {
		u64 bndregs[8];
	}
    
Let's say you want to read the upper bound from the MPX register
bnd2 out of the xsave buf.  You do:

	bndregno = 2;
	upper_bound = xsave_buf->bndregs.bndregs[2*bndregno+1];

That kinda sucks.  Every time you access it, you need to know:
1. Each bndX register is two entries wide in "bndregs"
2. The lower comes first followed by upper.  We do the +1 to get
   upper vs. lower.

This replaces the old definition.  You can now access them
indexed by the register number directly, and with a meaningful
name for the lower and upper bound:

	bndregno = 2;
	xsave_buf->bndreg[bndregno].upper_bound;

It's now *VERY* clear that there are 4 registers.  The programmer
now doesn't have to care what order the lower and upper bounds
are in, and it's harder to get it wrong.

[ tglx: Changed ub/lb to upper_bound/lower_bound and renamed struct
bndreg_struct to struct bndreg ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: "Yu, Fenghua" <fenghua.yu@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141031215820.5EA5E0EC@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:52 +01:00
Dave Hansen
6ba48ff46f x86: Remove arbitrary instruction size limit in instruction decoder
The current x86 instruction decoder steps along through the
instruction stream but always ensures that it never steps farther
than the largest possible instruction size (MAX_INSN_SIZE).

The MPX code is now going to be doing some decoding of userspace
instructions.  We copy those from userspace in to the kernel and
they're obviously completely untrusted coming from userspace.  In
addition to the constraint that instructions can only be so long,
we also have to be aware of how long the buffer is that came in
from userspace.  This _looks_ to be similar to what the perf and
kprobes is doing, but it's unclear to me whether they are
affected.

The whole reason we need this is that it is perfectly valid to be
executing an instruction within MAX_INSN_SIZE bytes of an
unreadable page. We should be able to gracefully handle short
reads in those cases.

This adds support to the decoder to record how long the buffer
being decoded is and to refuse to "validate" the instruction if
we would have gone over the end of the buffer to decode it.

The kprobes code probably needs to be looked at here a bit more
carefully.  This patch still respects the MAX_INSN_SIZE limit
there but the kprobes code does look like it might be able to
be a bit more strict than it currently is.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:52 +01:00
Nadav Amit
f210f7572b KVM: x86: Fix lost interrupt on irr_pending race
apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is
set.  If this assumption is broken and apicv is disabled, the injection of
interrupts may be deferred until another interrupt is delivered to the guest.
Ultimately, if no other interrupt should be injected to that vCPU, the pending
interrupt may be lost.

commit 56cc2406d6 ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv
is in use") changed the behavior of apic_clear_irr so irr_pending is cleared
after setting APIC_IRR vector. After this commit, if apic_set_irr and
apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR
vector set, and irr_pending cleared. In the following example, assume a single
vector is set in IRR prior to calling apic_clear_irr:

apic_set_irr				apic_clear_irr
------------				--------------
apic->irr_pending = true;
					apic_clear_vector(...);
					vec = apic_search_irr(apic);
					// => vec == -1
apic_set_vector(...);
					apic->irr_pending = (vec != -1);
					// => apic->irr_pending == false

Nonetheless, it appears the race might even occur prior to this commit:

apic_set_irr				apic_clear_irr
------------				--------------
apic->irr_pending = true;
					apic->irr_pending = false;
					apic_clear_vector(...);
					if (apic_search_irr(apic) != -1)
						apic->irr_pending = true;
					// => apic->irr_pending == false
apic_set_vector(...);

Fixing this issue by:
1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call
   apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending.
2. On apic_set_irr: first call apic_set_vector, then set irr_pending.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17 12:16:20 +01:00
Paolo Bonzini
a3e339e1ce KVM: compute correct map even if all APICs are software disabled
Logical destination mode can be used to send NMI IPIs even when all
APICs are software disabled, so if all APICs are software disabled we
should still look at the DFRs.

So the DFRs should all be the same, even if some or all APICs are
software disabled.  However, the SDM does not say this, so tweak
the logic as follows:

- if one APIC is enabled and has LDR != 0, use that one to build the map.
This picks the right DFR in case an OS is only setting it for the
software-enabled APICs, or in case an OS is using logical addressing
on some APICs while leaving the rest in reset state (using LDR was
suggested by Radim).

- if all APICs are disabled, pick a random one to build the map.
We use the last one with LDR != 0 for simplicity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17 12:16:19 +01:00
Nadav Amit
173beedc16 KVM: x86: Software disabled APIC should still deliver NMIs
Currently, the APIC logical map does not consider VCPUs whose local-apic is
software-disabled.  However, NMIs, INIT, etc. should still be delivered to such
VCPUs. Therefore, the APIC mode should first be determined, and then the map,
considering all VCPUs should be constructed.

To address this issue, first find the APIC mode, and only then construct the
logical map.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17 12:16:19 +01:00
Linus Torvalds
de55bbbff2 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Microcode fixes, a Xen fix and a KASLR boot loading fix with certain
  memory layouts"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Fix ucode patch stashing on 32-bit
  x86/core, x86/xen/smp: Use 'die_complete' completion when taking CPU down
  x86, microcode: Fix accessing dis_ucode_ldr on 32-bit
  x86, kaslr: Prevent .bss from overlaping initrd
  x86, microcode, AMD: Fix early ucode loading on 32-bit
2014-11-16 11:19:25 -08:00
Linus Torvalds
3b91270a0a x86-64: make csum_partial_copy_from_user() error handling consistent
Al Viro pointed out that the x86-64 csum_partial_copy_from_user() is
somewhat confused about what it should do on errors, notably it mostly
clears the uncopied end result buffer, but misses that for the initial
alignment case.

All users should check for errors, so it's dubious whether the clearing
is even necessary, and Al also points out that we should probably clean
up the calling conventions, but regardless of any future changes to this
function, the fact that it is inconsistent is just annoying.

So make the __get_user() failure path use the same error exit as all the
other errors do.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-16 11:00:42 -08:00
Thomas Gleixner
0dbcae8847 x86: mm: Move PAT only functions to mm/pat.c
Commit e00c8cc93c "x86: Use new cache mode type in memtype related
functions" broke the ARCH=um build.

 arch/x86/include/asm/cacheflush.h:67:36: error: return type is an incomplete type
 static inline enum page_cache_mode get_page_memtype(struct page *pg)

The reason is simple. get_page_memtype() and set_page_memtype()
require enum page_cache_mode now, which is defined in
asm/pgtable_types.h. UM does not include that file for obvious reasons.

The simple solution is to move that functions to arch/x86/mm/pat.c
where the only callsites of this are located. They should have been
there in the first place.

Fixes: e00c8cc93c "x86: Use new cache mode type in memtype related functions"
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Weinberger <richard@nod.at>
2014-11-16 18:59:19 +01:00
Dave Hansen
2cd3949f70 x86: Require exact match for 'noxsave' command line option
We have some very similarly named command-line options:

arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup);

__setup() is designed to match options that take arguments, like
"foo=bar" where you would have:

	__setup("foo", x86_foo_func...);

The problem is that "noxsave" actually _matches_ "noxsaves" in
the same way that "foo" matches "foo=bar".  If you boot an old
kernel that does not know about "noxsaves" with "noxsaves" on the
command line, it will interpret the argument as "noxsave", which
is not what you want at all.

This makes the "noxsave" handler only return success when it finds
an *exact* match.

[ tglx: We really need to make __setup() more robust. ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 12:13:16 +01:00
Stephane Eranian
aea48559ac perf/x86: Add support for sampling PEBS machine state registers
PEBS can capture machine state regs at retiremnt of the sampled
instructions. When precise sampling is enabled on an event, PEBS
is used, so substitute the interrupted state with the PEBS state.
Note that not all registers are captured by PEBS. Those missing
are replaced by the interrupt state counter-parts.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411559322-16548-3-git-send-email-eranian@google.com
Cc: cebbert.lkml@gmail.com
Cc: jolsa@redhat.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:58 +01:00
Andi Kleen
af4bdcf675 perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events
Disallow setting inv/cmask/etc. flags for all PEBS events
on these CPUs, except for the UOPS_RETIRED.* events on Nehalem/Westmere,
which are needed for cycles:p. This avoids an undefined situation
strongly discouraged by the Intle SDM. The PLD_* events were already
covered. This follows the earlier changes for Sandy Bridge and alter.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411569288-5627-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:56 +01:00
Andi Kleen
0dbc94796d perf/x86/intel: Use INTEL_FLAGS_UEVENT_CONSTRAINT for PRECDIST
My earlier commit:

  86a04461a9 ("perf/x86: Revamp PEBS event selection")

made nearly all PEBS on Sandy/IvyBridge/Haswell to reject non zero flags.

However this wasn't done for the INST_RETIRED.PREC_DIST event
because no suitable macro existed. Now that we have
INTEL_FLAGS_UEVENT_CONSTRAINT enforce zero flags for
INST_RETIRED.PREC_DIST too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411569288-5627-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:55 +01:00
Andi Kleen
7550ddffe4 perf/x86: Add INTEL_FLAGS_UEVENT_CONSTRAINT
Add a FLAGS_UEVENT_CONSTRAINT macro that allows us to
match on event+umask, and in additional all flags.

This is needed to ensure the INV and CMASK fields
are zero for specific events, as this can cause undefined
behavior.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1411569288-5627-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:54 +01:00
Andi Kleen
c0737ce453 perf/x86/intel/uncore: Add scaling units to the EP iMC events
Add scaling to MB/s to the memory controller read/write
events for Sandy/IvyBridge/Haswell-EP similar to how the client
does. This makes the events easier to use from the
standard perf tool.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:52 +01:00
Ingo Molnar
f108c898dd Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:31:17 +01:00
Juergen Gross
47591df505 xen: Support Xen pv-domains using PAT
With the dynamical mapping between cache modes and pgprot values it is
now possible to use all cache modes via the Xen hypervisor PAT settings
in a pv domain.

All to be done is to read the PAT configuration MSR and set up the
translation tables accordingly.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: ville.syrjala@linux.intel.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-19-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
bd809af16e x86: Enable PAT to use cache mode translation tables
Update the translation tables from cache mode to pgprot values
according to the PAT settings. This enables changing the cache
attributes of a PAT index in just one place without having to change
at the users side.

With this change it is possible to use the same kernel with different
PAT configurations, e.g. supporting Xen.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-18-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
f5b2831d65 x86: Respect PAT bit when copying pte values between large and normal pages
The PAT bit in the ptes is not moved to the correct position when
copying page protection attributes between entries of different sized
pages. Translate the ptes according to their page size.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-17-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
f439c429c3 x86: Support PAT bit in pagetable dump for lower levels
Dumping page table protection bits is not correct for entries on levels
2 and 3 regarding the PAT bit, which is at a different position as on
level 4.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-16-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
87ad0b713b x86: Clean up pgtable_types.h
Remove no longer used defines from pgtable_types.h as they are not
used any longer.

Switch __PAGE_KERNEL_NOCACHE to use cache mode type instead of pte
bits.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-15-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
e00c8cc93c x86: Use new cache mode type in memtype related functions
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-14-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
b14097bd91 x86: Use new cache mode type in mm/ioremap.c
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-13-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
c06814d841 x86: Use new cache mode type in setting page attributes
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type in the functions for modifying page
attributes.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-12-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
102e19e195 x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
When modifying page attributes via change_page_attr_set_clr() don't
test for setting _PAGE_PAT_LARGE, as this is
- never done
- PAT support for large pages is not included in the kernel up to now

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-11-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
2a3746984c x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type. As those are the main callers of
lookup_memtype(), change this as well.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-10-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
49a3b3cbdf x86: Use new cache mode type in mm/iomap_32.c
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type. This requires to change
io_reserve_memtype() as well.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-9-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
d85f33342a x86: Use new cache mode type in asm/pgtable.h
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type. This requires changing some callers of
is_new_memtype_allowed() to be changed as well.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-8-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
2df58b6d35 x86: Use new cache mode type in arch/x86/mm/init_64.c
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-7-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
1c64216be1 x86: Use new cache mode type in arch/x86/pci
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-6-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
c27ce0af89 x86: Use new cache mode type in include/asm/fb.h
Instead of directly using cache mode bits in the pte switch to usage of
the new cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-3-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:24 +01:00
Juergen Gross
281d4078be x86: Make page cache mode a real type
At the moment there are a lot of places that handle setting or getting
the page cache mode by treating the pgprot bits equal to the cache mode.
This is only true because there are a lot of assumptions about the setup
of the PAT MSR. Otherwise the cache type needs to get translated into
pgprot bits and vice versa.

This patch tries to prepare for that by introducing a separate type
for the cache mode and adding functions to translate between those and
pgprot values.

To avoid too much performance penalty the translation between cache mode
and pgprot values is done via tables which contain the relevant
information.  Write-back cache mode is hard-wired to be 0, all other
modes are configurable via those tables. For large pages there are
translation functions as the PAT bit is located at different positions
in the ptes of 4k and large pages.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-2-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:24 +01:00
Ingo Molnar
e9ac5f0fa8 Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying more changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:50:25 +01:00
Ingo Molnar
595247f61f * Support module unload for efivarfs - Mathias Krause
* Another attempt at moving x86 to libstub taking advantage of the
    __pure attribute - Ard Biesheuvel
 
  * Add EFI runtime services section to ptdump - Mathias Krause
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUZnQGAAoJEC84WcCNIz1VfyUP/1MCVt4vepl7+0JzdUP/eVPs
 CwPM6gBOvgx1PviWrtvSU8UyjtYDqZx7jnCvyvbmlgixAqqIoFm80x5sd9DfyJBj
 vSrmavXaJgQomJN3N+fvaIpGJXp8NQmeNT87++UMb6VE5nYvx7suDcwfTqOaxcYt
 yDwKatTXTvQxDLlGgtymp2UhgVKBICs9WVo8weevB5LPmpt4TFCi1GDSimJkfg+0
 JTvkKF+QxPmVqgwY7bgdFFcfhsYCux5VgtbD4DJKS3LgfLJLAMKPOt83DbeOSmIa
 8zqtlF3eMwHgecKrMLSfIZH3XIl2gsIdPdvT6iBQkwwGZuSzG93JgwJ90HYiKoDm
 yNlffnhmdgI2RXO97UZJpzqGor+eNc0auuS4485PcE8NtZ1tbo20A/OCpfIzK8j8
 Vk7sfZxaaHKF5PUtRe6vo3myRlUCofMIuSEWSF8d709R6AEEuia6RZ3Y45EJPROn
 fKOiLsf7Og1Mk43Iy2lb7kFT766OsUnZZHU/xiIZj/v94HPWFWoFPtxPC0IURvPx
 24oiJxCnyXWGtoyn+SSprl+NAPuPsxVFYriTwaq2RBuoY0NAdy7NIXKe2HTp/WI0
 oSTtYkCRcwiHv0aSrg+yQHmwH7y7m39S3yIS4t5LXenn2G4ObUUjhcAQdF2Ft0Pr
 MT/l+stTt390cpfpcQbE
 =he2O
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi

Pull EFI updates for v3.19 from Matt Fleming:

 - Support module unload for efivarfs - Mathias Krause

 - Another attempt at moving x86 to libstub taking advantage of the
   __pure attribute - Ard Biesheuvel

 - Add EFI runtime services section to ptdump - Mathias Krause

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:48:53 +01:00
Andi Kleen
68055915c1 perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP
There were several reports that on some systems writing the SBOX0 PMU
initialization MSR would #GP at boot. This did not happen on all
systems -- my two test systems booted fine.

Writing the three initialization bits bit-by-bit seems to avoid the
problem. So add a special callback to do just that.

This replaces an earlier patch that disabled the SBOX.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-and-Tested-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-4-git-send-email-andi@firstfloor.org
[ Fixed a whitespace error and added attribution tags that were left out inexplicably. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 09:53:36 +01:00
Andi Kleen
41a134a583 perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP
The counter register offsets for the IRP box PMU for Haswell-EP
were incorrect. The offsets actually changed over IvyBridge EP.

Fix them to the correct values. For this we need to fork the read
function from the IVB and use an own counter array.

Tested-by: patrick.lu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 09:45:47 +01:00
Igor Mammedov
1d4e7e3c0b kvm: x86: increase user memory slots to 509
With the 3 private slots, this gives us 512 slots total.
Motivation for this is in addition to assigned devices
support more memory hotplug slots, where 1 slot is
used by a hotplugged memory stick.
It will allow to support upto 256 hotplug memory
slots and leave 253 slots for assigned devices and
other devices that use them.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-14 10:02:40 +01:00
Chris J Arges
d913b90435 kvm: svm: move WARN_ON in svm_adjust_tsc_offset
When running the tsc_adjust kvm-unit-test on an AMD processor with the
IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be
triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc
is called; however it may trigger unnecessary warnings.

This patch moves the WARN_ON to trigger only if __scale_tsc will actually be
called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common
s64 since this can have signed values.

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-13 11:56:11 +01:00
Daniel Lezcano
b82b6cca48 cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic
The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
method is not set. Otherwise for all the drivers, the time can be correctly
measured.

Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
for all the states, just invert the logic by replacing it by the flag
CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
driver, remove the former flag from all the drivers and invert the logic with
this flag in the different governor.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12 21:17:27 +01:00
Andy Lutomirski
54b98bff8e x86, kvm, vmx: Don't set LOAD_IA32_EFER when host and guest match
There's nothing to switch if the host and guest values are the same.
I am unable to find evidence that this makes any difference
whatsoever.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[I could see a difference on Nehalem.  From 5 runs:

 userspace exit, guest!=host   12200 11772 12130 12164 12327
 userspace exit, guest=host    11983 11780 11920 11919 12040
 lightweight exit, guest!=host  3214  3220  3238  3218  3337
 lightweight exit, guest=host   3178  3193  3193  3187  3220

 This passes the t-test with 99% confidence for userspace exit,
 98.5% confidence for lightweight exit. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-12 16:27:21 +01:00
Aravind Gopalakrishnan
904cb3677f perf/x86/amd/ibs: Update IBS MSRs and feature definitions
New Fam15h models carry extra feature bits and extend
the MSR register space for IBS ops. Adding them here.

While at it, add functionality to read IbsBrTarget and
OpData4 depending on their availability if user wants a
PERF_SAMPLE_RAW.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <paulus@samba.org>
Cc: <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415651066-13523-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-12 15:12:32 +01:00
Herbert Xu
4c7912e919 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Merging 3.18-rc4 in order to pick up the memzero_explicit helper.
2014-11-12 22:11:15 +08:00
Ingo Molnar
29cc373037 Two minor cleanups for the next merge window.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUYlU4AAoJEBLB8Bhh3lVKAc4P/2ulOnLcU+P7YjRfEuOHoxZU
 HJUfs51urqiyQH3R2SaQ9KVr+E1N2j6RxHAVPq6MSWtHro4qnXRDnZy1Z2Mha6rE
 pv0QNE9UmQj/23fbLjr4dGTdYDFOEhRLULV2PBIqS7C55fo4xD34U+iIUtj7nKIY
 j1xGuuWquuNKt9XnBSJlvjmkUsGPpXgOkVVzpB9wirXQPjbhuMGJ5j9wsrNRJs08
 leCp4I4sMyi+iy+UUK40GoXESb5qgLKRQQkwC08tzArr8RzLfMla2qZFApS3h7Zd
 uwL+YbwRxY1MBY4ghpq7zUMqhx/OZz39czbt8PHWcwgEmfLXqKJPhrSGzC+ncEx8
 eaEcqoNDjFpafTLxRraXVPo3karEbbAKCUPjlZnCrbF5L7v1QSUr2/fPCuavRZRA
 R6VFaRqfAnj8N9TDVEPksDUrDLHIdi3FNYqXC9iQUJmCGGOIpk9MZs6/3wjUvd6v
 4QGeUj4IExzrmKaQRLucSOOE1ekeSEVMDqvZsYqgAYsJXwkIBWw/TVtj8sOpA+zV
 3FF1Lj4KqPg08ws3GopCfj06dMQNJWORtA2N5Kv7wYVN0uQK9nTK6TpX/8a+L5NY
 mZ52zrVSMIcc9pRlC/4clYaynZKAPqzBxQJ+0bVZqX4Ra6T8LV8wC9UjVtgsthZi
 uJLVvbDtIsyd6CJ5yqnN
 =bBWV
 -----END PGP SIGNATURE-----

Merge tag 'x86_queue_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/cleanups

Pull two minor cleanups from Borislav Petkov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-12 15:10:12 +01:00
Ingo Molnar
890ca861f8 Linux 3.18-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUX/DqAAoJEHm+PkMAQRiGLtQH/iAt3fRHlYDXjaJian/KG1Cb
 wVP0I+HWZmvVmmd0PzyaxCZLgRNwdmmYHEH4QLy2JwZ3jZfFHlxhy+hDWCgz+67t
 bIzkLs0Pf1T4kJ2+r8qW2kBEz9PWJHGTQw7NTqZ++Ts3rPptBA6Fg4mEJ6fQigXy
 qRIY68DpipUkXV9BWBWijnTmrvP5tt7JtPzBr4DC8frMjvWct8+XwYhc2k2tEv2j
 LwLYb1OW6PUpPv2BQBfWjqqH77vYNQVhJwuwGcDe2YZdI0UFkDheL24+RbbPcZ4f
 OnrLjJSSgzv6lBWkAaXZK7/WJ/JZbXxEqHzWZQ3xXoQov97bm7lEYJqqi5gDasQ=
 =6Qpa
 -----END PGP SIGNATURE-----

Merge tag 'v3.18-rc4' into x86/cleanups, to refresh the tree before pulling new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-12 15:09:01 +01:00
Andy Lutomirski
f6577a5fa1 x86, kvm, vmx: Always use LOAD_IA32_EFER if available
At least on Sandy Bridge, letting the CPU switch IA32_EFER is much
faster than switching it manually.

I benchmarked this using the vmexit kvm-unit-test (single run, but
GOAL multiplied by 5 to do more iterations):

Test                                  Before      After    Change
cpuid                                   2000       1932    -3.40%
vmcall                                  1914       1817    -5.07%
mov_from_cr8                              13         13     0.00%
mov_to_cr8                                19         19     0.00%
inl_from_pmtimer                       19164      10619   -44.59%
inl_from_qemu                          15662      10302   -34.22%
inl_from_kernel                         3916       3802    -2.91%
outl_to_kernel                          2230       2194    -1.61%
mov_dr                                   172        176     2.33%
ipi                                (skipped)  (skipped)
ipi+halt                           (skipped)  (skipped)
ple-round-robin                           13         13     0.00%
wr_tsc_adjust_msr                       1920       1845    -3.91%
rd_tsc_adjust_msr                       1892       1814    -4.12%
mmio-no-eventfd:pci-mem                16394      11165   -31.90%
mmio-wildcard-eventfd:pci-mem           4607       4645     0.82%
mmio-datamatch-eventfd:pci-mem          4601       4610     0.20%
portio-no-eventfd:pci-io               11507       7942   -30.98%
portio-wildcard-eventfd:pci-io          2239       2225    -0.63%
portio-datamatch-eventfd:pci-io         2250       2234    -0.71%

I haven't explicitly computed the significance of these numbers,
but this isn't subtle.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[The results were reproducible on all of Nehalem, Sandy Bridge and
 Ivy Bridge.  The slowness of manual switching is because writing
 to EFER with WRMSR triggers a TLB flush, even if the only bit you're
 touching is SCE (so the page table format is not affected).  Doing
 the write as part of vmentry/vmexit, instead, does not flush the TLB,
 probably because all processors that have EPT also have VPID. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-12 12:35:02 +01:00
Dave Airlie
51b44eb17b Linux 3.18-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUX/DqAAoJEHm+PkMAQRiGLtQH/iAt3fRHlYDXjaJian/KG1Cb
 wVP0I+HWZmvVmmd0PzyaxCZLgRNwdmmYHEH4QLy2JwZ3jZfFHlxhy+hDWCgz+67t
 bIzkLs0Pf1T4kJ2+r8qW2kBEz9PWJHGTQw7NTqZ++Ts3rPptBA6Fg4mEJ6fQigXy
 qRIY68DpipUkXV9BWBWijnTmrvP5tt7JtPzBr4DC8frMjvWct8+XwYhc2k2tEv2j
 LwLYb1OW6PUpPv2BQBfWjqqH77vYNQVhJwuwGcDe2YZdI0UFkDheL24+RbbPcZ4f
 OnrLjJSSgzv6lBWkAaXZK7/WJ/JZbXxEqHzWZQ3xXoQov97bm7lEYJqqi5gDasQ=
 =6Qpa
 -----END PGP SIGNATURE-----

Merge tag 'v3.18-rc4' into drm-next

backmerge to get vmwgfx locking changes into next as the
conflict with per-plane locking.
2014-11-12 17:53:30 +10:00
Dirk Brandewie
2f86dc4cdd intel_pstate: Add support for HWP
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.

With HWP enbaled intel_pstate will no longer be responsible for selecting P
states for the processor. intel_pstate will continue to register to
the cpufreq core as the scaling driver for CPUs implementing
HWP. In HWP mode intel_pstate provides three functions reporting
frequency to the cpufreq core, support for the set_policy() interface
from the core and maintaining the intel_pstate sysfs interface in
/sys/devices/system/cpu/intel_pstate.  User preferences expressed via
the set_policy() interface or the sysfs interface are forwared to the
CPU via the HWP MSR interface.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12 00:04:38 +01:00
Dirk Brandewie
7787388772 x86: Add support for Intel HWP feature detection.
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.

One bit CPUID.06H:EAX[bit 7] expresses the presence of the HWP feature on
the processor. The remaining bits CPUID.06H:EAX[bit 8-11] denote the
presense of various HWP features.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12 00:04:37 +01:00
Mathias Krause
8266e31ed0 x86, ptdump: Add section for EFI runtime services
In commit 3891a04aaf ("x86-64, espfix: Don't leak bits 31:16 of %esp
returning..") the "ESPFix Area" was added to the page table dump special
sections. That area, though, has a limited amount of entries printed.

The EFI runtime services are, unfortunately, located in-between the
espfix area and the high kernel memory mapping. Due to the enforced
limitation for the espfix area, the EFI mappings won't be printed in the
page table dump.

To make the ESP runtime service mappings visible again, provide them a
dedicated entry.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-11-11 22:28:57 +00:00
Ard Biesheuvel
243b6754cd efi/x86: Move x86 back to libstub
This reverts commit 84be880560, which itself reverted my original
attempt to move x86 from #include'ing .c files from across the tree
to using the EFI stub built as a static library.

The issue that affected the original approach was that splitting
the implementation into several .o files resulted in the variable
'efi_early' becoming a global with external linkage, which under
-fPIC implies that references to it must go through the GOT. However,
dealing with this additional GOT entry turned out to be troublesome
on some EFI implementations. (GCC's visibility=hidden attribute is
supposed to lift this requirement, but it turned out not to work on
the 32-bit build.)

Instead, use a pure getter function to get a reference to efi_early.
This approach results in no additional GOT entries being generated,
so there is no need for any changes in the early GOT handling.

Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-11-11 22:23:11 +00:00
Yijing Wang
03f56e42d0 Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()"
The problem fixed by 0e4ccb1505 ("PCI: Add x86_msi.msi_mask_irq() and
msix_mask_irq()") has been fixed in a simpler way by a previous commit
("PCI/MSI: Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask
Bits").

The msi_mask_irq() and msix_mask_irq() x86_msi_ops added by 0e4ccb1505
are no longer needed, so revert the commit.

default_msi_mask_irq() and default_msix_mask_irq() were added by
0e4ccb1505 and are still used by s390, so keep them for now.

[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: xen-devel@lists.xenproject.org
2014-11-11 15:14:30 -07:00
Arnd Bergmann
1c8d29696f Merge branch 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into asm-generic
* 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
  documentation: memory-barriers: clarify relaxed io accessor semantics
  x86: io: implement dummy relaxed accessor macros for writes
  tile: io: implement dummy relaxed accessor macros for writes
  sparc: io: implement dummy relaxed accessor macros for writes
  powerpc: io: implement dummy relaxed accessor macros for writes
  parisc: io: implement dummy relaxed accessor macros for writes
  mn10300: io: implement dummy relaxed accessor macros for writes
  m68k: io: implement dummy relaxed accessor macros for writes
  m32r: io: implement dummy relaxed accessor macros for writes
  ia64: io: implement dummy relaxed accessor macros for writes
  cris: io: implement dummy relaxed accessor macros for writes
  frv: io: implement dummy relaxed accessor macros for writes
  xtensa: io: remove dummy relaxed accessor macros for reads
  s390: io: remove dummy relaxed accessor macros for reads
  microblaze: io: remove dummy relaxed accessor macros
  asm-generic: io: implement relaxed accessor macros as conditional wrappers

Conflicts:
	include/asm-generic/io.h

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2014-11-11 19:55:45 +01:00
Steven Rostedt (Red Hat)
4fd3279b48 ftrace: Add more information to ftrace_bug() output
With the introduction of the dynamic trampolines, it is useful that if
things go wrong that ftrace_bug() produces more information about what
the current state is. This can help debug issues that may arise.

Ftrace has lots of checks to make sure that the state of the system it
touchs is exactly what it expects it to be. When it detects an abnormality
it calls ftrace_bug() and disables itself to prevent any further damage.
It is crucial that ftrace_bug() produces sufficient information that
can be used to debug the situation.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11 12:42:13 -05:00