Commit Graph

4567 Commits

Author SHA1 Message Date
Dave Hansen
d17d8f9ded x86/mm: Add tracepoints for TLB flushes
We don't have any good way to figure out what kinds of flushes
are being attempted.  Right now, we can try to use the vm
counters, but those only tell us what we actually did with the
hardware (one-by-one vs full) and don't tell us what was actually
_requested_.

This allows us to select out "interesting" TLB flushes that we
might want to optimize (like the ranged ones) and ignore the ones
that we have very little control over (the ones at context
switch).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154059.4C96CBA5@viggo.jf.intel.com
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-31 08:48:51 -07:00
Dave Hansen
e9f4e0a9fe x86/mm: Rip out complicated, out-of-date, buggy TLB flushing
I think the flush_tlb_mm_range() code that tries to tune the
flush sizes based on the CPU needs to get ripped out for
several reasons:

1. It is obviously buggy.  It uses mm->total_vm to judge the
   task's footprint in the TLB.  It should certainly be using
   some measure of RSS, *NOT* ->total_vm since only resident
   memory can populate the TLB.
2. Haswell, and several other CPUs are missing from the
   intel_tlb_flushall_shift_set() function.  Thus, it has been
   demonstrated to bitrot quickly in practice.
3. It is plain wrong in my vm:
	[    0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
	[    0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
	[    0.037444] tlb_flushall_shift: 6
   Which leads to it to never use invlpg.
4. The assumptions about TLB refill costs are wrong:
	http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com
    (more on this in later patches)
5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59
   I believe the sample times were too short.  Running the
   benchmark in a loop yields times that vary quite a bit.

Note that this leaves us with a static ceiling of 1 page.  This
is a conservative, dumb setting, and will be revised in a later
patch.

This also removes the code which attempts to predict whether we
are flushing data or instructions.  We expect instruction flushes
to be relatively rare and not worth tuning for explicitly.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.com
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-31 08:48:50 -07:00
Oleg Nesterov
36fac0a214 signals: kill sigfindinword()
It has no users and it doesn't look useful.  I do not know why/when it was
introduced, I can't even find any user in the git history.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Linus Torvalds
046f153343 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 EFI updates from Peter Anvin:
 "A collection of EFI changes.  The perhaps most important one is to
  fully save and restore the FPU state around each invocation of EFI
  runtime, and to not choke on non-ASCII characters in the boot stub"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efivars: Add compatibility code for compat tasks
  efivars: Refactor sanity checking code into separate function
  efivars: Stop passing a struct argument to efivar_validate()
  efivars: Check size of user object
  efivars: Use local variables instead of a pointer dereference
  x86/efi: Save and restore FPU context around efi_calls (i386)
  x86/efi: Save and restore FPU context around efi_calls (x86_64)
  x86/efi: Implement a __efi_call_virt macro
  x86, fpu: Extend the use of static_cpu_has_safe
  x86/efi: Delete most of the efi_call* macros
  efi: x86: Handle arbitrary Unicode characters
  efi: Add get_dram_base() helper function
  efi: Add shared printk wrapper for consistent prefixing
  efi: create memory map iteration helper
  efi: efi-stub-helper cleanup
2014-06-05 08:16:29 -07:00
Linus Torvalds
a0abcf2e8f Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 cdso updates from Peter Anvin:
 "Vdso cleanups and improvements largely from Andy Lutomirski.  This
  makes the vdso a lot less ''special''"

* 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso, build: Make LE access macros clearer, host-safe
  x86/vdso, build: Fix cross-compilation from big-endian architectures
  x86/vdso, build: When vdso2c fails, unlink the output
  x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
  x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls
  x86, mm: Improve _install_special_mapping and fix x86 vdso naming
  mm, fs: Add vm_ops->name as an alternative to arch_vma_name
  x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
  x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments
  x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
  x86, vdso: Move the 32-bit vdso special pages after the text
  x86, vdso: Reimplement vdso.so preparation in build-time C
  x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
  x86, vdso: Clean up 32-bit vs 64-bit vdso params
  x86, mm: Ensure correct alignment of the fixmap
2014-06-05 08:05:29 -07:00
Linus Torvalds
2071b3e34f Merge branch 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86-64 espfix changes from Peter Anvin:
 "This is the espfix64 code, which fixes the IRET information leak as
  well as the associated functionality problem.  With this code applied,
  16-bit stack segments finally work as intended even on a 64-bit
  kernel.

  Consequently, this patchset also removes the runtime option that we
  added as an interim measure.

  To help the people working on Linux kernels for very small systems,
  this patchset also makes these compile-time configurable features"

* 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
  x86, espfix: Make it possible to disable 16-bit support
  x86, espfix: Make espfix64 a Kconfig option, fix UML
  x86, espfix: Fix broken header guard
  x86, espfix: Move espfix definitions into a separate header file
  x86-32, espfix: Remove filter for espfix32 due to race
  x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
2014-06-05 07:46:15 -07:00
Linus Torvalds
00170fdd08 Merge branch 'akpm' (patchbomb from Andrew) into next
Merge misc updates from Andrew Morton:

 - a few fixes for 3.16.  Cc'ed to stable so they'll get there somehow.

 - various misc fixes and cleanups

 - most of the ocfs2 queue.  Review is slow...

 - most of MM.  The MM queue is pretty huge this time, but not much in
   the way of feature work.

 - some tweaks under kernel/

 - printk maintenance work

 - updates to lib/

 - checkpatch updates

 - tweaks to init/

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (276 commits)
  fs/autofs4/dev-ioctl.c: add __init to autofs_dev_ioctl_init
  fs/ncpfs/getopt.c: replace simple_strtoul by kstrtoul
  init/main.c: remove an ifdef
  kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND
  init/main.c: add initcall_blacklist kernel parameter
  init/main.c: don't use pr_debug()
  fs/binfmt_flat.c: make old_reloc() static
  fs/binfmt_elf.c: fix bool assignements
  fs/efs: convert printk(KERN_DEBUG to pr_debug
  fs/efs: add pr_fmt / use __func__
  fs/efs: convert printk to pr_foo()
  scripts/checkpatch.pl: device_initcall is not the only __initcall substitute
  checkpatch: check stable email address
  checkpatch: warn on unnecessary void function return statements
  checkpatch: prefer kstrto<foo> to sscanf(buf, "%<lhuidx>", &bar);
  checkpatch: add warning for kmalloc/kzalloc with multiply
  checkpatch: warn on #defines ending in semicolon
  checkpatch: make --strict a default for files in drivers/net and net/
  checkpatch: always warn on missing blank line after variable declaration block
  checkpatch: fix wildcard DT compatible string checking
  ...
2014-06-04 16:55:13 -07:00
Fabian Frederick
f6187769da sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL
sys_sgetmask and sys_ssetmask are obsolete system calls no longer
supported in libc.

This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
mode configuration.That option is enabled by default for those
architectures.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:14 -07:00
Chen Yucong
65eb71823b hwpoison: remove unused global variable in do_machine_check()
Remove an unused global variable mce_entry and relative operations in
do_machine_check().

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:11 -07:00
Cyrill Gorcunov
2bf01f9f0c mm: x86 pgtable: require X86_64 for soft-dirty tracker
Tracking dirty status on 2 level pages requires very ugly macros and
taking into account how old the machines who can operate without PAE
mode only are, lets drop soft dirty tracker from them for code
simplicity (note I can't drop all the macros from 2 level pages by now
since _PAGE_BIT_PROTNONE and _PAGE_BIT_FILE are still used even without
tracker).

Linus proposed to completely rip off softdirty support on x86-32 (even
with PAE) and since for CRIU we're not planning to support native x86-32
mode, lets do that.

(Softdirty tracker is relatively new feature which is mostly used by
CRIU so I don't expect if such API change would cause problems for
userspace).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:05 -07:00
Cyrill Gorcunov
2373eaecff mm: x86 pgtable: drop unneeded preprocessor ifdef
_PAGE_BIT_FILE (bit 6) is always less than _PAGE_BIT_PROTNONE (bit 8), so
drop redundant #ifdef.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:05 -07:00
Akinobu Mita
9c5a362142 x86: enable DMA CMA with swiotlb
The DMA Contiguous Memory Allocator support on x86 is disabled when
swiotlb config option is enabled.  So DMA CMA is always disabled on
x86_64 because swiotlb is always enabled.  This attempts to support for
DMA CMA with enabling swiotlb config option.

The contiguous memory allocator on x86 is integrated in the function
dma_generic_alloc_coherent() which is .alloc callback in nommu_dma_ops
for dma_alloc_coherent().

x86_swiotlb_alloc_coherent() which is .alloc callback in swiotlb_dma_ops
tries to allocate with dma_generic_alloc_coherent() firstly and then
swiotlb_alloc_coherent() is called as a fallback.

The main part of supporting DMA CMA with swiotlb is that changing
x86_swiotlb_free_coherent() which is .free callback in swiotlb_dma_ops
for dma_free_coherent() so that it can distinguish memory allocated by
dma_generic_alloc_coherent() from one allocated by
swiotlb_alloc_coherent() and release it with dma_generic_free_coherent()
which can handle contiguous memory.  This change requires making
is_swiotlb_buffer() global function.

This also needs to change .free callback in the dma_map_ops for amd_gart
and sta2x11, because these dma_ops are also using
dma_generic_alloc_coherent().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:57 -07:00
Mel Gorman
c46a7c817e x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels
_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
faults on x86.  Care is taken such that _PAGE_NUMA is used only in
situations where the VMA flags distinguish between NUMA hinting faults
and prot_none faults.  This decision was x86-specific and conceptually
it is difficult requiring special casing to distinguish between PROTNONE
and NUMA ptes based on context.

Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
between an entry that is really unmapped and a page that is protected
for NUMA hinting faults as if the PTE is not present then a fault will
be trapped.

Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset.
This patch shrinks the maximum possible swap size and uses the bit to
uniquely distinguish between NUMA hinting ptes and swap ptes.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:55 -07:00
Linus Torvalds
d09cc3659d Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core irq updates from Thomas Gleixner:
 "The irq department delivers:

   - Another tree wide update to get rid of the horrible create_irq
     interface along with its even more horrible variants.  That also
     gets rid of the last leftovers of the initial sparse irq hackery.
     arch/driver specific changes have been either acked or ignored.

   - A fix for the spurious interrupt detection logic with threaded
     interrupts.

   - A new ARM SoC interrupt controller

   - The usual pile of fixes and improvements all over the place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  Documentation: brcmstb-l2: Add Broadcom STB Level-2 interrupt controller binding
  irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller
  genirq: Improve documentation to match current implementation
  ARM: iop13xx: fix msi support with sparse IRQ
  genirq: Provide !SMP stub for irq_set_affinity_notifier()
  irqchip: armada-370-xp: Move the devicetree binding documentation
  irqchip: gic: Use mask field in GICC_IAR
  genirq: Remove dynamic_irq mess
  ia64: Use irq_init_desc
  genirq: Replace dynamic_irq_init/cleanup
  genirq: Remove irq_reserve_irq[s]
  genirq: Replace reserve_irqs in core code
  s390: Avoid call to irq_reserve_irqs()
  s390: Remove pointless arch_show_interrupts()
  s390: pci: Check return value of alloc_irq_desc() proper
  sh: intc: Remove pointless irq_reserve_irqs() invocation
  x86, irq: Remove pointless irq_reserve_irqs() call
  genirq: Make create/destroy_irq() ia64 private
  tile: Use SPARSE_IRQ
  tile: pci: Use irq_alloc/free_hwirq()
  ...
2014-06-04 15:59:13 -07:00
Linus Torvalds
4dc4226f99 ACPI and power management updates for 3.16-rc1
- ACPICA update to upstream version 20140424.  That includes a
    number of fixes and improvements related to things like GPE
    handling, table loading, headers, memory mapping and unmapping,
    DSDT/SSDT overriding, and the Unload() operator.  The acpidump
    utility from upstream ACPICA is included too.  From Bob Moore,
    Lv Zheng, David Box, David Binderman, and Colin Ian King.
 
  - Fixes and cleanups related to ACPI video and backlight interfaces
    from Hans de Goede.  That includes blacklist entries for some new
    machines and using native backlight by default.
 
  - ACPI device enumeration changes to create platform devices
    rather than PNP devices for ACPI device objects with _HID by
    default.  PNP devices will still be created for the ACPI device
    object with device IDs corresponding to real PNP devices, so
    that change should not break things left and right, and we're
    expecting to see more and more ACPI-enumerated platform devices
    in the future.  From Zhang Rui and Rafael J Wysocki.
 
  - Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing
    it to handle system suspend/resume on Asus T100 correctly.
    From Heikki Krogerus and Rafael J Wysocki.
 
  - PM core update introducing a mechanism to allow runtime-suspended
    devices to stay suspended over system suspend/resume transitions
    if certain additional conditions related to coordination within
    device hierarchy are met.  Related PM documentation update and
    ACPI PM domain support for the new feature.  From Rafael J Wysocki.
 
  - Fixes and improvements related to the "freeze" sleep state. They
    affect several places including cpuidle, PM core, ACPI core, and
    the ACPI battery driver.  From Rafael J Wysocki and Zhang Rui.
 
  - Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
    Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.
 
  - Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
    Aggregator Device) drivers from Baoquan He, Manuel Schölling,
    Tony Camuso, and Toshi Kani.
 
  - System suspend/resume optimization in the ACPI battery driver from
    Lan Tianyu.
 
  - OPP (Operating Performance Points) subsystem updates from
    Chander Kashyap, Mark Brown, and Nishanth Menon.
 
  - cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
    Stratos Karafotis, and Viresh Kumar.
 
  - Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
    s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
    Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
    Viresh Kumar.
 
  - intel_pstate driver fixes and cleanups from Dirk Brandewie,
    Doug Smythies, and Stratos Karafotis.
 
  - Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.
 
  - Fix for the cpuidle menu governor from Chander Kashyap.
 
  - New ARM clps711x cpuidle driver from Alexander Shiyan.
 
  - Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
    Fabian Frederick, Pali Rohár, and Sebastian Capella.
 
  - Intel RAPL (Running Average Power Limit) driver updates from
    Jacob Pan.
 
  - PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.
 
  - devfreq core updates from Chanwoo Choi and Paul Bolle.
 
  - devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
    Bartlomiej Zolnierkiewicz.
 
  - turbostat tool fix from Jean Delvare.
 
  - cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
    and Thomas Renninger.
 
  - New ACPI ec_access.c tool for poking at the EC in a safe way
    from Thomas Renninger.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTjl16AAoJEILEb/54YlRxeKgP/RRQSV7lFtf582Dw/5M/iWOg
 qYeNtuYFLArEmJ7SpxHdKsU1ZRm3CahAS1j7grvQMQasUxTzoavMcSBNZefeaoNK
 d01LVNqcyKCZs3+izRezk5N1IY+AjdrOcqCdIk8rfgFnc6kOttYUrVcIzKuIKAvJ
 MsJ5s/uqP8G69FsAA3Ttdtr0HKiQhN4skSt424wntQRDeJNZPBs74mPKBGh8bxlO
 Zr/VCDibKQ2Z8jS7x+TzwZrOxgE1/9x0Cub6GAdTvAfS8A+utPwSkneUyopNqpQ+
 tJ5rz5R+HpmPMerizBuU+5s+tvjDPtH4/OZvOPSpYraQSFLOwx3hAm+a5k7fOGmc
 XWjXnXWT0i0V3iQkwrspTNjX1RgywbsHbmXrcWn192HResvMQ9zk2gH2ch6m8JhN
 yTV5V51dOZicpPuaTCvIkJpsV33p6vRz+EdPBiXoEdua5KKtOg8EnQ470dNaMR92
 3ZtWmIvSgGlyPyHlSHLfGXbPUwTYvDNV3aheIoXp9E6WY3WJN9J3WXm4EHKBNVaI
 H83kwuk1s92cgqh22H5Pcb0CmDcrbkUdP6hhsPS/aL80/EJMljRP2AYW1Y+l1LAf
 pzMLmekHFqQEDjFQltwGvFV/EjFeMHnqOgQONx9ygMaayCGGTYSDx3FbRDesf8t9
 qhoFcTPSxoo0XjrGrR6b
 =tpdF
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm into next

Pull ACPI and power management updates from Rafael Wysocki:
 "ACPICA is the leader this time (63 commits), followed by cpufreq (28
  commits), devfreq (15 commits), system suspend/hibernation (12
  commits), ACPI video and ACPI device enumeration (10 commits each).

  We have no major new features this time, but there are a few
  significant changes of how things work.  The most visible one will
  probably be that we are now going to create platform devices rather
  than PNP devices by default for ACPI device objects with _HID.  That
  was long overdue and will be really necessary to be able to use the
  same drivers for the same hardware blocks on ACPI and DT-based systems
  going forward.  We're not expecting fallout from this one (as usual),
  but it's something to watch nevertheless.

  The second change having a chance to be visible is that ACPI video
  will now default to using native backlight rather than the ACPI
  backlight interface which should generally help systems with broken
  Win8 BIOSes.  We're hoping that all problems with the native backlight
  handling that we had previously have been addressed and we are in a
  good enough shape to flip the default, but this change should be easy
  enough to revert if need be.

  In addition to that, the system suspend core has a new mechanism to
  allow runtime-suspended devices to stay suspended throughout system
  suspend/resume transitions if some extra conditions are met
  (generally, they are related to coordination within device hierarchy).
  However, enabling this feature requires cooperation from the bus type
  layer and for now it has only been implemented for the ACPI PM domain
  (used by ACPI-enumerated platform devices mostly today).

  Also, the acpidump utility that was previously shipped as a separate
  tool will now be provided by the upstream ACPICA along with the rest
  of ACPICA code, which will allow it to be more up to date and better
  supported, and we have one new cpuidle driver (ARM clps711x).

  The rest is improvements related to certain specific use cases,
  cleanups and fixes all over the place.

  Specifics:

   - ACPICA update to upstream version 20140424.  That includes a number
     of fixes and improvements related to things like GPE handling,
     table loading, headers, memory mapping and unmapping, DSDT/SSDT
     overriding, and the Unload() operator.  The acpidump utility from
     upstream ACPICA is included too.  From Bob Moore, Lv Zheng, David
     Box, David Binderman, and Colin Ian King.

   - Fixes and cleanups related to ACPI video and backlight interfaces
     from Hans de Goede.  That includes blacklist entries for some new
     machines and using native backlight by default.

   - ACPI device enumeration changes to create platform devices rather
     than PNP devices for ACPI device objects with _HID by default.  PNP
     devices will still be created for the ACPI device object with
     device IDs corresponding to real PNP devices, so that change should
     not break things left and right, and we're expecting to see more
     and more ACPI-enumerated platform devices in the future.  From
     Zhang Rui and Rafael J Wysocki.

   - Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing it
     to handle system suspend/resume on Asus T100 correctly.  From
     Heikki Krogerus and Rafael J Wysocki.

   - PM core update introducing a mechanism to allow runtime-suspended
     devices to stay suspended over system suspend/resume transitions if
     certain additional conditions related to coordination within device
     hierarchy are met.  Related PM documentation update and ACPI PM
     domain support for the new feature.  From Rafael J Wysocki.

   - Fixes and improvements related to the "freeze" sleep state.  They
     affect several places including cpuidle, PM core, ACPI core, and
     the ACPI battery driver.  From Rafael J Wysocki and Zhang Rui.

   - Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
     Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.

   - Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
     Aggregator Device) drivers from Baoquan He, Manuel Schölling, Tony
     Camuso, and Toshi Kani.

   - System suspend/resume optimization in the ACPI battery driver from
     Lan Tianyu.

   - OPP (Operating Performance Points) subsystem updates from Chander
     Kashyap, Mark Brown, and Nishanth Menon.

   - cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
     Stratos Karafotis, and Viresh Kumar.

   - Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
     s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
     Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
     Viresh Kumar.

   - intel_pstate driver fixes and cleanups from Dirk Brandewie, Doug
     Smythies, and Stratos Karafotis.

   - Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.

   - Fix for the cpuidle menu governor from Chander Kashyap.

   - New ARM clps711x cpuidle driver from Alexander Shiyan.

   - Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
     Fabian Frederick, Pali Rohár, and Sebastian Capella.

   - Intel RAPL (Running Average Power Limit) driver updates from Jacob
     Pan.

   - PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.

   - devfreq core updates from Chanwoo Choi and Paul Bolle.

   - devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
     Bartlomiej Zolnierkiewicz.

   - turbostat tool fix from Jean Delvare.

   - cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
     and Thomas Renninger.

   - New ACPI ec_access.c tool for poking at the EC in a safe way from
     Thomas Renninger"

* tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (187 commits)
  ACPICA: Namespace: Remove _PRP method support.
  intel_pstate: Improve initial busy calculation
  intel_pstate: add sample time scaling
  intel_pstate: Correct rounding in busy calculation
  intel_pstate: Remove C0 tracking
  PM / hibernate: fixed typo in comment
  ACPI: Fix x86 regression related to early mapping size limitation
  ACPICA: Tables: Add mechanism to control early table checksum verification.
  ACPI / scan: use platform bus type by default for _HID enumeration
  ACPI / scan: always register ACPI LPSS scan handler
  ACPI / scan: always register memory hotplug scan handler
  ACPI / scan: always register container scan handler
  ACPI / scan: Change the meaning of missing .attach() in scan handlers
  ACPI / scan: introduce platform_id device PNP type flag
  ACPI / scan: drop unsupported serial IDs from PNP ACPI scan handler ID list
  ACPI / scan: drop IDs that do not comply with the ACPI PNP ID rule
  ACPI / PNP: use device ID list for PNPACPI device enumeration
  ACPI / scan: .match() callback for ACPI scan handlers
  ACPI / battery: wakeup the system only when necessary
  power_supply: allow power supply devices registered w/o wakeup source
  ...
2014-06-04 08:57:16 -07:00
Linus Torvalds
b05d59dfce At over 200 commits, covering almost all supported architectures, this
was a pretty active cycle for KVM.  Changes include:
 
 - a lot of s390 changes: optimizations, support for migration,
   GDB support and more
 
 - ARM changes are pretty small: support for the PSCI 0.2 hypercall
   interface on both the guest and the host (the latter acked by Catalin)
 
 - initial POWER8 and little-endian host support
 
 - support for running u-boot on embedded POWER targets
 
 - pretty large changes to MIPS too, completing the userspace interface
   and improving the handling of virtualized timer hardware
 
 - for x86, a larger set of changes is scheduled for 3.17.  Still,
   we have a few emulator bugfixes and support for running nested
   fully-virtualized Xen guests (para-virtualized Xen guests have
   always worked).  And some optimizations too.
 
 The only missing architecture here is ia64.  It's not a coincidence
 that support for KVM on ia64 is scheduled for removal in 3.17.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX
 BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1
 j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8
 W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ
 0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH
 CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA
 nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB
 TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m
 ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og
 VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl
 3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz
 bAknaZpPiOrW11Et1htY
 =j5Od
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next

Pull KVM updates from Paolo Bonzini:
 "At over 200 commits, covering almost all supported architectures, this
  was a pretty active cycle for KVM.  Changes include:

   - a lot of s390 changes: optimizations, support for migration, GDB
     support and more

   - ARM changes are pretty small: support for the PSCI 0.2 hypercall
     interface on both the guest and the host (the latter acked by
     Catalin)

   - initial POWER8 and little-endian host support

   - support for running u-boot on embedded POWER targets

   - pretty large changes to MIPS too, completing the userspace
     interface and improving the handling of virtualized timer hardware

   - for x86, a larger set of changes is scheduled for 3.17.  Still, we
     have a few emulator bugfixes and support for running nested
     fully-virtualized Xen guests (para-virtualized Xen guests have
     always worked).  And some optimizations too.

  The only missing architecture here is ia64.  It's not a coincidence
  that support for KVM on ia64 is scheduled for removal in 3.17"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
  KVM: add missing cleanup_srcu_struct
  KVM: PPC: Book3S PR: Rework SLB switching code
  KVM: PPC: Book3S PR: Use SLB entry 0
  KVM: PPC: Book3S HV: Fix machine check delivery to guest
  KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
  KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
  KVM: PPC: Book3S HV: Fix dirty map for hugepages
  KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
  KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
  KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
  KVM: PPC: Book3S: Add ONE_REG register names that were missed
  KVM: PPC: Add CAP to indicate hcall fixes
  KVM: PPC: MPIC: Reset IRQ source private members
  KVM: PPC: Graciously fail broken LE hypercalls
  PPC: ePAPR: Fix hypercall on LE guest
  KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
  KVM: PPC: BOOK3S: Always use the saved DAR value
  PPC: KVM: Make NX bit available with magic page
  KVM: PPC: Disable NX for old magic page using guests
  KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
  ...
2014-06-04 08:47:12 -07:00
Linus Torvalds
3de0ef8d0d Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86/UV changes from Ingo Molnar:
 "Continued updates for SGI UV 3 hardware support"

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/UV: Fix conditional in gru_exit()
  x86/UV: Set n_lshift based on GAM_GR_CONFIG MMR for UV3
2014-06-03 15:48:23 -07:00
Linus Torvalds
4aef77b2fe Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 IOSF platform updates from Ingo Molnar:
 "IOSF (Intel OnChip System Fabric) updates:

   - generalize the IOSF interface to allow mixed mode drivers: non-IOSF
     drivers to utilize of IOSF features on IOSF platforms.

   - add 'Quark X1000' IOSF/MBI support

   - clean up BayTrail and Quark PCI ID enumeration"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, iosf: Add PCI ID macros for better readability
  x86, iosf: Add Quark X1000 PCI ID
  x86, iosf: Added Quark MBI identifiers
  x86, iosf: Make IOSF driver modular and usable by more drivers
2014-06-03 15:46:38 -07:00
Linus Torvalds
c33c40549e Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 microcode changes from Ingo Molnar:
 "A microcode-debugging boot flag plus related refactoring"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode: Add a disable chicken bit
  x86, boot: Carve out early cmdline parsing function
2014-06-03 15:44:50 -07:00
Linus Torvalds
e30c631be6 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 irq cleanup from Ingo Molnar:
 "A single, trivial cleanup"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Clean up VECTOR_UNDEFINED and VECTOR_RETRIGGERED definition
2014-06-03 15:43:38 -07:00
Rafael J. Wysocki
0e36d43c9c Merge branch 'acpica'
* acpica: (63 commits)
  ACPICA: Namespace: Remove _PRP method support.
  ACPI: Fix x86 regression related to early mapping size limitation
  ACPICA: Tables: Add mechanism to control early table checksum verification.
  ACPICA: acpidump: Fix repetitive table dump in -n mode.
  ACPI: Clean up acpi_os_map/unmap_memory() to eliminate __iomem.
  ACPICA: Clean up redudant definitions already defined elsewhere
  ACPICA: Linux headers: Add <asm/acenv.h> to remove mis-ordered inclusion of <asm/acpi.h>
  ACPICA: Linux headers: Add <acpi/platform/aclinuxex.h>
  ACPICA: Linux headers: Remove ACPI_PREEMPTION_POINT() due to no usages.
  ACPICA: Update version to 20140424.
  ACPICA: Comment/format update, no functional change.
  ACPICA: Events: Update GPE handling and initialization code.
  ACPICA: Remove extraneous error message for large number of GPEs.
  ACPICA: Tables: Remove old mechanism to validate if XSDT contains NULL entries.
  ACPICA: Tables: Add new mechanism to skip NULL entries in RSDT and XSDT.
  ACPICA: acpidump: Add support to force using RSDT.
  ACPICA: Back port of improvements on exception code.
  ACPICA: Back port of _PRP update.
  ACPICA: acpidump: Fix truncated RSDP signature validation.
  ACPICA: Linux header: Add support for stubbed externals.
  ...
2014-06-03 23:12:27 +02:00
Linus Torvalds
c84a1e32ee Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull scheduler updates from Ingo Molnar:
 "The main scheduling related changes in this cycle were:

   - various sched/numa updates, for better performance

   - tree wide cleanup of open coded nice levels

   - nohz fix related to rq->nr_running use

   - cpuidle changes and continued consolidation to improve the
     kernel/sched/idle.c high level idle scheduling logic.  As part of
     this effort I pulled cpuidle driver changes from Rafael as well.

   - standardized idle polling amongst architectures

   - continued work on preparing better power/energy aware scheduling

   - sched/rt updates

   - misc fixlets and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  sched/numa: Decay ->wakee_flips instead of zeroing
  sched/numa: Update migrate_improves/degrades_locality()
  sched/numa: Allow task switch if load imbalance improves
  sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
  sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
  sched: Initialize rq->age_stamp on processor start
  sched, nohz: Change rq->nr_running to always use wrappers
  sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
  sched: Use clamp() and clamp_val() to make sys_nice() more readable
  sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
  sched/numa: Fix initialization of sched_domain_topology for NUMA
  sched: Call select_idle_sibling() when not affine_sd
  sched: Simplify return logic in sched_read_attr()
  sched: Simplify return logic in sched_copy_attr()
  sched: Fix exec_start/task_hot on migrated tasks
  arm64: Remove TIF_POLLING_NRFLAG
  metag: Remove TIF_POLLING_NRFLAG
  sched/idle: Make cpuidle_idle_call() void
  sched/idle: Reflow cpuidle_idle_call()
  sched/idle: Delay clearing the polling bit
  ...
2014-06-03 14:00:15 -07:00
Linus Torvalds
3d521f9151 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull perf updates from Ingo Molnar:
 "The tooling changes maintained by Jiri Olsa until Arnaldo is on
  vacation:

  User visible changes:
   - Add -F option for specifying output fields (Namhyung Kim)
   - Propagate exit status of a command line workload for record command
     (Namhyung Kim)
   - Use tid for finding thread (Namhyung Kim)
   - Clarify the output of perf sched map plus small sched command
     fixes (Dongsheng Yang)
   - Wire up perf_regs and unwind support for ARM64 (Jean Pihet)
   - Factor hists statistics counts processing which in turn also fixes
     several bugs in TUI report command (Namhyung Kim)
   - Add --percentage option to control absolute/relative percentage
     output (Namhyung Kim)
   - Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by
     completion scripts (Ramkumar Ramachandra)

  Development/infrastructure changes and fixes:
   - Android related fixes for pager and map dso resolving (Michael
     Lentine)
   - Add libdw DWARF post unwind support for ARM (Jean Pihet)
   - Consolidate types.h for ARM and ARM64 (Jean Pihet)
   - Fix possible null pointer dereference in session.c (Masanari Iida)
   - Cleanup, remove unused variables in map_switch_event() (Dongsheng
     Yang)
   - Remove nr_state_machine_bugs in perf latency (Dongsheng Yang)
   - Remove usage of trace_sched_wakeup(.success) (Peter Zijlstra)
   - Cleanups for perf.h header (Jiri Olsa)
   - Consolidate types.h and export.h within tools (Borislav Petkov)
   - Move u64_swap union to its single user's header, evsel.h (Borislav
     Petkov)
   - Fix for s390 to properly parse tracepoints plus test code
     (Alexander Yarygin)
   - Handle EINTR error for readn/writen (Namhyung Kim)
   - Add a test case for hists filtering (Namhyung Kim)
   - Share map_groups among threads of the same group (Arnaldo Carvalho
     de Melo, Jiri Olsa)
   - Making some code (cpu node map and report parse callchain callback)
     global to be usable by upcomming changes (Don Zickus)
   - Fix pmu object compilation error (Jiri Olsa)

  Kernel side changes:
   - intrusive uprobes fixes from Oleg Nesterov.  Since the interface is
     admin-only, and the bug only affects user-space ("any probed
     jmp/call can kill the application"), we queued these fixes via the
     development tree, as a special exception.
   - more fuzzer motivated race fixes and related refactoring and
     robustization.
   - allow PMU drivers to be built as modules.  (No actual module yet,
     because the x86 Intel uncore module wasn't ready in time for this)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  perf tools: Add automatic remapping of Android libraries
  perf tools: Add cat as fallback pager
  perf tests: Add a testcase for histogram output sorting
  perf tests: Factor out print_hists_*()
  perf tools: Introduce reset_output_field()
  perf tools: Get rid of obsolete hist_entry__sort_list
  perf hists: Reset width of output fields with header length
  perf tools: Skip elided sort entries
  perf top: Add --fields option to specify output fields
  perf report/tui: Fix a bug when --fields/sort is given
  perf tools: Add ->sort() member to struct sort_entry
  perf report: Add -F option to specify output fields
  perf tools: Call perf_hpp__init() before setting up GUI browsers
  perf tools: Consolidate management of default sort orders
  perf tools: Allow hpp fields to be sort keys
  perf ui: Get rid of callback from __hpp__fmt()
  perf tools: Consolidate output field handling to hpp format routines
  perf tools: Use hpp formats to sort final output
  perf tools: Support event grouping in hpp ->sort()
  perf tools: Use hpp formats to sort hist entries
  ...
2014-06-03 13:18:00 -07:00
Linus Torvalds
776edb5931 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - reduced/streamlined smp_mb__*() interface that allows more usecases
     and makes the existing ones less buggy, especially in rarer
     architectures

   - add rwsem implementation comments

   - bump up lockdep limits"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  rwsem: Add comments to explain the meaning of the rwsem's count field
  lockdep: Increase static allocations
  arch: Mass conversion of smp_mb__*()
  arch,doc: Convert smp_mb__*()
  arch,xtensa: Convert smp_mb__*()
  arch,x86: Convert smp_mb__*()
  arch,tile: Convert smp_mb__*()
  arch,sparc: Convert smp_mb__*()
  arch,sh: Convert smp_mb__*()
  arch,score: Convert smp_mb__*()
  arch,s390: Convert smp_mb__*()
  arch,powerpc: Convert smp_mb__*()
  arch,parisc: Convert smp_mb__*()
  arch,openrisc: Convert smp_mb__*()
  arch,mn10300: Convert smp_mb__*()
  arch,mips: Convert smp_mb__*()
  arch,metag: Convert smp_mb__*()
  arch,m68k: Convert smp_mb__*()
  arch,m32r: Convert smp_mb__*()
  arch,ia64: Convert smp_mb__*()
  ...
2014-06-03 12:57:53 -07:00
Linus Torvalds
425553209b PCI changes for the v3.16 merge window:
Enumeration
     - Notify driver before and after device reset (Keith Busch)
     - Use reset notification in NVMe (Keith Busch)
 
   NUMA
     - Warn if we have to guess host bridge node information (Myron Stowe)
     - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee Suthikulpanit)
     - Clean up and mark early_root_info_init() as deprecated (Suravee Suthikulpanit)
 
   Driver binding
     - Add "driver_override" for force specific binding (Alex Williamson)
     - Fail "new_id" addition for devices we already know about (Bandan Das)
 
   Resource management
     - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
     - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
     - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
     - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
     - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
     - Don't convert BAR address to resource if dma_addr_t is too small (Bjorn Helgaas)
     - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
     - Don't print anything while decoding is disabled (Bjorn Helgaas)
     - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
     - Add resource allocation comments (Bjorn Helgaas)
     - Restrict 64-bit prefetchable bridge windows to 64-bit resources (Yinghai Lu)
     - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)
 
   PCI device hotplug
     - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
     - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
     - Fix rphahp endianess issues (Laurent Dufour)
     - Acknowledge spurious "cmd completed" event (Rajat Jain)
     - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
     - Fix cpqphp possible NULL dereference (Rickard Strandqvist)
 
   MSI
     - Replace pci_enable_msi_block() by pci_enable_msi_exact() (Alexander Gordeev)
     - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
     - Simplify populate_msi_sysfs() (Jan Beulich)
 
   Virtualization
     - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
     - Mark RTL8110SC INTx masking as broken (Alex Williamson)
 
   Generic host bridge driver
     - Add generic PCI host controller driver (Will Deacon)
 
   Freescale i.MX6
     - Use new clock names (Lucas Stach)
     - Drop old IRQ mapping (Lucas Stach)
     - Remove optional (and unused) IRQs (Lucas Stach)
     - Add support for MSI (Lucas Stach)
     - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)
 
   Renesas R-Car
     - Add gen2 device tree support (Ben Dooks)
     - Use new OF interrupt mapping when possible (Lucas Stach)
     - Add PCIe driver (Phil Edworthy)
     - Add PCIe MSI support (Phil Edworthy)
     - Add PCIe device tree bindings (Phil Edworthy)
 
   Samsung Exynos
     - Remove unnecessary OOM messages (Jingoo Han)
     - Fix add_pcie_port() section mismatch warning (Sachin Kamat)
 
   Synopsys DesignWare
     - Make MSI ISR shared IRQ aware (Lucas Stach)
 
   Miscellaneous
     - Check for broken config space aliasing (Alex Williamson)
     - Update email address (Ben Hutchings)
     - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
     - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
     - Remove unnecessary __ref annotations (Bjorn Helgaas)
     - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns (Bjorn Helgaas)
     - Fix use of uninitialized MPS value (Bjorn Helgaas)
     - Tidy x86/gart messages (Bjorn Helgaas)
     - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
     - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
     - Remove unused serial device IDs (Jean Delvare)
     - Use designated initialization in PCI_VDEVICE (Mark Rustad)
     - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
     - Configure MPS on ARM (Murali Karicheri)
     - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
     - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
     - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
     - Remove pcibios_add_platform_entries() (Sebastian Ott)
     - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
     - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
     - Add and use new pci_is_bridge() interface (Yijing Wang)
     - Make pci_bus_add_device() void (Yijing Wang)
 
   DMA API
     - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
     - Fix typos in docs (Emilio López)
     - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
     - Change dma_declare_coherent_memory() CPU address to phys_addr_t (Bjorn Helgaas)
     - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory() (Bjorn Helgaas)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjMaQAAoJEFmIoMA60/r8XncQAKX7cD6btXCZnrcYo7inseyp
 3rwOlrsNkWyHqSj/RqqzE1NY6L1h5G2uliI6xg1SKenuHPDcosm5d8FYO0ORKiUs
 xrqBkmZJHXN7fck//tJwsTXiYh5u42RO8QWbvZVr5UqXe40LyaMHMh9Y7VarrU/o
 sM2ADzFKagv1qMQ13nmYxqT+Zl+CqpimyLP+ep6Nfqxi6ils+KJ6b9SKYqrpqE6t
 Mcq2K5ShqU5SaYub1JIXLcQ9XylID+t1M9+cwixcs7a87HbJiktfkGqQvQJoUIuK
 Q5U+abcIGk4vfOnDCctSnoRyrcbTAZ/vqfo0vpX22TokESjwrD8hFOX5HPOFtD+4
 wIDbYurW/8oBhLRaJ0uTPzSH8bXjXTynAwxHZgIrEur5908eECKQ/WiFCxyrovvv
 r4ThAN0FaobllEr0XOFESOzDNSt/ME00WWI7+puAJ/KJkFEtcXt9othLmLmvLz8H
 2GWXrm/aOR0WUO7foGUxI3bXYlDN6NbSKpfuZsLAi2VAyJJ6L6yVSo/fT0X07e3z
 qRy9LOohuiwIKv/I4F2SEq2REfGGsnkrJBoeQi/oBZDcBy1Lsi7P9LWIERhLQEM+
 Hm+30lC/f326nI3hoyThj2k2xxZOQzCIvrt658xP4qd9Zfe1bvCH58FF8K62CoOd
 p8XAf7Sl6v6YUodUrT/t
 =km55
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into next

Pull PCI changes from Bjorn Helgaas:
 "Enumeration
    - Notify driver before and after device reset (Keith Busch)
    - Use reset notification in NVMe (Keith Busch)

  NUMA
    - Warn if we have to guess host bridge node information (Myron Stowe)
    - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee
      Suthikulpanit)
    - Clean up and mark early_root_info_init() as deprecated (Suravee
      Suthikulpanit)

  Driver binding
    - Add "driver_override" for force specific binding (Alex Williamson)
    - Fail "new_id" addition for devices we already know about (Bandan
      Das)

  Resource management
    - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
    - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
    - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
    - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
    - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
    - Don't convert BAR address to resource if dma_addr_t is too small
      (Bjorn Helgaas)
    - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
    - Don't print anything while decoding is disabled (Bjorn Helgaas)
    - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
    - Add resource allocation comments (Bjorn Helgaas)
    - Restrict 64-bit prefetchable bridge windows to 64-bit resources
      (Yinghai Lu)
    - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)

  PCI device hotplug
    - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
    - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
    - Fix rphahp endianess issues (Laurent Dufour)
    - Acknowledge spurious "cmd completed" event (Rajat Jain)
    - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
    - Fix cpqphp possible NULL dereference (Rickard Strandqvist)

  MSI
    - Replace pci_enable_msi_block() by pci_enable_msi_exact()
      (Alexander Gordeev)
    - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
    - Simplify populate_msi_sysfs() (Jan Beulich)

  Virtualization
    - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
    - Mark RTL8110SC INTx masking as broken (Alex Williamson)

  Generic host bridge driver
    - Add generic PCI host controller driver (Will Deacon)

  Freescale i.MX6
    - Use new clock names (Lucas Stach)
    - Drop old IRQ mapping (Lucas Stach)
    - Remove optional (and unused) IRQs (Lucas Stach)
    - Add support for MSI (Lucas Stach)
    - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)

  Renesas R-Car
    - Add gen2 device tree support (Ben Dooks)
    - Use new OF interrupt mapping when possible (Lucas Stach)
    - Add PCIe driver (Phil Edworthy)
    - Add PCIe MSI support (Phil Edworthy)
    - Add PCIe device tree bindings (Phil Edworthy)

  Samsung Exynos
    - Remove unnecessary OOM messages (Jingoo Han)
    - Fix add_pcie_port() section mismatch warning (Sachin Kamat)

  Synopsys DesignWare
    - Make MSI ISR shared IRQ aware (Lucas Stach)

  Miscellaneous
    - Check for broken config space aliasing (Alex Williamson)
    - Update email address (Ben Hutchings)
    - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
    - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
    - Remove unnecessary __ref annotations (Bjorn Helgaas)
    - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns
      (Bjorn Helgaas)
    - Fix use of uninitialized MPS value (Bjorn Helgaas)
    - Tidy x86/gart messages (Bjorn Helgaas)
    - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
    - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
    - Remove unused serial device IDs (Jean Delvare)
    - Use designated initialization in PCI_VDEVICE (Mark Rustad)
    - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
    - Configure MPS on ARM (Murali Karicheri)
    - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
    - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
    - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
    - Remove pcibios_add_platform_entries() (Sebastian Ott)
    - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
    - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
    - Add and use new pci_is_bridge() interface (Yijing Wang)
    - Make pci_bus_add_device() void (Yijing Wang)

  DMA API
    - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
    - Fix typos in docs (Emilio López)
    - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
    - Change dma_declare_coherent_memory() CPU address to phys_addr_t
      (Bjorn Helgaas)
    - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory()
      (Bjorn Helgaas)"

* tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (92 commits)
  MAINTAINERS: Add generic PCI host controller driver
  PCI: generic: Add generic PCI host controller driver
  PCI: imx6: Add support for MSI
  PCI: designware: Make MSI ISR shared IRQ aware
  PCI: imx6: Remove optional (and unused) IRQs
  PCI: imx6: Drop old IRQ mapping
  PCI: imx6: Use new clock names
  i82875p_edac: Assign PCI resources before adding device
  ARM/PCI: Call pcie_bus_configure_settings() to set MPS
  PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning
  PCI: Make pci_bus_add_device() void
  PCI: exynos: Fix add_pcie_port() section mismatch warning
  PCI: Introduce new device binding path using pci_dev.driver_override
  PCI: rcar: Add gen2 device tree support
  PCI: cpqphp: Fix possible null pointer dereference
  PCI: rcar: Add R-Car PCIe device tree bindings
  PCI: rcar: Add MSI support for PCIe
  PCI: rcar: Add Renesas R-Car PCIe driver
  PCI: Fix return value from pci_user_{read,write}_config_*()
  PCI: exynos: Remove unnecessary OOM messages
  ...
2014-06-02 12:15:19 -07:00
Linus Torvalds
9f888b3a10 xen: features and fixes for 3.16-rc0
- Support foreign mappings in PVH domains (needed when dom0 is PVH)
 
 - Fix mapping high MMIO regions in x86 PV guests (this is also the
   first half of removing the PAGE_IOMAP PTE flag).
 
 - ARM suspend/resume support.
 
 - ARM multicall support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJTjE5MAAoJEFxbo/MsZsTRtl8H/2lfS9w05e60vRxjolPV0vRc
 5k9DcYFeJ+k2cz/2T3mNlIvKdfBTesSfgVquH+28GhQz+uKFQ1OrJpYNDTougSw5
 Wv0Ae8e+7eLABvJ9XMiZdDsPzsICw2wqWOvqrnQi2qR3SIimBc5tBigR4+Rccv+e
 btuBLlYT4WPQ8qgNyCBPgxzuyxteu5wK/0XryX6NcbrxeEbAzQAeDKkmvCD4fSvx
 KxrwTO3mwV4Lefmf/WS4Z9fDcPujQOUqKEtUWanw/2JalO1BzDPo+1wvYs0LduLC
 QI/YJN4SL3UeGOmbX2tyIaRgMsAcQVVrYkTm1cp8eD7vcRuvXaqy6dxuX05+V4g=
 =cxfG
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.16-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into next

Pull Xen updates from David Vrabel:
 "xen: features and fixes for 3.16-rc0
   - support foreign mappings in PVH domains (needed when dom0 is PVH)

   - fix mapping high MMIO regions in x86 PV guests (this is also the
     first half of removing the PAGE_IOMAP PTE flag).

   - ARM suspend/resume support.

   - ARM multicall support"

* tag 'stable/for-linus-3.16-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: map foreign pfns for autotranslated guests
  xen-acpi-processor: Don't display errors when we get -ENOSYS
  xen/pciback: Document the entry points for 'pcistub_put_pci_dev'
  xen/pciback: Document when the 'unbind' and 'bind' functions are called.
  xen-pciback: Document when we FLR an PCI device.
  xen-pciback: First reset, then free.
  xen-pciback: Cleanup up pcistub_put_pci_dev
  x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range()
  x86/xen: set regions above the end of RAM as 1:1
  x86/xen: only warn once if bad MFNs are found during setup
  x86/xen: compactly store large identity ranges in the p2m
  x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN
  x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle()
  xen/x86: set panic notifier priority to minimum
  arm,arm64/xen: introduce HYPERVISOR_suspend()
  xen: refactor suspend pre/post hooks
  arm: xen: export HYPERVISOR_multicall to modules.
  arm64: introduce virt_to_pfn
  arm/xen: Remove definiition of virt_to_pfn in asm/xen/page.h
  arm: xen: implement multicall hypercall support.
2014-06-02 08:24:12 -07:00
Minchan Kim
6538b8ea88 x86_64: expand kernel stack to 16K
While I play inhouse patches with much memory pressure on qemu-kvm,
3.14 kernel was randomly crashed. The reason was kernel stack overflow.

When I investigated the problem, the callstack was a little bit deeper
by involve with reclaim functions but not direct reclaim path.

I tried to diet stack size of some functions related with alloc/reclaim
so did a hundred of byte but overflow was't disappeard so that I encounter
overflow by another deeper callstack on reclaim/allocator path.

Of course, we might sweep every sites we have found for reducing
stack usage but I'm not sure how long it saves the world(surely,
lots of developer start to add nice features which will use stack
agains) and if we consider another more complex feature in I/O layer
and/or reclaim path, it might be better to increase stack size(
meanwhile, stack usage on 64bit machine was doubled compared to 32bit
while it have sticked to 8K. Hmm, it's not a fair to me and arm64
already expaned to 16K. )

So, my stupid idea is just let's expand stack size and keep an eye
toward stack consumption on each kernel functions via stacktrace of ftrace.
For example, we can have a bar like that each funcion shouldn't exceed 200K
and emit the warning when some function consumes more in runtime.
Of course, it could make false positive but at least, it could make a
chance to think over it.

I guess this topic was discussed several time so there might be
strong reason not to increase kernel stack size on x86_64, for me not
knowing so Ccing x86_64 maintainers, other MM guys and virtio
maintainers.

Here's an example call trace using up the kernel stack:

         Depth    Size   Location    (51 entries)
         -----    ----   --------
   0)     7696      16   lookup_address
   1)     7680      16   _lookup_address_cpa.isra.3
   2)     7664      24   __change_page_attr_set_clr
   3)     7640     392   kernel_map_pages
   4)     7248     256   get_page_from_freelist
   5)     6992     352   __alloc_pages_nodemask
   6)     6640       8   alloc_pages_current
   7)     6632     168   new_slab
   8)     6464       8   __slab_alloc
   9)     6456      80   __kmalloc
  10)     6376     376   vring_add_indirect
  11)     6000     144   virtqueue_add_sgs
  12)     5856     288   __virtblk_add_req
  13)     5568      96   virtio_queue_rq
  14)     5472     128   __blk_mq_run_hw_queue
  15)     5344      16   blk_mq_run_hw_queue
  16)     5328      96   blk_mq_insert_requests
  17)     5232     112   blk_mq_flush_plug_list
  18)     5120     112   blk_flush_plug_list
  19)     5008      64   io_schedule_timeout
  20)     4944     128   mempool_alloc
  21)     4816      96   bio_alloc_bioset
  22)     4720      48   get_swap_bio
  23)     4672     160   __swap_writepage
  24)     4512      32   swap_writepage
  25)     4480     320   shrink_page_list
  26)     4160     208   shrink_inactive_list
  27)     3952     304   shrink_lruvec
  28)     3648      80   shrink_zone
  29)     3568     128   do_try_to_free_pages
  30)     3440     208   try_to_free_pages
  31)     3232     352   __alloc_pages_nodemask
  32)     2880       8   alloc_pages_current
  33)     2872     200   __page_cache_alloc
  34)     2672      80   find_or_create_page
  35)     2592      80   ext4_mb_load_buddy
  36)     2512     176   ext4_mb_regular_allocator
  37)     2336     128   ext4_mb_new_blocks
  38)     2208     256   ext4_ext_map_blocks
  39)     1952     160   ext4_map_blocks
  40)     1792     384   ext4_writepages
  41)     1408      16   do_writepages
  42)     1392      96   __writeback_single_inode
  43)     1296     176   writeback_sb_inodes
  44)     1120      80   __writeback_inodes_wb
  45)     1040     160   wb_writeback
  46)      880     208   bdi_writeback_workfn
  47)      672     144   process_one_work
  48)      528     112   worker_thread
  49)      416     240   kthread
  50)      176     176   ret_from_fork

[ Note: the problem is exacerbated by certain gcc versions that seem to
  generate much bigger stack frames due to apparently bad coalescing of
  temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
  35% more stack on the virtio path than 4.8.2 does, for example.

  Minchan not only uses such a bad gcc version (4.6.3 in his case), but
  some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
  what causes that kernel_map_pages() frame, for example). But we're
  clearly getting too close.

  The VM code also seems to have excessive stack frames partly for the
  same compiler reason, triggered by excessive inlining and lots of
  function arguments.

  We need to improve on our stack use, but in the meantime let's do this
  simple stack increase too.  Unlike most earlier reports, there is
  nothing simple that stands out as being really horribly wrong here,
  apart from the fact that the stack frames are just bigger than they
  should need to be.        - Linus ]

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S Tsirkin <mst@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-30 11:52:51 -07:00
Hanjun Guo
a43ae58c84 PCI: Turn pcibios_penalize_isa_irq() into a weak function
pcibios_penalize_isa_irq() is only implemented by x86 now, and legacy ISA
is not used by some architectures.  Make pcibios_penalize_isa_irq() a
__weak function to simplify the code.  This removes the need for new
platforms to add stub implementations of pcibios_penalize_isa_irq().

[bhelgaas: changelog, comments]
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2014-05-27 16:23:58 -06:00
Lv Zheng
92985ef1db ACPICA: Clean up redudant definitions already defined elsewhere
Since mis-order issues have been solved, we can cleanup redundant
definitions that already have defaults in <acpi/platform/acenv.h>.

This patch removes redudant environments for __KERNEL__ surrounded code.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-27 18:13:08 +02:00
Lv Zheng
07d8391433 ACPICA: Linux headers: Add <asm/acenv.h> to remove mis-ordered inclusion of <asm/acpi.h>
There is a mis-order inclusion for <asm/acpi.h>.

As we will enforce including <linux/acpi.h> for all Linux ACPI users, we
can find the inclusion order is as follows:

<linux/acpi.h>
  <acpi/acpi.h>
   <acpi/platform/acenv.h>
    (acenv.h before including aclinux.h)
    <acpi/platform/aclinux.h>
...........................................................................
     (aclinux.h before including asm/acpi.h)
     <asm/acpi.h>                             @Redundant@
      (ACPICA specific stuff)
...........................................................................
...........................................................................
      (Linux ACPI specific stuff) ? - - - - - - - - - - - - +
     (aclinux.h after including asm/acpi.h)   @Invisible@   |
    (acenv.h after including aclinux.h)       @Invisible@   |
   other ACPICA headers                       @Invisible@   |
............................................................|..............
  <acpi/acpi_bus.h>                                         |
  <acpi/acpi_drivers.h>                                     |
  <asm/acpi.h> (Excluded)                                   |
   (Linux ACPI specific stuff) ! <- - - - - - - - - - - - - +

NOTE that, in ACPICA, <acpi/platform/acenv.h> is more like Kconfig
generated <generated/autoconf.h> for Linux, it is meant to be included
before including any ACPICA code.

In the above figure, there is a question mark for "Linux ACPI specific
stuff" in <asm/acpi.h> which should be included after including all other
ACPICA header files.  Thus they really need to be moved to the position
marked with exclaimation mark or the definitions in the blocks marked with
"@Invisible@" will be invisible to such architecture specific "Linux ACPI
specific stuff" header blocks.  This leaves 2 issues:
1. All environmental definitions in these blocks should have a copy in the
   area marked with "@Redundant@" if they are required by the "Linux ACPI
   specific stuff".
2. We cannot use any ACPICA defined types in <asm/acpi.h>.

This patch splits architecture specific ACPICA stuff from <asm/acpi.h> to
fix this issue.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-27 18:13:07 +02:00
Dave Hansen
65a7f03f6b x86: fix page fault tracing when KVM guest support enabled
I noticed on some of my systems that page fault tracing doesn't
work:

	cd /sys/kernel/debug/tracing
	echo 1 > events/exceptions/enable
	cat trace;
	# nothing shows up

I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
KVM VM, enabling that option breaks page fault tracing, and
disabling fixes it.  I tried on some old kernels and this does
not appear to be a regression: it never worked.

There are two page-fault entry functions today.  One when tracing
is on and another when it is off.  The KVM code calls do_page_fault()
directly instead of calling the traced version:

> dotraplinkage void __kprobes
> do_async_page_fault(struct pt_regs *regs, unsigned long
> error_code)
> {
>         enum ctx_state prev_state;
>
>         switch (kvm_read_and_reset_pf_reason()) {
>         default:
>                 do_page_fault(regs, error_code);
>                 break;
>         case KVM_PV_REASON_PAGE_NOT_PRESENT:

I'm also having problems with the page fault tracing on bare
metal (same symptom of no trace output).  I'm unsure if it's
related.

Steven had an alternative to this which has zero overhead when
tracing is off where this includes the standard noops even when
tracing is disabled.  I'm unconvinced that the extra complexity
of his apporach:

	http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home

is worth it, expecially considering that the KVM code is already
making page fault entry slower here.  This solution is
dirt-simple.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:17 +02:00
Paolo Bonzini
ae9fedc793 KVM: x86: get CPL from SS.DPL
CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.

However, SS.DPL *is* always equal to the CPL except for the weird
case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).

So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
CR0.PE=1 but before CS has been reloaded.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:17 +02:00
Paolo Bonzini
fb5e336b97 KVM: x86: drop set_rflags callback
Not needed anymore now that the CPL is computed directly
during task switch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:16 +02:00
Ingo Molnar
65c2ce7004 Linux 3.15-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTfR2zAAoJEHm+PkMAQRiG3noH/2s+KUge3qO2M+AmxttUo74B
 +npAMdbqYR3MdEiwxYZfsHcMu4Ye/IKLcrh4pydB5hI2mdjtGkH1bnmia0f1ve/c
 Z/a0256+W8gWp7mcUBqSNztqLPAWa7wKOqNdLjj5idr1BSj6u8im+fQ9FBh2woki
 1fyYAuq/60lq4CMOKJvkA95V1Ome/jO+8tS4PguOgsCETQxCVFGurZcBbG3Mx5Y3
 v+ioCqeRc6GvxPFR6YngnTZCrsLxSRT3tnO2Qy5zX7dxjIQkCEbvIckpBQv01Y3R
 wNUaX+2Jae207igxrEv8CjmCFnmZFuUI15aWWCy6fOS/j8bjuk6ThYJO8N4ZBM0=
 =2ShG
 -----END PGP SIGNATURE-----

Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:28:56 +02:00
H. Peter Anvin
03c1b4e8e5 Merge remote-tracking branch 'origin/x86/espfix' into x86/vdso
Merge x86/espfix into x86/vdso, due to changes in the vdso setup code
that otherwise cause conflicts.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-21 17:36:33 -07:00
H. Peter Anvin
e6ab9a20e7 Merge commit '7ed6fb9b5a5510e4ef78ab27419184741169978a' into x86/espfix
Merge in Linus' tree with:

fa81511bb0 x86-64, modify_ldt: Make support for 16-bit segments a runtime option

... reverted, to avoid a conflict.  This commit is no longer necessary
with the proper fix in place.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-21 15:23:19 -07:00
Borislav Petkov
65cef1311d x86, microcode: Add a disable chicken bit
Add a cmdline param which disables the microcode loader. This is useful
mostly in debugging situations where we want to turn off microcode
loading, both early from the initrd and late, as a means to be able to
rule out its influence on the machine.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1400525957-11525-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-20 20:21:27 -07:00
Borislav Petkov
1b1ded57a4 x86, boot: Carve out early cmdline parsing function
Carve out early cmdline parsing function into .../lib/cmdline.c so it
can be used by early code in the kernel proper as well.

Adapted from arch/x86/boot/cmdline.c.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1400525957-11525-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-20 20:21:24 -07:00
Andy Lutomirski
a62c34bd2a x86, mm: Improve _install_special_mapping and fix x86 vdso naming
Using arch_vma_name to give special mappings a name is awkward.  x86
currently implements it by comparing the start address of the vma to
the expected address of the vdso.  This requires tracking the start
address of special mappings and is probably buggy if a special vma
is split or moved.

Improve _install_special_mapping to just name the vma directly.  Use
it to give the x86 vvar area a name, which should make CRIU's life
easier.

As a side effect, the vvar area will show up in core dumps.  This
could be considered weird and is fixable.

[hpa: I say we accept this as-is but be prepared to deal with knocking
 out the vvars from core dumps if this becomes a problem.]

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-20 11:38:42 -07:00
Thomas Gleixner
a553b142b8 iommu: dmar: Provide arch specific irq allocation
ia64 and x86 share this driver. x86 is moving to a different irq
allocation and ia64 keeps its private irq_create/destroy stuff.

Use macros to redirect to one or the other. Yes, macros to avoid
include hell.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Cc: x86@kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20140507154336.372289825@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:19 +02:00
Thomas Gleixner
d07c9f1875 x86: Get rid of get_nr_irqs_gsi()
No need to expose this outside of the ioapic code. The dynamic
allocations are guaranteed not to happen in the gsi space. See commit
62a08ae2a.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20140507154335.959870037@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:19 +02:00
Anthony Iliopoulos
9844f54623 x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()
The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().

The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.

Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp
Suggested-by: Shay Goikhman <shay.goikhman@huawei.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v2.6.16+ (!)
2014-05-13 16:34:09 -07:00
Ong Boon Leong
7ef1def800 x86, iosf: Added Quark MBI identifiers
Added all the MBI units below and their associated read/write
opcodes:
 - Host Bridge Arbiter
 - Host Bridge
 - Remote Management Unit
 - Memory Manager & eSRAM
 - SoC Unit

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-3-git-send-email-david.e.box@linux.intel.com
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09 14:57:08 -07:00
David E. Box
6b8f0c8780 x86, iosf: Make IOSF driver modular and usable by more drivers
Currently drivers that run on non-IOSF systems (Core/Xeon) can't use the IOSF
driver on SOC's without selecting it which forces an unnecessary and limiting
dependency. Provides dummy functions to allow these modules to conditionally
use the driver on IOSF equipped platforms without impacting their ability to
compile and load on non-IOSF platforms. Build default m to ensure availability
on x86 SOC's.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-2-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09 14:56:15 -07:00
Andres Freund
c45f77364b x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro
The spuriously added semicolon didn't have any effect because the
macro isn't currently in use.

c0a639ad0b

Signed-off-by: Andres Freund <andres@anarazel.de>
Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-09 08:42:47 -07:00
Peter Zijlstra
f80c5b39b8 sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG
Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that
both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word.
This will allow us, using fetch_or(), to both set NEED_RESCHED and
check for POLLING_NRFLAG in a single operation and avoid pointless
wakeups.

Changing from the non-atomic thread_info::status flags to the atomic
thread_info::flags shouldn't be a big issue since most polling state
changes were followed/preceded by a full memory barrier anyway.

Also, fix up the apm_32 idle function, clearly that was forgotten in
the last conversion. The default idle state is !POLLING so just kill
the lot.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Link: http://lkml.kernel.org/n/tip-7yksmqtlv4nfowmlqr1rifoi@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08 09:16:56 +02:00
Feng Tang
f10f383d84 x86/hpet: Make boot_hpet_disable extern
HPET on some platform has accuracy problem. Making
"boot_hpet_disable" extern so that we can runtime disable
the HPET timer by using quirk to check the platform.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08 08:15:34 +02:00
Andy Lutomirski
f40c330091 x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
This makes the 64-bit and x32 vdsos use the same mechanism as the
32-bit vdso.  Most of the churn is deleting all the old fixmap code.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:19:01 -07:00
Andy Lutomirski
18d0a6fd22 x86, vdso: Move the 32-bit vdso special pages after the text
This unifies the vdso mapping code and teaches it how to map special
pages at addresses corresponding to symbols in the vdso image.  The
new code is used for all vdso variants, but so far only the 32-bit
variants use the new vvar page position.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/b6d7858ad7b5ac3fd3c29cab6d6d769bc45d195e.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:56 -07:00
Andy Lutomirski
6f121e548f x86, vdso: Reimplement vdso.so preparation in build-time C
Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel.  Replace all of that with plain C code that
runs at build time.

All five vdso images now generate .c files that are compiled and
linked in to the kernel image.

This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be.  Everything
outside the loadable segment is dropped.  In particular, this causes
the section table and section name strings to be missing.  This
should be fine: real dynamic loaders don't load or inspect these
tables anyway.  The result is roughly equivalent to eu-strip's
--strip-sections option.

The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment.  Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page.  This happens whenever the load segment is just under a
multiple of PAGE_SIZE.

The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'.  This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall.  That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation.  Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work.  vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.

(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:51 -07:00