Commit Graph

10011 Commits

Author SHA1 Message Date
Borislav Petkov
e1b43e3f13 x86, microcode: Share native MSR accessing variants
We want to use those in AMD's early loading path too. Also, add a
native_wrmsrl variant.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2014-01-13 19:57:27 +01:00
Borislav Petkov
5aa3d718f2 x86, ramdisk: Export relocated ramdisk VA
The ramdisk can possibly get relocated if the whole image is not mapped.
And since we're going over it in the microcode loader and fishing out
the relevant microcode patches, we want access it at its new location.
Thus, export it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2014-01-13 19:56:25 +01:00
Ingo Molnar
c9c8986847 Merge branch 'x86/idle' into sched/core
Merge these x86 specific bits - we are going to add generic bits as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 17:37:05 +01:00
Peter Zijlstra
10b033d434 sched/clock, x86: Avoid a runtime condition in native_sched_clock()
Use a static_key to avoid touching tsc_disabled and a runtime
condition in native_sched_clock() -- less cachelines touched is always
better.

                        MAINLINE   PRE       POST

    sched_clock_stable: 1          1         1
    (cold) sched_clock: 329841     215295    213039
    (cold) local_clock: 301773     220773    216084
    (warm) sched_clock: 38375      25659     25231
    (warm) local_clock: 100371     27242     27601
    (warm) rdtsc:       27340      24208     24203
    sched_clock_stable: 0          0         0
    (cold) sched_clock: 382634     237019    240055
    (cold) local_clock: 396890     294819    299942
    (warm) sched_clock: 38194      25609     25276
    (warm) local_clock: 143452     71232     73232
    (warm) rdtsc:       27345      24243     24244

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-hrz87bo37qke25bty6pnfy4b@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:13:17 +01:00
Peter Zijlstra
35af99e646 sched/clock, x86: Use a static_key for sched_clock_stable
In order to avoid the runtime condition and variable load turn
sched_clock_stable into a static_key.

Also provide a shorter implementation of local_clock() and
cpu_clock(int) when sched_clock_stable==1.

                        MAINLINE   PRE       POST

    sched_clock_stable: 1          1         1
    (cold) sched_clock: 329841     221876    215295
    (cold) local_clock: 301773     234692    220773
    (warm) sched_clock: 38375      25602     25659
    (warm) local_clock: 100371     33265     27242
    (warm) rdtsc:       27340      24214     24208
    sched_clock_stable: 0          0         0
    (cold) sched_clock: 382634     235941    237019
    (cold) local_clock: 396890     297017    294819
    (warm) sched_clock: 38194      25233     25609
    (warm) local_clock: 143452     71234     71232
    (warm) rdtsc:       27345      24245     24243

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:13:13 +01:00
Peter Zijlstra
20d1c86a57 sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
Use a ring-buffer like multi-version object structure which allows
always having a coherent object; we use this to avoid having to
disable IRQs while reading sched_clock() and avoids a problem when
getting an NMI while changing the cyc2ns data.

                        MAINLINE   PRE        POST

    sched_clock_stable: 1          1          1
    (cold) sched_clock: 329841     331312     257223
    (cold) local_clock: 301773     310296     309889
    (warm) sched_clock: 38375      38247      25280
    (warm) local_clock: 100371     102713     85268
    (warm) rdtsc:       27340      27289      24247
    sched_clock_stable: 0          0          0
    (cold) sched_clock: 382634     372706     301224
    (cold) local_clock: 396890     399275     399870
    (warm) sched_clock: 38194      38124      25630
    (warm) local_clock: 143452     148698     129629
    (warm) rdtsc:       27345      27365      24307

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:13:06 +01:00
Prarit Bhargava
c7a730fa46 x86/irq: Fix kbuild warning in smp_irq_move_cleanup_interrupt()
Fengguang Wu's 0day kernel build service reported the following build warning:

  arch/x86/kernel/apic/io_apic.c:2211
  smp_irq_move_cleanup_interrupt() warn: always true condition '(irq <= -1) => (0-u32max <= (-1))'

because irq is defined as an unsigned int instead of an int.
Fix this trivial error by redefining irq as a signed int.  The
remaining consumers of the int are okay.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/1389620420-7110-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:08:37 +01:00
Peter Zijlstra
57c67da274 sched/clock, x86: Move some cyc2ns() code around
There are no __cycles_2_ns() users outside of arch/x86/kernel/tsc.c,
so move it there.

There are no cycles_2_ns() users.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-01lslnavfgo3kmbo4532zlcj@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 13:47:39 +01:00
Rafael J. Wysocki
3e7cc142c1 Merge branch 'acpica'
* acpica: (21 commits)
  ACPICA: Update version to 20131218.
  ACPICA: Utilities: Cleanup declarations of the acpi_gbl_debug_file global.
  ACPICA: Linuxize: Cleanup spaces after special macro invocations.
  ACPICA: Interpreter: Add additional debug info for an error case.
  ACPICA: Update ACPI example code to make it an actual working program.
  ACPICA: Add an error message if the Debugger fails initialization.
  ACPICA: Conditionally define a local variable that is used for debug only.
  ACPICA: Parser: Updates/fixes for debug output.
  ACPICA: Enhance ACPI warning for memory/IO address conflicts.
  ACPICA: Update several debug statements - no functional change.
  ACPICA: Improve exception handling for GPE block installation.
  ACPICA: Add helper macros to extract bus/segment numbers from HEST table.
  ACPICA: Tables: Add full support for the PCCT table, update table definition.
  ACPICA: Tables: Add full support for the DBG2 table.
  ACPICA: Add option to favor 32-bit FADT addresses.
  ACPICA: Cleanup the option of forcing the use of the RSDT.
  ACPICA: Back port and refine validation of the XSDT root table.
  ACPICA: Linux Header: Remove unused OSL prototypes.
  ACPICA: Remove unused ACPI_FREE_BUFFER macro. No functional change.
  ACPICA: Disassembler: Improve pathname support for emitted External() statements.
  ...
2014-01-12 23:45:43 +01:00
Rafael J. Wysocki
98feb7cc61 Merge branch 'acpi-cleanup'
* acpi-cleanup: (22 commits)
  ACPI / tables: Return proper error codes from acpi_table_parse() and fix comment.
  ACPI / tables: Check if id is NULL in acpi_table_parse()
  ACPI / proc: Include appropriate header file in proc.c
  ACPI / EC: Remove unused functions and add prototype declaration in internal.h
  ACPI / dock: Include appropriate header file in dock.c
  ACPI / PCI: Include appropriate header file in pci_link.c
  ACPI / PCI: Include appropriate header file in pci_slot.c
  ACPI / EC: Mark the function acpi_ec_add_debugfs() as static in ec_sys.c
  ACPI / NVS: Include appropriate header file in nvs.c
  ACPI / OSL: Mark the function acpi_table_checksum() as static
  ACPI / processor: initialize a variable to silence compiler warning
  ACPI / processor: use ACPI_COMPANION() to get ACPI device
  ACPI: correct minor typos
  ACPI / sleep: Drop redundant acpi_disabled check
  ACPI / dock: Drop redundant acpi_disabled check
  ACPI / table: Replace '1' with specific error return values
  ACPI: remove trailing whitespace
  ACPI / IBFT: Fix incorrect <acpi/acpi.h> inclusion in iSCSI boot firmware module
  ACPI / i915: Fix incorrect <acpi/acpi.h> inclusions via <linux/acpi_io.h>
  SFI / ACPI: Fix warnings reported during builds with W=1
  ...

Conflicts:
	drivers/acpi/nvs.c
	drivers/hwmon/asus_atk0110.c
2014-01-12 23:44:09 +01:00
Ingo Molnar
b769e014f3 SCI reporting for other error types not only correctable ones
+ APEI GHES cleanups
 + mce timer fix
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJS0qbXAAoJEBLB8Bhh3lVK0G4P/1fjGcBzCiC4qp0vUzhdvu1d
 CXVTtF20fdGP+hgwVU/1DLuIeUhyBLFM57rPgz0FxfvPo/bTYq92Lw9sElQpiVad
 trIRpw3bemjDY/8E91vR94SqLKTddoJifldK5ZTUpRJN9up06MwLky4IurdL2ixE
 QLaI20ZQIDjxe+pmh4wHtyV5qPdtkiXY2ICcdTXwAW7RdiG7pXXd5gLljL4oFUMF
 bp0QzM074szkDvhBgZg6KTAy1zfNzZVM9hUBOAesi1tfXqkT7pCEbUkYKZ6QFNVV
 haQNt+RtkVvYcFcZK+9TkRM32KHSy1MVuicv2W6eNvRDtauGFtSEwBBuvyq3Imjb
 PTY6Vd5g/hR/+o968ieYUYdFua4xaza2wqC3mwL6WuNoWGzzb/f4dk9+wU/x0Khu
 th1NMSwy7VfNqi8pbTTy13LG1yjqqorrMLAEAfsNAMjZPIPsBREFYu8J5kko4ird
 1aEsaENUdkzgoDEUxwO7vQEfAL3RBOC8kr4SrBGd5jQtbN4SYRyP0bYVIsQL3Psz
 fo0H+9sVLwkSxcx7MM0bboBcv0NYazvTS5aIiivSdwMq0uxDsRWRmUCr70AlJDp9
 h7EfyovynCpfOcAOIz212A9bRzBcfW46KnQm6xkcgVdKKhEbc5EWhx6Nz3kCMNQZ
 Or1B6ZtUe7Acdg2kRTRL
 =+eFf
 -----END PGP SIGNATURE-----

Merge tag 'ras_for_3.14_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull RAS updates from Borislav Petkov:

 " SCI reporting for other error types not only correctable ones
   + APEI GHES cleanups
   + mce timer fix
 "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 17:56:54 +01:00
Ingo Molnar
da4540757d Linux 3.13-rc8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJS0miqAAoJEHm+PkMAQRiGbfgIAJSWEfo8ludknhPcHJabBtxu
 75SQAKJlL3sBVnxEc58Rtt8gsKYQIrm4IY5Slunklsn04RxuDUIQMgFoAYR5gQwz
 +Myqkw/HOqDe5VStGxtLYpWnfglxVwGDCd7ISfL9AOVy5adMWBxh4Tv+qqQc7aIZ
 eF7dy+DD+C6Q3Z5OoV8s0FZDxse29vOf17Nki7+7t8WMqyegYwjoOqNeqocGKsPi
 eHLrJgTl4T6jB4l9LKKC154DSKjKOTSwZMWgwK8mToyNLT/ufCiKgXloIjEvZZcY
 VVKUtncdHiTf+iqVojgpGBzOEeB5DM83iiapFeDiJg8C9yBzvT8lBtA9aPb5Wgw=
 =lEeV
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc8' into x86/ras, to pick up fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 17:56:29 +01:00
Borislav Petkov
4f75d84127 x86, mce: Fix mce_start_timer semantics
So mce_start_timer() has a 'cpu' argument which is supposed to mean to
start a timer on that cpu. However, the code currently starts a timer on
the *current* cpu the function runs on and causes the sanity-check in
mce_timer_fn to fire:

	WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn

because it is running on the wrong cpu.

This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining
all the cpus in succession.

Then, we were fiddling with the CMCI storm settings when starting the
timer whereas there's no need for that - if there's storm happening
on this newly restarted cpu, we're going to be in normal CMCI mode
initially and then when the CMCI interrupt starts firing, we're going to
go to the polling mode with the timer real soon.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
2014-01-12 15:22:25 +01:00
Prarit Bhargava
9345005f4e x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs
During heavy CPU-hotplug operations the following spurious kernel warnings
can trigger:

  do_IRQ: No ... irq handler for vector (irq -1)

  [ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ]

When downing a cpu it is possible that there are unhandled irqs
left in the APIC IRR register.  The following code path shows
how the problem can occur:

 1. CPU 5 is to go down.

 2. cpu_disable() on CPU 5 executes with interrupt flag cleared
    by local_irq_save() via stop_machine().

 3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because
    interrupt flag is cleared (CPU unabled to handle the irq)

 4. IRQs are migrated off of CPU 5, and the vectors' irqs are set
    to -1. 5. stop_machine() finishes cpu_disable()

 6. cpu_die() for CPU 5 executes in normal context.

 7. CPU 5 attempts to handle IRQ 12 because the IRR is set for
    IRQ 12.  The code attempts to find the vector's IRQ and cannot
    because it has been set to -1. 8. do_IRQ() warning displays
    warning about CPU 5 IRQ 12.

I added a debug printk to output which CPU & vector was
retriggered and discovered that that we are getting bogus
events.  I see a 100% correlation between this debug printk in
fixup_irqs() and the do_IRQ() warning.

This patchset resolves this by adding definitions for
VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying
the code to use them.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Rui Wang <rui.y.wang@intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: janet.morgan@Intel.com
Cc: tony.luck@Intel.com
Cc: ruiv.wang@gmail.com
Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com
[ Cleaned up the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 13:13:02 +01:00
Stephane Eranian
f228c5b882 perf/x86/intel: Add Intel RAPL PP1 energy counter support
This patch adds support for the Intel RAPL energy counter
PP1 (Power Plane 1).

On client processors, it usually corresponds to the
energy consumption of the builtin graphic card. That
is why the sysfs event is called energy-gpu.

New event:
 - name: power/energy-gpu/
 - code: event=0x4
 - unit: 2^-32 Joules

On processors without graphics, this should count 0.
The patch only enables this event on client processors.

Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: zheng.z.yan@intel.com
Cc: bp@alien8.de
Cc: vincent.weaver@maine.edu
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389176153-3128-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 10:16:08 +01:00
Steven Rostedt
1739f09e33 ftrace/x86: Load ftrace_ops in parameter not the variable holding it
Function tracing callbacks expect to have the ftrace_ops that registered it
passed to them, not the address of the variable that holds the ftrace_ops
that registered it.

Use a mov instead of a lea to store the ftrace_ops into the parameter
of the function tracing callback.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20131113152004.459787f9@gandalf.local.home
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.8+
2014-01-09 13:24:29 -08:00
David E. Box
4618441536 arch: x86: New MailBox support driver for Intel SOC's
Current Intel SOC cores use a MailBox Interface (MBI) to provide access to
configuration registers on devices (called units) connected to the system
fabric. This is a support driver that implements access to this interface on
those platforms that can enumerate the device using PCI. Initial support is for
BayTrail, for which port definitons are provided. This is a requirement for
implementing platform specific features (e.g. RAPL driver requires this to
perform platform specific power management using the registers in PUNIT).
Dependant modules should select IOSF_MBI in their respective Kconfig
configuraiton. Serialized access is handled by all exported routines with
spinlocks.

The API includes 3 functions for access to unit registers:

int iosf_mbi_read(u8 port, u8 opcode, u32 offset, u32 *mdr)
int iosf_mbi_write(u8 port, u8 opcode, u32 offset, u32 mdr)
int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 mdr, u32 mask)

port:	indicating the unit being accessed
opcode:	the read or write port specific opcode
offset:	the register offset within the port
mdr:	the register data to be read, written, or modified
mask:	bit locations in mdr to change

Returns nonzero on error

Note: GPU code handles access to the GFX unit. Therefore access to that unit
with this driver is disallowed to avoid conflicts.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1389216471-734-1-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
2014-01-08 14:36:29 -08:00
Lv Zheng
fab4610583 ACPICA: Cleanup the option of forcing the use of the RSDT.
This change adds a runtime option that will force ACPICA to use the
RSDT instead of the XSDT. Although the ACPI spec requires that an XSDT
be used instead of the RSDT, the XSDT has been found to be corrupt or
ill-formed on some machines.

This option is already in the Linux kernel.  When it is back ported to
ACPICA, code is re-written to follow ACPICA coding style.  This patch
is the generation of the integration.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-08 15:31:36 +01:00
Paul Gortmaker
663b55b9b3 x86: Delete non-required instances of include <linux/init.h>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.  Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

[ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ]

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-06 21:25:18 -08:00
Ingo Molnar
ef0b8b9a52 Linux 3.13-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJSyJVbAAoJEHm+PkMAQRiGa28H/0m7GpZSpT8mvBthITxzqWCq
 JRkSPS4KTurAWlA5CJMJePyCM30DgN90s06bYUen9sTecZUwnL+qSV5OqAmg2r+0
 PrfwtXtGZR6/Y12XlZ/3oFxVfUxjmgJyDAS76TIH1IvIum52nvJmLrR+6AyVphIX
 DkgBOuapdA7lia+U+ZM1cRkeHxUOKTUEw9v611VgoN3LYZyzyRb6d0rB7JtZN1RV
 dnXRi27enaPhwxelsCnORioRjsByMwD40CERxfLHmr5CGhmvCehBjO6bJ+KAdp14
 52bfwWcNdbFMzUobcR7qlfS3Hy3AYJci+P6JzeeZ+kWEdv/eh5/1lvNuXtBJRlc=
 =iwzJ
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc7' into x86/efi-kexec to resolve conflicts

Conflicts:
	arch/x86/platform/efi/efi.c
	drivers/firmware/efi/Kconfig

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-05 12:34:29 +01:00
Kirill A. Shutemov
dd360393f4 x86, cpu: Detect more TLB configuration
The Intel Software Developer’s Manual covers few more TLB
configurations exposed as CPUID 2 descriptors:

61H Instruction TLB: 4 KByte pages, fully associative, 48 entries
63H Data TLB: 1 GByte pages, 4-way set associative, 4 entries
76H Instruction TLB: 2M/4M pages, fully associative, 8 entries
B5H Instruction TLB: 4KByte pages, 8-way set associative, 64 entries
B6H Instruction TLB: 4KByte pages, 8-way set associative, 128 entries
C1H Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries
C2H DTLB DTLB: 2 MByte/$MByte pages, 4-way associative, 16 entries

Let's detect them as well.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/1387801018-14499-1-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-03 14:35:42 -08:00
Dave Young
41a34cec2e x86: ksysfs.c build fix
kbuild test robot report below error for randconfig:

  arch/x86/kernel/ksysfs.c: In function 'get_setup_data_paddr':
  arch/x86/kernel/ksysfs.c:81:3: error: implicit declaration of function 'ioremap_cache' [-Werror=implicit-function-declaration]
  arch/x86/kernel/ksysfs.c:86:3: error: implicit declaration of function 'iounmap' [-Werror=implicit-function-declaration]

Fix it by including <asm/io.h> in ksysfs.c

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-01-03 14:37:13 +00:00
Linus Torvalds
8cf126d927 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "There is a small EFI fix and a big power regression fix in this batch.

  My queue also had a fix for downing a CPU when there are insufficient
  number of IRQ vectors available, but I'm holding that one for now due
  to recent bug reports"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Don't select EFI from certain special ACPI drivers
  x86 idle: Repair large-server 50-watt idle-power regression
2013-12-29 13:35:04 -08:00
Dave Young
77ea8c9489 x86: Reserve setup_data ranges late after parsing memmap cmdline
Currently e820_reserve_setup_data() is called before parsing early
params, it works in normal case. But for memmap=exactmap, the final
memory ranges are created after parsing memmap= cmdline params, so the
previous e820_reserve_setup_data() has no effect. For example,
setup_data ranges will still be marked as normal system ram, thus when
later sysfs driver ioremap them kernel will warn about mapping normal
ram.

This patch fix it by moving the e820_reserve_setup_data() callback after
parsing early params so they can be set as reserved ranges and later
ioremap will be fine with it.

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-29 13:09:07 +00:00
Dave Young
5039e316dd x86: Export x86 boot_params to sysfs
kexec-tools use boot_params for getting the 1st kernel hardware_subarch,
the kexec kernel EFI runtime support also needs to read the old efi_info
from boot_params. Currently it exists in debugfs which is not a good
place for such infomation. Per HPA, we should avoid "sploit debugfs".

In this patch /sys/kernel/boot_params are exported, also the setup_data is
exported as a subdirectory. kexec-tools is using debugfs for hardware_subarch
for a long time now so we're not removing it yet.

Structure is like below:

/sys/kernel/boot_params
|__ data                /* boot_params in binary*/
|__ setup_data
|   |__ 0               /* the first setup_data node */
|   |   |__ data        /* setup_data node 0 in binary*/
|   |   |__ type        /* setup_data type of setup_data node 0, hex string */
[snip]
|__ version             /* boot protocal version (in hex, "0x" prefixed)*/

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-29 13:09:07 +00:00
Dave Young
1fec053369 x86/efi: Pass necessary EFI data for kexec via setup_data
Add a new setup_data type SETUP_EFI for kexec use.  Passing the saved
fw_vendor, runtime, config tables and EFI runtime mappings.

When entering virtual mode, directly mapping the EFI runtime regions
which we passed in previously. And skip the step to call
SetVirtualAddressMap().

Specially for HP z420 workstation we need save the smbios physical
address.  The kernel boot sequence proceeds in the following order.
Step 2 requires efi.smbios to be the physical address.  However, I found
that on HP z420 EFI system table has a virtual address of SMBIOS in step
1.  Hence, we need set it back to the physical address with the smbios
in efi_setup_data.  (When it is still the physical address, it simply
sets the same value.)

1. efi_init() - Set efi.smbios from EFI system table
2. dmi_scan_machine() - Temporary map efi.smbios to access SMBIOS table
3. efi_enter_virtual_mode() - Map EFI ranges

Tested on ovmf+qemu, lenovo thinkpad, a dell laptop and an
HP z420 workstation.

Signed-off-by: Dave Young <dyoung@redhat.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-29 13:09:05 +00:00
Greg Kroah-Hartman
5bd2010fbe Merge 3.13-rc5 into staging-next
We want these fixes here to handle some merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-24 09:43:21 -08:00
Chen, Gong
addccbb264 ACPI, APEI, GHES: Do not report only correctable errors with SCI
Currently SCI is employed to handle corrected errors - memory corrected
errors, more specifically but in fact SCI still can be used to handle
any errors, e.g. uncorrected or even fatal ones if enabled by the BIOS.
Enable logging for those kinds of errors too.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message, rename function arg. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-21 13:31:06 +01:00
H. Peter Anvin
7d590cca7c x86, idle: Add memory barriers around clflush in mwait_play_dead()
For consistency with mwait_idle_with_hints().  Not sure they help, but
they really won't hurt...

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFzGxcML7j8CEvQPYzh0W81uVoAAVmGctMOUZ7CZ1yYd2A@mail.gmail.com
2013-12-19 12:30:03 -08:00
Peter Zijlstra
1682425539 x86, acpi, idle: Restructure the mwait idle routines
People seem to delight in writing wrong and broken mwait idle routines;
collapse the lot.

This leaves mwait_play_dead() the sole remaining user of __mwait() and
new __mwait() users are probably doing it wrong.

Also remove __sti_mwait() as its unused.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jacob Jun Pan <jacob.jun.pan@linux.intel.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Len Brown <lenb@kernel.org>
Cc: Rui Zhang <rui.zhang@intel.com>
Acked-by: Rafael Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131212141654.616820819@infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-19 11:54:44 -08:00
Len Brown
40e2d7f9b5 x86 idle: Repair large-server 50-watt idle-power regression
Linux 3.10 changed the timing of how thread_info->flags is touched:

	x86: Use generic idle loop
	(7d1a941731)

This caused Intel NHM-EX and WSM-EX servers to experience a large number
of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote
from deep C-states to shallow C-states, which caused these platforms
to experience a significant increase in idle power.

Note that this issue was already present before the commit above,
however, it wasn't seen often enough to be noticed in power measurements.

Here we extend an errata workaround from the Core2 EX "Dunnington"
to extend to NHM-EX and WSM-EX, to prevent these immediate
returns from MWAIT, reducing idle power on these platforms.

While only acpi_idle ran on Dunnington, intel_idle
may also run on these two newer systems.
As of today, there are no other models that are known
to need this tweak.

Link: http://lkml.kernel.org/r/CAJvTdK=%2BaNN66mYpCGgbHGCHhYQAKx-vB0kJSWjVpsNb_hOAtQ@mail.gmail.com
Signed-off-by: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com
Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-19 11:47:39 -08:00
Linus Torvalds
1070d5ac19 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
 "An x86/intel event constraint fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix constraint table end marker bug
2013-12-17 12:35:05 -08:00
Stephane Eranian
7fd565e275 perf/x86: enable Haswell Celeron RAPL support
Enable RAPL support for Haswell Celeron (model 69).

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: zheng.z.yan@intel.com
Cc: bp@alien8.de
Cc: vincent.weaver@maine.edu
Cc: maria.n.dimakopoulou@gmail.com
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1387225224-27799-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-17 15:21:32 +01:00
Ingo Molnar
fe361cfcf4 Linux 3.13-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSrhGrAAoJEHm+PkMAQRiGsNoH/jIK3CsQ2lbW7yRLXmfgtbzz
 i2Kep6D4SDvmaLpLYOVC8xNYTiE8jtTbSXHomwP5wMZ63MQDhBfnEWsEWqeZ9+D9
 3Q46p0QWuoBgYu2VGkoxTfygkT6hhSpwWIi3SeImbY4fg57OHiUil/+YGhORM4Qc
 K4549OCTY3sIrgmWL77gzqjRUo+pQ4C73NKqZ3+5nlOmYBZC1yugk8mFwEpQkwhK
 4NRNU760Fo+XIht/bINqRiPMddzC15p0mxvJy3cDW8bZa1tFSS9SB7AQUULBbcHL
 +2dFlFOEb5SV1sNiNPrJ0W+h2qUh2e7kPB0F8epaBppgbwVdyQoC2u4uuLV2ZN0=
 =lI2r
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc4' into perf/core

Merge Linux 3.13-rc4, to refresh this branch with the latest fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16 14:51:32 +01:00
Ingo Molnar
014952270e * Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which, if
 done properly in the BIOS, makes a corresponding EDAC module redundant,
 from Gong Chen.
 
 * PCIe AER tracepoint severity levels fix, from Rui Wang.
 
 * Error path correction for the mce device init, from Levente Kurusa.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSrCysAAoJEBLB8Bhh3lVK1ikP/0hKY1Kk4tjbSta9A9Z8LdQG
 9F5JzEny47DpTrLaKij7MqAlbYFO8sSm7Zw0CEztTF7Ou/H37GAuxhMlB8ECMGOm
 Dzu53X1rySTna9mB+1gyXXd+pJypp/oe18/o16rw1QKjI9o2Kfgwfj7lKvytR549
 kDM1dhxEImQIS5cpJPkOPbcpVlSqYN7BnK9/Qx3h0W70httT/8qrr9xVtVL7wjOT
 auTA0R5/TkV06FtxyfHUNULEWTSP+2yNP/iJbusR6f4Jk1j0XmyCFr0BYOkPA1UO
 9+wC9+2R+r7rJw8MBfMzNmPrRzDJHdaiHPwYqse05yewRHfRHe5cgZWJYbL8Qv0u
 2WOX+fY12EfDYlihcOYtlupRzhGfGKRsaRpSuG1zX87ctDxAfNZencv4hnaJvfqG
 Xk6ggIX6tHKEivO2gmaPsmhoKveh0zcozUs+wgh/tvV5QB6ioFCjzHfSEsix5+BH
 ryyg1ri7IZnh92g3UuSUpE0OCbAquMfI7XIJo+kFs0u79dZTL/kD3wVu6oYazwdy
 yTrvIq7Bq5cMWnnni5w7dIU09ef2uvDgyHyAS6+RiqaQxhYFsW8/yx2zJrIloWRs
 7txz6t3CVmWFiejIg2gw6KyjaG6pXRBkDkI1XU6T+bKLb31ojx2+i9UKIIUeRZTB
 iisWAOI6ZSdt4eAkgeaI
 =r//I
 -----END PGP SIGNATURE-----

Merge tag 'ras_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras

Pull RAS updates from Borislav Petkov:

  * Add the functionality to override error reporting agents as some
  machines are sporting a new extended error logging capability which, if
  done properly in the BIOS, makes a corresponding EDAC module redundant,
  from Gong Chen.

  * PCIe AER tracepoint severity levels fix, from Rui Wang.

  * Error path correction for the mce device init, from Levente Kurusa.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16 14:33:17 +01:00
Bjorn Helgaas
1d72e71d45 Merge branch 'pci/yijing-dev_is_pci' into next
* pci/yijing-dev_is_pci:
  alpha/PCI: Use dev_is_pci() to identify PCI devices
  arm/PCI: Use dev_is_pci() to identify PCI devices
  arm/PCI: Use dev_is_pci() to identify PCI devices
  parisc/PCI: Use dev_is_pci() to identify PCI devices
  sparc/PCI: Use dev_is_pci() to identify PCI devices
  ia64/PCI: Use dev_is_pci() to identify PCI devices
  x86/PCI: Use dev_is_pci() to identify PCI devices
  PCI: Use dev_is_pci() to identify PCI devices
2013-12-13 11:01:27 -07:00
DuanZhenzhong
ac8344c4c0 PCI: Drop "irq" param from *_restore_msi_irqs()
Change x86_msi.restore_msi_irqs(struct pci_dev *dev, int irq) to
x86_msi.restore_msi_irqs(struct pci_dev *dev).

restore_msi_irqs() restores multiple MSI-X IRQs, so param 'int irq' is
unneeded.  This makes code more consistent between vm and bare metal.

Dom0 MSI-X restore code can also be optimized as XEN only has a hypercall
to restore all MSI-X vectors at one time.

Tested-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-12-13 08:44:30 -07:00
Ingo Molnar
d8af4ce490 x86/traps: Clean up error exception handler definitions
So I was reading the exception handler generation code and got a real
headache looking at the unstructured mess that our DO_ERROR*()
generation code is today.

Make it more readable.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-kuabysiykvUJpgus35lhnhvs@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-12 14:46:42 +01:00
Tejun Heo
13ccb93f41 Merge branch 'driver-core-linus' into driver-core-next
a8b1474442 ("sysfs: give different locking key to regular and bin
files") in driver-core-linus modifies sysfs_open_file() so that it
gives out different locking classes to sysfs_open_files depending on
whether the file is bin or not.  Due to the massive kernfs
reorganization in driver-core-next, this naturally causes merge
conflict in fs/sysfs/file.c.

Due to the way things are split between kernfs and sysfs in
driver-core-next, the same fix can't easily be applied to
driver-core-next.  This merge simply ignores the offending commit.  A
following patch will implement a separate fix for the issue.

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-12-10 08:44:37 -05:00
Yijing Wang
894d334378 x86/PCI: Use dev_is_pci() to identify PCI devices
Use dev_is_pci() instead of checking bus type directly.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-12-09 16:50:12 -07:00
Takashi Iwai
75da02b29f microcode: Use request_firmware_direct()
Use the new helper, request_firmware_direct(), for avoiding the
lengthy timeout of non-existing firmware loads.  Especially the Intel
microcode driver suffers from this problem because each CPU triggers
the f/w loading, thus it ends up taking (literally) hours with many
cores.

Tested-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08 18:22:32 -08:00
Qiaowei Ren
e7d820a5e5 x86, xsave: Support eager-only xsave features, add MPX support
Some features, like Intel MPX, work only if the kernel uses eagerfpu
model.  So we should force eagerfpu on unless the user has explicitly
disabled it.

Add definitions for Intel MPX and add it to the supported list.

[ hpa: renamed XSTATE_FLEXIBLE to XSTATE_LAZY and added comments ]

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Link: http://lkml.kernel.org/r/9E0BE1322F2F2246BD820DA9FC397ADE014A6115@SHSMSX102.ccr.corp.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-06 17:17:42 -08:00
Lv Zheng
8b48463f89 ACPI: Clean up inclusions of ACPI header files
Replace direct inclusions of <acpi/acpi.h>, <acpi/acpi_bus.h> and
<acpi/acpi_drivers.h>, which are incorrect, with <linux/acpi.h>
inclusions and remove some inclusions of those files that aren't
necessary.

First of all, <acpi/acpi.h>, <acpi/acpi_bus.h> and <acpi/acpi_drivers.h>
should not be included directly from any files that are built for
CONFIG_ACPI unset, because that generally leads to build warnings about
undefined symbols in !CONFIG_ACPI builds.  For CONFIG_ACPI set,
<linux/acpi.h> includes those files and for CONFIG_ACPI unset it
provides stub ACPI symbols to be used in that case.

Second, there are ordering dependencies between those files that always
have to be met.  Namely, it is required that <acpi/acpi_bus.h> be included
prior to <acpi/acpi_drivers.h> so that the acpi_pci_root declarations the
latter depends on are always there.  And <acpi/acpi.h> which provides
basic ACPICA type declarations should always be included prior to any other
ACPI headers in CONFIG_ACPI builds.  That also is taken care of including
<linux/acpi.h> as appropriate.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (drivers/pci stuff)
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> (Xen stuff)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-12-07 01:03:14 +01:00
Maria Dimakopoulou
cf30d52e2d perf/x86: Fix constraint table end marker bug
The EVENT_CONSTRAINT_END() macro defines the end marker as
a constraint with a weight of zero. This was all fine
until we blacklisted the corrupting memory events on
Intel IvyBridge. These events are blacklisted by using
a counter bitmask of zero. Thus, they also get a constraint
weight of zero.

The iteration macro: for_each_constraint tests the weight==0.
Therefore, it was stopping at the first blacklisted event, i.e.,
0xd0. The corrupting events were therefore considered as
unconstrained and were scheduled on any of the generic counters.

This patch fixes the end marker to have a weight of -1. With
this, the blacklisted events get an empty constraint and cannot
be scheduled which is what we want for now.

Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20131204232437.GA10689@starlight
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-05 10:02:30 +01:00
Fenghua Yu
2885432aaf x86/apic, doc: Justification for disabling IO APIC before Local APIC
Since erratum AVR31 in "Intel Atom Processor C2000 Product Family
Specification Update" is now published, I added a justification
comment for disabling IO APIC before Local APIC, as changed in commit:

522e664644 x86/apic: Disable I/O APIC before shutdown of the local APIC

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1386202069-51515-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-04 19:33:21 -08:00
Levente Kurusa
853d9b18f1 x86, mce: Call put_device on device_register failure
This patch adds a call to put_device() when the device_register() call
has failed. This is required so that the last reference to the device is
given up.

Signed-off-by: Levente Kurusa <levex@linux.com>
Link: http://lkml.kernel.org/r/5298F900.9000208@linux.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-30 12:14:51 +01:00
Stephane Eranian
65661f96d3 perf/x86: Add RAPL hrtimer support
The RAPL PMU counters do not interrupt on overflow.
Therefore, the kernel needs to poll the counters
to avoid missing an overflow. This patch adds
the hrtimer code to do this.

The timer interval is calculated at boot time
based on the power unit used by the HW.

There is one hrtimer per-cpu to handle the case
of multiple simultaneous use across cores on
the same package + hotplug CPU.

Thanks to Maria Dimakopoulou for her contributions
to this patch especially on the math aspects.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
[ Applied 32-bit build fix. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: zheng.z.yan@intel.com
Cc: bp@alien8.de
Cc: maria.n.dimakopoulou@gmail.com
Link: http://lkml.kernel.org/r/1384275531-10892-5-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27 15:31:23 +01:00
Stephane Eranian
4788e5b4b2 perf/x86: Add Intel RAPL PMU support
This patch adds a new uncore PMU to expose the Intel
RAPL energy consumption counters. Up to 3 counters,
each counting a particular RAPL event are exposed.

The RAPL counters are available on Intel SandyBridge,
IvyBridge, Haswell. The server skus add a 3rd counter.

The following events are available and exposed in sysfs:

  - power/energy-cores: power consumption of all cores on socket
  - power/energy-pkg: power consumption of all cores + LLc cache
  - power/energy-dram: power consumption of DRAM (servers only)

For each event both the unit (Joules) and scale (2^-32 J)
is exposed in sysfs for use by perf stat and other tools.
The files are:

	/sys/devices/power/events/energy-*.unit
	/sys/devices/power/events/energy-*.scale

The RAPL PMU is uncore by nature and is implemented such
that it only works in system-wide mode. Measuring only
one CPU per socket is sufficient. The /sys/devices/power/cpumask
file can be used by tools to figure out which CPUs to monitor
by default. For instance, on a 2-socket system, 2 CPUs
(one on each socket) will be shown.

All the counters measure in the same unit (exposed via sysfs).
The perf_events API exposes all RAPL counters as 64-bit integers
counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools
must convert the counts by multiplying them by 2^-32 to obtain
Joules. The reason for this is that the kernel avoids
doing floating point math whenever possible because it is
expensive (user floating-point state must be saved). The method
used avoids kernel floating-point usage. There is no loss of
precision. Thanks to PeterZ for suggesting this approach.

To convert the raw count in Watt:
   W = C * 2.3 / (1e10 * time)
or ldexp(C, -32).

RAPL PMU is a new standalone PMU which registers with the
perf_event core subsystem. The PMU type (attr->type) is
dynamically allocated and is available from /sys/device/power/type.

Sampling is not supported by the RAPL PMU. There is no
privilege level filtering either.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: zheng.z.yan@intel.com
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/1384275531-10892-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27 11:16:40 +01:00
Dave Airlie
cf96967794 Merge tag 'drm-intel-fixes-2013-11-20' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Just a small pile of fixes for bugs and a few regressions. I'm still
trying to track down a driver load hang on my g33 (which infuriatingly
doesn't happen when loading the module manually after boot), somehow
bisecting loves to go astray on this one :( And there's a (harmless)
locking WARN in the suspend code due to one of Jesse's vlv backlight
rework patches. Otherwise nothing outstanding afaik.

* tag 'drm-intel-fixes-2013-11-20' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Fix gen3 self-refresh watermarks
  drm/i915: Replicate BIOS eDP bpp clamping hack for hsw
  drm/i915: Do not enable package C8 on unsupported hardware
  drm/i915: Hold pc8 lock around toggling pc8.gpu_idle
  drm/i915: encoder->get_config is no longer optional
  drm/i915/tv: add ->get_config callback
  drm/i915: restore the early forcewake cleanup
  Partially revert "drm/i915: tune the RC6 threshold for stability"
  drm/i915: flush cursors harder
  i915: Use 120MHz LVDS SSC clock for gen5/gen6/gen7
  x86/early quirk: use gen6 stolen detection for VLV
  drm/i915/dp: set sink to power down mode on dp disable
2013-11-21 18:45:51 +10:00
Linus Torvalds
9066d9b250 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "A modular build fix for certain .config's"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Export 'boot_cpu_physical_apicid' to modules
2013-11-19 10:48:19 -08:00
Linus Torvalds
b29c8306a3 This batch of changes is mostly clean ups and small bug fixes.
The only real feature that was added this release is from Namhyung Kim,
 who introduced "set_graph_notrace" filter that lets you run the function
 graph tracer and not trace particular functions and their call chain.
 
 Tom Zanussi added some updates to the ftrace multibuffer tracing that
 made it more consistent with the top level tracing.
 
 One of the fixes for perf function tracing required an API change in
 RCU; the addition of "rcu_is_watching()". As Paul McKenney is pushing
 that change in this release too, he gave me a branch that included
 all the changes to get that working, and I pulled that into my tree
 in order to complete the perf function tracing fix.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSgX5SAAoJEKQekfcNnQGulUAH/jORqJrKaNAulmZ314VsAqfa
 zMtF5UAAPf7kqc3AN/jtFrhJUNEfxWOo7A4r0FsM/rKdWJF+98GA6aqYVD+XoWFt
 +36fg1enxbXUjixQ96Uh+o1+BJUgYDqljuWzqSu/oiXWfWwl8+WL4kcbhb+V9WcF
 SpdzLCWVZRfhyDiN3+0zvyQ8RSG2Pd7CWn9zroI0e4sxGo0Ki6JUnIcXtZGOBDOQ
 IIZdjXvGSfpJ+3u3XvRPXJcltRCtOsVWxYzrmvRlmHDW5QMe1+WmmrlojTePrLaJ
 xn8+3WINqetAR+ZQnazbpt1XzJzKa8QtFgpiN0kT6qL7cg3N1Owc4vLGohl7wok=
 =Nesf
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing update from Steven Rostedt:
 "This batch of changes is mostly clean ups and small bug fixes.  The
  only real feature that was added this release is from Namhyung Kim,
  who introduced "set_graph_notrace" filter that lets you run the
  function graph tracer and not trace particular functions and their
  call chain.

  Tom Zanussi added some updates to the ftrace multibuffer tracing that
  made it more consistent with the top level tracing.

  One of the fixes for perf function tracing required an API change in
  RCU; the addition of "rcu_is_watching()".  As Paul McKenney is pushing
  that change in this release too, he gave me a branch that included all
  the changes to get that working, and I pulled that into my tree in
  order to complete the perf function tracing fix"

* tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Add rcu annotation for syscall trace descriptors
  tracing: Do not use signed enums with unsigned long long in fgragh output
  tracing: Remove unused function ftrace_off_permanent()
  tracing: Do not assign filp->private_data to freed memory
  tracing: Add helper function tracing_is_disabled()
  tracing: Open tracer when ftrace_dump_on_oops is used
  tracing: Add support for SOFT_DISABLE to syscall events
  tracing: Make register/unregister_ftrace_command __init
  tracing: Update event filters for multibuffer
  recordmcount.pl: Add support for __fentry__
  ftrace: Have control op function callback only trace when RCU is watching
  rcu: Do not trace rcu_is_watching() functions
  ftrace/x86: skip over the breakpoint for ftrace caller
  trace/trace_stat: use rbtree postorder iteration helper instead of opencoding
  ftrace: Add set_graph_notrace filter
  ftrace: Narrow down the protected area of graph_lock
  ftrace: Introduce struct ftrace_graph_data
  ftrace: Get rid of ftrace_graph_filter_enabled
  tracing: Fix potential out-of-bounds in trace_get_user()
  tracing: Show more exact help information about snapshot
2013-11-16 12:23:18 -08:00
Linus Torvalds
9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
David Rientjes
cc08e04c3f x86: Export 'boot_cpu_physical_apicid' to modules
Commit 9ebddac7ea "ACPI, x86: Fix extended error log driver to depend on
CONFIG_X86_LOCAL_APIC" fixed a build error when CONFIG_X86_LOCAL_APIC was not
selected and !CONFIG_SMP.

However, since CONFIG_ACPI_EXTLOG is tristate, there is a second build error:

  ERROR: "boot_cpu_physical_apicid" [drivers/acpi/acpi_extlog.ko] undefined!

The symbol needs to be exported for it to be available.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311141504080.30112@chino.kir.corp.google.com
[ Changed it to a _GPL() export. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-15 08:38:30 +01:00
Linus Torvalds
049ffa8ab3 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "This is a combo of -next and some -fixes that came in in the
  intervening time.

  Highlights:

  New drivers:
    ARM Armada driver for Marvell Armada 510 SOCs

  Intel:
    Broadwell initial support under a default off switch,
    Stereo/3D HDMI mode support
    Valleyview improvements
    Displayport improvements
    Haswell fixes
    initial mipi dsi panel support
    CRC support for debugging
    build with CONFIG_FB=n

  Radeon:
    enable DPM on a number of GPUs by default
    secondary GPU powerdown support
    enable HDMI audio by default
    Hawaii support

  Nouveau:
    dynamic pm code infrastructure reworked, does nothing major yet
    GK208 modesetting support
    MSI fixes, on by default again
    PMPEG improvements
    pageflipping fixes

  GMA500:
    minnowboard SDVO support

  VMware:
    misc fixes

  MSM:
    prime, plane and rendernodes support

  Tegra:
    rearchitected to put the drm driver into the drm subsystem.
    HDMI and gr2d support for tegra 114 SoC

  QXL:
    oops fix, and multi-head fixes

  DRM core:
    sysfs lifetime fixes
    client capability ioctl
    further cleanups to device midlayer
    more vblank timestamp fixes"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (789 commits)
  drm/nouveau: do not map evicted vram buffers in nouveau_bo_vma_add
  drm/nvc0-/gr: shift wrapping bug in nvc0_grctx_generate_r406800
  drm/nouveau/pwr: fix missing mutex unlock in a failure path
  drm/nv40/therm: fix slowing down fan when pstate undefined
  drm/nv11-: synchronise flips to vblank, unless async flip requested
  drm/nvc0-: remove nasty fifo swmthd hack for flip completion method
  drm/nv10-: we no longer need to create nvsw object on user channels
  drm/nouveau: always queue flips relative to kernel channel activity
  drm/nouveau: there is no need to reserve/fence the new fb when flipping
  drm/nouveau: when bailing out of a pushbuf ioctl, do not remove previous fence
  drm/nouveau: allow nouveau_fence_ref() to be a noop
  drm/nvc8/mc: msi rearm is via the nvc0 method
  drm/ttm: Fix vma page_prot bit manipulation
  drm/vmwgfx: Fix a couple of compile / sparse warnings and errors
  drm/vmwgfx: Resource evict fixes
  drm/edid: compare actual vrefresh for all modes for quirks
  drm: shmob_drm: Convert to clk_prepare/unprepare
  drm/nouveau: fix 32-bit build
  drm/i915/opregion: fix build error on CONFIG_ACPI=n
  Revert "drm/radeon/audio: don't set speaker allocation on DCE4+"
  ...
2013-11-15 14:19:54 +09:00
Linus Torvalds
f080480488 Here are the 3.13 KVM changes. There was a lot of work on the PPC
side: the HV and emulation flavors can now coexist in a single kernel
 is probably the most interesting change from a user point of view.
 On the x86 side there are nested virtualization improvements and a
 few bugfixes.  ARM got transparent huge page support, improved
 overcommit, and support for big endian guests.
 
 Finally, there is a new interface to connect KVM with VFIO.  This
 helps with devices that use NoSnoop PCI transactions, letting the
 driver in the guest execute WBINVD instructions.  This includes
 some nVidia cards on Windows, that fail to start without these
 patches and the corresponding userspace changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJShPAhAAoJEBvWZb6bTYbyl48P/297GgmELHAGBgjvb6q7yyGu
 L8+eHjKbh4XBAkPwyzbvUjuww5z2hM0N3JQ0BDV9oeXlO+zwwCEns/sg2Q5/NJXq
 XxnTeShaKnp9lqVBnE6G9rAOUWKoyLJ2wItlvUL8JlaO9xJ0Vmk0ta4n2Nv5GqDp
 db6UD7vju6rHtIAhNpvvAO51kAOwc01xxRixCVb7KUYOnmO9nvpixzoI/S0Rp1gu
 w/OWMfCosDzBoT+cOe79Yx1OKcpaVW94X6CH1s+ShCw3wcbCL2f13Ka8/E3FIcuq
 vkZaLBxio7vjUAHRjPObw0XBW4InXEbhI1DjzIvm8dmc4VsgmtLQkTCG8fj+jINc
 dlHQUq6Do+1F4zy6WMBUj8tNeP1Z9DsABp98rQwR8+BwHoQpGQBpAxW0TE0ZMngC
 t1caqyvjZ5pPpFUxSrAV+8Kg4AvobXPYOim0vqV7Qea07KhFcBXLCfF7BWdwq/Jc
 0CAOlsLL4mHGIQWZJuVGw0YGP7oATDCyewlBuDObx+szYCoV4fQGZVBEL0KwJx/1
 7lrLN7JWzRyw6xTgJ5VVwgYE1tUY4IFQcHu7/5N+dw8/xg9KWA3f4PeMavIKSf+R
 qteewbtmQsxUnvuQIBHLs8NRWPnBPy+F3Sc2ckeOLIe4pmfTte6shtTXcLDL+LqH
 NTmT/cfmYp2BRkiCfCiS
 =rWNf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM changes from Paolo Bonzini:
 "Here are the 3.13 KVM changes.  There was a lot of work on the PPC
  side: the HV and emulation flavors can now coexist in a single kernel
  is probably the most interesting change from a user point of view.

  On the x86 side there are nested virtualization improvements and a few
  bugfixes.

  ARM got transparent huge page support, improved overcommit, and
  support for big endian guests.

  Finally, there is a new interface to connect KVM with VFIO.  This
  helps with devices that use NoSnoop PCI transactions, letting the
  driver in the guest execute WBINVD instructions.  This includes some
  nVidia cards on Windows, that fail to start without these patches and
  the corresponding userspace changes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
  kvm, vmx: Fix lazy FPU on nested guest
  arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
  arm/arm64: KVM: MMIO support for BE guest
  kvm, cpuid: Fix sparse warning
  kvm: Delete prototype for non-existent function kvm_check_iopl
  kvm: Delete prototype for non-existent function complete_pio
  hung_task: add method to reset detector
  pvclock: detect watchdog reset at pvclock read
  kvm: optimize out smp_mb after srcu_read_unlock
  srcu: API for barrier after srcu read unlock
  KVM: remove vm mmap method
  KVM: IOMMU: hva align mapping page size
  KVM: x86: trace cpuid emulation when called from emulator
  KVM: emulator: cleanup decode_register_operand() a bit
  KVM: emulator: check rex prefix inside decode_register()
  KVM: x86: fix emulation of "movzbl %bpl, %eax"
  kvm_host: typo fix
  KVM: x86: emulate SAHF instruction
  MAINTAINERS: add tree for kvm.git
  Documentation/kvm: add a 00-INDEX file
  ...
2013-11-15 13:51:36 +09:00
Jesse Barnes
7bd40c16cc x86/early quirk: use gen6 stolen detection for VLV
We've always been able to use either method on VLV, but it appears more
recent BIOSes only support the gen6 method, so switch over to that.

References: https://bugs.freedesktop.org/show_bug.cgi?id=71370
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-14 09:32:11 +01:00
Linus Torvalds
d320e203ba Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull two x86 fixes from Ingo Molnar.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/amd: Tone down printk(), don't treat a missing firmware file as an error
  x86/dumpstack: Fix printk_address for direct addresses
2013-11-14 16:55:56 +09:00
Linus Torvalds
5e30025a31 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
 "The biggest changes:

   - add lockdep support for seqcount/seqlocks structures, this
     unearthed both bugs and required extra annotation.

   - move the various kernel locking primitives to the new
     kernel/locking/ directory"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  block: Use u64_stats_init() to initialize seqcounts
  locking/lockdep: Mark __lockdep_count_forward_deps() as static
  lockdep/proc: Fix lock-time avg computation
  locking/doc: Update references to kernel/mutex.c
  ipv6: Fix possible ipv6 seqlock deadlock
  cpuset: Fix potential deadlock w/ set_mems_allowed
  seqcount: Add lockdep functionality to seqcount/seqlock structures
  net: Explicitly initialize u64_stats_sync structures for lockdep
  locking: Move the percpu-rwsem code to kernel/locking/
  locking: Move the lglocks code to kernel/locking/
  locking: Move the rwsem code to kernel/locking/
  locking: Move the rtmutex code to kernel/locking/
  locking: Move the semaphore core to kernel/locking/
  locking: Move the spinlock code to kernel/locking/
  locking: Move the lockdep code to kernel/locking/
  locking: Move the mutex code to kernel/locking/
  hung_task debugging: Add tracepoint to report the hang
  x86/locking/kconfig: Update paravirt spinlock Kconfig description
  lockstat: Report avg wait and hold times
  lockdep, x86/alternatives: Drop ancient lockdep fixup message
  ...
2013-11-14 16:30:30 +09:00
Linus Torvalds
7971e23a66 Merge branch 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/trace changes from Ingo Molnar:
 "This adds page fault tracepoints which have zero runtime cost in the
  disabled case via IDT trickery (no NOPs in the page fault hotpath)"

* 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, trace: Change user|kernel_page_fault to page_fault_user|kernel
  x86, trace: Add page fault tracepoints
  x86, trace: Delete __trace_alloc_intr_gate()
  x86, trace: Register exception handler to trace IDT
  x86, trace: Remove __alloc_intr_gate()
2013-11-14 16:25:10 +09:00
Linus Torvalds
2f466d33f5 PCI changes for the v3.13 merge window:
Resource management
     - Fix host bridge window coalescing (Alexey Neyman)
     - Pass type, width, and prefetchability for window alignment (Wei Yang)
 
   PCI device hotplug
     - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu)
 
   Power management
     - Remove pci_pm_complete() (Liu Chuansheng)
 
   MSI
     - Fail initialization if device is not in PCI_D0 (Yijing Wang)
 
   MPS (Max Payload Size)
     - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang)
     - Use pcie_set_readrq() to simplify code (Yijing Wang)
     - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang)
 
   SR-IOV
     - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas)
     - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang)
 
   Virtualization
     - Add x86 MSI masking ops (Konrad Rzeszutek Wilk)
 
   Freescale i.MX6
     - Support i.MX6 PCIe controller (Sean Cross)
     - Increase link startup timeout (Marek Vasut)
     - Probe PCIe in fs_initcall() (Marek Vasut)
     - Fix imprecise abort handler (Tim Harvey)
     - Remove redundant of_match_ptr (Sachin Kamat)
 
   Renesas R-Car
     - Support Gen2 internal PCIe controller (Valentine Barshak)
 
   Samsung Exynos
     - Add MSI support (Jingoo Han)
     - Turn off power when link fails (Jingoo Han)
     - Add Jingoo Han as maintainer (Jingoo Han)
     - Add clk_disable_unprepare() on error path (Wei Yongjun)
     - Remove redundant of_match_ptr (Sachin Kamat)
 
   Synopsys DesignWare
     - Add irq_create_mapping() (Pratyush Anand)
     - Add header guards (Seungwon Jeon)
 
   Miscellaneous
     - Enable native PCIe services by default on non-ACPI (Andrew Murray)
     - Cleanup _OSC usage and messages (Bjorn Helgaas)
     - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas)
     - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman)
     - Remove unused pci_mem_start (Myron Stowe)
     - Make sysfs functions static (Sachin Kamat)
     - Warn on invalid return from driver probe (Stephen M. Cameron)
     - Remove Intel Haswell D3 delays (Todd E Brandt)
     - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu)
     - Use pci_is_pcie() to simplify code (Yijing Wang)
     - Use PCIe capability accessors to simplify code (Yijing Wang)
     - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang)
     - Removed unused "is_pcie" from struct pci_dev (Yijing Wang)
     - Simplify sysfs CPU affinity implementation (Yijing Wang))
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSgUzsAAoJEFmIoMA60/r8wmsQAJhwmtkUYR2L4T1g9smAyjJz
 bLm5zoC6WdywFcbTpTBfsTrS1CHIQG5akRgkEXGdr99epiho5F2lwmagWsUR4ijL
 39Qn3knAUMgtNjoVXXI106h/DfTyxSmkZBfih2AQFyWobJq+0kg7hjQQA3+836b4
 8ssWr1+NSl6JJTqYQ0Paw1kSqvvYoXsu5rWFEfCHk8D0s/1bvr5ldAUpk2jTg93I
 uo9/5+O264yt1YoKZOMqAMZLUfd5DaWY1mV3yeF0Uauy1pBmol5csE8ckqJPDrES
 PRdJT1+PhBeLYWcgXANOBZsW58ddxA0pQ5jQV6VJHQWsm5cE82OBpYJf6xUZ2moV
 o6DZ0KRnCPVA3NllYYR16H+wbMfADwwO83QoA+QTIZJy/WgpDH3Cst+m8KePGqbL
 uFgDdXSws9Bs1BCFs7bfYzAM3OdkBFnn+ac7JoPXKP5ibgAp9nDlurgK2r90zRnp
 j15vHMx0mV+e8B8/iwiW5eRtg7NoCHYiNfFy7JalOlsPmYr2KFazBVKclp13Hng7
 fe/Jy6X4UhWoQPdqsy4ftvSQb0gm1MClxFJeZ3VAt6LY9j8OP6S/Vdf6lpAL85KR
 lAQoQzB+lOhTPdXxFY2xgGkITkqPDOQMjPfowYUYFwybqBuG6BHXZPJobL+niBlb
 Nh+M2WlUUA9Z3V6rWJB6
 =CTPk
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Resource management
    - Fix host bridge window coalescing (Alexey Neyman)
    - Pass type, width, and prefetchability for window alignment (Wei Yang)

  PCI device hotplug
    - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu)

  Power management
    - Remove pci_pm_complete() (Liu Chuansheng)

  MSI
    - Fail initialization if device is not in PCI_D0 (Yijing Wang)

  MPS (Max Payload Size)
    - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang)
    - Use pcie_set_readrq() to simplify code (Yijing Wang)
    - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang)

  SR-IOV
    - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas)
    - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang)

  Virtualization
    - Add x86 MSI masking ops (Konrad Rzeszutek Wilk)

  Freescale i.MX6
    - Support i.MX6 PCIe controller (Sean Cross)
    - Increase link startup timeout (Marek Vasut)
    - Probe PCIe in fs_initcall() (Marek Vasut)
    - Fix imprecise abort handler (Tim Harvey)
    - Remove redundant of_match_ptr (Sachin Kamat)

  Renesas R-Car
    - Support Gen2 internal PCIe controller (Valentine Barshak)

  Samsung Exynos
    - Add MSI support (Jingoo Han)
    - Turn off power when link fails (Jingoo Han)
    - Add Jingoo Han as maintainer (Jingoo Han)
    - Add clk_disable_unprepare() on error path (Wei Yongjun)
    - Remove redundant of_match_ptr (Sachin Kamat)

  Synopsys DesignWare
    - Add irq_create_mapping() (Pratyush Anand)
    - Add header guards (Seungwon Jeon)

  Miscellaneous
    - Enable native PCIe services by default on non-ACPI (Andrew Murray)
    - Cleanup _OSC usage and messages (Bjorn Helgaas)
    - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas)
    - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman)
    - Remove unused pci_mem_start (Myron Stowe)
    - Make sysfs functions static (Sachin Kamat)
    - Warn on invalid return from driver probe (Stephen M. Cameron)
    - Remove Intel Haswell D3 delays (Todd E Brandt)
    - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu)
    - Use pci_is_pcie() to simplify code (Yijing Wang)
    - Use PCIe capability accessors to simplify code (Yijing Wang)
    - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang)
    - Removed unused "is_pcie" from struct pci_dev (Yijing Wang)
    - Simplify sysfs CPU affinity implementation (Yijing Wang)"

* tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (79 commits)
  PCI: Enable upstream bridges even for VFs on virtual buses
  PCI: Add pci_upstream_bridge()
  PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
  PCI: Warn on driver probe return value greater than zero
  PCI: Drop warning about drivers that don't use pci_set_master()
  PCI: Workaround missing pci_set_master in pci drivers
  powerpc/pci: Use pci_is_pcie() to simplify code [fix]
  PCI: Update pcie_ports 'auto' behavior for non-ACPI platforms
  PCI: imx6: Probe the PCIe in fs_initcall()
  PCI: Add R-Car Gen2 internal PCI support
  PCI: imx6: Remove redundant of_match_ptr
  PCI: Report pci_pme_active() kmalloc failure
  mn10300/PCI: Remove useless pcibios_last_bus
  frv/PCI: Remove pcibios_last_bus
  PCI: imx6: Increase link startup timeout
  PCI: exynos: Remove redundant of_match_ptr
  PCI: imx6: Fix imprecise abort handler
  PCI: Fail MSI/MSI-X initialization if device is not in PCI_D0
  PCI: imx6: Remove redundant dev_err() in imx6_pcie_probe()
  x86/PCI: Coalesce multiple overlapping host bridge windows
  ...
2013-11-14 14:02:00 +09:00
Linus Torvalds
f9300eaaac ACPI and power management updates for 3.13-rc1
- New power capping framework and the the Intel Running Average Power
    Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan.
 
  - Addition of the in-kernel switching feature to the arm_big_little
    cpufreq driver from Viresh Kumar and Nicolas Pitre.
 
  - cpufreq support for iMac G5 from Aaro Koskinen.
 
  - Baytrail processors support for intel_pstate from Dirk Brandewie.
 
  - cpufreq support for Midway/ECX-2000 from Mark Langsdorf.
 
  - ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha.
 
  - ACPI power management support for the I2C and SPI bus types from
    Mika Westerberg and Lv Zheng.
 
  - cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat,
    Stratos Karafotis, Xiaoguang Chen, Lan Tianyu.
 
  - cpufreq drivers updates (mostly fixes and cleanups) from Viresh Kumar,
    Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz Majewski,
    Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev.
 
  - intel_pstate updates from Dirk Brandewie and Adrian Huang.
 
  - ACPICA update to version 20130927 includig fixes and cleanups and
    some reduction of divergences between the ACPICA code in the kernel
    and ACPICA upstream in order to improve the automatic ACPICA patch
    generation process.  From Bob Moore, Lv Zheng, Tomasz Nowicki,
    Naresh Bhat, Bjorn Helgaas, David E Box.
 
  - ACPI IPMI driver fixes and cleanups from Lv Zheng.
 
  - ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani,
    Zhang Yanfei, Rafael J Wysocki.
 
  - Conversion of the ACPI AC driver to the platform bus type and
    multiple driver fixes and cleanups related to ACPI from Zhang Rui.
 
  - ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu,
    Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki.
 
  - Fixes and cleanups and new blacklist entries related to the ACPI
    video support from Aaron Lu, Felipe Contreras, Lennart Poettering,
    Kirill Tkhai.
 
  - cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi.
 
  - cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han,
    Bartlomiej Zolnierkiewicz, Prarit Bhargava.
 
  - devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe.
 
  - Operation Performance Points (OPP) core updates from Nishanth Menon.
 
  - Runtime power management core fix from Rafael J Wysocki and update
    from Ulf Hansson.
 
  - Hibernation fixes from Aaron Lu and Rafael J Wysocki.
 
  - Device suspend/resume lockup detection mechanism from Benoit Goby.
 
  - Removal of unused proc directories created for various ACPI drivers
    from Lan Tianyu.
 
  - ACPI LPSS driver fix and new device IDs for the ACPI platform scan
    handler from Heikki Krogerus and Jarkko Nikula.
 
  - New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa.
 
  - Assorted fixes and cleanups related to ACPI from Andy Shevchenko,
    Al Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter,
    Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause,
    Liu Chuansheng.
 
  - Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding,
    Jean-Christophe Plagniol-Villard.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSfPKLAAoJEILEb/54YlRxH6YQAJwDKi25RCZziFSIenXuqzC/
 c6JxoH/tSnDHJHhcTgqh7H7Raa+zmatMDf0m2oEv2Wjfx4Lt4BQK4iefhe/zY4lX
 yJ8uXDg+U8DYhDX2XwbwnFpd1M1k/A+s2gIHDTHHGnE0kDngXdd8RAFFktBmooTZ
 l5LBQvOrTlgX/ZfqI/MNmQ6lfY6kbCABGSHV1tUUsDA6Kkvk/LAUTOMSmptv1q22
 hcs6k55vR34qADPkUX5GghjmcYJv+gNtvbDEJUjcmCwVoPWouF415m7R5lJ8w3/M
 49Q8Tbu5HELWLwca64OorS8qh/P7sgUOf1BX5IDzHnJT+TGeDfvcYbMv2Z275/WZ
 /bqhuLuKBpsHQ2wvEeT+lYV3FlifKeTf1FBxER3ApjzI3GfpmVVQ+dpEu8e9hcTh
 ZTPGzziGtoIsHQ0unxb+zQOyt1PmIk+cU4IsKazs5U20zsVDMcKzPrb19Od49vMX
 gCHvRzNyOTqKWpE83Ss4NGOVPAG02AXiXi/BpuYBHKDy6fTH/liKiCw5xlCDEtmt
 lQrEbupKpc/dhCLo5ws6w7MZzjWJs2eSEQcNR4DlR++pxIpYOOeoPTXXrghgZt2X
 mmxZI2qsJ7GAvPzII8OBeF3CRO3fabZ6Nez+M+oEZjGe05ZtpB3ccw410HwieqBn
 dYpJFt/BHK189odhV9CM
 =JCxk
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael J Wysocki:

 - New power capping framework and the the Intel Running Average Power
   Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan.

 - Addition of the in-kernel switching feature to the arm_big_little
   cpufreq driver from Viresh Kumar and Nicolas Pitre.

 - cpufreq support for iMac G5 from Aaro Koskinen.

 - Baytrail processors support for intel_pstate from Dirk Brandewie.

 - cpufreq support for Midway/ECX-2000 from Mark Langsdorf.

 - ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha.

 - ACPI power management support for the I2C and SPI bus types from Mika
   Westerberg and Lv Zheng.

 - cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat,
   Stratos Karafotis, Xiaoguang Chen, Lan Tianyu.

 - cpufreq drivers updates (mostly fixes and cleanups) from Viresh
   Kumar, Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz
   Majewski, Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev.

 - intel_pstate updates from Dirk Brandewie and Adrian Huang.

 - ACPICA update to version 20130927 includig fixes and cleanups and
   some reduction of divergences between the ACPICA code in the kernel
   and ACPICA upstream in order to improve the automatic ACPICA patch
   generation process.  From Bob Moore, Lv Zheng, Tomasz Nowicki, Naresh
   Bhat, Bjorn Helgaas, David E Box.

 - ACPI IPMI driver fixes and cleanups from Lv Zheng.

 - ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani, Zhang
   Yanfei, Rafael J Wysocki.

 - Conversion of the ACPI AC driver to the platform bus type and
   multiple driver fixes and cleanups related to ACPI from Zhang Rui.

 - ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu,
   Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki.

 - Fixes and cleanups and new blacklist entries related to the ACPI
   video support from Aaron Lu, Felipe Contreras, Lennart Poettering,
   Kirill Tkhai.

 - cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi.

 - cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han,
   Bartlomiej Zolnierkiewicz, Prarit Bhargava.

 - devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe.

 - Operation Performance Points (OPP) core updates from Nishanth Menon.

 - Runtime power management core fix from Rafael J Wysocki and update
   from Ulf Hansson.

 - Hibernation fixes from Aaron Lu and Rafael J Wysocki.

 - Device suspend/resume lockup detection mechanism from Benoit Goby.

 - Removal of unused proc directories created for various ACPI drivers
   from Lan Tianyu.

 - ACPI LPSS driver fix and new device IDs for the ACPI platform scan
   handler from Heikki Krogerus and Jarkko Nikula.

 - New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa.

 - Assorted fixes and cleanups related to ACPI from Andy Shevchenko, Al
   Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter,
   Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause,
   Liu Chuansheng.

 - Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding,
   Jean-Christophe Plagniol-Villard.

* tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (386 commits)
  cpufreq: conservative: fix requested_freq reduction issue
  ACPI / hotplug: Consolidate deferred execution of ACPI hotplug routines
  PM / runtime: Use pm_runtime_put_sync() in __device_release_driver()
  ACPI / event: remove unneeded NULL pointer check
  Revert "ACPI / video: Ignore BIOS initial backlight value for HP 250 G1"
  ACPI / video: Quirk initial backlight level 0
  ACPI / video: Fix initial level validity test
  intel_pstate: skip the driver if ACPI has power mgmt option
  PM / hibernate: Avoid overflow in hibernate_preallocate_memory()
  ACPI / hotplug: Do not execute "insert in progress" _OST
  ACPI / hotplug: Carry out PCI root eject directly
  ACPI / hotplug: Merge device hot-removal routines
  ACPI / hotplug: Make acpi_bus_hot_remove_device() internal
  ACPI / hotplug: Simplify device ejection routines
  ACPI / hotplug: Fix handle_root_bridge_removal()
  ACPI / hotplug: Refuse to hot-remove all objects with disabled hotplug
  ACPI / scan: Start matching drivers after trying scan handlers
  ACPI: Remove acpi_pci_slot_init() headers from internal.h
  ACPI / blacklist: fix name of ThinkPad Edge E530
  PowerCap: Fix build error with option -Werror=format-security
  ...

Conflicts:
	arch/arm/mach-omap2/opp.c
	drivers/Kconfig
	drivers/spi/spi.c
2013-11-14 13:41:48 +09:00
Linus Torvalds
42a2d923cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) The addition of nftables.  No longer will we need protocol aware
    firewall filtering modules, it can all live in userspace.

    At the core of nftables is a, for lack of a better term, virtual
    machine that executes byte codes to inspect packet or metadata
    (arriving interface index, etc.) and make verdict decisions.

    Besides support for loading packet contents and comparing them, the
    interpreter supports lookups in various datastructures as
    fundamental operations.  For example sets are supports, and
    therefore one could create a set of whitelist IP address entries
    which have ACCEPT verdicts attached to them, and use the appropriate
    byte codes to do such lookups.

    Since the interpreted code is composed in userspace, userspace can
    do things like optimize things before giving it to the kernel.

    Another major improvement is the capability of atomically updating
    portions of the ruleset.  In the existing netfilter implementation,
    one has to update the entire rule set in order to make a change and
    this is very expensive.

    Userspace tools exist to create nftables rules using existing
    netfilter rule sets, but both kernel implementations will need to
    co-exist for quite some time as we transition from the old to the
    new stuff.

    Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
    worked so hard on this.

 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
    to our pseudo-random number generator, mostly used for things like
    UDP port randomization and netfitler, amongst other things.

    In particular the taus88 generater is updated to taus113, and test
    cases are added.

 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
    and Yang Yingliang.

 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
    Sujir.

 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
    Neal Cardwell, and Yuchung Cheng.

 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
    control message data, much like other socket option attributes.
    From Francesco Fusco.

 7) Allow applications to specify a cap on the rate computed
    automatically by the kernel for pacing flows, via a new
    SO_MAX_PACING_RATE socket option.  From Eric Dumazet.

 8) Make the initial autotuned send buffer sizing in TCP more closely
    reflect actual needs, from Eric Dumazet.

 9) Currently early socket demux only happens for TCP sockets, but we
    can do it for connected UDP sockets too.  Implementation from Shawn
    Bohrer.

10) Refactor inet socket demux with the goal of improving hash demux
    performance for listening sockets.  With the main goals being able
    to use RCU lookups on even request sockets, and eliminating the
    listening lock contention.  From Eric Dumazet.

11) The bonding layer has many demuxes in it's fast path, and an RCU
    conversion was started back in 3.11, several changes here extend the
    RCU usage to even more locations.  From Ding Tianhong and Wang
    Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
    Falico.

12) Allow stackability of segmentation offloads to, in particular, allow
    segmentation offloading over tunnels.  From Eric Dumazet.

13) Significantly improve the handling of secret keys we input into the
    various hash functions in the inet hashtables, TCP fast open, as
    well as syncookies.  From Hannes Frederic Sowa.  The key fundamental
    operation is "net_get_random_once()" which uses static keys.

    Hannes even extended this to ipv4/ipv6 fragmentation handling and
    our generic flow dissector.

14) The generic driver layer takes care now to set the driver data to
    NULL on device removal, so it's no longer necessary for drivers to
    explicitly set it to NULL any more.  Many drivers have been cleaned
    up in this way, from Jingoo Han.

15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.

16) Improve CRC32 interfaces and generic SKB checksum iterators so that
    SCTP's checksumming can more cleanly be handled.  Also from Daniel
    Borkmann.

17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
    using the interface MTU value.  This helps avoid PMTU attacks,
    particularly on DNS servers.  From Hannes Frederic Sowa.

18) Use generic XPS for transmit queue steering rather than internal
    (re-)implementation in virtio-net.  From Jason Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  random32: add test cases for taus113 implementation
  random32: upgrade taus88 generator to taus113 from errata paper
  random32: move rnd_state to linux/random.h
  random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
  random32: add periodic reseeding
  random32: fix off-by-one in seeding requirement
  PHY: Add RTL8201CP phy_driver to realtek
  xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
  macmace: add missing platform_set_drvdata() in mace_probe()
  ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
  ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
  vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
  ixgbe: add warning when max_vfs is out of range.
  igb: Update link modes display in ethtool
  netfilter: push reasm skb through instead of original frag skbs
  ip6_output: fragment outgoing reassembled skb properly
  MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
  net_sched: tbf: support of 64bit rates
  ixgbe: deleting dfwd stations out of order can cause null ptr deref
  ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
  ...
2013-11-13 17:40:34 +09:00
Vineet Gupta
c375f15a43 x86: move fpu_counter into ARCH specific thread_struct
Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can
be moved out into ARCH specific thread_struct, reducing the size of
task_struct for other arches.

Compile tested i386_defconfig + gcc 4.7.3

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mundt <paul.mundt@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:13 +09:00
Tang Chen
fa591c4ae7 x86, acpi, crash, kdump: do reserve_crashkernel() after SRAT is parsed.
Memory reserved for crashkernel could be large.  So we should not allocate
this memory bottom up from the end of kernel image.

When SRAT is parsed, we will be able to know which memory is hotpluggable,
and we can avoid allocating this memory for the kernel.  So reorder
reserve_crashkernel() after SRAT is parsed.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:08 +09:00
Jianguo Wu
40c3baa7c6 mm/arch: use NUMA_NO_NODE
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc()

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:05 +09:00
Thomas Renninger
11f918d3e2 x86/microcode/amd: Tone down printk(), don't treat a missing firmware file as an error
Do it the same way as done in microcode_intel.c: use pr_debug()
for missing firmware files.

There seem to be CPUs out there for which no microcode update
has been submitted to kernel-firmware repo yet resulting in
scary sounding error messages in dmesg:

  microcode: failed to load file amd-ucode/microcode_amd_fam16h.bin

Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1384274383-43510-1-git-send-email-trenn@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-12 22:03:49 +01:00
Jiri Slaby
5f01c98859 x86/dumpstack: Fix printk_address for direct addresses
Consider a kernel crash in a module, simulated the following way:

 static int my_init(void)
 {
         char *map = (void *)0x5;
         *map = 3;
         return 0;
 }
 module_init(my_init);

When we turn off FRAME_POINTERs, the very first instruction in
that function causes a BUG. The problem is that we print IP in
the BUG report using %pB (from printk_address). And %pB
decrements the pointer by one to fix printing addresses of
functions with tail calls.

This was added in commit 71f9e59800 ("x86, dumpstack: Use
%pB format specifier for stack trace") to fix the call stack
printouts.

So instead of correct output:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
  IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]

We get:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
  IP: [<ffffffffa0152000>] 0xffffffffa0151fff

To fix that, we use %pS only for stack addresses printouts (via
newly added printk_stack_address) and %pB for regs->ip (via
printk_address). I.e. we revert to the old behaviour for all
except call stacks. And since from all those reliable is 1, we
remove that parameter from printk_address.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: joe@perches.com
Cc: jirislaby@gmail.com
Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-12 21:06:06 +01:00
Linus Torvalds
10d0c9705e DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
 
 - Cross arch clean-up and consolidation of early DT scanning code.
 - Clean-up and removal of arch prom.h headers. Makes arch specific
   prom.h optional on all but Sparc.
 - Addition of interrupts-extended property for devices connected to
   multiple interrupt controllers.
 - Refactoring of DT interrupt parsing code in preparation for deferred
   probe of interrupts.
 - ARM cpu and cpu topology bindings documentation.
 - Various DT vendor binding documentation updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
 R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
 huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
 PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
 2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
 Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
 =GCbY
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:
 "DeviceTree updates for 3.13.  This is a bit larger pull request than
  usual for this cycle with lots of clean-up.

   - Cross arch clean-up and consolidation of early DT scanning code.
   - Clean-up and removal of arch prom.h headers.  Makes arch specific
     prom.h optional on all but Sparc.
   - Addition of interrupts-extended property for devices connected to
     multiple interrupt controllers.
   - Refactoring of DT interrupt parsing code in preparation for
     deferred probe of interrupts.
   - ARM cpu and cpu topology bindings documentation.
   - Various DT vendor binding documentation updates"

* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
  powerpc: add missing explicit OF includes for ppc
  dt/irq: add empty of_irq_count for !OF_IRQ
  dt: disable self-tests for !OF_IRQ
  of: irq: Fix interrupt-map entry matching
  MIPS: Netlogic: replace early_init_devtree() call
  of: Add Panasonic Corporation vendor prefix
  of: Add Chunghwa Picture Tubes Ltd. vendor prefix
  of: Add AU Optronics Corporation vendor prefix
  of/irq: Fix potential buffer overflow
  of/irq: Fix bug in interrupt parsing refactor.
  of: set dma_mask to point to coherent_dma_mask
  of: add vendor prefix for PHYTEC Messtechnik GmbH
  DT: sort vendor-prefixes.txt
  of: Add vendor prefix for Cadence
  of: Add empty for_each_available_child_of_node() macro definition
  arm/versatile: Fix versatile irq specifications.
  of/irq: create interrupts-extended property
  microblaze/pci: Drop PowerPC-ism from irq parsing
  of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
  of/irq: Use irq_of_parse_and_map()
  ...
2013-11-12 16:52:17 +09:00
Linus Torvalds
9b66bfb280 Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 UV debug changes from Ingo Molnar:
 "Various SGI UV debuggability improvements, amongst them KDB support,
  with related core KDB enabling patches changing kernel/debug/kdb/"

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86/UV: Add uvtrace support"
  x86/UV: Add call to KGDB/KDB from NMI handler
  kdb: Add support for external NMI handler to call KGDB/KDB
  x86/UV: Check for alloc_cpumask_var() failures properly in uv_nmi_setup()
  x86/UV: Add uvtrace support
  x86/UV: Add kdump to UV NMI handler
  x86/UV: Add summary of cpu activity to UV NMI handler
  x86/UV: Update UV support for external NMI signals
  x86/UV: Move NMI support
2013-11-12 12:01:14 +09:00
Linus Torvalds
986189f9ec Merge branch 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 reboot changes from Ingo Molnar:
 "Misc changes - the only one with functional impact should be commit
  16c21ae5ca ("reboot: Allow specifying warm/cold reset for CF9 boot
  type") which extends cold/warm reboot handling to the 0xCF9 reboot
  method"

* 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/reboot: Correct pr_info() log message in the set_bios/pci/kbd_reboot()
  x86/reboot: Sort reboot DMI quirks by vendor
  x86/reboot: Remove the duplicate C6100 entry in the reboot quirks list
  reboot: Allow specifying warm/cold reset for CF9 boot type
2013-11-12 11:33:38 +09:00
Linus Torvalds
340286cd4e Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:
 "The biggest change adds support for Intel 'CPER' (UEFI Common Platform
  Error Record) error logging, which builds upon an enhanced error
  logging mechanism available on Xeon processors.

  Full description is here:

    http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html

  This change provides a module (and support code) to check for an
  extended error log and prints extra details about the error on the
  console"

* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ACPI, x86: Fix extended error log driver to depend on CONFIG_X86_LOCAL_APIC
  dmi: Avoid unaligned memory access in save_mem_devices()
  Move cper.c from drivers/acpi/apei to drivers/firmware/efi
  EDAC, GHES: Update ghes error record info
  ACPI, APEI, CPER: Cleanup CPER memory error output format
  ACPI, APEI, CPER: Enhance memory reporting capability
  ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
  DMI: Parse memory device (type 17) in SMBIOS
  ACPI, x86: Extended error log driver for x86 platform
  bitops: Introduce a more generic BITMASK macro
  ACPI, CPER: Update cper info
  ACPI, APEI, CPER: Fix status check during error printing
2013-11-12 11:16:44 +09:00
Linus Torvalds
dba538ff56 Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/intel-mid changes from Ingo Molnar:
 "Update the 'intel mid' (mobile internet device) platform code as Intel
  is rolling out more SoC designs.

  This gets rid of most of the 'MRST' platform code in the process,
  mostly by renaming and shuffling code around into their respective
  'intel-mid' platform drivers"

* 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, intel-mid: Do not re-introduce usage of obsolete __cpuinit
  intel_mid: Move platform device setups to their own platform_<device>.* files
  x86: intel-mid: Add section for sfi device table
  intel-mid: sfi: Allow struct devs_id.get_platform_data to be NULL
  intel_mid: Moved SFI related code to sfi.c
  intel_mid: Added custom handler for ipc devices
  intel_mid: Added custom device_handler support
  intel_mid: Refactored sfi_parse_devs() function
  intel_mid: Renamed *mrst* to *intel_mid*
  pci: intel_mid: Return true/false in function returning bool
  intel_mid: Renamed *mrst* to *intel_mid*
  mrst: Fixed indentation issues
  mrst: Fixed printk/pr_* related issues
2013-11-12 11:12:22 +09:00
Linus Torvalds
2dc1733fd4 Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/hyperv changes from Ingo Molnar:
 "These changes enable Linux guests to boot as 'Modern VM' guest kernels
  on MS-Hyperv hosts"

* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, hyperv: Move a variable to avoid an unused variable warning
  x86, hyperv: Fix build error due to missing <asm/apic.h> include
  x86, hyperv: Correctly guard the local APIC calibration code
  x86, hyperv: Get the local APIC timer frequency from the hypervisor
2013-11-12 11:07:49 +09:00
Linus Torvalds
69019d77c7 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI changes from Ingo Molnar:
 "Main changes:

   - Add support for earlyprintk=efi which uses the EFI framebuffer.
     Very useful for debugging boot problems.

   - EFI stub support for large memory maps (more than 128 entries)

   - EFI ARM support - this was mostly done by generalizing x86 <-> ARM
     platform differences, such as by moving x86 EFI code into
     drivers/firmware/efi/ and sharing it with ARM.

   - Documentation updates

   - misc fixes"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  x86/efi: Add EFI framebuffer earlyprintk support
  boot, efi: Remove redundant memset()
  x86/efi: Fix config_table_type array termination
  x86 efi: bugfix interrupt disabling sequence
  x86: EFI stub support for large memory maps
  efi: resolve warnings found on ARM compile
  efi: Fix types in EFI calls to match EFI function definitions.
  efi: Renames in handle_cmdline_files() to complete generalization.
  efi: Generalize handle_ramdisks() and rename to handle_cmdline_files().
  efi: Allow efi_free() to be called with size of 0
  efi: use efi_get_memory_map() to get final map for x86
  efi: generalize efi_get_memory_map()
  efi: Rename __get_map() to efi_get_memory_map()
  efi: Move unicode to ASCII conversion to shared function.
  efi: Generalize relocate_kernel() for use by other architectures.
  efi: Move relocate_kernel() to shared file.
  efi: Enforce minimum alignment of 1 page on allocations.
  efi: Rename memory allocation/free functions
  efi: Add system table pointer argument to shared functions.
  efi: Move common EFI stub code from x86 arch code to common location
  ...
2013-11-12 10:48:30 +09:00
Linus Torvalds
6df1e7f2e9 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu changes from Ingo Molnar:
 "The biggest change that stands out is the increase of the
  CONFIG_NR_CPUS range from 4096 to 8192 - as real hardware out there
  already went beyond 4k CPUs ...

  We only allow more than 512 CPUs if offstack cpumasks are enabled.

  CONFIG_MAXSMP=y remains to be the 'you are nuts!' extreme testcase,
  which now means a max of 8192 CPUs"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Increase max CPU count to 8192
  x86/cpu: Allow higher NR_CPUS values
  x86/cpu: Always print SMP information in /proc/cpuinfo
  x86/cpu: Track legacy CPU model data only on 32-bit kernels
2013-11-12 10:46:43 +09:00
Linus Torvalds
d96d8aa261 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Two small cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, msr: Use file_inode(), not f_mapping->host
  x86: mkpiggy.c: Explicitly close the output file
2013-11-12 10:45:01 +09:00
Linus Torvalds
014d595c23 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot changes from Ingo Molnar:
 "Two changes that prettify and compactify the SMP bootup output from:

     smpboot: Booting Node   0, Processors  #1 #2 #3 OK
     smpboot: Booting Node   1, Processors  #4 #5 #6 #7 OK
     smpboot: Booting Node   2, Processors  #8 #9 #10 #11 OK
     smpboot: Booting Node   3, Processors  #12 #13 #14 #15 OK
     Brought up 16 CPUs

  to something like:

     x86: Booting SMP configuration:
     .... node  #0, CPUs:        #1  #2  #3
     .... node  #1, CPUs:    #4  #5  #6  #7
     .... node  #2, CPUs:    #8  #9 #10 #11
     .... node  #3, CPUs:   #12 #13 #14 #15
     x86: Booted up 4 nodes, 16 CPUs"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Further compress CPUs bootup message
  x86: Improve the printout of the SMP bootup CPU table
2013-11-12 10:41:10 +09:00
Linus Torvalds
b7ab6e3d21 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/apic fix from Ingo Molnar:
 "A single fix to the IO-APIC / local-APIC shutdown sequence"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Disable I/O APIC before shutdown of the local APIC
2013-11-12 10:37:53 +09:00
Linus Torvalds
87093826aa Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer changes from Ingo Molnar:
 "Main changes in this cycle were:

   - Updated full dynticks support.

   - Event stream support for architected (ARM) timers.

   - ARM clocksource driver updates.

   - Move arm64 to using the generic sched_clock framework & resulting
     cleanup in the generic sched_clock code.

   - Misc fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC
  clocksource: sun4i: remove IRQF_DISABLED
  clocksource: sun4i: Report the minimum tick that we can program
  clocksource: sun4i: Select CLKSRC_MMIO
  clocksource: Provide timekeeping for efm32 SoCs
  clocksource: em_sti: convert to clk_prepare/unprepare
  time: Fix signedness bug in sysfs_get_uname() and its callers
  timekeeping: Fix some trivial typos in comments
  alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
  clocksource: arch_timer: Do not register arch_sys_counter twice
  timer stats: Add a 'Collection: active/inactive' line to timer usage statistics
  sched_clock: Remove sched_clock_func() hook
  arch_timer: Move to generic sched_clock framework
  clocksource: tcb_clksrc: Remove IRQF_DISABLED
  clocksource: tcb_clksrc: Improve driver robustness
  clocksource: tcb_clksrc: Replace clk_enable/disable with clk_prepare_enable/disable_unprepare
  clocksource: arm_arch_timer: Use clocksource for suspend timekeeping
  clocksource: dw_apb_timer_of: Mark a few more functions as __init
  clocksource: Put nodes passed to CLOCKSOURCE_OF_DECLARE callbacks centrally
  arm: zynq: Enable arm_global_timer
  ...
2013-11-12 10:36:00 +09:00
Linus Torvalds
39cf275a1a Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle are:

   - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
     van Riel, Peter Zijlstra et al.  Yay!

   - optimize preemption counter handling: merge the NEED_RESCHED flag
     into the preempt_count variable, by Peter Zijlstra.

   - wait.h fixes and code reorganization from Peter Zijlstra

   - cfs_bandwidth fixes from Ben Segall

   - SMP load-balancer cleanups from Peter Zijstra

   - idle balancer improvements from Jason Low

   - other fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
  ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
  stop_machine: Fix race between stop_two_cpus() and stop_cpus()
  sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
  sched: Fix asymmetric scheduling for POWER7
  sched: Move completion code from core.c to completion.c
  sched: Move wait code from core.c to wait.c
  sched: Move wait.c into kernel/sched/
  sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
  sched: Avoid throttle_cfs_rq() racing with period_timer stopping
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq->lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  sched: Remove extra put_online_cpus() inside sched_setaffinity()
  sched/rt: Fix task_tick_rt() comment
  sched/wait: Fix build breakage
  sched/wait: Introduce prepare_to_wait_event()
  sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
  sched: Remove get_online_cpus() usage
  sched: Fix race in migrate_swap_stop()
  ...
2013-11-12 10:20:12 +09:00
Linus Torvalds
ad5d69899e Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "As a first remark I'd like to note that the way to build perf tooling
  has been simplified and sped up, in the future it should be enough for
  you to build perf via:

        cd tools/perf/
        make install

  (ie without the -j option.) The build system will figure out the
  number of CPUs and will do a parallel build+install.

  The various build system inefficiencies and breakages Linus reported
  against the v3.12 pull request should now be resolved - please
  (re-)report any remaining annoyances or bugs.

  Main changes on the perf kernel side:

   * Performance optimizations:
      . perf ring-buffer code optimizations,          by Peter Zijlstra
      . perf ring-buffer code optimizations,          by Oleg Nesterov
      . x86 NMI call-stack processing optimizations,  by Peter Zijlstra
      . perf context-switch optimizations,            by Peter Zijlstra
      . perf sampling speedups,                       by Peter Zijlstra
      . x86 Intel PEBS processing speedups,           by Peter Zijlstra

   * Enhanced hardware support:
      . for Intel Ivy Bridge-EP uncore PMUs,          by Zheng Yan
      . for Haswell transactions,                     by Andi Kleen, Peter Zijlstra

   * Core perf events code enhancements and fixes by Oleg Nesterov:
      . for uprobes, if fork() is called with pending ret-probes
      . for uprobes platform support code

   * New ABI details by Andi Kleen:
      . Report x86 Haswell TSX transaction abort cost as weight

  Main changes on the perf tooling side (some of these tooling changes
  utilize the above kernel side changes):

   * 'perf report/top' enhancements:

      . Convert callchain children list to rbtree, greatly reducing the
        time taken for callchain processing, from Namhyung Kim.

      . Add new COMM infrastructure, further improving histogram
        processing, from Frédéric Weisbecker, one fix from Namhyung Kim.

      . Add /proc/kcore based live-annotation improvements, including
        build-id cache support, multi map 'call' instruction navigation
        fixes, kcore address validation, objdump workarounds.  From
        Adrian Hunter.

      . Show progress on histogram collapsing, that can take a long
        time, from Namhyung Kim.

      . Add --max-stack option to limit callchain stack scan in 'top'
        and 'report', improving callchain processing when reducing the
        stack depth is an option, from Waiman Long.

      . Add new option --ignore-vmlinux for perf top, from Willy
        Tarreau.

   * 'perf trace' enhancements:

      . 'perf trace' now can can use a 'perf probe' dynamic tracepoints
        to hook into the userspace -> kernel pathname copy so that it
        can map fds to pathnames without reading /proc/pid/fd/ symlinks.
        From Arnaldo Carvalho de Melo.

      . Show VFS path associated with fd in live sessions, using a
        'vfs_getname' 'perf probe' created dynamic tracepoint or by
        looking at /proc/pid/fd, from Arnaldo Carvalho de Melo.

      . Add 'trace' beautifiers for lots of syscall arguments, from
        Arnaldo Carvalho de Melo.

      . Implement more compact 'trace' output by suppressing zeroed
        args, from Arnaldo Carvalho de Melo.

      . Show thread COMM by default in 'trace', from Arnaldo Carvalho de
        Melo.

      . Add option to show full timestamp in 'trace', from David Ahern.

      . Add 'record' command in 'trace', to record raw_syscalls:*, from
        David Ahern.

      . Add summary option to dump syscall statistics in 'trace', from
        David Ahern.

      . Improve error messages in 'trace', providing hints about system
        configuration steps needed for using it, from Ramkumar
        Ramachandra.

      . 'perf trace' now emits hints as to why tracing is not possible,
        helping the user to setup the system to allow tracing in the
        desired permission granularity, telling if the problem is due to
        debugfs not being mounted or with not enough permission for
        !root, /proc/sys/kernel/perf_event_paranoit value, etc.  From
        Arnaldo Carvalho de Melo.

   * 'perf record' enhancements:

      . Check maximum frequency rate for record/top, emitting better
        error messages, from Jiri Olsa.

      . 'perf record' code cleanups, from David Ahern.

      . Improve write_output error message in 'perf record', from Adrian
        Hunter.

      . Allow specifying B/K/M/G unit to the --mmap-pages arguments,
        from Jiri Olsa.

      . Fix command line callchain attribute tests to handle the new
        -g/--call-chain semantics, from Arnaldo Carvalho de Melo.

   * 'perf kvm' enhancements:

      . Disable live kvm command if timerfd is not supported, from David
        Ahern.

      . Fix detection of non-core features, from David Ahern.

   * 'perf list' enhancements:

      . Add usage to 'perf list', from David Ahern.

      . Show error in 'perf list' if tracepoints not available, from
        Pekka Enberg.

   * 'perf probe' enhancements:

      . Support "$vars" meta argument syntax for local variables,
        allowing asking for all possible variables at a given probe
        point to be collected when it hits, from Masami Hiramatsu.

   * 'perf sched' enhancements:

      . Address the root cause of that 'perf sched' stack initialization
        build slowdown, by programmatically setting a big array after
        moving the global variable back to the stack.  Fix from Adrian
        Hunter.

   * 'perf script' enhancements:

      . Set up output options for in-stream attributes, from Adrian
        Hunter.

      . Print addr by default for BTS in 'perf script', from Adrian
        Juntmer

   * 'perf stat' enhancements:

      . Improved messages when doing profiling in all or a subset of
        CPUs using a workload as the session delimitator, as in:

         'perf stat --cpu 0,2 sleep 10s'

        from Arnaldo Carvalho de Melo.

      . Add units to nanosec-based counters in 'perf stat', from David
        Ahern.

      . Remove bogus info when using 'perf stat' -e cycles/instructions,
        from Ramkumar Ramachandra.

   * 'perf lock' enhancements:

      . 'perf lock' fixes and cleanups, from Davidlohr Bueso.

   * 'perf test' enhancements:

      . Fixup PERF_SAMPLE_TRANSACTION handling in sample synthesizing
        and 'perf test', from Adrian Hunter.

      . Clarify the "sample parsing" test entry, from Arnaldo Carvalho
        de Melo.

      . Consider PERF_SAMPLE_TRANSACTION in the "sample parsing" test,
        from Arnaldo Carvalho de Melo.

      . Memory leak fixes in 'perf test', from Felipe Pena.

   * 'perf bench' enhancements:

      . Change the procps visible command-name of invididual benchmark
        tests plus cleanups, from Ingo Molnar.

   * Generic perf tooling infrastructure/plumbing changes:

      . Separating data file properties from session, code
        reorganization from Jiri Olsa.

      . Fix version when building out of tree, as when using one of
        these:

        $ make help | grep perf
          perf-tar-src-pkg    - Build perf-3.12.0.tar source tarball
          perf-targz-src-pkg  - Build perf-3.12.0.tar.gz source tarball
          perf-tarbz2-src-pkg - Build perf-3.12.0.tar.bz2 source tarball
          perf-tarxz-src-pkg  - Build perf-3.12.0.tar.xz source tarball
        $

        from David Ahern.

      . Enhance option parse error message, showing just the help lines
        of the options affected, from Namhyung Kim.

      . libtraceevent updates from upstream trace-cmd repo, from Steven
        Rostedt.

      . Always use perf_evsel__set_sample_bit to set sample_type, from
        Adrian Hunter.

      . Memory and mmap leak fixes from Chenggang Qin.

      . Assorted build fixes for from David Ahern and Jiri Olsa.

      . Speed up and prettify the build system, from Ingo Molnar.

      . Implement addr2line directly using libbfd, from Roberto Vitillo.

      . Separate the GTK support in a separate libperf-gtk.so DSO, that
        is only loaded when --gtk is specified, from Namhyung Kim.

      . perf bash completion fixes and improvements from Ramkumar
        Ramachandra.

      . Support for Openembedded/Yocto -dbg packages, from Ricardo
        Ribalda Delgado.

  And lots and lots of other fixes and code reorganizations that did not
  make it into the list, see the shortlog, diffstat and the Git log for
  details!"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (300 commits)
  uprobes: Fix the memory out of bound overwrite in copy_insn()
  uprobes: Fix the wrong usage of current->utask in uprobe_copy_process()
  perf tools: Remove unneeded include
  perf record: Remove post_processing_offset variable
  perf record: Remove advance_output function
  perf record: Refactor feature handling into a separate function
  perf trace: Don't relookup fields by name in each sample
  perf tools: Fix version when building out of tree
  perf evsel: Ditch evsel->handler.data field
  uprobes: Export write_opcode() as uprobe_write_opcode()
  uprobes: Introduce arch_uprobe->ixol
  uprobes: Kill module_init() and module_exit()
  uprobes: Move function declarations out of arch
  perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support
  perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxes
  perf: Factor out strncpy() in perf_event_mmap_event()
  tools/perf: Add required memory barriers
  perf: Fix arch_perf_out_copy_user default
  perf: Update a stale comment
  perf: Optimize perf_output_begin() -- address calculation
  ...
2013-11-12 10:06:34 +09:00
Linus Torvalds
1006fae359 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull IRQ changes from Ingo Molnar:
 "The biggest change this cycle are the softirq/hardirq stack
  interaction and nesting fixes, cleanups and reorganizations from
  Frederic.  This is the longer followup story to the softirq nesting
  fix that is already upstream (commit ded7975475: "irq: Force hardirq
  exit's softirq processing on its own stack")"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: bcm2835: Convert to use IRQCHIP_DECLARE macro
  powerpc: Tell about irq stack coverage
  x86: Tell about irq stack coverage
  irq: Optimize softirq stack selection in irq exit
  irq: Justify the various softirq stack choices
  irq: Improve a bit softirq debugging
  irq: Optimize call to softirq on hardirq exit
  irq: Consolidate do_softirq() arch overriden implementations
  x86/irq: Correct comment about i8259 initialization
2013-11-12 10:02:59 +09:00
Seiji Aguchi
25c74b10ba x86, trace: Register exception handler to trace IDT
This patch registers exception handlers for tracing to a trace IDT.

To implemented it in set_intr_gate(), this patch does followings.
 - Register the exception handlers to
   the trace IDT by prepending "trace_" to the handler's names.
 - Also, newly introduce trace_page_fault() to add tracepoints
   in a subsequent patch.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/52716DEC.5050204@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-11-08 14:15:45 -08:00
Ben Widawsky
9459d25237 drm/i915/bdw: support GMS and GGMS changes
All the BARs have the ability to grow.

v2: Pulled out the simulator workaround to a separate patch.
Rebased.

v3: Rebase onto latest vlv patches from Jesse.

v4: Rebased on top of the early stolen quirk patch from Jesse.

v5: Use the new macro names.
s/INTEL_BDW_PCI_IDS_D/INTEL_BDW_D_IDS
s/INTEL_BDW_PCI_IDS_M/INTEL_BDW_M_IDS
It's Jesse's fault for not following the convention I originally set.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:39 +01:00
Rafael J. Wysocki
0faf996f4d Merge branch 'acpica'
* acpica: (35 commits)
  ACPICA: Add __init for ACPICA initializers/finalizers.
  ACPICA: Cleanup asmlinkage for ACPICA APIs.
  ACPICA: Update acpidump related header file changes.
  ACPICA: Update compilation environment settings.
  ACPICA: Fix cached object deletion code.
  ACPICA: Remove dead AOPOBJ_INVALID check.
  ACPICA: Cleanup useless memset invocations.
  ACPICA: Fix an ACPI_ALLOCATE_ZEROED() reversal.
  ACPICA: Fix wrong object length returned by acpi_ut_get_simple_object_size().
  ACPICA: Add new statistics interface.
  ACPICA: Update DMAR table definitions.
  ACPICA: Update RSDP table definitions.
  ACPICA: Update namespace dump code.
  ACPICA: Update check for setting the ANOBJ_IS_EXTERNAL flag.
  ACPICA: Update default space handlers.
  ACPICA: Update version to 20130927.
  ACPICA: Update aclinux.h for new OSL override mechanism.
  ACPICA: Add support to allow host OS to redefine individual OSL prototypes.
  ACPICA: Simplify configuration of global ACPI_REDUCED_HARDWARE macro.
  ACPICA: Fix indentation issues for macro invocations.
  ...
2013-11-07 19:21:11 +01:00
Rob Herring
b5480950c6 Merge remote-tracking branch 'grant/devicetree/next' into for-next 2013-11-07 10:34:46 -06:00
Fenghua Yu
522e664644 x86/apic: Disable I/O APIC before shutdown of the local APIC
In reboot and crash path, when we shut down the local APIC, the I/O APIC is
still active. This may cause issues because external interrupts
can still come in and disturb the local APIC during shutdown process.

To quiet external interrupts, disable I/O APIC before shutdown local APIC.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1382578212-4677-1-git-send-email-fenghua.yu@intel.com
Cc: <stable@kernel.org>
[ I suppose the 'issue' is a hang during shutdown. It's a fine change nevertheless. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-07 10:12:37 +01:00
Konrad Rzeszutek Wilk
0e4ccb1505 PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
Certain platforms do not allow writes in the MSI-X BARs to setup or tear
down vector values.  To combat against the generic code trying to write to
that and either silently being ignored or crashing due to the pagetables
being marked R/O this patch introduces a platform override.

Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
code.

For Xen, which does not allow the guest to write to MSI-X tables - as the
hypervisor is solely responsible for setting the vector values - we
implement two nops.

This fixes a Xen guest crash when passing a PCI device with MSI-X to the
guest.  See the bugzilla for more details.

[bhelgaas: add bugzilla info]
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
2013-11-06 16:32:19 -07:00
H. Peter Anvin
4c08edd305 x86, hyperv: Move a variable to avoid an unused variable warning
The variable hv_lapic_frequency causes an unused variable warning if
CONFIG_X86_LOCAL_APIC is disabled.  Since the variable is only used
inside a small if statement, move the declaration of that variable
into the if statement itself.

Cc: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1381444224-3303-1-git-send-email-kys@microsoft.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-11-06 10:02:05 -08:00
Yan, Zheng
f891d8cfb8 perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support
Unlike other uncore boxes, IRP boxes live in PCI buses with no UBOX
device. For PCI bus without UBOX device, we find the next bus that
has UBOX device and use its 'bus to socket' mapping.

Besides the counter/control registers in IRP boxes are not properly
aligned.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Cc: "Yan Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1383197815-17706-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:34:31 +01:00
Yan, Zheng
d1e8f4a836 perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxes
The encoding for filter registers of IvyBridge-EP uncore QPI boxes is
completely the same as SandyBridge-EP.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Cc: "Yan Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1383197815-17706-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:34:29 +01:00
Peter Zijlstra
0a196848ca perf: Fix arch_perf_out_copy_user default
The arch_perf_output_copy_user() default of
__copy_from_user_inatomic() returns bytes not copied, while all other
argument functions given DEFINE_OUTPUT_COPY() return bytes copied.

Since copy_from_user_nmi() is the odd duck out by returning bytes
copied where all other *copy_{to,from}* functions return bytes not
copied, change it over and ammend DEFINE_OUTPUT_COPY() to expect bytes
not copied.

Oddly enough DEFINE_OUTPUT_COPY() already returned bytes not copied
while expecting its worker functions to return bytes copied.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: will.deacon@arm.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20131030201622.GR16117@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:34:25 +01:00
Marcelo Tosatti
8b414521bc hung_task: add method to reset detector
In certain occasions it is possible for a hung task detector
positive to be false: continuation from a paused VM, for example.

Add a method to reset detection, similar as is done
with other kernel watchdogs.

Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-11-06 09:49:02 +02:00
Marcelo Tosatti
d63285e94a pvclock: detect watchdog reset at pvclock read
Implement reset of kernel watchdogs at pvclock read time. This avoids
adding special code to every watchdog.

This is possible for watchdogs which measure time based on sched_clock() or
ktime_get() variants.

Suggested by Don Zickus.

Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-11-06 09:48:43 +02:00
HATAYAMA Daisuke
a477c8594b x86/cpu: Always print SMP information in /proc/cpuinfo
Currently show_cpuinfo_core() displays cpu core information only if
the number of threads per a whole cores is 2 or larger.

However, this condition doesn't care about the number of
sockets. For example, this condition doesn't hold on systems
with two logical cpus consisting of two sockets and a single
core on each socket - yet the topology information would be
interesting to see in that case as well.

I don't know whether or not there are processors in real world
by which such configurations are possible, but at least on
vitual machine environments, such configuration can occur,
typically when no explicit SMP information is provided in
advance.

For example, on qemu/KVM, SMP information is specified via -smp
command-line option, more specifically, its syntax is:

  -smp n[,cores=cores][,threads=threads][,sockets=sockets][,maxcpus=maxcpus]

If this is not specified, qemu tells configuration with
n-sockets, 1-core and 1-thread to the guest machine, on which
guest, MP information is not displayed in /proc/cpuinfo.

I saw this situation on VMWare guest environment, too.

To fix this issue, this patch simply removes the condition
because this information is useful even if there's only 1
thread.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/5277D644.4090707@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 08:13:56 +01:00
Ingo Molnar
c90423d1de Merge branch 'sched/core' into core/locking, to prepare the kernel/locking/ file move
Conflicts:
	kernel/Makefile

There are conflicts in kernel/Makefile due to file moving in the
scheduler tree - resolve them.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 07:50:37 +01:00
Ingo Molnar
164777530b Linux 3.12
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSdt9HAAoJEHm+PkMAQRiGnzEH/345Keg5dp+oKACnokBfzOtp
 V0p3g5EBsGtzEVnV+1B96trczDUtWdDFFr5GfGSj565NBQpFyc+iZC1mC99RDJCs
 WUquGFqlLMK2aV0SbKwCO4K1rJ5A0TRVj0ZRJOUJUY7jwNf5Qahny0WBVjO/8qAY
 UvJK1rktBClhKdH53YtpDHHgXBeZ2LOrzt1fQ/AMpujGbZauGvnLdNOli5r2kCFK
 jzoOgFLvX+PHU/5/d4/QyJPeQNPva5hjk5Ho9UuSJYhnFtPO3EkD4XZLcpcbNEJb
 LqBvbnZWm6CS435lfU1l93RqQa5xMO9ITk0oe4h69syTSHwWk9aJ+ZTc/4Up+t8=
 =57MC
 -----END PGP SIGNATURE-----

Merge tag 'v3.12' into x86/cpu, to refresh the branch before queueing up more changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 06:50:21 +01:00
Ingo Molnar
97c53b402f Linux 3.12
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSdt9HAAoJEHm+PkMAQRiGnzEH/345Keg5dp+oKACnokBfzOtp
 V0p3g5EBsGtzEVnV+1B96trczDUtWdDFFr5GfGSj565NBQpFyc+iZC1mC99RDJCs
 WUquGFqlLMK2aV0SbKwCO4K1rJ5A0TRVj0ZRJOUJUY7jwNf5Qahny0WBVjO/8qAY
 UvJK1rktBClhKdH53YtpDHHgXBeZ2LOrzt1fQ/AMpujGbZauGvnLdNOli5r2kCFK
 jzoOgFLvX+PHU/5/d4/QyJPeQNPva5hjk5Ho9UuSJYhnFtPO3EkD4XZLcpcbNEJb
 LqBvbnZWm6CS435lfU1l93RqQa5xMO9ITk0oe4h69syTSHwWk9aJ+ZTc/4Up+t8=
 =57MC
 -----END PGP SIGNATURE-----

Merge tag 'v3.12' into core/locking to pick up mutex upates

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 06:39:45 +01:00
Kevin Hao
ab4ead02ec ftrace/x86: skip over the breakpoint for ftrace caller
In commit 8a4d0a687a "ftrace: Use breakpoint method to update ftrace
caller", we choose to use breakpoint method to update the ftrace
caller. But we also need to skip over the breakpoint in function
ftrace_int3_handler() for them. Otherwise weird things would happen.

Cc: stable@vger.kernel.org # 3.5+
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05 16:01:47 -05:00
David S. Miller
394efd19d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/netconsole.c
	net/bridge/br_private.h

Three mostly trivial conflicts.

The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.

In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".

Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 13:48:30 -05:00
Ingo Molnar
2a3ede8cb2 Merge branch 'perf/urgent' into perf/core to fix conflicts
Conflicts:
	tools/perf/bench/numa.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-04 07:49:35 +01:00
Linus Torvalds
9581b7d268 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two fixes:

   - Fix 'NMI handler took too long to run' false positives

     [ Genuine NMI overhead speedups will come for v3.13, this commit
       only fixes a measurement bug ]

   - Fix perf ring-buffer missed barrier causing (rare) ring-buffer data
     corruption on ppc64"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix NMI measurements
  perf: Fix perf ring buffer memory ordering
2013-11-01 12:54:51 -07:00
Ingo Molnar
fb10d5b7ef Merge branch 'linus' into sched/core
Resolve cherry-picking conflicts:

Conflicts:
	mm/huge_memory.c
	mm/memory.c
	mm/mprotect.c

See this upstream merge commit for more details:

  52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-01 08:24:41 +01:00
Lv Zheng
40bce100ca ACPICA: Cleanup asmlinkage for ACPICA APIs.
Add an asmlinkage wrapper around acpi_enter_sleep_state() to prevent
an empty stub from being called by assmebly code for ACPI_REDUCED_HARDWARE
set.

As arch/x86/kernel/acpi/wakeup_xx.S is only compiled when CONFIG_ACPI=y
and there are no users of ACPI_HARDWARE_REDUCED, currently this is in
fact not a real issue, but a cleanup to reduce source code differences
between Linux and ACPICA upstream.

[rjw: Changelog]
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-31 14:37:35 +01:00
David Rientjes
d68ce0177c x86, hyperv: Fix build error due to missing <asm/apic.h> include
9e7827b5ea ("x86, hyperv: Get the local APIC timer frequency from the
hypervisor") breaks the build with some configs because apic.h isn't
directly included:

arch/x86/kernel/cpu/mshyperv.c: In function 'ms_hyperv_init_platform':
arch/x86/kernel/cpu/mshyperv.c:90:3: error: 'lapic_timer_frequency' undeclared (first use in this function)
arch/x86/kernel/cpu/mshyperv.c:90:3: note: each undeclared identifier is reported only once for each function it appears in

Fix it by including asm/apic.h.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1310111604160.31170@chino.kir.corp.google.com
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-30 15:37:47 -07:00
Tim Gardner
d780a31271 KVM: Fix modprobe failure for kvm_intel/kvm_amd
The x86 specific kvm init creates a new conflicting
debugfs directory which causes modprobe issues
with kvm_intel and kvm_amd. For example,

sudo modprobe kvm_amd
modprobe: ERROR: could not insert 'kvm_amd': Bad address

The simplest fix is to just rename the directory. The following
KVM config options are set:

CONFIG_KVM_GUEST=y
CONFIG_KVM_DEBUG_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
CONFIG_KVM_DEVICE_ASSIGNMENT=y

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[Change debugfs directory name. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 12:10:42 +01:00
Peter Zijlstra
e8a923cc1f perf/x86: Fix NMI measurements
OK, so what I'm actually seeing on my WSM is that sched/clock.c is
'broken' for the purpose we're using it for.

What triggered it is that my WSM-EP is broken :-(

  [    0.001000] tsc: Fast TSC calibration using PIT
  [    0.002000] tsc: Detected 2533.715 MHz processor
  [    0.500180] TSC synchronization [CPU#0 -> CPU#6]:
  [    0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock.
  [    0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed

For some reason it consistently detects TSC skew, even though NHM+
should have a single clock domain for 'reasonable' systems.

This marks sched_clock_stable=0, which means that we do fancy stuff to
try and get a 'sane' clock. Part of this fancy stuff relies on the tick,
clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until
it either wakes up or gets kicked by another cpu.

While this is perfectly fine for the scheduler -- it only cares about
actually running stuff, and when we're running stuff we're obviously not
idle. This does somewhat break down for perf which can trigger events
just fine on an otherwise idle cpu.

So I've got NMIs get get 'measured' as taking ~1ms, which actually
don't last nearly that long:

          <idle>-0     [013] d.h.   886.311970: rcu_nmi_enter <-do_nmi
  ...
          <idle>-0     [013] d.h.   886.311997: perf_sample_event_took: HERE!!! : 1040990

So ftrace (which uses sched_clock(), not the fancy bits) only sees
~27us, but we measure ~1ms !!

Now since all this measurement stuff lives in x86 code, we can actually
fix it.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:20 +01:00
Ingo Molnar
aac898548d Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/builtin-record.c
	tools/perf/builtin-top.c
	tools/perf/util/hist.h
2013-10-29 11:23:32 +01:00
Matt Fleming
72548e836b x86/efi: Add EFI framebuffer earlyprintk support
It's incredibly difficult to diagnose early EFI boot issues without
special hardware because earlyprintk=vga doesn't work on EFI systems.

Add support for writing to the EFI framebuffer, via earlyprintk=efi,
which will actually give users a chance of providing debug output.

Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-10-28 18:09:58 +00:00
Rafael J. Wysocki
5171f4fa74 Merge branch 'acpi-assorted'
* acpi-assorted:
  ACPI: Add Toshiba NB100 to Vista _OSI blacklist
  ACPI / osl: remove an unneeded NULL check
  ACPI / platform: add ACPI ID for a Broadcom GPS chip
  ACPI: improve acpi_extract_package() utility
  ACPI / LPSS: fix UART Auto Flow Control
  ACPI / platform: Add ACPI IDs for Intel SST audio device
  x86 / ACPI: fix incorrect placement of __initdata tag
  ACPI / thermal: convert printk(LEVEL...) to pr_<lvl>
  ACPI / sysfs: make GPE sysfs attributes only accept correct values
  ACPI / EC: Convert all printk() calls to dynamic debug function
  ACPI / button: Using input_set_capability() to mark device's event capability
  ACPI / osl: implement acpi_os_sleep() with msleep()
2013-10-28 01:20:24 +01:00
Rafael J. Wysocki
5c2aae8355 Merge branch 'acpi-hotplug'
* acpi-hotplug:
  ACPI / memhotplug: Use defined marco METHOD_NAME__STA
  ACPI / hotplug: Use kobject_init_and_add() instead of _init() and _add()
  ACPI / hotplug: Don't set kobject parent pointer explicitly
  ACPI / hotplug: Set kobject name via kobject_add(), not kobject_set_name()
  hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()
  hotplug / x86: Disable ARCH_CPU_PROBE_RELEASE on x86
  hotplug / x86: Add hotplug lock to missing places
  hotplug / x86: Fix online state in cpu0 debug interface
2013-10-28 01:12:41 +01:00
Rafael J. Wysocki
3fbc4d6374 Merge branch 'acpi-processor'
* acpi-processor:
  ACPI / processor: fixed a brace coding style issue
  ACPI / processor: Remove outdated comments
  ACPI / processor: remove unnecessary if (!pr) check
  ACPI / processor: remove some dead code in acpi_processor_get_info()
  x86 / ACPI: simplify _acpi_map_lsapic()
  ACPI / processor: use apic_id and remove duplicated _MAT evaluation
  ACPI / processor: Introduce apic_id in struct processor to save parsed APIC id
2013-10-28 01:11:24 +01:00
Jan Beulich
ee5872befc x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC
Even though the omission was found only during code review
(originally in the Xen hypervisor, looking through ACPI v5 flags
and their meanings and uses), we shouldn't be creating a
corresponding platform device in that case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/5265029D02000078000FC4D2@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-26 13:36:00 +02:00
Jan Beulich
09dc68d958 x86/cpu: Track legacy CPU model data only on 32-bit kernels
struct cpu_dev's c_models is only ever set inside CONFIG_X86_32
conditionals (or code that's being built for 32-bit only), so
there's no use of reserving the (empty) space for the model
names in a 64-bit kernel.

Similarly, c_size_cache is only used in the #else of a
CONFIG_X86_64 conditional, so reserving space for (and in one
case even initializing) that field is pointless for 64-bit
kernels too.

While moving both fields to the end of the structure, I also
noticed that:

 - the c_models array size was one too small, potentially causing
   table_lookup_model() to return garbage on Intel CPUs (intel.c's
   instance was lacking the sentinel with family being zero), so the
   patch bumps that by one,

 - c_models' vendor sub-field was unused (and anyway redundant
   with the base structure's c_x86_vendor field), so the patch deletes it.

Also rename the legacy fields so that their legacy nature stands out
and comment their declarations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5265036802000078000FC4DB@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-26 13:34:39 +02:00
Rafael J. Wysocki
17e8d03c48 Merge back earlier 'acpi-assorted' material. 2013-10-25 23:39:25 +02:00
Lan Tianyu
6d9153bbce x86/reboot: Correct pr_info() log message in the set_bios/pci/kbd_reboot()
commit:

  c767a54ba0 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>

broke the log messages in the set_bios/pci/kbd_reboot() functions, by
putting the reboot method string and quirk entry's ident string in the
wrong order. This patch fixes it.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Cc: holt@sgi.com
Cc: davej@fedoraproject.org
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: awilliam@redhat.com
Link: http://lkml.kernel.org/r/1382598693-29334-1-git-send-email-tianyu.lan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-25 12:45:04 +02:00
Grant Likely
16b84e5a50 of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
Several architectures open code effectively the same code block for
finding and mapping PCI irqs. This patch consolidates it down to a
single function.

Signed-off-by: Grant Likely <grant.likely@linaro.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-24 11:50:36 +01:00
Grant Likely
e6d30ab1e7 of/irq: simplify args to irq_create_of_mapping
All the callers of irq_create_of_mapping() pass the contents of a struct
of_phandle_args structure to the function. Since all the callers already
have an of_phandle_args pointer, why not pass it directly to
irq_create_of_mapping()?

Signed-off-by: Grant Likely <grant.likely@linaro.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Tony Lindgren <tony@atomide.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-24 11:42:57 +01:00
Grant Likely
530210c781 of/irq: Replace of_irq with of_phandle_args
struct of_irq and struct of_phandle_args are exactly the same structure.
This patch makes the kernel use of_phandle_args everywhere. This in
itself isn't a big deal, but it makes some follow-on patches simpler.

Signed-off-by: Grant Likely <grant.likely@linaro.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Tony Lindgren <tony@atomide.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-24 11:42:51 +01:00
Grant Likely
0c02c8007e of/irq: Rename of_irq_map_* functions to of_irq_parse_*
The OF irq handling code has been overloading the term 'map' to refer to
both parsing the data in the device tree and mapping it to the internal
linux irq system. This is probably because the device tree does have the
concept of an 'interrupt-map' function for translating interrupt
references from one node to another, but 'map' is still confusing when
the primary purpose of some of the functions are to parse the DT data.

This patch renames all the of_irq_map_* functions to of_irq_parse_*
which makes it clear that there is a difference between the parsing
phase and the mapping phase. Kernel code can make use of just the
parsing or just the mapping support as needed by the subsystem.

The patch was generated mechanically with a handful of sed commands.

Signed-off-by: Grant Likely <grant.likely@linaro.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Tony Lindgren <tony@atomide.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-24 11:40:59 +01:00
David S. Miller
c3fa32b976 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/qmi_wwan.c
	include/net/dst.h

Trivial merge conflicts, both were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:49:34 -04:00
Chen, Gong
147de14772 ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
In latest UEFI spec(by now it is 2.4) memory error definition
for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
adds some new fields. These fields help people to locate
memory error to an actual DIMM location.

Original-author: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-10-23 10:10:20 -07:00
Chen, Gong
dd6dad4288 DMI: Parse memory device (type 17) in SMBIOS
This patch adds a new interface to decode memory device (type 17)
to help error reporting on DIMMs.

Original-author: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-10-23 10:10:12 -07:00
Hannes Frederic Sowa
a8fab07445 x86/jump_label: expect default_nop if static_key gets enabled on boot-up
net_get_random_once(intrduced in the next patch) uses static_keys in
a way that they get enabled on boot-up instead of replaced with an
ideal_nop. So check for default_nop on initial enabling.

Other architectures don't check for this.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: x86@kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:35 -04:00
Linus Torvalds
9219cec5f2 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two fixlets:

   - fix a (rare-config) build bug
   - fix a next-gen SGI/UV hw/firmware enumeration bug"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Update UV3 hub revision ID
  x86/microcode: Correct Kconfig dependencies
2013-10-18 12:25:11 -07:00
David Cohen
66ac501370 x86: intel-mid: Add section for sfi device table
When Intel mid uses SFI table to enumerate devices, it requires an extra
device table with further information about how to probe such devices.

This patch creates a section where the device table will stay if
CONFIG_X86_INTEL_MID is selected.

Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1382049336-21316-12-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-17 16:41:04 -07:00
Kuppuswamy Sathyanarayanan
712b6aa873 intel_mid: Renamed *mrst* to *intel_mid*
mrst is used as common name to represent all intel_mid type
soc's. But moorsetwon is just one of the intel_mid soc. So
renamed them to use intel_mid.

This patch mainly renames the variables and related
functions that uses *mrst* prefix with *intel_mid*.

To ensure that there are no functional changes, I have compared
the objdump of related files before and after rename and found
the only difference is symbol and name changes.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: http://lkml.kernel.org/r/1382049336-21316-6-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-17 16:40:47 -07:00
Kuppuswamy Sathyanarayanan
05454c26eb intel_mid: Renamed *mrst* to *intel_mid*
Following files contains code that is common to all intel mid
soc's. So renamed them as below.

mrst/mrst.c              -> intel-mid/intel-mid.c
mrst/vrtc.c              -> intel-mid/intel_mid_vrtc.c
mrst/early_printk_mrst.c -> intel-mid/intel_mid_vrtc.c
pci/mrst.c               -> pci/intel_mid_pci.c

Also, renamed the corresponding header files and made changes
to the driver files that included these header files.

To ensure that there are no functional changes, I have compared
the objdump of renamed files before and after rename and found
that the only difference is file name change.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: http://lkml.kernel.org/r/1382049336-21316-4-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-17 16:40:36 -07:00
Peter Zijlstra
9536c8d2da perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:

  [ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
  [ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs

Don Zickus analyzed the source of the overhead and reported:

 > While there are a few places that are causing latencies, for now I focused on
 > the longest one first.  It seems to be 'copy_user_from_nmi'
 >
 > intel_pmu_handle_irq ->
 >	intel_pmu_drain_pebs_nhm ->
 >		__intel_pmu_drain_pebs_nhm ->
 >			__intel_pmu_pebs_event ->
 >				intel_pmu_pebs_fixup_ip ->
 >					copy_from_user_nmi
 >
 > In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
 > all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
 > (there are some cases where only 10 iterations are needed to go that high
 > too, but in generall over 50 or so).  At this point copy_user_from_nmi
 > seems to account for over 90% of the nmi latency.

The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.

Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.

Don reported this test result:

 > Your patch made a huge difference in improvement.  The
 > copy_from_user_nmi() no longer hits the million of cycles.  I still
 > have a batch of 100,000-300,000 cycles.  My longest NMI paths used
 > to be dominated by copy_from_user_nmi, now it is not (I have to dig
 > up the new hot path).

Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-16 15:44:00 +02:00
Raghavendra K T
3dbef3e3bf KVM: Enable pvspinlock after jump_label_init() to avoid VM hang
We use jump label to enable pv-spinlock. With the changes in (442e0973e9
Merge branch 'x86/jumplabel'), the jump label behaviour has changed
that would result in eventual hang of the VM since we would end up in a
situation where slow path locks would halt the vcpus but we will not be
able to wakeup the vcpu by lock releaser using unlock kick.

Similar problem in Xen and more detailed description is available in
a945928ea2 (xen: Do not enable spinlocks before jump_label_init()
has executed)

This patch splits kvm_spinlock_init to separate jump label changes with
pvops patching and also make jump label enabling after jump_label_init().

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-15 14:15:54 +03:00
Russ Anderson
dd3c9c4b60 x86: Update UV3 hub revision ID
The UV3 hub revision ID is different than expected.  The first
revision was supposed to start at 1 but instead will start at 0.

Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: <stable@kernel.org> # v3.9, v3.10, v3.11
Link: http://lkml.kernel.org/r/20131014161733.GA6274@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 08:44:46 +02:00
Ingo Molnar
426ee9e3bb Linux 3.12-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSWyGgAAoJEHm+PkMAQRiGA8MH/35upHXImoRCsI5uC1qvHJtI
 QvQAhDFxoEXbFUKeaYgTfcM8q9FgnqnfjhLf8eYa4Q7tDZeqLXOE8bkI807mSZMl
 yECr3jcwlV+zyhV2MP/HdwTjzy25bwxLM3Zy43S7QROrYoMHZYznil/QPfyMATCJ
 XLPuXZC1FtuUen89n4BoDIuL8QaVrIR/zLqFklAQcdTcGpLHSOwFtH8gb2WaRLhv
 +4IikFRFgTNZiMR5tP0GPc6UH6TVTvRb4QKSqqa7J8OmfAIvOzAUdhqWSPOIwWwt
 Z/+JFxFDczAcNmpv4gE6jkgc2vR8CVeHsvh0j61RDSFObBWspwk337CSyUZxYSA=
 =w4VQ
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc5' into perf/core

Merge Linux v3.12-rc5, to pick up the latest fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 07:05:18 +02:00
Michael Opdenacker
aa5e5dc2a8 treewide: fix "distingush" typo
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-14 15:38:33 +02:00
Maxime Jayat
3f79410c7c treewide: Fix common typo in "identify"
Correct common misspelling of "identify" as "indentify" throughout
the kernel

Signed-off-by: Maxime Jayat <maxime@artisandeveloppeur.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-14 15:31:06 +02:00
Kees Cook
f32360ef66 x86, kaslr: Report kernel offset on panic
When the system panics, include the kernel offset in the report to assist
in debugging.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-6-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-13 03:12:24 -07:00
Kees Cook
5bfce5ef55 x86, kaslr: Provide randomness functions
Adds potential sources of randomness: RDRAND, RDTSC, or the i8254.

This moves the pre-alternatives inline rdrand function into the header so
both pieces of code can use it. Availability of RDRAND is then controlled
by CONFIG_ARCH_RANDOM, if someone wants to disable it even for kASLR.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-4-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-13 03:12:12 -07:00
Linus Torvalds
71ac3d1938 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "A build fix and a reboot quirk"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/reboot: Add reboot quirk for Dell Latitude E5410
  x86, build, pci: Fix PCI_MSI build on !SMP
2013-10-12 10:36:03 -07:00
Ingo Molnar
ec0ad3d01f Merge branch 'core/urgent' into sched/core
Merge in asm goto fix, to be able to apply the asm/rmwcc.h fix.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-11 07:39:37 +02:00
K. Y. Srinivasan
90ab9d5510 x86, hyperv: Correctly guard the local APIC calibration code
The code that gets the local APIC timer frequency from the hypervisor
rather depends on there being a local APIC.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1381444224-3303-1-git-send-email-kys@microsoft.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-10 15:21:38 -07:00
K. Y. Srinivasan
9e7827b5ea x86, hyperv: Get the local APIC timer frequency from the hypervisor
Hyper-V supports a mechanism for retrieving the local APIC frequency.
Use this and bypass the calibration code in the kernel . This would
allow us to boot the Linux kernel as a "modern VM" on Hyper-V where
many of the legacy devices (such as PIT) are not emulated.

I would like to thank Olaf Hering <olaf@aepfle.de>, Jan Beulich <JBeulich@suse.com> and
H. Peter Anvin <h.peter.anvin@intel.com> for their help in this effort.

In this version of the patch, I have addressed Jan's comments.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1380554932-9888-1-git-send-email-olaf@aepfle.de
Tested-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-10 11:44:12 -07:00
Rob Herring
25ff79443c of: implement pci_address_to_pio as weak function
Implement pci_address_to_pio as weak function to remove the dependency on
asm/prom.h. This is in preparation to make prom.h optional.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Grant Likely <grant.likely@linaro.org>
2013-10-09 20:04:06 -05:00
Rob Herring
ba904f0649 x86: add necessary includes for prom.h
Once prom.h is no longer implicitly included, we need to include setup.h
to get COMMAND_LINE_SIZE.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
2013-10-09 20:04:06 -05:00
Rob Herring
29eb45a9ab of: remove early_init_dt_setup_initrd_arch
All arches do essentially the same thing now for
early_init_dt_setup_initrd_arch, so it can now be removed.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
2013-10-09 11:39:01 -05:00
Rob Herring
21c561bda1 x86: use unflatten_and_copy_device_tree
Use the common unflatten_and_copy_device_tree to copy the built-in FDT
out of init section.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
2013-10-09 11:38:05 -05:00
Ingo Molnar
37bf06375c Linux 3.12-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSUc9zAAoJEHm+PkMAQRiG9DMH/AtpuAF6LlMRPjrCeuJQ1pyh
 T0IUO+CsLKO6qtM5IyweP8V6zaasNjIuW1+B6IwVIl8aOrM+M7CwRiKvpey26ldM
 I8G2ron7hqSOSQqSQs20jN2yGAqQGpYIbTmpdGLAjQ350NNNvEKthbP5SZR5PAmE
 UuIx5OGEkaOyZXvCZJXU9AZkCxbihlMSt2zFVxybq2pwnGezRUYgCigE81aeyE0I
 QLwzzMVdkCxtZEpkdJMpLILAz22jN4RoVDbXRa2XC7dA9I2PEEXI9CcLzqCsx2Ii
 8eYS+no2K5N2rrpER7JFUB2B/2X8FaVDE+aJBCkfbtwaYTV9UYLq3a/sKVpo1Cs=
 =xSFJ
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc4' into sched/core

Merge Linux v3.12-rc4 to fix a conflict and also to refresh the tree
before applying more scheduler patches.

Conflicts:
	arch/avr32/include/asm/Kbuild

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09 12:36:13 +02:00
Andre Richter
1224987384 x86, msr: Use file_inode(), not f_mapping->host
As discussed in [1], exchange f_mapping->host with file_inode().  This
is a bug, but happens to be non-manifest in this case.

[1] http://lkml.kernel.org/r/20131007190357.GA13318@ZenIV.linux.org.uk

Signed-off-by: Andre Richter <andre.o.richter@gmail.com>
Link: http://lkml.kernel.org/r/1381224142-3267-1-git-send-email-andre.o.richter@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-08 12:01:29 -07:00
Linus Torvalds
0e7a3ed04f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Various fixlets:

  On the kernel side:

   - fix a race
   - fix a bug in the handling of the perf ring-buffer data page

  On the tooling side:

   - fix the handling of certain corrupted perf.data files
   - fix a bug in 'perf probe'
   - fix a bug in 'perf record + perf sched'
   - fix a bug in 'make install'
   - fix a bug in libaudit feature-detection on certain distros"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf session: Fix infinite loop on invalid perf.data file
  perf tools: Fix installation of libexec components
  perf probe: Fix to find line information for probe list
  perf tools: Fix libaudit test
  perf stat: Set child_pid after perf_evlist__prepare_workload()
  perf tools: Add default handler for mmap2 events
  perf/x86: Clean up cap_user_time* setting
  perf: Fix perf_pmu_migrate_context
2013-10-08 09:23:12 -07:00
Ville Syrjälä
8412da7577 x86/reboot: Add reboot quirk for Dell Latitude E5410
Dell Latitude E5410 needs reboot=pci to actually reboot.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://lkml.kernel.org/r/1380888964-14517-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-06 11:49:07 +02:00
Andi Kleen
b7af41a1bc perf/x86: Suppress duplicated abort LBR records
Haswell always give an extra LBR record after every TSX abort.
Suppress the extra record.

This only works when the abort is visible in the LBR
If the original abort has already left the 16 LBR entries
the extra entry will will stay.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379688044-14173-7-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 10:06:16 +02:00
Andi Kleen
a405bad5ad perf/x86: Add Haswell specific transaction flag reporting
In the PEBS handler report the transaction flags using the new
generic transaction flags facility. Most of them come from
the "tsx_tuning" field in PEBSv2, but the abort code is derived
from the RAX register reported in the PEBS record.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379688044-14173-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 10:06:09 +02:00
Ingo Molnar
fafd883f67 Merge branch 'perf/urgent' into perf/core
Pick up the latest fixes before applying new patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 09:59:13 +02:00
Peter Zijlstra
d8b11a0cbd perf/x86: Clean up cap_user_time* setting
Currently the cap_user_time_zero capability has different tests than
cap_user_time; even though they expose the exact same data.

Switch from CONSTANT && NONSTOP to sched_clock_stable to also deal
with multi cabinet machines and drop the tsc_disabled() check.. non of
this will work sanely without tsc anyway.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-nmgn0j0muo1r4c94vlfh23xy@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 09:58:55 +02:00
David Herrmann
29d274b8d3 x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning
IORESOURCE_BUSY is used to mark temporary driver mem-resources
instead of global regions. This suppresses warnings if regions
overlap with a region marked as BUSY.

This was always the case for VESA/VGA/EFI framebuffer regions so
do the same for simplefb regions. The reason we do this is to
allow device handover to real GPU drivers like
i915/radeon/nouveau which get the same regions via PCI BARs.

Maybe at some point we will be able to unregister platform
devices properly during the handover. In this case the simplefb
region would get removed before the new region is created.
However, this is currently not the case and would require rather
huge changes in remove_conflicting_framebuffers(). Add the BUSY
marker now and try to eventually rewrite the handover for a next release.

Also see kernel/resource.c for more information:

  /*
   * if a resource is "BUSY", it's not a hardware resource
   * but a driver mapping of such a resource; we don't want
   * to warn for those; some drivers legitimately map only
   * partial hardware resources. (example: vesafb)
   */

This suppresses warnings like:

  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 199 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2e3/0x390()
  Info: mapping multiple BARs. Your kernel is fine.
  Call Trace:
    dump_stack+0x54/0x8d
    warn_slowpath_common+0x7d/0xa0
    warn_slowpath_fmt+0x4c/0x50
    iomem_map_sanity_check+0xac/0xe0
    __ioremap_caller+0x2e3/0x390
    ioremap_wc+0x32/0x40
    i915_driver_load+0x670/0xf50 [i915]
    ...

Reported-by: Tom Gundersen <teg@jklm.no>
Tested-by: Tom Gundersen <teg@jklm.no>
Tested-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1380724864-1757-1-git-send-email-dh.herrmann@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-03 07:51:11 +02:00
Dave Jones
e56e57f661 x86/reboot: Sort reboot DMI quirks by vendor
Grouping them by vendor should make it easier to spot duplicates.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20131001203655.GA10719@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-02 08:02:24 +02:00
Ingo Molnar
b8d490c3de Merge branch 'irq/core-v6' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into irq/core
Pull hardirq and softirq nesting updates from Frederic Weisbecker,
which fix nesting related stack overruns such as:

  http://lkml.kernel.org/r/1378330796.4321.50.camel%40pasglop

Beyond being a fix, this series also optimizes and reorganizes arch
hardirq/softirq stack processing to be faster and more robust.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-02 07:57:37 +02:00
Ingo Molnar
8a60d42d26 Linux 3.12-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSSKOHAAoJEHm+PkMAQRiGeREH/3EqHmJPBzmVoJwR9/ykDoLg
 u+TJTkuxZG220WhgXS7W/0ECyBX0U7yA0bY9PZbqgcdiLjY0veR18/pOhEq5RzHq
 ub8Q+AJdiORF/sq268q7gnNmy3rSCgnrAyHA/bzBtkbisYODwZPYvWQVUjgNZ2dW
 qtW/TE9rjANcUrk8WdOu9oWcwsq4cyG3cscbfHE/JLFy/8tB5GoD158gxKLZsLXk
 uTCeUHMmvFRT56fZwfyvNstA8ozxXcHBmuu6+Ttceky2zeGzp6dOrd+d2SU1Ps3O
 P91x4e/Af4RFEwDczGP6TpSBEf/J/JaqrM1drjhnQHho0hrNRZVUXhADFVADCXY=
 =dOjB
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc3' into irq/core

Merge Linux v3.12-rc3, to refresh the tree from a v3.11 base to a v3.12 base.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-02 07:55:54 +02:00
Tom Gundersen
e33a29a5ae x86/simplefb: Fix overflow causing bogus fall-back
On my MacBook Air lfb_size is 4M, which makes the bitshit
overflow (to 256GB - larger than 32 bits), meaning we fall
back to efifb unnecessarily.

Cast to u64 to avoid the overflow.

Signed-off-by: Tom Gundersen <teg@jklm.no>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Link: http://lkml.kernel.org/r/1380644320-1026-1-git-send-email-teg@jklm.no
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-02 07:50:40 +02:00
Frederic Weisbecker
7d65f4a655 irq: Consolidate do_softirq() arch overriden implementations
All arch overriden implementations of do_softirq() share the following
common code: disable irqs (to avoid races with the pending check),
check if there are softirqs pending, then execute __do_softirq() on
a specific stack.

Consolidate the common parts such that archs only worry about the
stack switch.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
2013-10-01 12:53:25 +02:00
Borislav Petkov
a17bce4d1d x86/boot: Further compress CPUs bootup message
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-01 10:52:30 +02:00
Bartlomiej Zolnierkiewicz
dd96dc32d5 x86 / ACPI: fix incorrect placement of __initdata tag
__initdata tag should not be placed between "struct" and "resource"
because it prevents the variable from being placed in the intended
.init.data section. Fix it.

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-30 20:30:18 +02:00
Toshi Kani
6dedcca610 hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()
cpu_hotplug_driver_lock() serializes CPU online/offline operations
when ARCH_CPU_PROBE_RELEASE is set.  This lock interface is no longer
necessary with the following reason:

 - lock_device_hotplug() now protects CPU online/offline operations,
   including the probe & release interfaces enabled by
   ARCH_CPU_PROBE_RELEASE.  The use of cpu_hotplug_driver_lock() is
   redundant.
 - cpu_hotplug_driver_lock() is only valid when ARCH_CPU_PROBE_RELEASE
   is defined, which is misleading and is only enabled on powerpc.

This patch removes the cpu_hotplug_driver_lock() interface.  As
a result, ARCH_CPU_PROBE_RELEASE only enables / disables the cpu
probe & release interface as intended.  There is no functional change
in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-30 19:55:51 +02:00
Linus Torvalds
669fc2f0c7 Merge branches 'sched-urgent-for-linus', 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler, timer and x86 fixes from Ingo Molnar:
 - A context tracking ARM build and functional fix
 - A handful of ARM clocksource/clockevent driver fixes
 - An AMD microcode patch level sysfs reporting fixlet

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm: Fix build error with context tracking calls

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast
  clocksource: of: Respect device tree node status
  clocksource: exynos_mct: Set IRQ affinity when the CPU goes online
  arm: clocksource: mvebu: Use the main timer as clock source from DT

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Fix patch level reporting for family 15h
2013-09-28 14:22:17 -07:00
Linus Torvalds
9b565a8051 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A couple of tooling fixlets and a PMU detection printout fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix PMU detection printout when no PMU is detected
  perf symbols: Demangle cloned functions
  perf machine: Fix path unpopulated in machine__create_modules()
  perf tools: Explicitly add libdl dependency
  perf probe: Fix probing symbols with optimization suffix
  perf trace: Add mmap2 handler
  perf kmem: Make it work again on non NUMA machines
2013-09-28 14:21:13 -07:00
Ingo Molnar
8a3da6c7d0 perf/x86: Fix PMU detection printout when no PMU is detected
Ran into this cryptic PMU bootup log recently:

[    0.124047] Performance Events:
[    0.125000] smpboot: ...

Turns out we print this if no PMU is detected. Fall back to
the right condition so that the following is printed:

[    0.122381] Performance Events: no PMU driver, software events only.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-u2fwaUffakjp0qkpRfqljgsn@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 15:48:48 +02:00
Borislav Petkov
646e29a178 x86: Improve the printout of the SMP bootup CPU table
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 10:10:26 +02:00
Borislav Petkov
2a929242ee lockdep, x86/alternatives: Drop ancient lockdep fixup message
It messes up the output of the nodes/cores bootup table and it
is obsolete anyway, see

  17abecfe65 x86: fix up alternatives with lockdep enabled

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143442.GE4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 10:09:41 +02:00
Suravee Suthikulpanit
accd1e823e x86/microcode/AMD: Fix patch level reporting for family 15h
On AMD family 14h, applying microcode patch on the a core (core0)
would also affect the other core (core1) in the same compute
unit. The driver would skip applying the patch on core1, but it
still need to update kernel structures to reflect the proper
patch level.

The current logic is not updating the struct
ucode_cpu_info.cpu_sig.rev of the skipped core. This causes the
/sys/devices/system/cpu/cpu1/microcode/version to report
incorrect patch level as shown below:

  $ grep . cpu?/microcode/version
  cpu0/microcode/version:0x600063d
  cpu1/microcode/version:0x6000626
  cpu2/microcode/version:0x600063d
  cpu3/microcode/version:0x6000626
  cpu4/microcode/version:0x600063d

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: <bp@alien8.de>
Cc: <jacob.w.shin@gmail.com>
Cc: <herrmann.der.user@googlemail.com>
Link: http://lkml.kernel.org/r/1285806432-1995-1-git-send-email-suravee.suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-27 09:29:27 +02:00
Masoud Sharbiani
b5eafc6f07 x86/reboot: Remove the duplicate C6100 entry in the reboot quirks list
Two entries for the same system type were added, with two different vendor
names: 'Dell' and 'Dell, Inc.'.

Since a prefix match is being used by the DMI parsing code, we can eliminate
the latter as redundant.

Reported-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masoud Sharbiani <msharbiani@twitter.com>
Cc: holt@sgi.com
Link: http://lkml.kernel.org/r/1380216643-4683-1-git-send-email-masoud.sharbiani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-26 20:52:37 +02:00
Ingo Molnar
474a83b76f Merge branch 'linus'
Merge in the relevant upstream merge point to queue up dependent patch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-26 20:49:53 +02:00
Linus Torvalds
654fdd0412 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "An EFI fix and two reboot-quirk fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround
  x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically
  x86, efi: Don't map Boot Services on i386
2013-09-25 13:29:18 -07:00
Linus Torvalds
bdc5663fa1 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Assorted standalone fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add model number for Avoton Silvermont
  perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
  perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group()
  perf: Update ABI comment
  tools lib lk: Uninclude linux/magic.h in debugfs.c
  perf tools: Fix old GCC build error in trace-event-parse.c:parse_proc_kallsyms()
  perf probe: Fix finder to find lines of given function
  perf session: Check for SIGINT in more loops
  perf tools: Fix compile with libelf without get_phdrnum
  perf tools: Fix buildid cache handling of kallsyms with kcore
  perf annotate: Fix objdump line parsing offset validation
  perf tools: Fill in new definitions for madvise()/mmap() flags
  perf tools: Sharpen the libaudit dependencies test
2013-09-25 13:28:08 -07:00
Li Fei
16c21ae5ca reboot: Allow specifying warm/cold reset for CF9 boot type
In current implementation for reboot type CF9 and CF9_COND,
warm and cold reset are not differentiated, and both are
performed by writing 0x06 to port 0xCF9.

This commit will differentiate warm and cold reset:

For warm reset, write 0x06 to port 0xCF9;
For cold reset, write 0x0E to port 0xCF9.

[ hpa: This meaning of "cold" and "warm" reset is different from other
  reboot types use, where "warm" means "bypass BIOS POST".  It is also
  not entirely clear that it actually solves any actual problem.  However,
  it would seem fairly harmless to offer this additional option.

  Also note that we do not mask bit 3 in the "warm reset" case.  This
  preserves the behavior on existing systems, including ones quirked
  to use CF9.  It seems reasonable that on any system where the
  warm/cold distinction actually matters that bit 3 would be read as
  zero. ]

From: Liu Chuansheng <chuansheng.liu@intel.com>
Signed-off-by: Li Fei <fei.li@intel.com>
Link: http://lkml.kernel.org/r/1377072837.24556.2.camel@fli24-HP-Compaq-8100-Elite-CMT-PC
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 19:33:22 +02:00
Peter Zijlstra
1a338ac32c sched, x86: Optimize the preempt_schedule() call
Remove the bloat of the C calling convention out of the
preempt_enable() sites by creating an ASM wrapper which allows us to
do an asm("call ___preempt_schedule") instead.

calling.h bits by Andi Kleen

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-tk7xdi1cvvxewixzke8t8le1@git.kernel.org
[ Fixed build error. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:23:07 +02:00
Peter Zijlstra
c2daa3bed5 sched, x86: Provide a per-cpu preempt_count implementation
Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.

We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.

NOTE: this save/restore is required even for !PREEMPT kernels as
cond_resched() also relies on preempt_count's PREEMPT_ACTIVE to ignore
task_struct::state.

Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-gzn5rfsf8trgjoqx8hyayy3q@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:57 +02:00
Peter Zijlstra
bdb4380658 sched: Extract the basic add/sub preempt_count modifiers
Rewrite the preempt_count macros in order to extract the 3 basic
preempt_count value modifiers:

  __preempt_count_add()
  __preempt_count_sub()

and the new:

  __preempt_count_dec_and_test()

And since we're at it anyway, replace the unconventional
$op_preempt_count names with the more conventional preempt_count_$op.

Since these basic operators are equivalent to the previous _notrace()
variants, do away with the _notrace() versions.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-ewbpdbupy9xpsjhg960zwbv8@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:54 +02:00
Peter Zijlstra
ea81174789 sched, idle: Fix the idle polling state logic
Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
regressed several workloads and caused excessive reschedule
interrupts.

The patch in question failed to notice that the x86 code had an
inverted sense of the polling state versus the new generic code (x86:
default polling, generic: default !polling).

Fix the two prominent x86 mwait based idle drivers and introduce a few
new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
usage).

Also switch the idle routines to using tif_need_resched() which is an
immediate TIF_NEED_RESCHED test as opposed to need_resched which will
end up being slightly different.

Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: lenb@kernel.org
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/n/tip-nc03imb0etuefmzybzj7sprf@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 13:53:10 +02:00
Toshi Kani
1cad5e9a39 hotplug / x86: Disable ARCH_CPU_PROBE_RELEASE on x86
Commit d7c53c9e enabled ARCH_CPU_PROBE_RELEASE on x86 in order to
serialize CPU online/offline operations.  Although it is the config
option to enable CPU hotplug test interfaces, probe & release, it is
also the option to enable cpu_hotplug_driver_lock() as well.  Therefore,
this option had to be enabled on x86 with dummy arch_cpu_probe() and
arch_cpu_release().

Since then, lock_device_hotplug() was introduced to serialize CPU
online/offline & hotplug operations.  Therefore, this config option
is no longer required for the serialization.  This patch disables
this config option on x86 and revert the changes made by commit
d7c53c9e.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:10 +02:00
Toshi Kani
574b851e99 hotplug / x86: Add hotplug lock to missing places
lock_device_hotplug[_sysfs]() serializes CPU & Memory online/offline
and hotplug operations.  However, this lock is not held in the debug
interfaces below that initiate CPU online/offline operations.

 - _debug_hotplug_cpu(), cpu0 hotplug test interface enabled by
   CONFIG_DEBUG_HOTPLUG_CPU0.
 - cpu_probe_store() and cpu_release_store(), cpu hotplug test interface
   enabled by CONFIG_ARCH_CPU_PROBE_RELEASE.

This patch changes the above interfaces to hold lock_device_hotplug().

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:09 +02:00
Toshi Kani
f6913f9902 hotplug / x86: Fix online state in cpu0 debug interface
_debug_hotplug_cpu() is a debug interface that puts cpu0 offline during
boot-up when CONFIG_DEBUG_HOTPLUG_CPU0 is set.  After cpu0 is put offline
in this interface, however, /sys/devices/system/cpu/cpu0/online still
shows 1 (online).

This patch fixes _debug_hotplug_cpu() to update dev->offline when CPU
online/offline operation succeeded.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:09 +02:00
Dave Jones
7a20c2fad6 x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround
This seems to have been copied from the Optiplex 990 entry
above, but somoene forgot to change the ident text.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20130925001344.GA13554@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 08:41:10 +02:00
Mike Travis
0d12ef0c90 x86/UV: Update UV support for external NMI signals
The current UV NMI handler has not been updated for the changes
in the system NMI handler and the perf operations.  The UV NMI
handler reads an MMR in the UV Hub to check to see if the NMI
event was caused by the external 'system NMI' that the operator
can initiate on the System Mgmt Controller.

The problem arises when the perf tools are running, causing
millions of perf events per second on very large CPU count
systems.  Previously this was okay because the perf NMI handler
ran at a higher priority on the NMI call chain and if the NMI
was a perf event, it would stop calling other NMI handlers
remaining on the NMI call chain.

Now the system NMI handler calls all the handlers on the NMI
call chain including the UV NMI handler.  This causes the UV NMI
handler to read the MMRs at the same millions per second rate.
This can lead to significant performance loss and possible
system failures.  It also can cause thousands of 'Dazed and
Confused' messages being sent to the system console.  This
effectively makes perf tools unusable on UV systems.

To avoid this excessive overhead when perf tools are running,
this code has been optimized to minimize reading of the MMRs as
much as possible, by moving to the NMI_UNKNOWN notifier chain.
This chain is called only when all the users on the standard
NMI_LOCAL call chain have been called and none of them have
claimed this NMI.

There is an exception where the NMI_LOCAL notifier chain is
used.  When the perf tools are in use, it's possible that the UV
NMI was captured by some other NMI handler and then either
ignored or mistakenly processed as a perf event.  We set a
per_cpu ('ping') flag for those CPUs that ignored the initial
NMI, and then send them an IPI NMI signal.  The NMI_LOCAL
handler on each cpu does not need to read the MMR, but instead
checks the in memory flag indicating it was pinged.  There are
two module variables, 'ping_count' indicating how many requested
NMI events occurred, and 'ping_misses' indicating how many stray
NMI events.  These most likely are perf events so it shows the
overhead of the perf NMI interrupts and how many MMR reads were avoided.

This patch also minimizes the reads of the MMRs by having the
first cpu entering the NMI handler on each node set a per HUB
in-memory atomic value.  (Having a per HUB value avoids sending
lock traffic over NumaLink.)  Both types of UV NMIs from the SMI
layer are supported.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.353547733@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:02 +02:00
Mike Travis
1e019421bc x86/UV: Move NMI support
This patch moves the UV NMI support from the x2apic file to a
new separate uv_nmi.c file in preparation for the next sequence
of patches. It prevents upcoming bloat of the x2apic file, and
has the added benefit of putting the upcoming /sys/module
parameters under the name 'uv_nmi' instead of 'x2apic_uv_x',
which was obscure.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.183295611@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:02 +02:00
Jiang Liu
7e1f85f96d x86 / ACPI: simplify _acpi_map_lsapic()
In acpi_register_lapic(), it will generates a new logical cpu
number and maps to the local APIC id, this logical cpu number
can be returned to simplify _acpi_map_lsapic() implementation.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:39:40 +02:00
Jiang Liu
d536bf3dc9 ACPI / processor: use apic_id and remove duplicated _MAT evaluation
Since APIC id is saved in processor struct, just use it and
remove the duplicated _MAT evaluation.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:39:39 +02:00
Masoud Sharbiani
4f0acd31c3 x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically
Dell PowerEdge C6100 machines fail to completely reboot about 20% of the time.

Signed-off-by: Masoud Sharbiani <msharbiani@twitter.com>
Signed-off-by: Vinson Lee <vlee@twitter.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1379717947-18042-1-git-send-email-vlee@freedesktop.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-23 10:26:08 +02:00
Yan, Zheng
cf3b425dd8 perf/x86/intel: Add model number for Avoton Silvermont
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1379837953-17755-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-23 10:22:00 +02:00
Peter Zijlstra
92519bbc8a perf/x86/intel: Fix build warning in intel_pmu_drain_pebs_nhm()
Fengguang Wu reported this build warning:

    arch/x86/kernel/cpu/perf_event_intel_ds.c: In function 'intel_pmu_drain_pebs_nhm':
    arch/x86/kernel/cpu/perf_event_intel_ds.c:964:2: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'int'

Because pointer arithmetics result type is bitness dependent there's no natural
type to use here, cast it to long.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-jbpauwxJqtf24luewcsdFith@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 13:15:27 +02:00
Peter Zijlstra
fa73158710 perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
Solve the problems around the broken definition of perf_event_mmap_page::
cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
fixed by:

  860f085b74 ("perf: Fix broken union in 'struct perf_event_mmap_page'")

The problem with the fix (merged in v3.12-rc1 and not yet released
officially), noticed by Vince Weaver is that the new behavior is
not detectable by new user-space, and that due to the reuse of the
field names it's easy to mis-compile a binary if old headers are used
on a new kernel or new headers are used on an old kernel.

To solve all that make this change explicit, detectable and self-contained,
by iterating the ABI the following way:

 - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
   confuse old user-space binaries. RDPMC will be marked as unavailable
   to old binaries but that's within the ABI, this is a capability bit.

 - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - Use bits 2, 3, 4 for the newly defined, correct functionality:

	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
	cap_user_time		: 1, /* The time_* fields are used */
	cap_user_time_zero	: 1, /* The time_zero field is used */

 - Rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

The 'size' field can then be used in the future to add new fields and it
will act as a natural ABI version indicator as well.

Also adjust tools/perf/ userspace for the new definitions, noticed by
Adrian Hunter.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Also-Fixed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 09:45:11 +02:00
Yan, Zheng
73c4427c6c perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group()
uncore_validate_group() can't call smp_processor_id() because it is
in preemptible context. Pass NUMA_NO_NODE to the allocator instead.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379400493-11505-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 06:54:36 +02:00
Peter Zijlstra
eb8417aa70 perf/x86/intel: Remove division from the intel_pmu_drain_pebs_nhm() hot path
Only do the division in case we have to print the result out in a warning.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-43nl31erfbajwpfj254f6zji@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 06:53:44 +02:00
Linus Torvalds
f42bcf1aa8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/lpss: Add pin control support to Intel low power subsystem
  perf/x86/intel: Mark MEM_LOAD_UOPS_MISS_RETIRED as precise on SNB
  x86: Remove now-unused save_rest()
  x86/smpboot: Fix announce_cpu() to printk() the last "OK" properly
2013-09-18 11:26:17 -05:00
Linus Torvalds
186844b292 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two small fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix UAPI export of PERF_EVENT_IOC_ID
  perf/x86/intel: Fix Silvermont offcore masks
2013-09-18 11:22:53 -05:00
Stephane Eranian
9d8e3f9693 perf/x86/intel: Mark MEM_LOAD_UOPS_MISS_RETIRED as precise on SNB
On Intel SNB (SNB, SNB-EP), the event MEM_LOAD_UOPS_MISS_RETIRED
supports PEBS. It was missing for the SNB PEBS event constraint
table thereby preventing any measurement with PEBS for it.

This patch adds the event to the PEBS table for SNB.

WARNING: it should be noted that this event like a few others
are subject to the erratum BT241 for Xeon E5 (SNB-EP). As such,
the event may undercount when used with PEBS unless the
workaround is implemented. But without this patch and just the
workaround, the kernel would not allow precise sampling on this
event. BT241 is documented in:

  http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdf

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20130913201646.GA23981@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-14 08:00:18 +02:00
Linus Torvalds
75acebf242 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Various fixes.

  The -g perf report lockup you reported is only partially addressed,
  patches that fix the excessive runtime are still being worked on"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix uncore PCI fixed counter handling
  uprobes: Fix utask->depth accounting in handle_trampoline()
  perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING
  perf: Fix up MMAP2 buffer space reservation
  perf tools: Add attr->mmap2 support
  perf kvm: Fix sample_type manipulation
  perf evlist: Fix id pos in perf_evlist__open()
  perf trace: Handle perf.data files with no tracepoints
  perf session: Separate progress bar update when processing events
  perf trace: Check if MAP_32BIT is defined
  perf hists: Fix formatting of long symbol names
  perf evlist: Fix parsing with no sample_id_all bit set
  perf tools: Add test for parsing with no sample_id_all bit
  perf trace: Check control+C more often
2013-09-12 10:44:54 -07:00
Ingo Molnar
7f2ee91f54 perf/x86/intel: Clean up EVENT_ATTR_STR() muck
Make the code a bit more readable by removing stray whitespaces et al.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-lzEnychz1ylqy8zjenxOmeht@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:17:50 +02:00
Peter Zijlstra
d2beea4a34 perf/x86/intel: Clean-up/reduce PEBS code
Get rid of some pointless duplication introduced by the Haswell code.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-8q6y4davda9aawwv5yxe7klp@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:37 +02:00
Peter Zijlstra
2b9e344df3 perf/x86/intel: Clean up checkpoint-interrupt bits
Clean up the weird CP interrupt exception code by keeping a CP mask.

Andi suggested this implementation but weirdly didn't actually
implement it himself, do so now because it removes the conditional in
the interrupt handler and avoids the assumption its only on cnt2.

Suggested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-dvb4q0rydkfp00kqat4p5bah@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:37 +02:00
Andi Kleen
4b2c4f1f1b perf/x86/intel: Add Haswell TSX event aliases
Add TSX event aliases, and export them from the kernel to perf.

These are used by perf stat -T and to allow
more user friendly access to events. The events are designed to
be fairly generic and may also apply to other architectures
implementing HTM.  They all cover common situations that
happens during tuning of transactional code.

For Haswell we have to separate the HLE and RTM events,
as they are separate in the PMU.

This adds the following events:

 tx-start	Count start transaction (used by perf stat -T)
 tx-commit	Count commit of transaction
 tx-abort	Count all aborts
 tx-conflict	Count aborts due to conflict with another CPU.
 tx-capacity	Count capacity aborts (transaction too large)

Then matching el-* events for HLE

 cycles-t	Transactional cycles (used by perf stat -T)
  * also exists on POWER8

 cycles-ct	Transactional cycles commited (used by perf stat -T)
  * according to Michael Ellerman POWER8 has a cycles-transactional-committed,
  * perf stat -T handles both cases

Note for useful abort profiling often precise has to be set,
as Haswell can only report the point inside the transaction
with precise=2.

For some classes of aborts, like conflicts, this is not needed,
as it makes more sense to look at the complete critical section.

This gives a clean set of generalized events to examine transaction
success and aborts. Haswell has additional events for TSX, but those
are more specialized for very specific situations.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:36 +02:00
Andi Kleen
748e86aa90 perf/x86: Report TSX transaction abort cost as weight
Use the existing weight reporting facility to report the transaction
abort cost, that is the number of cycles wasted in aborts.
Haswell reports this in the PEBS record.

This was in fact the original user for weight.

This is a very useful sort key to concentrate on the most
costly aborts and a good metric for TSX tuning.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:34 +02:00
Andi Kleen
2dbf0116aa perf/x86/intel: Avoid checkpointed counters causing excessive TSX aborts
With checkpointed counters there can be a situation where the counter
is overflowing, aborts the transaction, is set back to a non overflowing
checkpoint, causes interupt. The interrupt doesn't see the overflow
because it has been checkpointed.  This is then a spurious PMI, typically with
a ugly NMI message.  It can also lead to excessive aborts.

Avoid this problem by:

- Using the full counter width for counting counters (earlier patch)

- Forbid sampling for checkpointed counters. It's not too useful anyways,
  checkpointing is mainly for counting. The check is approximate
  (to still handle KVM), but should catch the majority of cases.

- On a PMI always set back checkpointed counters to zero.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:33 +02:00
Peter Zijlstra
06c939c1f4 perf/x86/intel: Fix Silvermont offcore masks
Fengguang Wu reported:

> sparse warnings: (new ones prefixed by >>)
>
> >> arch/x86/kernel/cpu/perf_event_intel.c:901:9: sparse: constant 0x768005ffff is so big it is long
> >> arch/x86/kernel/cpu/perf_event_intel.c:902:9: sparse: constant 0x768005ffff is so big it is long
>
> vim +901 arch/x86/kernel/cpu/perf_event_intel.c
>
>    895	 },
>    896	};
>    897
>    898	static struct extra_reg intel_slm_extra_regs[] __read_mostly =
>    899	{
>    900		/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
>  > 901		INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x768005ffff, RSP_0),
>  > 902		INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x768005ffff, RSP_1),
>    903		EVENT_EXTRA_END
>    904	};
>    905

Extend those constants to 64 bits.

Reported-by: fengguang.wu@intel.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130909112636.GQ31370@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:12:27 +02:00
Stephane Eranian
dbc33f7016 perf/x86: Fix uncore PCI fixed counter handling
There was a bug in the handling of SNB-EP/IVB-EP uncore PCI
fixed counters, e.g., IMC.

It would cause erratic values to be returned for the IMC
clockticks event. This was due to a bogus hwc->config value
which was then written to PCI config space.

The erratic values can be seen via:

  $ perf stat -a -C 0 -e uncore_imc_0/clockticks/ -I 1000 sleep 10

The fixed counter has most fields marked as reserved with
hw reset values of 0. Yet the kernel was defaulting to a
hwc->config = ~0 and that was causing the issues.

This patch sets the hwc->config values for fixed uncore event
to 0. Now, the values of IMC clockticks is correct.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: peterz@infradead.org
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20130909195350.GA17643@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 08:42:37 +02:00
Stephane Eranian
6113af14c8 perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING
The IvyBridge event CYCLE_ACTIVITY:CYCLES_LDM_PENDING can only
be measured on counters 0-3 when HT is off. When HT is on, you
only have counters 0-3.

If you program it on the eight counters for 1s on a 3GHz
IVB laptop running a noploop, you see:

           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING

Clearly the last 4 values are bogus.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: dhsharp@google.com
Link: http://lkml.kernel.org/r/20130911152222.GA28761@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 07:58:26 +02:00
Dave Hansen
6df46865ff mm: vmstats: track TLB flush stats on UP too
The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush
counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled
for SMP.

UP systems do not do remote TLB flushes, so compile those counters out on
UP.

arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly.  This is
probably an optimization since both the mtrr code and __flush_tlb() write
cr4.  It would probably be safe to make that a flush_tlb_all() (and then
get these statistics), but the mtrr code is ancient and I'm hesitant to
touch it other than to just stick in the counters.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:09 -07:00
Linus Torvalds
fa1586a7e4 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Daniel had some fixes queued up, that were delayed, the stolen memory
  ones and vga arbiter ones are quite useful, along with his usual bunch
  of stuff, nothing for HSW outputs yet.

  The one nouveau fix is for a regression I caused with the poweroff stuff"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (30 commits)
  drm/nouveau: fix oops on runtime suspend/resume
  drm/i915: Delay disabling of VGA memory until vgacon->fbcon handoff is done
  drm/i915: try not to lose backlight CBLV precision
  drm/i915: Confine page flips to BCS on Valleyview
  drm/i915: Skip stolen region initialisation if none is reserved
  drm/i915: fix gpu hang vs. flip stall deadlocks
  drm/i915: Hold an object reference whilst we shrink it
  drm/i915: fix i9xx_crtc_clock_get for multiplied pixels
  drm/i915: handle sdvo input pixel multiplier correctly again
  drm/i915: fix hpd work vs. flush_work in the pageflip code deadlock
  drm/i915: fix up the relocate_entry refactoring
  drm/i915: Fix pipe config warnings when dealing with LVDS fixed mode
  drm/i915: Don't call sg_free_table() if sg_alloc_table() fails
  i915: Update VGA arbiter support for newer devices
  vgaarb: Fix VGA decodes changes
  vgaarb: Don't disable resources that are not owned
  drm/i915: Pin pages whilst mapping the dma-buf
  drm/i915: enable trickle feed on Haswell
  x86: add early quirk for reserving Intel graphics stolen memory v5
  drm/i915: split PCI IDs out into i915_drm.h v4
  ...
2013-09-10 20:05:57 -07:00
Linus Torvalds
442e0973e9 Merge branch 'x86/jumplabel' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 jumplabel changes from Peter Anvin:
 "One more x86 tree for this merge window.  This tree improves the
  handling of jump labels, so that most of the time we don't have to do
  a massive initial patching run.

  Furthermore, we will error out of the jump label is not what is
  expected, eg if it has been corrupted or tampered with"

* 'x86/jumplabel' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/jump-label: Show where and what was wrong on errors
  x86/jump-label: Add safety checks to jump label conversions
  x86/jump-label: Do not bother updating nops if they are correct
  x86/jump-label: Use best default nops for inital jump label calls
2013-09-10 19:43:23 -07:00
Linus Torvalds
31f7c3a688 Device tree core updates for v3.12
Generally minor changes. A bunch of bug fixes, particularly for
 initialization and some refactoring. Most notable change if feeding the
 entire flattened tree into the random pool at boot. May not be
 significant, but shouldn't hurt either.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJSL12LAAoJEEFnBt12D9kB64gP/RBipnYbo3RPanHg+lE/J1V7
 KSVFNGKWJHxTg47VVC1YJGIG21jqxAilpdS2MQL5FP7iyd+IzvtHpQiJgp+2G+pq
 di06yrdyrYErxRgZgGQi8IpR538ZzOEVLCKJGdb09YelkRzPT5au7CC1MAsX3qco
 yba7PHk0/Nc4hZE4aGbgR1DlRmn86ob7mM0KFE/LORaSN2BueMgWcwKhQXYNGyoh
 assX4yNhAbUG6Bgw7paBLDGqHh8c5Ei5AppU8yPb+N094jgYHBJryUoDlzzUHD23
 qqiEqHhUKT0TpgHNs8KH0WZFugcmjKvYEbzdzadBxqfXnJN4fKSEcdfF3iz4T14j
 U6EZks89GoHwA523OghUZkKNOqlsUdWfdKz+8/grQqKisYwDcf3fCxEYk/4weDCQ
 b6fFlOv6+AI3btjXp6F511ZKxyT4ZZzkHjp/ZSrhBygyamNZfax0ma0j+ZS9AZql
 kPxQS0nOve6NKaP7vXxMmW5sGMnL19ER/Hm31wthGcWI43GVebUdklnzfGaEeSjs
 pmP8oiCNemceqVpiPKxcOxiguf/eyIjP1SFXbguASygUmQeTDbbJ8n1FYznCitue
 xJgWttKWsEf/aMR3eJtQ3aBmHR3rijAV4E28Wlq8XMkocwvpQm2zMocS2Z5BJ80S
 hi1kQVy8+RxNX96tOSp1
 =GSWl
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux

Pull device tree core updates from Grant Likely:
 "Generally minor changes.  A bunch of bug fixes, particularly for
  initialization and some refactoring.  Most notable change if feeding
  the entire flattened tree into the random pool at boot.  May not be
  significant, but shouldn't hurt either"

Tim Bird questions whether the boot time cost of the random feeding may
be noticeable.  And "add_device_randomness()" is definitely not some
speed deamon of a function.

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
  of/platform: add error reporting to of_amba_device_create()
  irq/of: Fix comment typo for irq_of_parse_and_map
  of: Feed entire flattened device tree into the random pool
  of/fdt: Clean up casting in unflattening path
  of/fdt: Remove duplicate memory clearing on FDT unflattening
  gpio: implement gpio-ranges binding document fix
  of: call __of_parse_phandle_with_args from of_parse_phandle
  of: introduce of_parse_phandle_with_fixed_args
  of: move of_parse_phandle()
  of: move documentation of of_parse_phandle_with_args
  of: Fix missing memory initialization on FDT unflattening
  of: consolidate definition of early_init_dt_alloc_memory_arch()
  of: Make of_get_phy_mode() return int i.s.o. const int
  include: dt-binding: input: create a DT header defining key codes.
  of/platform: Staticize of_platform_device_create_pdata()
  of: Specify initrd location using 64-bit
  dt: Typo fix
  OF: make of_property_for_each_{u32|string}() use parameters if OF is not enabled
2013-09-10 13:53:52 -07:00
Borislav Petkov
c0da0fa1d7 x86: Remove now-unused save_rest()
b3af11afe0 ("x86: get rid of pt_regs argument of iopl(2)")
dropped PTREGSCALL which was also the last user of save_rest.
Drop that now-unused function too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/1378546750-19727-1-git-send-email-bp@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-10 09:31:55 +02:00
Linus Torvalds
7eb69529cb Not much changes for the 3.12 merge window. The major tracing changes
are still in flux, and will have to wait for 3.13.
 
 The changes for 3.12 are mostly clean ups and minor fixes.
 
 H. Peter Anvin added a check to x86_32 static function tracing that
 helps a small segment of the kernel community.
 
 Oleg Nesterov had a few changes from 3.11, but were mostly clean ups
 and not worth pushing in the -rc time frame.
 
 Li Zefan had small clean up with annotating a raw_init with __init.
 
 I fixed a slight race in updating function callbacks, but the race
 is so small and the bug that happens when it occurs is so minor it's
 not even worth pushing to stable.
 
 The only real enhancement is from Alexander Z Lam that made the
 tracing_cpumask work for trace buffer instances, instead of them all
 sharing a global cpumask.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSLJm1AAoJEOdOSU1xswtMSu0H/0/Uuh0D5VhANZRcTATY4gUO
 n3WH6sm3atOxH+cbeYQcFXxOcvRcR2n90tvCMpiFlPiC0NiNR1yjro3VLS4zWb77
 twq7gABdJf+Tdq7sOBmSzmY5vRKQVHIXvAfC27mBez38nCWZz0BjJGEsPBwoly25
 ZaiCbKlusw/QKIEy40tuKUL/rXF6yEWnQrMujhBbyNm0w7sJVdfnd+HHmCvy15H2
 IQE1g83d/dAMBjFY2BYg77J+oV6qmJxql2itvDivQWXHqFb52Jw3ZTwHwWLZlPYU
 AZcHtYGs2lSUscQLF56LejB7zZyE8taUufExFEVexXxZS5u7nNPXsPrA2LOOK70=
 =JWO6
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Not much changes for the 3.12 merge window.  The major tracing changes
  are still in flux, and will have to wait for 3.13.

  The changes for 3.12 are mostly clean ups and minor fixes.

  H Peter Anvin added a check to x86_32 static function tracing that
  helps a small segment of the kernel community.

  Oleg Nesterov had a few changes from 3.11, but were mostly clean ups
  and not worth pushing in the -rc time frame.

  Li Zefan had small clean up with annotating a raw_init with __init.

  I fixed a slight race in updating function callbacks, but the race is
  so small and the bug that happens when it occurs is so minor it's not
  even worth pushing to stable.

  The only real enhancement is from Alexander Z Lam that made the
  tracing_cpumask work for trace buffer instances, instead of them all
  sharing a global cpumask"

* tag 'trace-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace/rcu: Do not trace debug_lockdep_rcu_enabled()
  x86-32, ftrace: Fix static ftrace when early microcode is enabled
  ftrace: Fix a slight race in modifying what function callback gets traced
  tracing: Make tracing_cpumask available for all instances
  tracing: Kill the !CONFIG_MODULES code in trace_events.c
  tracing: Don't pass file_operations array to event_create_dir()
  tracing: Kill trace_create_file_ops() and friends
  tracing/syscalls: Annotate raw_init function with __init
2013-09-09 14:42:15 -07:00
Linus Torvalds
b4b50fd78b ARM: SoC platform changes for 3.12
This branch contains mostly additions and changes to platform enablement
 and SoC-level drivers. Since there's sometimes a dependency on device-tree
 changes, there's also a fair amount of those in this branch.
 
 Pieces worth mentioning are:
 
 - Mbus driver for Marvell platforms, allowing kernel configuration
   and resource allocation of on-chip peripherals.
 - Enablement of the mbus infrastructure from Marvell PCI-e drivers.
 - Preparation of MSI support for Marvell platforms.
 - Addition of new PCI-e host controller driver for Tegra platforms
 - Some churn caused by sharing of macro names between i.MX 6Q and 6DL
   platforms in the device tree sources and header files.
 - Various suspend/PM updates for Tegra, including LP1 support.
 - Versatile Express support for MCPM, part of big little support.
 - Allwinner platform support for A20 and A31 SoCs (dual and quad Cortex-A7)
 - OMAP2+ support for DRA7, a new Cortex-A15-based SoC.
 
 The code that touches other architectures are patches moving
 MSI arch-specific functions over to weak symbols and removal of
 ARCH_SUPPORTS_MSI, acked by PCI maintainers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSKhYmAAoJEIwa5zzehBx322AP/1ONYs8o8f7/Gzq6lZvTN6T3
 0pBTApg6Jfioi3lwKvUAEIcsW82YKQ+UZkbW66GQH6+Ri4aZJKZHuz0+JPU67OJ4
 LtSLuzVWrymy2VOOUvAnS/SXkOZw/pHhU4cLNHn1dMndhUL1Uqp9/XwuiHEQyFsP
 uOkpcBtIu0EWElov0PKKZ5SWBg8JJs2vy5ydiViGelWHCrZvDDZkWzIsDcBQxJLQ
 juzT4+JE+KOu7vKmfw78o6iHoCS2TBRAN9YUCajRb8Wl+out1hrTahHnDWaZ5Mce
 EskcQNkJROqFbjD4k3ABN4XGTv2VDmrztIwFe0SEQ7Dz/9ypCrBGT69uI9xIqTXr
 GwVRIwAUFTpMupK0gy93z1ajV3N0CXV79out9+jQNUQybYE+czp8QOyhmuc1tZx0
 8fn9jlBQe9Vy6yrs39gEcE7nUwrayeyQ+6UvqqwsE2pWZabNAnCMSPX5+QIu+T/3
 tQ7+jYmfFeserp1sIDOHOnxfhtW9EI6U9d1h/DUCwrsuFdkL9ha4M/vh9Pwgye98
 tBdz0T4yE39AJQwwFWRkv1jcQKcGu6WqJanmvS4KRBksGwuLWxy+ewOnkz2ifS25
 ZYSyxAryZRBvQRqlOK11rXPfRcbGcY0MG9lkKX96rGcyWEizgE1DdjxXD8HoIleN
 R8heV6GX5OzlFLGX2tKK
 =fJ5x
 -----END PGP SIGNATURE-----

Merge tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC platform changes from Olof Johansson:
 "This branch contains mostly additions and changes to platform
  enablement and SoC-level drivers.  Since there's sometimes a
  dependency on device-tree changes, there's also a fair amount of
  those in this branch.

  Pieces worth mentioning are:

   - Mbus driver for Marvell platforms, allowing kernel configuration
     and resource allocation of on-chip peripherals.
   - Enablement of the mbus infrastructure from Marvell PCI-e drivers.
   - Preparation of MSI support for Marvell platforms.
   - Addition of new PCI-e host controller driver for Tegra platforms
   - Some churn caused by sharing of macro names between i.MX 6Q and 6DL
     platforms in the device tree sources and header files.
   - Various suspend/PM updates for Tegra, including LP1 support.
   - Versatile Express support for MCPM, part of big little support.
   - Allwinner platform support for A20 and A31 SoCs (dual and quad
     Cortex-A7)
   - OMAP2+ support for DRA7, a new Cortex-A15-based SoC.

  The code that touches other architectures are patches moving MSI
  arch-specific functions over to weak symbols and removal of
  ARCH_SUPPORTS_MSI, acked by PCI maintainers"

* tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (266 commits)
  tegra-cpuidle: provide stub when !CONFIG_CPU_IDLE
  PCI: tegra: replace devm_request_and_ioremap by devm_ioremap_resource
  ARM: tegra: Drop ARCH_SUPPORTS_MSI and sort list
  ARM: dts: vf610-twr: enable i2c0 device
  ARM: dts: i.MX51: Add one more I2C2 pinmux entry
  ARM: dts: i.MX51: Move pins configuration under "iomuxc" label
  ARM: dtsi: imx6qdl-sabresd: Add USB OTG vbus pin to pinctrl_hog
  ARM: dtsi: imx6qdl-sabresd: Add USB host 1 VBUS regulator
  ARM: dts: imx27-phytec-phycore-som: Enable AUDMUX
  ARM: dts: i.MX27: Disable AUDMUX in the template
  ARM: dts: wandboard: Add support for SDIO bcm4329
  ARM: i.MX5 clocks: Remove optional clock setup (CKIH1) from i.MX51 template
  ARM: dts: imx53-qsb: Make USBH1 functional
  ARM i.MX6Q: dts: Enable I2C1 with EEPROM and PMIC on Phytec phyFLEX-i.MX6 Ouad module
  ARM i.MX6Q: dts: Enable SPI NOR flash on Phytec phyFLEX-i.MX6 Ouad module
  ARM: dts: imx6qdl-sabresd: Add touchscreen support
  ARM: imx: add ocram clock for imx53
  ARM: dts: imx: ocram size is different between imx6q and imx6dl
  ARM: dts: imx27-phytec-phycore-som: Fix regulator settings
  ARM: dts: i.MX27: Remove clock name from CPU node
  ...
2013-09-06 13:30:06 -07:00
Linus Torvalds
050ba07cdc Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Fix for the annoying paravirt.o build warning under allmodconfig, and
  a MAINTAINERS file update"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, doc: Add an entry in MAINTAINERS for arch/x86/kernel/cpu/vmware.c
  x86, paravirt: Remove duplicate definition for DEF_NATIVE
2013-09-05 12:36:12 -07:00
H. Peter Anvin
af058ab04d x86-32, ftrace: Fix static ftrace when early microcode is enabled
Early microcode loading runs C code before paging is enabled on 32
bits.  Since ftrace puts a hook into every function, that hook needs
to be safe to execute in the pre-paging environment.  This is
currently true for dynamic ftrace but not for static ftrace.

Static ftrace is obsolescent and assumed to not be
performance-critical, so we can simply test that the stack pointer
falls within the valid range of kernel addresses.

Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-09-05 09:31:32 -04:00
Libin
52239484bf x86/smpboot: Fix announce_cpu() to printk() the last "OK" properly
When booting secondary CPUs, announce_cpu() is called to show which cpu has
been brought up. For example:

[    0.402751] smpboot: Booting Node   0, Processors  #1 #2 #3 #4 #5 OK
[    0.525667] smpboot: Booting Node   1, Processors  #6 #7 #8 #9 #10 #11 OK
[    0.755592] smpboot: Booting Node   0, Processors  #12 #13 #14 #15 #16 #17 OK
[    0.890495] smpboot: Booting Node   1, Processors  #18 #19 #20 #21 #22 #23

But the last "OK" is lost, because 'nr_cpu_ids-1' represents the maximum
possible cpu id. It should use the maximum present cpu id in case not all
CPUs booted up.

Signed-off-by: Libin <huawei.libin@huawei.com>
Cc: <guohanjun@huawei.com>
Cc: <wangyijing@huawei.com>
Cc: <fenghua.yu@intel.com>
Cc: <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1378378676-18276-1-git-send-email-huawei.libin@huawei.com
[ tweaked the changelog, removed unnecessary line break, tweaked the format to align the fields vertically. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-05 15:05:37 +02:00
Linus Torvalds
ae7a835cc5 Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Gleb Natapov:
 "The highlights of the release are nested EPT and pv-ticketlocks
  support (hypervisor part, guest part, which is most of the code, goes
  through tip tree).  Apart of that there are many fixes for all arches"

Fix up semantic conflicts as discussed in the pull request thread..

* 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (88 commits)
  ARM: KVM: Add newlines to panic strings
  ARM: KVM: Work around older compiler bug
  ARM: KVM: Simplify tracepoint text
  ARM: KVM: Fix kvm_set_pte assignment
  ARM: KVM: vgic: Bump VGIC_NR_IRQS to 256
  ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs
  ARM: KVM: vgic: fix GICD_ICFGRn access
  ARM: KVM: vgic: simplify vgic_get_target_reg
  KVM: MMU: remove unused parameter
  KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate()
  KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls
  KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX
  KVM: x86: update masterclock when kvmclock_offset is calculated (v2)
  KVM: PPC: Book3S: Fix compile error in XICS emulation
  KVM: PPC: Book3S PR: return appropriate error when allocation fails
  arch: powerpc: kvm: add signed type cast for comparation
  KVM: x86: add comments where MMIO does not return to the emulator
  KVM: vmx: count exits to userspace during invalid guest emulation
  KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmp
  kvm: optimize away THP checks in kvm_is_mmio_pfn()
  ...
2013-09-04 18:15:06 -07:00
Linus Torvalds
816434ec4a Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 spinlock changes from Ingo Molnar:
 "The biggest change here are paravirtualized ticket spinlocks (PV
  spinlocks), which bring a nice speedup on various benchmarks.

  The KVM host side will come to you via the KVM tree"

* 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared as static"
  kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  kvm guest: Add configuration support to enable debug information for KVM Guests
  kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi
  xen, pvticketlock: Allow interrupts to be enabled while blocking
  x86, ticketlock: Add slowpath logic
  jump_label: Split jumplabel ratelimit
  x86, pvticketlock: When paravirtualizing ticket locks, increment by 2
  x86, pvticketlock: Use callee-save for lock_spinning
  xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
  xen, pvticketlock: Xen implementation for PV ticket locks
  xen: Defer spinlock setup until boot CPU setup
  x86, ticketlock: Collapse a layer of functions
  x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
  x86, spinlock: Replace pv spinlocks with pv ticketlocks
2013-09-04 11:55:10 -07:00
Linus Torvalds
f357a82048 Merge branch 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SMAP fixes from Ingo Molnar:
 "Fixes for Intel SMAP support, to fix SIGSEGVs during bootup"

* 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Introduce [compat_]save_altstack_ex() to unbreak x86 SMAP
  x86, smap: Handle csum_partial_copy_*_user()
2013-09-04 11:08:32 -07:00
Linus Torvalds
b20c99eb66 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:
 "[ The reason for drivers/ updates is that Boris asked for the
    drivers/edac/ changes to go via x86/ras in this cycle ]

  Main changes:

   - AMD CPUs:
      . Add ECC event decoding support for new F15h models
      . Various erratum fixes
      . Fix single-channel on dual-channel-controllers bug.

   - Intel CPUs:
      . UC uncorrectable memory error parsing fix
      . Add support for CMC (Corrected Machine Check) 'FF' (Firmware
        First) flag in the APEI HEST

   - Various cleanups and fixes"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  amd64_edac: Fix incorrect wraparounds
  amd64_edac: Correct erratum 505 range
  cpc925_edac: Use proper array termination
  x86/mce, acpi/apei: Only disable banks listed in HEST if mce is configured
  amd64_edac: Get rid of boot_cpu_data accesses
  amd64_edac: Add ECC decoding support for newer F15h models
  x86, amd_nb: Clarify F15h, model 30h GART and L3 support
  pci_ids: Add PCI device ID functions 3 and 4 for newer F15h models.
  x38_edac: Make a local function static
  i3200_edac: Make a local function static
  x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors
  APEI/ERST: Fix error message formatting
  amd64_edac: Fix single-channel setups
  EDAC: Replace strict_strtol() with kstrtol()
  mce: acpi/apei: Soft-offline a page on firmware GHES notification
  mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
  mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
2013-09-04 11:07:04 -07:00
Linus Torvalds
bb8c470170 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform documentation fix from Ingo Molnar.

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/acpi: Correct out-of-date comment of __acpi_map_table()
2013-09-04 11:06:19 -07:00
Linus Torvalds
05eebfb26b Merge branch 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt changes from Ingo Molnar:
 "Hypervisor signature detection cleanup and fixes - the goal is to make
  KVM guests run better on MS/Hyperv and to generalize and factor out
  the code a bit"

* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Correctly detect hypervisor
  x86, kvm: Switch to use hypervisor_cpuid_base()
  xen: Switch to use hypervisor_cpuid_base()
  x86: Introduce hypervisor_cpuid_base()
2013-09-04 11:05:13 -07:00
H. Peter Anvin
f2a7b303d6 x86, paravirt: Remove duplicate definition for DEF_NATIVE
DEF_NATIVE() is defined in paravirt_types.h, remove duplicate
definition in paravirt.c

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andi Kleen <ak@linux.kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/CA%2B55aFxVv==DC0JdS87V%2BcPr-twN%2BTujYg5XmgHOjJOAkZ4xwQ@mail.gmail.com
2013-09-04 09:46:43 -07:00
Linus Torvalds
cb3e4330e6 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "Misc smaller fixes:

   - a parse_setup_data() boot crash fix

   - a memblock and an __early_ioremap cleanup

   - turn the always-on CONFIG_ARCH_MEMORY_PROBE=y into a configurable
     option and turn it off - it's an unrobust debug facility, it
     shouldn't be enabled by default"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: avoid remapping data in parse_setup_data()
  x86: Use memblock_set_current_limit() to set limit for memblock.
  mm: Remove unused variable idx0 in __early_ioremap()
  mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default
2013-09-04 09:39:26 -07:00
Linus Torvalds
228abe73ad Merge branch 'x86-fb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fb changes from Ingo Molnar:
 "This tree includes preparatory patches for SimpleDRM driver support,
  by David Herrmann.  They clean up x86 framebuffer support by creating
  simplefb devices wherever possible.  More background can be found at

     http://lwn.net/Articles/558104/"

* 'x86-fb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  fbdev: fbcon: select VT_HW_CONSOLE_BINDING
  fbdev: efifb: bind to efi-framebuffer
  fbdev: vesafb: bind to platform-framebuffer device
  fbdev: simplefb: add common x86 RGB formats
  x86: sysfb: move EFI quirks from efifb to sysfb
  x86: provide platform-devices for boot-framebuffers
  fbdev: simplefb: mark as fw and allocate apertures
  fbdev: simplefb: add init through platform_data
2013-09-04 09:12:17 -07:00
Linus Torvalds
1f9c52e16b Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu feature fixes from Ingo Molnar:
 "Two small cpufeature support updates"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix override new_cpu_data.x86 with 486
  x86, cpufeature: Use new CC_HAVE_ASM_GOTO
2013-09-04 09:11:16 -07:00
Linus Torvalds
2a475501b8 Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/asmlinkage changes from Ingo Molnar:
 "As a preparation for Andi Kleen's LTO patchset (link time
  optimizations using GCC's -flto which build time optimization has
  steadily increased in quality over the past few years and might
  eventually be usable for the kernel too) this tree includes a handful
  of preparatory patches that make function calling convention
  annotations consistent again:

   - Mark every function without arguments (or 64bit only) that is used
     by assembly code with asmlinkage()

   - Mark every function with parameters or variables that is used by
     assembly code as __visible.

  For the vanilla kernel this has documentation, consistency and
  debuggability advantages, for the time being"

* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asmlinkage: Fix warning in xen asmlinkage change
  x86, asmlinkage, vdso: Mark vdso variables __visible
  x86, asmlinkage, power: Make various symbols used by the suspend asm code visible
  x86, asmlinkage: Make dump_stack visible
  x86, asmlinkage: Make 64bit checksum functions visible
  x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops
  x86, asmlinkage, apm: Make APM data structure used from assembler visible
  x86, asmlinkage: Make syscall tables visible
  x86, asmlinkage: Make several variables used from assembler/linker script visible
  x86, asmlinkage: Make kprobes code visible and fix assembler code
  x86, asmlinkage: Make various syscalls asmlinkage
  x86, asmlinkage: Make 32bit/64bit __switch_to visible
  x86, asmlinkage: Make _*_start_kernel visible
  x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible
  x86, asmlinkage: Change dotraplinkage into __visible on 32bit
  x86: Fix sys_call_table type in asm/syscall.h
2013-09-04 08:42:44 -07:00
Linus Torvalds
6924a4672d Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/apic changes from Ingo Molnar:
 "Smaller fixes"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioapic: Check attr against the previous setting when programmed more than once
  x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock
  x86/acpi: Fix incorrect sanity check in acpi_register_lapic()
2013-09-04 08:39:05 -07:00
Linus Torvalds
0d99b70873 Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "As a first remark I'd like to point out that the obsolete '-f'
  (--force) option, which has not done anything for several releases,
  has been removed from 'perf record' and related utilities.  Everyone
  please update muscle memory accordingly! :-)

  Main changes on the perf kernel side:

   - Performance optimizations:
        . for trace events, by Steve Rostedt.
        . for time values, by Peter Zijlstra

   - New hardware support:
        . for Intel Silvermont (22nm Atom) CPUs, by Zheng Yan
        . for Intel SNB-EP uncore PMUs, by Zheng Yan

   - Enhanced hardware support:
        . for Intel uncore PMUs: add filter support for QPI boxes, by Zheng Yan

   - Core perf events code enhancements and fixes:
        . for full-nohz feature handling, by Frederic Weisbecker
        . for group events, by Jiri Olsa
        . for call chains, by Frederic Weisbecker
        . for event stream parsing, by Adrian Hunter

   - New ABI details:
        . Add attr->mmap2 attribute, by Stephane Eranian
        . Add PERF_EVENT_IOC_ID ioctl to return event ID, by Jiri Olsa
        . Export u64 time_zero on the mmap header page to allow TSC
          calculation, by Adrian Hunter
        . Add dummy software event, by Adrian Hunter.
        . Add a new PERF_SAMPLE_IDENTIFIER to make samples always
          parseable, by Adrian Hunter.
        . Make Power7 events available via sysfs, by Runzhen Wang.

   - Code cleanups and refactorings:
        . for nohz-full, by Frederic Weisbecker
        . for group events, by Jiri Olsa

   - Documentation updates:
        . for perf_event_type, by Peter Zijlstra

  Main changes on the perf tooling side (some of these tooling changes
  utilize the above kernel side changes):

   - Lots of 'perf trace' enhancements:

        . Make 'perf trace' command line arguments consistent with
          'perf record', by David Ahern.

        . Allow specifying syscalls a la strace, by Arnaldo Carvalho de Melo.

        . Add --verbose and -o/--output options, by Arnaldo Carvalho de Melo.

        . Support ! in -e expressions, to filter a list of syscalls,
          by Arnaldo Carvalho de Melo.

        . Arg formatting improvements to allow masking arguments in
          syscalls such as futex and open, where the some arguments are
          ignored and thus should not be printed depending on other args,
          by Arnaldo Carvalho de Melo.

        . Beautify futex open, openat, open_by_handle_at, lseek and futex
          syscalls, by Arnaldo Carvalho de Melo.

        . Add option to analyze events in a file versus live, so that
          one can do:

           [root@zoo ~]# perf record -a -e raw_syscalls:* sleep 1
           [ perf record: Woken up 0 times to write data ]
           [ perf record: Captured and wrote 25.150 MB perf.data (~1098836 samples) ]
           [root@zoo ~]# perf trace -i perf.data -e futex --duration 1
              17.799 ( 1.020 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, ua
             113.344 (95.429 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 4294967
             133.778 ( 1.042 ms): 18004 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 429496
           [root@zoo ~]#

          By David Ahern.

        . Honor target pid / tid options when analyzing a file, by David Ahern.

        . Introduce better formatting of syscall arguments, including so
          far beautifiers for mmap, madvise, syscall return values,
          by Arnaldo Carvalho de Melo.

        . Handle HUGEPAGE defines in the mmap beautifier, by David Ahern.

   - 'perf report/top' enhancements:

        . Do annotation using /proc/kcore and /proc/kallsyms when
          available, removing the forced need for a vmlinux file kernel
          assembly annotation. This also improves this use case because
          vmlinux has just the initial kernel image, not what is actually
          in use after various code patchings by things like alternatives.
          By Adrian Hunter.

        . Add --ignore-callees=<regex> option to collapse undesired parts
          of call graphs, by Greg Price.

        . Simplify symbol filtering by doing it at machine class level,
          by Adrian Hunter.

        . Add support for callchains in the gtk UI, by Namhyung Kim.

        . Add --objdump option to 'perf top', by Sukadev Bhattiprolu.

   - 'perf kvm' enhancements:

        . Add option to print only events that exceed a specified time
          duration, by David Ahern.

        . Improve stack trace printing, by David Ahern.

        . Update documentation of the live command, by David Ahern

        . Add perf kvm stat live mode that combines aspects of 'perf kvm
          stat' record and report, by David Ahern.

        . Add option to analyze specific VM in perf kvm stat report, by
          David Ahern.

        . Do not require /lib/modules/* on a guest, by Jason Wessel.

   - 'perf script' enhancements:

        . Fix symbol offset computation for some dsos, by David Ahern.

        . Fix named threads support, by David Ahern.

        . Don't install scripting files files when perl/python support
          is disabled, by Arnaldo Carvalho de Melo.

   - 'perf test' enhancements:

        . Add various improvements and fixes to the "vmlinux matches
          kallsyms" 'perf test' entry, related to the /proc/kcore
          annotation feature. By Adrian Hunter.

        . Add sample parsing test, by Adrian Hunter.

        . Add test for reading object code, by Adrian Hunter.

        . Add attr record group sampling test, by Jiri Olsa.

        . Misc testing infrastructure improvements and other details,
          by Jiri Olsa.

   - 'perf list' enhancements:

        . Skip unsupported hardware events, by Namhyung Kim.

        . List pmu events, by Andi Kleen.

   - 'perf diff' enhancements:

        . Add support for more than two files comparison, by Jiri Olsa.

   - 'perf sched' enhancements:

        . Various improvements, including removing reliance on some
          scheduler tracepoints that provide the same information as the
          PERF_RECORD_{FORK,EXIT} events. By David Ahern.

        . Remove odd build stall by moving a large struct initialization
          from a local variable to a global one, by Namhyung Kim.

   - 'perf stat' enhancements:

        . Add --initial-delay option to skip measuring for a defined
          startup phase, by Andi Kleen.

   - Generic perf tooling infrastructure/plumbing changes:

        . Tidy up sample parsing validation, by Adrian Hunter.

        . Fix up jobserver setup in libtraceevent Makefile.
          by Arnaldo Carvalho de Melo.

        . Debug improvements, by Adrian Hunter.

        . Fix correlation of samples coming after PERF_RECORD_EXIT event,
          by David Ahern.

        . Improve robustness of the topology parsing code,
          by Stephane Eranian.

        . Add group leader sampling, that allows just one event in a group
          to sample while the other events have just its values read,
          by Jiri Olsa.

        . Add support for a new modifier "D", which requests that the
          event, or group of events, be pinned to the PMU.
          By Michael Ellerman.

        . Support callchain sorting based on addresses, by Andi Kleen

        . Prep work for multi perf data file storage, by Jiri Olsa.

        . libtraceevent cleanups, by Namhyung Kim.

  And lots and lots of other fixes and code reorganizations that did not
  make it into the list, see the shortlog, diffstat and the Git log for
  details!"

[ Also merge a leftover from the 3.11 cycle ]

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Prevent race in unthrottling code

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (237 commits)
  perf trace: Tell arg formatters the arg index
  perf trace: Add beautifier for open's flags arg
  perf trace: Add beautifier for lseek's whence arg
  perf tools: Fix symbol offset computation for some dsos
  perf list: Skip unsupported events
  perf tests: Add 'keep tracking' test
  perf tools: Add support for PERF_COUNT_SW_DUMMY
  perf: Add a dummy software event to keep tracking
  perf trace: Add beautifier for futex 'operation' parm
  perf trace: Allow syscall arg formatters to mask args
  perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()
  perf: Export struct perf_branch_entry to userspace
  perf: Add attr->mmap2 attribute to an event
  perf/x86: Add Silvermont (22nm Atom) support
  perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X
  perf trace: Handle missing HUGEPAGE defines
  perf trace: Honor target pid / tid options when analyzing a file
  perf trace: Add option to analyze events in a file versus live
  perf evlist: Add tracepoint lookup by name
  perf tests: Add a sample parsing test
  ...
2013-09-04 08:25:35 -07:00
Yanchuan Nian
7752572f18 x86/irq: Correct comment about i8259 initialization
0x30-0x3f have been used for ISA interrupts on i386 as well
since 5 years ago, but old comments about i8259 initialization
were still referring to the old i386 usage of this port range.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Cc: yinghai@kernel.org
Cc: pavel@suse.cz
Link: http://lkml.kernel.org/r/1378257924-29446-1-git-send-email-ycnian@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-04 07:46:04 +02:00
Jesse Barnes
814c5f1f52 x86: add early quirk for reserving Intel graphics stolen memory v5
Systems with Intel graphics controllers set aside memory exclusively for
gfx driver use.  This memory is not always marked in the E820 as
reserved or as RAM, and so is subject to overlap from E820 manipulation
later in the boot process.  On some systems, MMIO space is allocated on
top, despite the efforts of the "RAM buffer" approach, which simply
rounds memory boundaries up to 64M to try to catch space that may decode
as RAM and so is not suitable for MMIO.

v2: use read_pci_config for 32 bit reads instead of adding a new one
    (Chris)
    add gen6 stolen size function (Chris)
v3: use a function pointer (Chris)
    drop gen2 bits (Daniel)
v4: call e820_sanitize_map after adding the region
v5: fixup comments (Peter)
    simplify loop (Chris)

Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66726
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66844
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:17:57 +02:00
Joe Perches
7bfb7e6bdd perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()
Use the convenience function instead of __GFP_ZERO.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f58599ae1a8d7b32d37e9cf283e95fba6452f7f6.1377809875.git.joe@perches.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02 08:42:49 +02:00
Yan, Zheng
1fa64180fb perf/x86: Add Silvermont (22nm Atom) support
Compared to old atom, Silvermont has offcore and has more events
that support PEBS.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374138144-17278-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02 08:42:47 +02:00
Yan, Zheng
53ad044720 perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X
Silvermont (22nm Atom) has two offcore response configuration MSRs,
unlike other Intel CPU, its event code for MSR_OFFCORE_RSP_1 is 0x02b7.

To avoid complicating intel_fixup_er(), use INTEL_UEVENT_EXTRA_REG to
define MSR_OFFCORE_RSP_X. So intel_fixup_er() can find the event code
for OFFCORE_RSP_N by x86_pmu.extra_regs[N].event.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374138144-17278-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02 08:42:47 +02:00
Al Viro
bd1c149aa9 Introduce [compat_]save_altstack_ex() to unbreak x86 SMAP
For performance reasons, when SMAP is in use, SMAP is left open for an
entire put_user_try { ... } put_user_catch(); block, however, calling
__put_user() in the middle of that block will close SMAP as the
STAC..CLAC constructs intentionally do not nest.

Furthermore, using __put_user() rather than put_user_ex() here is bad
for performance.

Thus, introduce new [compat_]save_altstack_ex() helpers that replace
__[compat_]save_altstack() for x86, being currently the only
architecture which supports put_user_try { ... } put_user_catch().

Reported-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.8+
Link: http://lkml.kernel.org/n/tip-es5p6y64if71k8p5u08agv9n@git.kernel.org
2013-09-01 14:16:33 -07:00
Ingo Molnar
aee2bce3cf Merge branch 'linus' into perf/core
Pick up the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-29 12:02:08 +02:00
Grant Likely
8be137f266 Linux 3.11-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSGqS5AAoJEHm+PkMAQRiGFxEH/3VrqF6WAkcviNiW/0DCdO8k
 v6Wi7Sp5LxVkwzmOCHCV1tTHwLRlH3cB9YmJlGQ0kHCREaAuEQAB0xJXIW7dnyYj
 Qq7KoRZEMe3wizmjEsj8qsrhfMLzHjBw67hBz2znwW/4P7YdgzwD7KRiEat+yRC9
 ON3nNL2zIqpfk92RXvVrSVl4KMEM+WNbOfiffgBiEP24Ja1MJMFH1d4i6hNOaB0x
 9Pb3Lw8let92x+8Ao5jnjKdKMgVsoZWbN/TgQR8zZOHM38AGGiDgk18vMz+L+hpS
 jqfjckxj1m30jGq0qZ9ZbMZx3IGif4KccVr30MqNHJpwi6Q24qXvT3YfA3HkstM=
 =nAab
 -----END PGP SIGNATURE-----

Merge tag 'v3.11-rc7' into devicetree/next

Linux 3.11-rc7
2013-08-28 20:18:13 +01:00
Rafael J. Wysocki
4b319f290d Merge branch 'acpi-sleep'
* acpi-sleep:
  x86 / tboot / ACPI: Fail extended mode reduced hardware sleep
  xen / ACPI: notify xen when reduced hardware sleep is available
  ACPI / sleep: Introduce acpi_os_prepare_extended_sleep() for extended sleep path
2013-08-27 01:28:38 +02:00
Liu Ping Fan
25aa295797 x86/ioapic: Check attr against the previous setting when programmed more than once
When programming ioapic pinX more than once, current code
does not check whether the later attr (trigger & polarity) is the
same as the former or not.

This causes broken semantics which can be observed in a qemu q35
machine, where ioapic's ioredtbl[x] can never be set as low-active,
even if the hpet driver registered it.

And hpet driver may share a high-level active IRQ line with other
devices. So in qemu, when hpet-dev asserts low-level as kernel
expects, the kernel has no response.

With this patch, we can observe an ioredtbl[x] set as low-active
for hpet.

Fix it by reporting -EBUSY to the caller, when attr is different.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1377248327-19633-1-git-send-email-pingfank@linux.vnet.ibm.com
[ Made small readability edits to both the changelog and the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-26 12:58:00 +02:00
Radu Caragea
41aacc1eea x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member
This is the updated version of df54d6fa54 ("x86 get_unmapped_area():
use proper mmap base for bottom-up direction") that only randomizes the
mmap base address once.

Signed-off-by: Radu Caragea <sinaelgl@gmail.com>
Reported-and-tested-by: Jeff Shorey <shoreyjeff@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-22 10:19:35 -07:00
Linus Torvalds
5ea80f76a5 Revert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction"
This reverts commit df54d6fa54.

The commit isn't necessarily wrong, but because it recalculates the
random mmap_base every time, it seems to confuse user memory allocators
that expect contiguous mmap allocations even when the mmap address isn't
specified.

In particular, the MATLAB Java runtime seems to be unhappy. See

  https://bugzilla.kernel.org/show_bug.cgi?id=60774

So we'll want to apply the random offset only once, and Radu has a patch
for that.  Revert this older commit in order to apply the other one.

Reported-by: Jeff Shorey <shoreyjeff@gmail.com>
Cc: Radu Caragea <sinaelgl@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-22 10:18:44 -07:00
Kevin Hilman
bfa664f21b ARM: tegra: core SoC enhancements for 3.12
This branch includes a number of enhancements to core SoC support for
 Tegra devices. The major new features are:
 
 * Adds a new CPU-power-gated cpuidle state for Tegra114.
 * Adds initial system suspend support for Tegra114, initially supporting
   just CPU-power-gating during suspend.
 * Adds "LP1" suspend mode support for all of Tegra20/30/114. This mode
   both gates CPU power, and places the DRAM into self-refresh mode.
 * A new DT-driven PCIe driver to Tegra20/30. The driver is also moved
   from arch/arm/mach-tegra/ to drivers/pci/host/.
 
 The PCIe driver work depends on the following tag from Thomas Petazzoni:
 git://git.infradead.org/linux-mvebu.git mis-3.12.2
 ... which is merged into the middle of this pull request.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSDlwwAAoJEMzrak5tbycxR68QAJZ/Izc9Izj0JH8hmCEvMNfi
 ub1DQfWAy3oXk0ttkk+BMvuyD8JTvBr8LSK8GqjZs//rFGlW81A4NHTvCwoKZjKe
 hgrRgI2B1wj3Um1sp8le9D0klKrTcfmpXrOxH8ALgz0BIpMge8AGZHkV0SrfQa1z
 bKiISFVAw12WJCVrQ2nbzpZGU51lbyJ/+RghttM1a8LuS2P03CZgt2kqiytk3UVK
 uiGEy3sCkjXLFO3EsUvM6ha623S6BumCAYjNfgDowTVKaoEe1r2TD4bFeU6lGcXJ
 mlVTv0Kywazf4Q2gKzkbDz8UQMArW4hok2iILHzz+sf/Rn0hie5XVqhFlbBlcae8
 vyWsHmqvmE9BJAK2G2RLs9cJCTzEpEyAjUWfE3sIIa3ztSguT5+PHndDLR/d76aS
 j8L3FYReICZ1NuNw1JSQPFs9g2EWJbNRiy+8o9O2elsJMpLDBj/FcV6TVpudbBTI
 z7hvN+XSVYUaCVD4e8ma9YoC3VGseiAZvd+Y8hPd2MFBECVPNpy2bOacieU6Bgxh
 zjSBXZ/URxN3rTkv9+F3BLWAOfVmJYN0rKV9YfM/rqpWjc9iQx30m1fRZDnXWhvd
 ps8eFIYsKqc6v9AAugl/RexFy4Laav9eREjb0k2LA8ClLhK/qLLuiisVmKWS/grh
 lX9tzPEG2nZcjxSYaEjz
 =ve9i
 -----END PGP SIGNATURE-----

Merge tag 'tegra-for-3.12-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra into next/soc

From: Stephen Warren:
ARM: tegra: core SoC enhancements for 3.12

This branch includes a number of enhancements to core SoC support for
Tegra devices. The major new features are:

* Adds a new CPU-power-gated cpuidle state for Tegra114.
* Adds initial system suspend support for Tegra114, initially supporting
  just CPU-power-gating during suspend.
* Adds "LP1" suspend mode support for all of Tegra20/30/114. This mode
  both gates CPU power, and places the DRAM into self-refresh mode.
* A new DT-driven PCIe driver to Tegra20/30. The driver is also moved
  from arch/arm/mach-tegra/ to drivers/pci/host/.

The PCIe driver work depends on the following tag from Thomas Petazzoni:
git://git.infradead.org/linux-mvebu.git mis-3.12.2
... which is merged into the middle of this pull request.

* tag 'tegra-for-3.12-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra: (33 commits)
  ARM: tegra: disable LP2 cpuidle state if PCIe is enabled
  MAINTAINERS: Add myself as Tegra PCIe maintainer
  PCI: tegra: set up PADS_REFCLK_CFG1
  PCI: tegra: Add Tegra 30 PCIe support
  PCI: tegra: Move PCIe driver to drivers/pci/host
  PCI: msi: add default MSI operations for !HAVE_GENERIC_HARDIRQS platforms
  ARM: tegra: add LP1 suspend support for Tegra114
  ARM: tegra: add LP1 suspend support for Tegra20
  ARM: tegra: add LP1 suspend support for Tegra30
  ARM: tegra: add common LP1 suspend support
  clk: tegra114: add LP1 suspend/resume support
  ARM: tegra: config the polarity of the request of sys clock
  ARM: tegra: add common resume handling code for LP1 resuming
  ARM: pci: add ->add_bus() and ->remove_bus() hooks to hw_pci
  of: pci: add registry of MSI chips
  PCI: Introduce new MSI chip infrastructure
  PCI: remove ARCH_SUPPORTS_MSI kconfig option
  PCI: use weak functions for MSI arch-specific functions
  ARM: tegra: unify Tegra's Kconfig a bit more
  ARM: tegra: remove the limitation that Tegra114 can't support suspend
  ...

Signed-off-by: Kevin Hilman <khilman@linaro.org>
2013-08-21 10:17:18 -07:00
Yoshihiro YUNOMAE
17405453f4 x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock
Prevent crash_kexec() from deadlocking on ioapic_lock. When
crash_kexec() is executed on a CPU, the CPU will take ioapic_lock
in disable_IO_APIC(). So if the cpu gets an NMI while locking
ioapic_lock, a deadlock will happen.

In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC().

You can reproduce this deadlock the following way:

1. Add mdelay(1000) after raw_spin_lock_irqsave() in
   native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c

   Although the deadlock can occur without this modification, it will increase
   the potential of the deadlock problem.

2. Build and install the kernel

3. Set up the OS which will run panic() and kexec when NMI is injected
    # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf
    # vim /etc/default/grub
      add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line
    # grub2-mkconfig

4. Reboot the OS

5. Run following command for each vcpu on the guest
    # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done;
   By running this command, cpus will get ioapic_lock for setting affinity.

6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you
   use VM). After injecting NMI, panic() is called in an nmi-handler context.
   Then, kexec will normally run in panic(), but the operation will be stopped
   by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()->
   native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()->
   clear_IO_APIC_pin()->ioapic_read_entry().

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevel
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-20 09:26:33 +02:00
Linus Torvalds
7067552dfb Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two AMD microcode loader fixes and an OLPC firmware support fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Fix early microcode loading
  x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86
  x86: Don't clear olpc_ofw_header when sentinel is detected
2013-08-19 09:18:29 -07:00
Raghavendra K T
36bd621337 x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared as static"
It was not declared as static since it was thought to be used by
pv-flushtlb earlier.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: <gleb@redhat.com>
Cc: <pbonzini@redhat.com>
Cc: Jiri Kosina <trivial@kernel.org>
Link: http://lkml.kernel.org/r/1376645921-8056-1-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-19 11:49:50 +02:00
Yan, Zheng
77b339bce3 perf/x86/intel/uncore: Enable EV_SEL_EXT bit for PCU
This patch adds support for the SNB-EP PCU uncore PMU extra_sel_bit
(bit 21) which is missing from the documentation in Table-2.75 of
Intel Xeon Processor E5-2600 Product Family Uncore Performance
Monitoring Guide. It is referred to later in Table-2.81. Without
this selection bit explicitly enabled by the kernel, some events
such as COREx_TRANSITION_CYCLES do not count correctly.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1376375382-21350-4-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-16 17:55:50 +02:00
Yan, Zheng
fd1ec259ba perf/x86/intel/uncore: Add filter support for QPI boxes
The QPI uncore boxes have two pairs of MATCH/MASK registers that
user to filter packet traffic serviced by QPI link layer. These
registers are in auxiliary PCI devices.

This patch adds the auxiliary PCI devices to snbep_uncore_pci_ids
and adds field definitions for the MATCH/MASK registers.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1375856245-10717-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-16 17:55:49 +02:00
Yan, Zheng
899396cf7b perf/x86/intel/uncore: Add auxiliary pci device support
The QPI uncore boxes have two pairs of MATCH/MASK registers that
user to filter packet traffic serviced by QPI link layer. These
registers are in auxiliary PCI devices.

This patch changes the meaning of (struct pci_device_id)->driver_data.
The first 8 bits are device index of the same uncore type, the second
8 bytes are uncore type index. Auxiliary PCI device's type is defined
as UNCORE_EXTRA_PCI_DEV(0xff)

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1375856245-10717-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-16 17:55:48 +02:00
Ingo Molnar
c9572f010d Linux 3.11-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSCDSjAAoJEHm+PkMAQRiGDXMIAI7Loae0Oqb1eoeJkvjyZsBS
 OJDeeEcn+k58VbxVHyRdc7hGo4yI4tUZm172SpnOaM8sZ/ehPU7zBrwJK2lzX334
 /jAM3uvVPfxA2nu0I4paNpkED/NQ8NRRsYE1iTE8dzHXOH6dA3mgp5qfco50rQvx
 rvseXpME4KIAJEq4jnyFZF5+nuHiPueM9JftPmSSmJJ3/KY9kY1LESovyWd7ttg1
 jYSVPFal9J0E+tl2UQY5g9H16GqhhjYn+39Iei6Q5P4bL4ZubQgTRQTN9nyDc06Z
 ezQtGoqZ8kEz/2SyRlkda6PzjSEhgXlc8mCL5J7AW+dMhTHHx2IrosjiCA80kG8=
 =c0rK
 -----END PGP SIGNATURE-----

Merge tag 'v3.11-rc5' into perf/core

Merge Linux 3.11-rc5, to sync up with the latest upstream fixes since -rc1.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-15 10:00:09 +02:00
Linus Torvalds
f1d6e17f54 Merge branch 'akpm' (patches from Andrew Morton)
Merge a bunch of fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  fs/proc/task_mmu.c: fix buffer overflow in add_page_map()
  arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig"
  ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id()
  x86 get_unmapped_area(): use proper mmap base for bottom-up direction
  ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page
  ocfs2: Revert 40bd62e to avoid regression in extended allocation
  drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit
  hugetlb: fix lockdep splat caused by pmd sharing
  aoe: adjust ref of head for compound page tails
  microblaze: fix clone syscall
  mm: save soft-dirty bits on file pages
  mm: save soft-dirty bits on swapped pages
  memcg: don't initialize kmem-cache destroying work for root caches
2013-08-14 10:04:43 -07:00
Srivatsa Vaddagiri
92b75202e5 kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor
During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130810193849.GA25260@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes,
addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-14 13:12:35 +02:00
Ingo Molnar
397f09977e AMD F15h, model 0x30 and later enablement stuff, more specifically EDAC
support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSCT7xAAoJEBLB8Bhh3lVKdpwP/1SANXLCyn9rj+aJdG1PzjQt
 b0CW8x03HIJSplQz1tKXKYHzT04y6WO44aCiODSPSC2hqKz5/1YCA+5z/qbeRwKV
 H61z7ubT9nQM7eIZyBl7NduQzQFKo9Hywqlli2/2i5e/HXWPjYGuTkNSmCQKIsj9
 iYzquv302Ac1h2rOJdoYcn6oC2aikS/ELL77KLacWSOfincr2nHKYCAG7tL6b9L7
 vAbWATeE0pQhWkT5KhIYSmbe5ep91tvAeLFGZvBlW6wlJZhcqM/N5OPhq6klmTZJ
 9FVSqaUzaAk1xDGBzpYxm303l4qCUukhtbgbMCCP4oKHmThnXP3tjpV6WRTstzXP
 JN6VCduW+abhiY5Fe1Nn7kODzgAv7TKkCKfd+TH8kKa7ILmzCML07hA3QVol6nQ1
 8fLskO7syjVcWFoIZDGx/LLfbdIAraBEvQlFFeeWVYe+oNhle+eN0YSujLzGlR/K
 xziJvcsU3cfLXeTzzhZYNB6XpQ5KHFn1DZ8/vxbyPJhy6KV6ykTRvnxcJUphyU8o
 //CalS9fuVabGFxi9qLi/9gr1g+/zFsIwmdID/D/oAQa6GaeX/rafaOTnynMdua6
 33/7YTSFMUvgemRs8dM2bc3waZ0qgz4Xi98on0udIIGCX76sM+L9sejkpQUrIiGi
 1NAhXZnHzsvQ9F/OszQ8
 =1VnW
 -----END PGP SIGNATURE-----

Merge tag 'amd_f15_m30' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras

Pull AMD F15h, model 0x30 and later enablement stuff, more specifically EDAC
support, from Borislav Petkov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-14 12:14:12 +02:00
Linn Crosetto
30e46b574a x86: avoid remapping data in parse_setup_data()
Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size
larger than early_memremap() is able to handle, which is currently
limited to 256kB. If this occurs it leads to a NULL dereference in
parse_setup_data().

To avoid this, remap the setup_data header and allow parsing functions
for individual types to handle their own data remapping.

Signed-off-by: Linn Crosetto <linn@hp.com>
Link: http://lkml.kernel.org/r/1376430401-67445-1-git-send-email-linn@hp.com
Acked-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-13 23:29:19 -07:00
Tang Chen
2449f343e4 x86: Use memblock_set_current_limit() to set limit for memblock.
In setup_arch() of x86, it set memblock.current_limit directly.
We should use memblock_set_current_limit(). If the implementation
is changed, it is easy to maintain.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/1376451844-15682-1-git-send-email-tangchen@cn.fujitsu.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-13 21:27:02 -07:00
Radu Caragea
df54d6fa54 x86 get_unmapped_area(): use proper mmap base for bottom-up direction
When the stack is set to unlimited, the bottomup direction is used for
mmap-ings but the mmap_base is not used and thus effectively renders
ASLR for mmapings along with PIE useless.

Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13 17:57:49 -07:00
Linus Torvalds
bfd3605087 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two small fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Add Haswell ULT model number used in Macbook Air and other systems
  perf/x86: Fix intel QPI uncore event definitions
2013-08-13 16:57:40 -07:00
Ingo Molnar
6356bb0ad6 Bit 12 may or may not be set in MCi_STATUS.MCACOD when
an uncorrected error is reported. Ignore it when checking
 error signatures.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR/912AAoJEKurIx+X31iBz/oP/javBL5Q098ir8qj2GdlkSV0
 sPZkJ0Qd4AmmcvYgiYfE3wVL0F8mL2R3gFLcppN7wBTNqAjFOEOJb2wofwByiv+T
 0eOM9OWxPZTsoxdiCHd5pvoNuw6lkegUfmxbb9vr9Xs0JaJuhjQY/bbbUbi2ue2V
 WoghLZvcSyZBGAaFJ3tWNdPYroHp4OulbvVdgbb4+iLgaLz73zzpv1HQvBnw6CLW
 x+P5guI5FawZnS6FjPwfjIs1mKqU5fdBxGl4vHB55IPyexWOVY6i+zc9VG0EuzDE
 za+FW2v8ohJzzxNP3/u4W3ousJoXkZSrwlZvDvuEkozrM8izibmDr2JglYqnQ2sK
 5WMXL1aXHyTISWpHYiH0/218hljUhxaC5+TfNVeiZYytFULSyZOES8Em3fENx0gN
 HohTZkyBH0jNd2OZytSqS69/dvPgDIFD7qGR6KB2CaBAU+c/qH9M6g8vfaXt5Y/6
 2a0oKrZ+WXiXYgEq3PynnTHTiaHYDo2rjBN+yQDrauomopZv0qpEMEicS66G0lE3
 7+zh3CQOiv6WL9pYQxYjeIiP46H7tb+BSpbsDYDQ23++nLj61by1YXrLBJ2MsGep
 2xEDkoVE7jW5+vskcSIUfavmp7pNNnIpRsU2cb7bem44iTc2DkuHYXZtHstvQIDN
 qIwoXyi3JE2siMTs4icc
 =LbVP
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-f-bit' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE-uncorrected-error fix from Tony Luck:

 "Bit 12 may or may not be set in MCi_STATUS.MCACOD when
  an uncorrected error is reported. Ignore it when checking
  error signatures."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12 19:51:43 +02:00
Torsten Kaiser
84516098b5 x86, microcode, AMD: Fix early microcode loading
load_microcode_amd() (and the helper it is using) should not have an
cpu parameter. The microcode loading does not depend on the CPU wrt the
patches loaded since they will end up in a global list for all CPUs
anyway.

The change from cpu to x86family in load_microcode_amd()
now allows to drop the code messing with cpu_data(cpu) from
collect_cpu_info_amd_early(), which is wrong anyway because at that
point the per-cpu cpu_info is not yet setup (These values would later be
overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()).

Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(),
because its only used at one place and without the cpuinfo_x86 accesses
it was not much left.

Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
[ Fengguang: build fix ]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[ Boris: adapt it to current tree. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 18:32:45 +02:00
Torsten Kaiser
8c6b79bb12 x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().

If early microcode loading is enabled its collect_cpu_info_amd_early()
will fill ->x86 and so the fallback to boot_cpu_data is not used. But
->x86_vendor was not filled and is still X86_VENDOR_INTEL resulting in
no errata fixes getting applied and my system hangs on boot.

Using cpu_info in cpu_has_amd_erratum() is wrong anyway: its only
caller init_amd() will have a struct cpuinfo_x86 as parameter and the
set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses
that struct.

So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum()
and the broken fallback can be dropped.

[ Boris: Drop WARN_ON() since we're called only from init_amd() ]

Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 18:25:00 +02:00
Ingo Molnar
0237d7f355 Merge branch 'x86/mce' into x86/ras
Pursue a single RAS/MCE topic branch on x86.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12 17:54:05 +02:00
Thomas Petazzoni
4287d824f2 PCI: use weak functions for MSI arch-specific functions
Until now, the MSI architecture-specific functions could be overloaded
using a fairly complex set of #define and compile-time
conditionals. In order to prepare for the introduction of the msi_chip
infrastructure, it is desirable to switch all those functions to use
the 'weak' mechanism. This commit converts all the architectures that
were overidding those MSI functions to use the new strategy.

Note that we keep two separate, non-weak, functions
default_teardown_msi_irqs() and default_restore_msi_irqs() for the
default behavior of the arch_teardown_msi_irqs() and
arch_restore_msi_irqs(), as the default behavior is needed by x86 PCI
code.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Daniel Price <daniel.price@gmail.com>
Tested-by: Thierry Reding <thierry.reding@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: linux-s390@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David S. Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2013-08-12 15:26:39 +00:00
Aravind Gopalakrishnan
7d64ac6422 x86, amd_nb: Clarify F15h, model 30h GART and L3 support
F15h, models 0x30 and later don't have a GART. Note that. Also check
CPUID leaf 0x80000006 for L3 prescence because there are models which
don't sport an L3 cache.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Boris: rewrite commit message, cleanup comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 15:30:08 +02:00
Andi Kleen
0499bd867b perf/x86: Add Haswell ULT model number used in Macbook Air and other systems
This one was missed earlier.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1376007983-31616-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12 12:19:58 +02:00
Jeremy Fitzhardinge
96f853eaa8 x86, ticketlock: Add slowpath logic
Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

Unlocker			Locker
				test for lock pickup
					-> fail
unlock
test slowpath
	-> false
				set slowpath flags
				block

Whereas this works in any ordering:

Unlocker			Locker
				set slowpath flags
				test for lock pickup
					-> fail
				block
unlock
test slowpath
	-> true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked "add" is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09 07:54:00 -07:00
Jeremy Fitzhardinge
354714dd26 x86, pvticketlock: Use callee-save for lock_spinning
Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09 07:53:44 -07:00
Jeremy Fitzhardinge
545ac13892 x86, spinlock: Replace pv spinlocks with pv ticketlocks
Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Results:
=======
setup: 32 core machine with 32 vcpu KVM guest (HT off)  with 8GB RAM
base = 3.11-rc
patched = base + pvspinlock V12

+-----------------+----------------+--------+
 dbench (Throughput in MB/sec. Higher is better)
+-----------------+----------------+--------+
|   base (stdev %)|patched(stdev%) | %gain  |
+-----------------+----------------+--------+
| 15035.3   (0.3) |15150.0   (0.6) |   0.8  |
|  1470.0   (2.2) | 1713.7   (1.9) |  16.6  |
|   848.6   (4.3) |  967.8   (4.3) |  14.0  |
|   652.9   (3.5) |  685.3   (3.7) |   5.0  |
+-----------------+----------------+--------+

pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
and undercommits results are flat

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
[ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09 07:53:05 -07:00
Steven Rostedt
fb40d7a899 x86/jump-label: Show where and what was wrong on errors
When modifying text sections for jump labels, a paranoid check is
performed. If the check fails, the system "bugs". But why it failed
is not shown.

The BUG_ON()s in the jump label update code is replaced with bug_at(ip).
This is a function that will show what pointer failed, and what was
at the location of the failure that made jump label panic.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-06 21:54:33 -04:00
Steven Rostedt
9c85f3bdf4 x86/jump-label: Add safety checks to jump label conversions
As with all modifying of kernel text, we need to be very paranoid.

When converting the jump label locations to and from nops to jumps
a check has been added to make sure what we are replacing is what we
expect, otherwise we bug.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-06 21:43:20 -04:00
Steven Rostedt
11570da1c5 x86/jump-label: Do not bother updating nops if they are correct
On boot up, the jump label init function scans all the jump label locations
and converts them to the best nop for the machine. If the nop is already
the ideal nop, do not bother with changing it.

Cc: Jason Baron <jbaron@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-06 21:43:18 -04:00
Andi Kleen
9a55fdbe94 x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-13-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:20:56 -07:00
Andi Kleen
54c2f3fdb9 x86, asmlinkage, apm: Make APM data structure used from assembler visible
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-12-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:20:20 -07:00
Andi Kleen
e0e745e45d x86, asmlinkage: Make syscall tables visible
They are referenced from entry*.S.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-11-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:20:18 -07:00
Andi Kleen
277d5b40b7 x86, asmlinkage: Make several variables used from assembler/linker script visible
Plus one function, load_gs_index().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-10-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:20:13 -07:00
Andi Kleen
04bb591ca7 x86, asmlinkage: Make kprobes code visible and fix assembler code
- Make all the external assembler template symbols __visible
- Move the templates inline assembler code into a top level
  assembler statement, not inside a function. This avoids it being
  optimized away or cloned.

Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-8-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:19:48 -07:00
Andi Kleen
ff49103fdb x86, asmlinkage: Make various syscalls asmlinkage
FWIW I suspect sys_rt_sigreturn/sys_sigreturn should use
standard SYSCALL wrappers.  But I didn't do that change in this
patch.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-7-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:18:33 -07:00
Andi Kleen
35ea7903b8 x86, asmlinkage: Make 32bit/64bit __switch_to visible
This function is called from inline assembler, so has to be visible.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-6-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:18:30 -07:00
Andi Kleen
a1ed4ddfb7 x86, asmlinkage: Make _*_start_kernel visible
Obviously these functions have to be visible, otherwise
the whole kernel could be optimized away.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-5-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:18:26 -07:00
Andi Kleen
1d9090e2fb x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible
These handlers are all referenced from assembler stubs, so need
to be visible.

The handlers without arguments become asmlinkage, the others __visible
to not force regparms(0) on x86-32.

I put it all into a single patch, please let me know if you want
it it split up.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-4-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:18:23 -07:00
Andi Kleen
1599e8fc84 x86: Fix sys_call_table type in asm/syscall.h
Make the sys_call_table type defined in asm/syscall.h match
the definition in syscall_64.c

v2: include asm/syscall.h in syscall_64.c too. I left uml alone
because it doesn't have an syscall.h on its own and including
the native one leads to other errors.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-2-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Richard Weinberger <richard@nod.at>
2013-08-06 14:18:08 -07:00
Linus Torvalds
0fff106872 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Peter Anvin.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, amd, microcode: Fix error path in apply_microcode_amd()
  x86, fpu: correct the asm constraints for fxsave, unbreak mxcsr.daz
  x86, efi: correct call to free_pages
  x86/iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset
2013-08-06 13:18:52 -07:00
Jason Wang
9df56f19a5 x86: Correctly detect hypervisor
We try to handle the hypervisor compatibility mode by detecting hypervisor
through a specific order. This is not robust, since hypervisors may implement
each others features.

This patch tries to handle this situation by always choosing the last one in the
CPUID leaves. This is done by letting .detect() return a priority instead of
true/false and just re-using the CPUID leaf where the signature were found as
the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who
has the highest priority. Other sophisticated detection method could also be
implemented on top.

Suggested by H. Peter Anvin and Paolo Bonzini.

Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Doug Covelli <dcovelli@vmware.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-05 06:35:33 -07:00
Vince Weaver
c9601247f8 perf/x86: Fix intel QPI uncore event definitions
John McCalpin reports that the "drs_data" and "ncb_data" QPI
uncore events are missing the "extra bit" and always return zero
values unless the bit is properly set.

More details from him:

 According to the Xeon E5-2600 Product Family Uncore Performance
 Monitoring Guide, Table 2-94, about 1/2 of the QPI Link Layer events
 (including the ones that "perf" calls "drs_data" and "ncb_data") require
 that the "extra bit" be set.

 This was confusing for a while -- a note at the bottom of page 94 says
 that the "extra bit" is bit 16 of the control register.
 Unfortunately, Table 2-86 clearly says that bit 16 is reserved and must
 be zero.  Looking around a bit, I found that bit 21 appears to be the
 correct "extra bit", and further investigation shows that "perf" actually
 agrees with me:
	[root@c560-003.stampede]# cat /sys/bus/event_source/devices/uncore_qpi_0/format/event
	config:0-7,21

 So the command
	# perf -e "uncore_qpi_0/event=drs_data/"
 Is the same as
	# perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
 While it should be
	# perf -e "uncore_qpi_0/event=0x102,umask=0x08/"

 I confirmed that this last version gives results that agree with the
 amount of data that I expected the STREAM benchmark to move across the QPI
 link in the second (cross-chip) test of the original script.

Reported-by: John McCalpin <mccalpin@tacc.utexas.edu>
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: zheng.z.yan@intel.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308021037280.26119@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-05 11:32:08 +02:00
David Herrmann
2995e50627 x86: sysfb: move EFI quirks from efifb to sysfb
The EFI FB quirks from efifb.c are useful for simple-framebuffer devices
as well. Apply them by default so we can convert efifb.c to use
efi-framebuffer platform devices.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1375445127-15480-5-git-send-email-dh.herrmann@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-02 16:17:47 -07:00
David Herrmann
e3263ab389 x86: provide platform-devices for boot-framebuffers
The current situation regarding boot-framebuffers (VGA, VESA/VBE, EFI) on
x86 causes troubles when loading multiple fbdev drivers. The global
"struct screen_info" does not provide any state-tracking about which
drivers use the FBs. request_mem_region() theoretically works, but
unfortunately vesafb/efifb ignore it due to quirks for broken boards.

Avoid this by creating a platform framebuffer devices with a pointer
to the "struct screen_info" as platform-data. Drivers can now create
platform-drivers and the driver-core will refuse multiple drivers being
active simultaneously.

We keep the screen_info available for backwards-compatibility. Drivers
can be converted in follow-up patches.

Different devices are created for VGA/VESA/EFI FBs to allow multiple
drivers to be loaded on distro kernels. We create:
 - "vesa-framebuffer" for VBE/VESA graphics FBs
 - "efi-framebuffer" for EFI FBs
 - "platform-framebuffer" for everything else
This allows to load vesafb, efifb and others simultaneously and each
picks up only the supported FB types.

Apart from platform-framebuffer devices, this also introduces a
compatibility option for "simple-framebuffer" drivers which recently got
introduced for OF based systems. If CONFIG_X86_SYSFB is selected, we
try to match the screen_info against a simple-framebuffer supported
format. If we succeed, we create a "simple-framebuffer" device instead
of a platform-framebuffer.
This allows to reuse the simplefb.c driver across architectures and also
to introduce a SimpleDRM driver. There is no need to have vesafb.c,
efifb.c, simplefb.c and more just to have architecture specific quirks
in their setup-routines.

Instead, we now move the architecture specific quirks into x86-setup and
provide a generic simple-framebuffer. For backwards-compatibility (if
strange formats are used), we still allow vesafb/efifb to be loaded
simultaneously and pick up all remaining devices.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1375445127-15480-4-git-send-email-dh.herrmann@gmail.com
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-02 16:17:46 -07:00
Torsten Kaiser
d982057f63 x86, amd, microcode: Fix error path in apply_microcode_amd()
Return -1 (like Intels apply_microcode) when the loading fails, also
do not set the active microcode level on failure.

Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Link: http://lkml.kernel.org/r/20130723225823.2e4e7588@googlemail.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-07-31 08:37:14 -07:00
Ben Guthro
01c6a6afd5 x86 / tboot / ACPI: Fail extended mode reduced hardware sleep
Register for the extended sleep callback from ACPI.

As tboot currently does not support the reduced hardware sleep
interface, fail this extended sleep call.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com>
Cc: tboot-devel@lists.sourceforge.net
Cc: Gang Wei <gang.wei@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-31 14:25:51 +02:00
Tony Luck
1a7f0e3c4f x86/mce: Fix mce regression from recent cleanup
In commit 33d7885b59
   x86/mce: Update MCE severity condition check

We simplified the rules to recognise each classification of recoverable
machine check combining the instruction and data fetch rules into a
single entry based on clarifications in the June 2013 SDM that all
recoverable events would be reported on the unaffected processor with
MCG_STATUS.EIPV=0 and MCG_STATUS.RIPV=1.  Unfortunately the simplified
rule has a couple of bugs.  Fix them here.

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-29 11:23:27 -07:00
Zhang Yanfei
45f1330af6 x86/acpi: Correct out-of-date comment of __acpi_map_table()
The implementation of function __acpi_map_table() has been
changed long time ago, and now it directly invokes
early_ioremap() to setup the temporarily acpi table mappings.

So correct its out-of-date comment.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: len.brown@intel.com
Cc: pavel@ucw.cz
Cc: rjw@sisk.pl
Link: http://lkml.kernel.org/r/51EE7F1C.9020506@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-26 22:12:36 +02:00
H.J. Lu
eaa5a99019 x86, fpu: correct the asm constraints for fxsave, unbreak mxcsr.daz
GCC will optimize mxcsr_feature_mask_init in arch/x86/kernel/i387.c:

		memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct));
		asm volatile("fxsave %0" : : "m" (fx_scratch));
		mask = fx_scratch.mxcsr_mask;
		if (mask == 0)
			mask = 0x0000ffbf;

to

		memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct));
		asm volatile("fxsave %0" : : "m" (fx_scratch));
		mask = 0x0000ffbf;

since asm statement doesn’t say it will update fx_scratch.  As the
result, the DAZ bit will be cleared.  This patch fixes it. This bug
dates back to at least kernel 2.6.12.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2013-07-26 09:11:56 -07:00
Santosh Shilimkar
374d5c9964 of: Specify initrd location using 64-bit
On some PAE architectures, the entire range of physical memory could reside
outside the 32-bit limit.  These systems need the ability to specify the
initrd location using 64-bit numbers.

This patch globally modifies the early_init_dt_setup_initrd_arch() function to
use 64-bit numbers instead of the current unsigned long.

There has been quite a bit of debate about whether to use u64 or phys_addr_t.
It was concluded to stick to u64 to be consistent with rest of the device
tree code. As summarized by Geert, "The address to load the initrd is decided
by the bootloader/user and set at that point later in time. The dtb should not
be tied to the kernel you are booting"

More details on the discussion can be found here:
https://lkml.org/lkml/2013/6/20/690
https://lkml.org/lkml/2012/9/13/544

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-07-24 11:10:01 +01:00
Adrian Hunter
c73deb6aec perf/x86: Add ability to calculate TSC from perf sample timestamps
For modern CPUs, perf clock is directly related to TSC.  TSC
can be calculated from perf clock and vice versa using a simple
calculation.  Two of the three componenets of that calculation
are already exported in struct perf_event_mmap_page.  This patch
exports the third.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1372425741-1676-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23 12:17:45 +02:00
Neil Horman
803075dba3 x86/iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset
Recently we added an early quirk to detect 5500/5520 chipsets
with early revisions that had problems with irq draining with
interrupt remapping enabled:

  commit 03bbcb2e7e
  Author: Neil Horman <nhorman@tuxdriver.com>
  Date:   Tue Apr 16 16:38:32 2013 -0400

      iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets

It turns out this same problem is present in the intel X58
chipset as well. See errata 69 here:

  http://www.intel.com/content/www/us/en/chipsets/x58-express-specification-update.html

This patch extends the pci early quirk so that the chip
devices/revisions specified in the above update are also covered
in the same way:

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Donald Dutile <ddutile@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Malcolm Crossley <malcolm.crossley@citrix.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1374059639-8631-1-git-send-email-nhorman@tuxdriver.com
[ Small edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23 11:29:30 +02:00
Jiri Kosina
17f41571bb kprobes/x86: Call out into INT3 handler directly instead of using notifier
In fd4363fff3 ("x86: Introduce int3 (breakpoint)-based
instruction patching"), the mechanism that was introduced for
notifying alternatives code from int3 exception handler that and
exception occured was die_notifier.

This is however problematic, as early code might be using jump
labels even before the notifier registration has been performed,
which will then lead to an oops due to unhandled exception. One
of such occurences has been encountered by Fengguang:

 int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
 task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
 RIP: 0010:[<ffffffff811098cc>]  [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
 RSP: 0000:ffff88000dd03f10  EFLAGS: 00000006
 RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
 RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
 R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
 R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
 FS:  0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
 Stack:
  ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
  ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
  ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
 Call Trace:
  <IRQ>
  [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
  [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
  [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
  [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
  [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
  <EOI>
  [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
  [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
  [<ffffffff81015f10>] default_idle+0x147/0x282
  [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
  [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
  [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
  [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
  [...]

Fix this by directly calling poke_int3_handler() from the int3
exception handler (analogically to what ftrace has been doing
already), instead of relying on notifier, registration of which
might not have yet been finalized by the time of the first trap.

Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307231007490.14024@pobox.suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23 10:12:57 +02:00
Tang Chen
82982d7293 x86/acpi: Fix incorrect sanity check in acpi_register_lapic()
We wanted to check if the APIC ID is out of range. It should be:

	if (id >= MAX_LOCAL_APIC)

There's no known bad effect of this bug.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: pavel@ucw.cz
Cc: rjw@sisk.pl
Link: http://lkml.kernel.org/r/1374566419-21120-1-git-send-email-tangchen@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23 10:08:16 +02:00
Masami Hiramatsu
ea8596bb2d kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions
Since introducing the text_poke_bp() for all text_poke_smp*()
callers, text_poke_smp*() are now unused. This patch basically
reverts:

  3d55cc8a05 ("x86: Add text_poke_smp for SMP cross modifying code")
  7deb18dcf0 ("x86: Introduce text_poke_smp_batch() for batch-code modifying")

and related commits.

This patch also fixes a Kconfig dependency issue on STOP_MACHINE
in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Borislav Petkov <bpetkov@suse.de>
Link: http://lkml.kernel.org/r/20130718114753.26675.18714.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-19 09:57:04 +02:00
Masami Hiramatsu
a7b0133ea9 kprobes/x86: Use text_poke_bp() instead of text_poke_smp*()
Use text_poke_bp() for optimizing kprobes instead of
text_poke_smp*(). Since the number of kprobes is usually not so
large (<100) and text_poke_bp() is much lighter than
text_poke_smp() [which uses stop_machine()], this just stops
using batch processing.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Borislav Petkov <bpetkov@suse.de>
Link: http://lkml.kernel.org/r/20130718114750.26675.9174.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-19 09:57:04 +02:00
Masami Hiramatsu
c7e85c42be kprobes/x86: Remove an incorrect comment about int3 in NMI/MCE
Remove a comment about an int3 issue in NMI/MCE, since
commit:

  3f3c8b8c4b ("x86: Add workaround to NMI iret woes")

already fixed that. Keeping this incorrect comment can mislead developers.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Borislav Petkov <bpetkov@suse.de>
Link: http://lkml.kernel.org/r/20130718114747.26675.84110.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-19 09:57:03 +02:00
Ingo Molnar
9bb15425c3 Merge branch 'x86/jumplabel' into perf/core
Upcoming kprobes patches rely on the int3 code-patching machinery introduced by:

   fd4363fff3 x86: Introduce int3 (breakpoint)-based instruction patching

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-19 09:55:00 +02:00
Linus Torvalds
ee114b97e6 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Trying again to get the fixes queue, including the fixed IDT alignment
  patch.

  The UEFI patch is by far the biggest issue at hand: it is currently
  causing quite a few machines to boot.  Which is sad, because the only
  reason they would is because their BIOSes touch memory that has
  already been freed.  The other major issue is that we finally have
  tracked down the root cause of a significant number of machines
  failing to suspend/resume"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Make sure IDT is page aligned
  x86, suspend: Handle CPUs which fail to #GP on RDMSR
  x86/platform/ce4100: Add header file for reboot type
  Revert "UEFI: Don't pass boot services regions to SetVirtualAddressMap()"
  efivars: check for EFI_RUNTIME_SERVICES
2013-07-18 17:39:05 -07:00
Marcelo Tosatti
e04c5d76b0 remove sched notifier for cross-cpu migrations
Linux as a guest on KVM hypervisor, the only user of the pvclock
vsyscall interface, does not require notification on task migration
because:

1. cpu ID number maps 1:1 to per-CPU pvclock time info.
2. per-CPU pvclock time info is updated if the
   underlying CPU changes.
3. that version is increased whenever underlying CPU
   changes.

Which is sufficient to guarantee nanoseconds counter
is calculated properly.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-07-18 12:29:30 +02:00
Jiri Kosina
51b2c07b22 x86: Make jump_label use int3-based patching
Make jump labels use text_poke_bp() for text patching instead of
text_poke_smp(), avoiding the need for stop_machine().

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307121120250.29788@pobox.suse.cz
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-07-16 17:55:37 -07:00
Jiri Kosina
fd4363fff3 x86: Introduce int3 (breakpoint)-based instruction patching
Introduce a method for run-time instruction patching on a live SMP kernel
based on int3 breakpoint, completely avoiding the need for stop_machine().

The way this is achieved:

	- add a int3 trap to the address that will be patched
	- sync cores
	- update all but the first byte of the patched range
	- sync cores
	- replace the first byte (int3) by the first byte of
	  replacing opcode
	- sync cores

According to

	http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html

synchronization after replacing "all but first" instructions should not
be necessary (on Intel hardware), as the syncing after the subsequent
patching of the first byte provides enough safety.
But there's not only Intel HW out there, and we'd rather be on a safe
side.

If any CPU instruction execution would collide with the patching,
it'd be trapped by the int3 breakpoint and redirected to the provided
"handler" (which would typically mean just skipping over the patched
region, acting as "nop" has been there, in case we are doing nop -> jump
and jump -> nop transitions).

Ftrace has been using this very technique since 08d636b ("ftrace/x86:
Have arch x86_64 use breakpoints instead of stop machine") for ages
already, and jump labels are another obvious potential user of this.

Based on activities of Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
a few years ago.

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307121102440.29788@pobox.suse.cz
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-07-16 17:55:29 -07:00
Kees Cook
4df05f3619 x86: Make sure IDT is page aligned
Since the IDT is referenced from a fixmap, make sure it is page aligned.
Merge with 32-bit one, since it was already aligned to deal with F00F
bug. Since bss is cleared before IDT setup, it can live there. This also
moves the other *_idt_table variables into common locations.

This avoids the risk of the IDT ever being moved in the bss and having
the mapping be offset, resulting in calling incorrect handlers. In the
current upstream kernel this is not a manifested bug, but heavily patched
kernels (such as those using the PaX patch series) did encounter this bug.

The tables other than idt_table technically do not need to be page
aligned, at least not at the current time, but using a common
declaration avoids mistakes.  On 64 bits the table is exactly one page
long, anyway.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130716183441.GA14232@www.outflux.net
Reported-by: PaX Team <pageexec@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-07-16 15:14:48 -07:00
H. Peter Anvin
5ff560fd48 x86, suspend: Handle CPUs which fail to #GP on RDMSR
There are CPUs which have errata causing RDMSR of a nonexistent MSR to
not fault.  We would then try to WRMSR to restore the value of that
MSR, causing a crash.  Specifically, some Pentium M variants would
have this problem trying to save and restore the non-existent EFER,
causing a crash on resume.

Work around this by making sure we can write back the result at
suspend time.

Huge thanks to Christian Sünkenberg for finding the offending erratum
that finally deciphered the mystery.

Reported-and-tested-by: Johan Heinrich <onny@project-insanity.org>
Debugged-by: Christian Sünkenberg <christian.suenkenberg@student.kit.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Link: http://lkml.kernel.org/r/51DDC972.3010005@student.kit.edu
Cc: <stable@vger.kernel.org> # v3.7+
2013-07-15 13:50:54 -07:00
Paul Gortmaker
148f9bb877 x86: delete __cpuinit usage from all x86 files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit  -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings.  In any case, they are temporary and harmless.

This removes all the arch/x86 uses of the __cpuinit macros from
all C files.  x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:56 -04:00
Linus Torvalds
560ae37178 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 - fix for do_div() abuse on x86
 - locking fix in perf core
 - a pile of (build) fixes and cleanups in perf tools

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  perf/x86: Fix incorrect use of do_div() in NMI warning
  perf: Fix perf_lock_task_context() vs RCU
  perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario
  perf: Clone child context from parent context pmu
  perf script: Fix broken include in Context.xs
  perf tools: Fix -ldw/-lelf link test when static linking
  perf tools: Revert regression in configuration of Python support
  perf tools: Fix perf version generation
  perf stat: Fix per-socket output bug for uncore events
  perf symbols: Fix vdso list searching
  perf evsel: Fix missing increment in sample parsing
  perf tools: Update symbol_conf.nr_events when processing attribute events
  perf tools: Fix new_term() missing free on error path
  perf tools: Fix parse_events_terms() segfault on error path
  perf evsel: Fix count parameter to read call in event_format__new
  perf tools: fix a typo of a Power7 event name
  perf tools: Fix -x/--exclude-other option for report command
  perf evlist: Enhance perf_evlist__start_workload()
  perf record: Remove -f/--force option
  perf record: Remove -A/--append option
  ...
2013-07-13 15:35:47 -07:00
Dave Hansen
baf64b8544 perf/x86: Fix incorrect use of do_div() in NMI warning
I completely botched understanding the calling conventions of
do_div().  I assumed that do_div() returned the result instead
of realizing that it modifies its argument and returns a
remainder.  The side-effect from this would be bogus numbers
for the "msecs" value in the warning messages:

	INFO: NMI handler (perf_event_nmi_handler) took too long to run: 0.114 msecs

Note, there was a second fix posted by Stephane Eranian for
a separate patch which I also botched:

	http://lkml.kernel.org/r/20130704223010.GA30625@quad

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20130708214404.B0B6EA66@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-12 14:13:04 +02:00
Linus Torvalds
8cbd0eefca Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:
 "There are not too many changes this time, except two new platform
  thermal drivers, ti-soc-thermal driver and x86_pkg_temp_thermal
  driver, and a couple of small fixes.

  Highlights:

   - move the ti-soc-thermal driver out of the staging tree to the
     thermal tree.

   - introduce the x86_pkg_temp_thermal driver.  This driver registers
     CPU digital temperature package level sensor as a thermal zone.

   - small fixes/cleanups including removing redundant use of
     platform_set_drvdata() and of_match_ptr for all platform thermal
     drivers"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (34 commits)
  thermal: cpu_cooling: fix stub function
  thermal: ti-soc-thermal: use standard GPIO DT bindings
  thermal: MAINTAINERS: Add git tree path for SoC specific updates
  thermal: fix x86_pkg_temp_thermal.c build and Kconfig
  Thermal: Documentation for x86 package temperature thermal driver
  Thermal: CPU Package temperature thermal
  thermal: consider emul_temperature while computing trend
  thermal: ti-soc-thermal: add DT example for DRA752 chip
  thermal: ti-soc-thermal: add dra752 chip to device table
  thermal: ti-soc-thermal: add thermal data for DRA752 chips
  thermal: ti-soc-thermal: remove usage of IS_ERR_OR_NULL
  thermal: ti-soc-thermal: freeze FSM while computing trend
  thermal: ti-soc-thermal: remove external heat while extrapolating hotspot
  thermal: ti-soc-thermal: update DT reference for OMAP5430
  x86, mcheck, therm_throt: Process package thresholds
  thermal: cpu_cooling: fix 'descend' check in get_property()
  Thermal: spear: Remove redundant use of of_match_ptr
  Thermal: kirkwood: Remove redundant use of of_match_ptr
  Thermal: dove: Remove redundant use of of_match_ptr
  Thermal: armada: Remove redundant use of of_match_ptr
  ...
2013-07-11 12:26:08 -07:00
Linus Torvalds
a9642fa351 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two small fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix interrupt handler timing harness
  perf/x86/amd: Do not print an error when the device is not present
2013-07-10 16:04:38 -07:00
Linus Torvalds
23e3a1d971 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "irq-tracing fixlet"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()
2013-07-10 10:17:01 -07:00
Linus Torvalds
2e17c5a97e Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Okay this is the big one, I was stalled on the fbdev pull req as I
  stupidly let fbdev guys merge a patch I required to fix a warning with
  some patches I had, they ended up merging the patch from the wrong
  place, but the warning should be fixed.  In future I'll just take the
  patch myself!

  Outside drm:

  There are some snd changes for the HDMI audio interactions on haswell,
  they've been acked for inclusion via my tree.  This relies on the
  wound/wait tree from Ingo which is already merged.

  Major changes:

  AMD finally released the dynamic power management code for all their
  GPUs from r600->present day, this is great, off by default for now but
  also a huge amount of code, in fact it is most of this pull request.

  Since it landed there has been a lot of community testing and Alex has
  sent a lot of fixes for any bugs found so far.  I suspect radeon might
  now be the biggest kernel driver ever :-P p.s.  radeon.dpm=1 to enable
  dynamic powermanagement for anyone.

  New drivers:

  Renesas r-car display unit.

  Other highlights:

   - core: GEM CMA prime support, use new w/w mutexs for TTM
     reservations, cursor hotspot, doc updates
   - dvo chips: chrontel 7010B support
   - i915: Haswell (fbc, ips, vecs, watermarks, audio powerwell),
     Valleyview (enabled by default, rc6), lots of pll reworking, 30bpp
     support (this time for sure)
   - nouveau: async buffer object deletion, context/register init
     updates, kernel vp2 engine support, GF117 support, GK110 accel
     support (with external nvidia ucode), context cleanups.
   - exynos: memory leak fixes, Add S3C64XX SoC series support, device
     tree updates, common clock framework support,
   - qxl: cursor hotspot support, multi-monitor support, suspend/resume
     support
   - mgag200: hw cursor support, g200 mode limiting
   - shmobile: prime support
   - tegra: fixes mostly

  I've been banging on this quite a lot due to the size of it, and it
  seems to okay on everything I've tested it on."

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (811 commits)
  drm/radeon/dpm: implement vblank_too_short callback for si
  drm/radeon/dpm: implement vblank_too_short callback for cayman
  drm/radeon/dpm: implement vblank_too_short callback for btc
  drm/radeon/dpm: implement vblank_too_short callback for evergreen
  drm/radeon/dpm: implement vblank_too_short callback for 7xx
  drm/radeon/dpm: add checks against vblank time
  drm/radeon/dpm: add helper to calculate vblank time
  drm/radeon: remove stray line in old pm code
  drm/radeon/dpm: fix display_gap programming on rv7xx
  drm/nvc0/gr: fix gpc firmware regression
  drm/nouveau: fix minor thinko causing bo moves to not be async on kepler
  drm/radeon/dpm: implement force performance level for TN
  drm/radeon/dpm: implement force performance level for ON/LN
  drm/radeon/dpm: implement force performance level for SI
  drm/radeon/dpm: implement force performance level for cayman
  drm/radeon/dpm: implement force performance levels for 7xx/eg/btc
  drm/radeon/dpm: add infrastructure to force performance levels
  drm/radeon: fix surface setup on r1xx
  drm/radeon: add support for 3d perf states on older asics
  drm/radeon: set default clocks for SI when DPM is disabled
  ...
2013-07-09 16:04:31 -07:00
Robin Holt
1b3a5d02ee reboot: move arch/x86 reboot= handling to generic kernel
Merge together the unicore32, arm, and x86 reboot= command line
parameter handling.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:29 -07:00
Robin Holt
edf2b13946 reboot: x86: prepare reboot_mode for moving to generic kernel code
Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Miguel Boton <mboton.lkml@gmail.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:29 -07:00
Oleg Nesterov
f7da04c9e3 ptrace/x86: flush_ptrace_hw_breakpoint() shoule clear the virtual debug registers
flush_ptrace_hw_breakpoint() destroys the counters set by ptrace, but
"leaks" ->debugreg6 and ->ptrace_dr7.

The problem is minor, but still it doesn't look right and flush_thread()
did this until commit 66cb591729 ("hw-breakpoints: use the new wrapper
routines to access debug registers in process/thread code").  Now that
PTRACE_DETACH does flush_ too this makes even more sense.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:26 -07:00
Oleg Nesterov
61e305c716 ptrace/x86: cleanup ptrace_set_debugreg()
ptrace_set_debugreg() is trivial but looks horrible.  Kill the unnecessary
goto's and return's to cleanup the code.

This matches ptrace_get_debugreg() which also needs the trivial whitespace
cleanups.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:26 -07:00
Oleg Nesterov
b87a95ad60 ptrace/x86: ptrace_write_dr7() should create bp if !disabled
Commit 24f1e32c60 ("hw-breakpoints: Rewrite the hw-breakpoints layer
on top of perf events") introduced the minor regression.  Before this
commit

	PTRACE_POKEUSER DR7, enableDR0
	PTRACE_POKEUSER DR0, address

was perfectly valid, now PTRACE_POKEUSER(DR7) fails if DR0 was not
previously initialized by PTRACE_POKEUSER(DR0).

Change ptrace_write_dr7() to do ptrace_register_breakpoint(addr => 0) if
!bp && !disabled.

This fixes watchpoint-zeroaddr from ptrace-tests, see

    https://bugzilla.redhat.com/show_bug.cgi?id=660204.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:26 -07:00
Oleg Nesterov
9afe33ada2 ptrace/x86: introduce ptrace_register_breakpoint()
No functional changes, preparation.

Extract the "register breakpoint" code from ptrace_get_debugreg() into
the new/generic helper, ptrace_register_breakpoint().  It will have more
users.

The patch also adds another simple helper, ptrace_fill_bp_fields(), to
factor out the arch_bp_generic_fields() logic in register/modify.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:26 -07:00
Oleg Nesterov
29a5551341 ptrace/x86: dont delay "disable" till second pass in ptrace_write_dr7()
ptrace_write_dr7() skips ptrace_modify_breakpoint(disabled => true)
unless second_pass, this buys nothing but complicates the code and means
that we always do the main loop twice even if "disabled" was never true.

The comment says:

	Don't unregister the breakpoints right-away,
	unless all register_user_hw_breakpoint()
	requests have succeeded.

Firstly, we do not do register_user_hw_breakpoint(), it was removed by
commit 24f1e32c60 ("hw-breakpoints: Rewrite the hw-breakpoints layer
on top of perf events").

We are going to restore register_user_hw_breakpoint() (see the next
patch) but this doesn't matter: after commit 44234adcdc
("hw-breakpoints: Modify breakpoints without unregistering them")
perf_event_disable() can not hurt, hw_breakpoint_del() does not free the
slot.

Remove the "second_pass" check from the main loop and simplify the code.
Since we have to check "bp != NULL" anyway, the patch also removes the
same check in ptrace_modify_breakpoint() and moves the comment into
ptrace_write_dr7().

With this patch the second pass is only needed to restore the saved
old_dr7.  This should never fail, so the patch adds WARN_ON() to catch
the potential problems as Frederic suggested.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:26 -07:00
Oleg Nesterov
e6a7d60771 ptrace/x86: simplify the "disable" logic in ptrace_write_dr7()
ptrace_write_dr7() looks unnecessarily overcomplicated.  We can factor
out ptrace_modify_breakpoint() and do not do "continue" twice, just we
need to pass the proper "disabled" argument to
ptrace_modify_breakpoint().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:26 -07:00
Oleg Nesterov
02be46fba4 ptrace/x86: revert "hw_breakpoints: Fix racy access to ptrace breakpoints"
This reverts commit 87dc669ba2 ("hw_breakpoints: Fix racy access to
ptrace breakpoints").

The patch was fine but we can no longer race with SIGKILL after commit
9899d11f65 ("ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.

The patch only removes ptrace_get_breakpoints/ptrace_put_breakpoints and
does a couple of "while at it" cleanups, it doesn't remove other changes
from the reverted commit.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:25 -07:00
Naveen N. Rao
9ad95879cd mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
Add a boot option to disable firmware first mode for corrected errors.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-08 11:54:28 -07:00
Naveen N. Rao
c3d1fb567a mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
The Corrected Machine Check structure (CMC) in HEST has a flag which can be
set by the firmware to indicate to the OS that it prefers to process the
corrected error events first. In this scenario, the OS is expected to not
monitor for corrected errors (through CMCI/polling). Instead, the firmware
notifies the OS on corrected error events through GHES.

Linux already has support for GHES. This patch adds support for parsing CMC
structure and to disable CMCI/polling if the firmware first flag is set.

Further, the list of machine check bank structures at the end of CMC is used
to determine which MCA banks function in FF mode, so that we continue to
monitor error events on the other banks.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-08 11:53:01 -07:00
Linus Torvalds
21884a83b2 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
 "The timer changes contain:

   - posix timer code consolidation and fixes for odd corner cases

   - sched_clock implementation moved from ARM to core code to avoid
     duplication by other architectures

   - alarm timer updates

   - clocksource and clockevents unregistration facilities

   - clocksource/events support for new hardware

   - precise nanoseconds RTC readout (Xen feature)

   - generic support for Xen suspend/resume oddities

   - the usual lot of fixes and cleanups all over the place

  The parts which touch other areas (ARM/XEN) have been coordinated with
  the relevant maintainers.  Though this results in an handful of
  trivial to solve merge conflicts, which we preferred over nasty cross
  tree merge dependencies.

  The patches which have been committed in the last few days are bug
  fixes plus the posix timer lot.  The latter was in akpms queue and
  next for quite some time; they just got forgotten and Frederic
  collected them last minute."

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
  hrtimer: Remove unused variable
  hrtimers: Move SMP function call to thread context
  clocksource: Reselect clocksource when watchdog validated high-res capability
  posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  posix_timers: fix racy timer delta caching on task exit
  posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
  selftests: add basic posix timers selftests
  posix_cpu_timers: consolidate expired timers check
  posix_cpu_timers: consolidate timer list cleanups
  posix_cpu_timer: consolidate expiry time type
  tick: Sanitize broadcast control logic
  tick: Prevent uncontrolled switch to oneshot mode
  tick: Make oneshot broadcast robust vs. CPU offlining
  x86: xen: Sync the CMOS RTC as well as the Xen wallclock
  x86: xen: Sync the wallclock when the system time is set
  timekeeping: Indicate that clock was set in the pvclock gtod notifier
  timekeeping: Pass flags instead of multiple bools to timekeeping_update()
  xen: Remove clock_was_set() call in the resume path
  hrtimers: Support resuming with two or more CPUs online (but stopped)
  timer: Fix jiffies wrap behavior of round_jiffies_common()
  ...
2013-07-06 14:09:38 -07:00
Linus Torvalds
2cb7b5a38c irqdomain refactoring for v3.11
This is the long awaited simplification of irqdomain. It gets rid of the
 different types of irq domains and instead both linear and tree mappings
 can be supported in a single domain. Doing this removes a lot of special
 case code and makes irq domains simpler to understand overall.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR1mnQAAoJEEFnBt12D9kBW7gP/jQpLqFU6PTdsCMX459Ib7fE
 JGLQWYjAsqVbVT3H8LZYOZl+ItIbXSQVk+wf0mislnOzxtGxNCxZeJdniFK2RWyz
 sGNm/eEZfp9bvm9jlrPE615QE7U7c9fQ5dqETNS4UMMFFmjgrsY4Z82zYH2/Y/jK
 YbxFUT6+fep5QaF7lYM7phfrs7FrvnpCoxCio6LO+GgxnMxHSsGJ8L9JcojZ5Qoa
 MwlSwsMaQrWJ6PNVbB2YrtlMxjHkNpQ7EV7bNF693PjACVpXf3ItpvQ1k6W/ns06
 j6Nysj4nbjsTdWOVh5mY7tPfknZUDs38rLT5s+RbfHqySMIGQXp6CzieYaJ+lxV9
 YsItyfZE7h3anebjwRv/9XqLNsDM4KmSIJGcnq+QxNSGxO3rXkItWtpCKaqYdWnN
 wI359rSZ//x2ltFzvB14mZJm2j7r9Hh7gH8M5h8W8YTqCbe3oSES3iRhJqnyPxMX
 of4PsQaTAicuLwberGdVkKafg9XIsHPoebBzDxssB4nxPrEKIILL4g54x1rV17TZ
 kBXurd79FhP4gOQxJHf5uEpdxrQ3Cm/PQpD5CG2rZJrrcV7hfP4QMMFHqqbFSIOu
 2h4M4fZX2gsDh3RLboGRGn0GckSPTHiO05yeeBriLzMCVGcZ1JMqAxLUSJpH/UYl
 EzFy9RpwnwLoUPi8jUXE
 =yJRw
 -----END PGP SIGNATURE-----

Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux

Pull irqdomain refactoring from Grant Likely:
 "This is the long awaited simplification of irqdomain.  It gets rid of
  the different types of irq domains and instead both linear and tree
  mappings can be supported in a single domain.  Doing this removes a
  lot of special case code and makes irq domains simpler to understand
  overall"

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux:
  irq: fix checkpatch error
  irqdomain: Include hwirq number in /proc/interrupts
  irqdomain: make irq_linear_revmap() a fast path again
  irqdomain: remove irq_domain_generate_simple()
  irqdomain: Refactor irq_domain_associate_many()
  irqdomain: Beef up debugfs output
  irqdomain: Clean up aftermath of irq_domain refactoring
  irqdomain: Eliminate revmap type
  irqdomain: merge linear and tree reverse mappings.
  irqdomain: Add a name field
  irqdomain: Replace LEGACY mapping with LINEAR
  irqdomain: Relax failure path on setting up mappings
2013-07-06 12:37:04 -07:00
Peter Zijlstra
100ac53315 perf/x86/amd: Do not print an error when the device is not present
As Linus said its not an error to not have an AMD IOMMU; esp.
when you're not even running on an AMD platform.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Link: http://lkml.kernel.org/r/20130703075542.GF23916@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-05 08:27:15 +02:00
Thomas Gleixner
2b0f89317e Merge branch 'timers/posix-cpu-timers-for-tglx' of
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core

Frederic sayed: "Most of these patches have been hanging around for
several month now, in -mmotm for a significant chunk. They already
missed a few releases."

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-04 23:11:22 +02:00
Linus Torvalds
80cc38b163 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  treewide: relase -> release
  Documentation/cgroups/memory.txt: fix stat file documentation
  sysctl/net.txt: delete reference to obsolete 2.4.x kernel
  spinlock_api_smp.h: fix preprocessor comments
  treewide: Fix typo in printk
  doc: device tree: clarify stuff in usage-model.txt.
  open firmware: "/aliasas" -> "/aliases"
  md: bcache: Fixed a typo with the word 'arithmetic'
  irq/generic-chip: fix a few kernel-doc entries
  frv: Convert use of typedef ctl_table to struct ctl_table
  sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
  doc: clk: Fix incorrect wording
  Documentation/arm/IXP4xx fix a typo
  Documentation/networking/ieee802154 fix a typo
  Documentation/DocBook/media/v4l fix a typo
  Documentation/video4linux/si476x.txt fix a typo
  Documentation/virtual/kvm/api.txt fix a typo
  Documentation/early-userspace/README fix a typo
  Documentation/video4linux/soc-camera.txt fix a typo
  lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
  ...
2013-07-04 11:40:58 -07:00
Linus Torvalds
7f0ef0267e Merge branch 'akpm' (updates from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 - various misc bits
 - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
   distracted.  There has been quite a bit of activity.
 - About half the MM queue
 - Some backlight bits
 - Various lib/ updates
 - checkpatch updates
 - zillions more little rtc patches
 - ptrace
 - signals
 - exec
 - procfs
 - rapidio
 - nbd
 - aoe
 - pps
 - memstick
 - tools/testing/selftests updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
  tools/testing/selftests: don't assume the x bit is set on scripts
  selftests: add .gitignore for kcmp
  selftests: fix clean target in kcmp Makefile
  selftests: add .gitignore for vm
  selftests: add hugetlbfstest
  self-test: fix make clean
  selftests: exit 1 on failure
  kernel/resource.c: remove the unneeded assignment in function __find_resource
  aio: fix wrong comment in aio_complete()
  drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
  drivers/memstick/host/r592.c: convert to module_pci_driver
  drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
  pps-gpio: add device-tree binding and support
  drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
  drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
  drivers/parport/share.c: use kzalloc
  Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
  aoe: update internal version number to v83
  aoe: update copyright date
  aoe: perform I/O completions in parallel
  ...
2013-07-03 17:12:13 -07:00
Jiang Liu
46a841329a mm/x86: prepare for removing num_physpages and simplify mem_init()
Prepare for removing num_physpages and simplify mem_init().

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:38 -07:00
Linus Torvalds
f991fae5c6 Power management and ACPI updates for 3.11-rc1
- Hotplug changes allowing device hot-removal operations to fail
   gracefully (instead of crashing the kernel) if they cannot be
   carried out completely.  From Rafael J Wysocki and Toshi Kani.
 
 - Freezer update from Colin Cross and Mandeep Singh Baines targeted
   at making the freezing of tasks a bit less heavy weight operation.
 
 - cpufreq resume fix from Srivatsa S Bhat for a regression introduced
   during the 3.10 cycle causing some cpufreq sysfs attributes to
   return wrong values to user space after resume.
 
 - New freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to
   provide information previously available via related_cpus from
   Lan Tianyu.
 
 - cpufreq fixes and cleanups from Viresh Kumar, Jacob Shin,
   Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and
   Tang Yuantian.
 
 - Fix for an ACPICA regression causing suspend/resume issues to
   appear on some systems introduced during the 3.4 development cycle
   from Lv Zheng.
 
 - ACPICA fixes and cleanups from Bob Moore, Tomasz Nowicki, Lv Zheng,
   Chao Guan, and Zhang Rui.
 
 - New cupidle driver for Xilinx Zynq processors from Michal Simek.
 
 - cpuidle fixes and cleanups from Daniel Lezcano.
 
 - Changes to make suspend/resume work correctly in Xen guests from
   Konrad Rzeszutek Wilk.
 
 - ACPI device power management fixes and cleanups from Fengguang Wu
   and Rafael J Wysocki.
 
 - ACPI documentation updates from Lv Zheng, Aaron Lu and Hanjun Guo.
 
 - Fix for the IA-64 issue that was the reason for reverting commit
   9f29ab1 and updates of the ACPI scan code from Rafael J Wysocki.
 
 - Mechanism for adding CMOS RTC address space handlers from Lan Tianyu
   (to allow some EC-related breakage to be fixed on some systems).
 
 - Spec-compliant implementation of acpi_os_get_timer() from
   Mika Westerberg.
 
 - Modification of do_acpi_find_child() to execute _STA in order to
   to avoid situations in which a pointer to a disabled device object
   is returned instead of an enabled one with the same _ADR value.
   From Jeff Wu.
 
 - Intel BayTrail PCH (Platform Controller Hub) support for the ACPI
   Intel Low-Power Subsystems (LPSS) driver and modificaions of that
   driver to work around a couple of known BIOS issues from
   Mika Westerberg and Heikki Krogerus.
 
 - EC driver fix from Vasiliy Kulikov to make it use get_user() and
   put_user() instead of dereferencing user space pointers blindly.
 
 - Assorted ACPI code cleanups from Bjorn Helgaas, Nicholas Mazzuca and
   Toshi Kani.
 
 - Modification of the "runtime idle" helper routine to take the return
   values of the callbacks executed by it into account and to call
   rpm_suspend() if they return 0, which allows some code bloat
   reduction to be done, from Rafael J Wysocki and Alan Stern.
 
 - New trace points for PM QoS from Sahara <keun-o.park@windriver.com>.
 
 - PM QoS documentation update from Lan Tianyu.
 
 - Assorted core PM code cleanups and changes from Bernie Thompson,
   Bjorn Helgaas, Julius Werner, and Shuah Khan.
 
 - New devfreq driver for the Exynos5-bus device from Abhilash Kesavan.
 
 - Minor devfreq cleanups, fixes and MAINTAINERS update from
   MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and
   Wei Yongjun.
 
 - OMAP Adaptive Voltage Scaling (AVS) SmartReflex voltage control
   driver updates from Andrii Tseglytskyi and Nishanth Menon.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0ZNOAAoJEKhOf7ml8uNsDLYP/0EU4rmvw0TWTITfp6RS1KDE
 9GwBn96ZR4Q5bJd9gBCTPSqhHOYMqxWEUp99sn/M2wehG1pk/jw5LO56+2IhM3UZ
 g1HDcJ7te2nVT/iXsKiAGTVhU9Rk0aYwoVSknwk27qpIBGxW9w/s5tLX8pY3Q3Zq
 wL/7aTPjyL+PFFFEaxgH7qLqsl3DhbtYW5AriUBTkXout/tJ4eO1b7MNBncLDh8X
 VQ/0DNCKE95VEJfkO4rk9RKUyVp9GDn0i+HXCD/FS4IA5oYzePdVdNDmXf7g+swe
 CGlTZq8pB+oBpDiHl4lxzbNrKQjRNbGnDUkoRcWqn0nAw56xK+vmYnWJhW99gQ/I
 fKnvxeLca5po1aiqmC4VSJxZIatFZqLrZAI4dzoCLWY+bGeTnCKmj0/F8ytFnZA2
 8IuLLs7/dFOaHXV/pKmpg6FAlFa9CPxoqRFoyqb4M0GjEarADyalXUWsPtG+6xCp
 R/p0CISpwk+guKZR/qPhL7M654S7SHrPwd2DPF0KgGsvk+G2GhoB8EzvD8BVp98Z
 9siCGCdgKQfJQVI6R0k9aFmn/4gRQIAgyPhkhv9tqULUUkiaXki+/t8kPfnb8O/d
 zep+CA57E2G8MYLkDJfpFeKS7GpPD6TIdgFdGmOUC0Y6sl9iTdiw4yTx8O2JM37z
 rHBZfYGkJBrbGRu+Q1gs
 =VBBq
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "This time the total number of ACPI commits is slightly greater than
  the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
  remains the most active patch submitter.

  To me, the most significant change is the addition of offline/online
  device operations to the driver core (with the Greg's blessing) and
  the related modifications of the ACPI core hotplug code.  Next are the
  freezer updates from Colin Cross that should make the freezing of
  tasks a bit less heavy weight.

  We also have a couple of regression fixes, a number of fixes for
  issues that have not been identified as regressions, two new drivers
  and a bunch of cleanups all over.

  Highlights:

   - Hotplug changes to support graceful hot-removal failures.

     It sometimes is necessary to fail device hot-removal operations
     gracefully if they cannot be carried out completely.  For example,
     if memory from a memory module being hot-removed has been allocated
     for the kernel's own use and cannot be moved elsewhere, it's
     desirable to fail the hot-removal operation in a graceful way
     rather than to crash the kernel, but currenty a success or a kernel
     crash are the only possible outcomes of an attempted memory
     hot-removal.  Needless to say, that is not a very attractive
     alternative and it had to be addressed.

     However, in order to make it work for memory, I first had to make
     it work for CPUs and for this purpose I needed to modify the ACPI
     processor driver.  It's been split into two parts, a resident one
     handling the low-level initialization/cleanup and a modular one
     playing the actual driver's role (but it binds to the CPU system
     device objects rather than to the ACPI device objects representing
     processors).  That's been sort of like a live brain surgery on a
     patient who's riding a bike.

     So this is a little scary, but since we found and fixed a couple of
     regressions it caused to happen during the early linux-next testing
     (a month ago), nobody has complained.

     As a bonus we remove some duplicated ACPI hotplug code, because the
     ACPI-based CPU hotplug is now going to use the common ACPI hotplug
     code.

   - Lighter weight freezing of tasks.

     These changes from Colin Cross and Mandeep Singh Baines are
     targeted at making the freezing of tasks a bit less heavy weight
     operation.  They reduce the number of tasks woken up every time
     during the freezing, by using the observation that the freezer
     simply doesn't need to wake up some of them and wait for them all
     to call refrigerator().  The time needed for the freezer to decide
     to report a failure is reduced too.

     Also reintroduced is the check causing a lockdep warining to
     trigger when try_to_freeze() is called with locks held (which is
     generally unsafe and shouldn't happen).

   - cpufreq updates

     First off, a commit from Srivatsa S Bhat fixes a resume regression
     introduced during the 3.10 cycle causing some cpufreq sysfs
     attributes to return wrong values to user space after resume.  The
     fix is kind of fresh, but also it's pretty obvious once Srivatsa
     has identified the root cause.

     Second, we have a new freqdomain_cpus sysfs attribute for the
     acpi-cpufreq driver to provide information previously available via
     related_cpus.  From Lan Tianyu.

     Finally, we fix a number of issues, mostly related to the
     CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
     up some code.  The majority of changes from Viresh Kumar with bits
     from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
     Arnd Bergmann, and Tang Yuantian.

   - ACPICA update

     A usual bunch of updates from the ACPICA upstream.

     During the 3.4 cycle we introduced support for ACPI 5 extended
     sleep registers, but they are only supposed to be used if the
     HW-reduced mode bit is set in the FADT flags and the code attempted
     to use them without checking that bit.  That caused suspend/resume
     regressions to happen on some systems.  Fix from Lv Zheng causes
     those registers to be used only if the HW-reduced mode bit is set.

     Apart from this some other ACPICA bugs are fixed and code cleanups
     are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
     Zhang Rui.

   - cpuidle updates

     New driver for Xilinx Zynq processors is added by Michal Simek.

     Multidriver support simplification, addition of some missing
     kerneldoc comments and Kconfig-related fixes come from Daniel
     Lezcano.

   - ACPI power management updates

     Changes to make suspend/resume work correctly in Xen guests from
     Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
     cleanups and fixes of the ACPI device power state selection
     routine.

   - ACPI documentation updates

     Some previously missing pieces of ACPI documentation are added by
     Lv Zheng and Aaron Lu (hopefully, that will help people to
     uderstand how the ACPI subsystem works) and one outdated doc is
     updated by Hanjun Guo.

   - Assorted ACPI updates

     We finally nailed down the IA-64 issue that was the reason for
     reverting commit 9f29ab11dd ("ACPI / scan: do not match drivers
     against objects having scan handlers"), so we can fix it and move
     the ACPI scan handler check added to the ACPI video driver back to
     the core.

     A mechanism for adding CMOS RTC address space handlers is
     introduced by Lan Tianyu to allow some EC-related breakage to be
     fixed on some systems.

     A spec-compliant implementation of acpi_os_get_timer() is added by
     Mika Westerberg.

     The evaluation of _STA is added to do_acpi_find_child() to avoid
     situations in which a pointer to a disabled device object is
     returned instead of an enabled one with the same _ADR value.  From
     Jeff Wu.

     Intel BayTrail PCH (Platform Controller Hub) support is added to
     the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
     driver is modified to work around a couple of known BIOS issues.
     Changes from Mika Westerberg and Heikki Krogerus.

     The EC driver is fixed by Vasiliy Kulikov to use get_user() and
     put_user() instead of dereferencing user space pointers blindly.

     Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
     Kani.

   - Assorted power management updates

     The "runtime idle" helper routine is changed to take the return
     values of the callbacks executed by it into account and to call
     rpm_suspend() if they return 0, which allows us to reduce the
     overall code bloat a bit (by dropping some code that's not
     necessary any more after that modification).

     The runtime PM documentation is updated by Alan Stern (to reflect
     the "runtime idle" behavior change).

     New trace points for PM QoS are added by Sahara
     (<keun-o.park@windriver.com>).

     PM QoS documentation is updated by Lan Tianyu.

     Code cleanups are made and minor issues are addressed by Bernie
     Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.

   - devfreq updates

     New driver for the Exynos5-bus device from Abhilash Kesavan.

     Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
     Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.

   - OMAP power management updates

     Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
     updates from Andrii Tseglytskyi and Nishanth Menon."

* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
  cpufreq: Fix cpufreq regression after suspend/resume
  ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
  PM / Sleep: Warn about system time after resume with pm_trace
  cpufreq: don't leave stale policy pointer in cdbs->cur_policy
  acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
  cpufreq: make sure frequency transitions are serialized
  ACPI: implement acpi_os_get_timer() according the spec
  ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
  ACPI: Add CMOS RTC Operation Region handler support
  ACPI / processor: Drop unused variable from processor_perflib.c
  cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
  ...
2013-07-03 14:35:40 -07:00
Linus Torvalds
a9f4a7005f Thermal limit warnings are too scary and cause unnecessary concern
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRzhAdAAoJEKurIx+X31iByQEP/1eAEHNiSCrrKUm/ovGCiVBZ
 w+ZRJkcKYBj+/3cQUeMds7OevmM8fSU5t+4zGpikobQ75xCjz7vJ6nQNLrSylqQG
 Z4weVme6a529szRwv6H9an+psj5wP0yWAbCgBSw4/O3cogDS0eMD8T6ZhEi8mWJ9
 PRSdqOLdxl7AZpJpJpP3GF7VQLJLDbVv56FcKgAZ3mtFM6qr7HFiTtnTc+Jl33V5
 bM7v23HXyoyRk+JOUMtO+4BlrkKzbtL4Zh8aGw4s71QX3zZbP0BIw9sKffNMD+2k
 oYXOcJjTwbtxQqXKoSKOtvLUQzf/SDqzlFf7yU38Spw+nRECPhVVRO0aAXKOmFKD
 74qyNuW6ZXMw7xH4M7iG6MuM/cZIvueDtp0RSZVc6Gmb5CGb1aXz4GeotvKF5Bev
 nVL65UVo6XwfB4m77ayPYqfD6KX0BsWHdCzWABNl8WbgyCY5dXRJJs/c9iLCl05I
 EXRgWLJK/H/wOY9mj3NGcvRicYJigZP2luJEdwn8mVlpZl+3wstDcgq9Ijxr0LGi
 bj2H9ro3vSaXpAnJw3FR9QbXgH/8Nbb8sV7yBddy9zntrnfEfvrYtrhOKR+fPrQL
 C5zXeaXGEnqCEcYNESSjyhDCa1az49BlJC5ebEHHqdviHzRpVVi/EG920GViWw+D
 z+z25N5hi4yoUF/+1vc/
 =Dm06
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull thermal power-limit update from Tony Luck:
 "Thermal limit warnings are too scary and cause unnecessary concern"

* tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  x86 thermal: Disable power limit notification interrupt by default
  x86 thermal: Delete power-limit-notification console messages
2013-07-03 11:16:09 -07:00
Linus Torvalds
7c6809ff2b Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 UV update from Ingo Molnar:
 "There's a single commit in this tree, which adds support for a new SGI
  UV GRU (Global Reference Unit - fast NUMA messaging ASIC) hardware
  feature to scale up and beyond: an optional distributed mode that will
  allow per-node address mapping of local GRU space, as opposed to
  mapping all GRU hardware to the same contiguous high space"

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/UV: Add GRU distributed mode mappings
2013-07-02 16:33:05 -07:00
Linus Torvalds
96a3d998fb Merge branch 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tracing updates from Ingo Molnar:
 "This tree adds IRQ vector tracepoints that are named after the handler
  and which output the vector #, based on a zero-overhead approach that
  relies on changing the IDT entries, by Seiji Aguchi.

  The new tracepoints look like this:

   # perf list | grep -i irq_vector
    irq_vectors:local_timer_entry                      [Tracepoint event]
    irq_vectors:local_timer_exit                       [Tracepoint event]
    irq_vectors:reschedule_entry                       [Tracepoint event]
    irq_vectors:reschedule_exit                        [Tracepoint event]
    irq_vectors:spurious_apic_entry                    [Tracepoint event]
    irq_vectors:spurious_apic_exit                     [Tracepoint event]
    irq_vectors:error_apic_entry                       [Tracepoint event]
    irq_vectors:error_apic_exit                        [Tracepoint event]
   [...]"

* 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tracing: Add config option checking to the definitions of mce handlers
  trace,x86: Do not call local_irq_save() in load_current_idt()
  trace,x86: Move creation of irq tracepoints from apic.c to irq.c
  x86, trace: Add irq vector tracepoints
  x86: Rename variables for debugging
  x86, trace: Introduce entering/exiting_irq()
  tracing: Add DEFINE_EVENT_FN() macro
2013-07-02 16:31:49 -07:00
Linus Torvalds
3045f94a20 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The changes in this tree are:

   - ACPI APEI (ACPI Platform Error Interface) improvements, by Chen
     Gong
   - misc MCE fixes/cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Update MCE severity condition check
  mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
  ACPI/APEI: Update einj documentation for param1/param2
  ACPI/APEI: Add parameter check before error injection
  ACPI, APEI, EINJ: Fix error return code in einj_init()
  x86, mce: Fix "braodcast" typo
2013-07-02 16:30:46 -07:00
Linus Torvalds
1982269a5c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "Misc improvements:

   - Fix /proc/mtrr reporting
   - Fix ioremap printout
   - Remove the unused pvclock fixmap entry on 32-bit
   - misc cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioremap: Correct function name output
  x86: Fix /proc/mtrr with base/size more than 44bits
  ix86: Don't waste fixmap entries
  x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>
  x86_64: Correct phys_addr in cleanup_highmap comment
2013-07-02 16:29:05 -07:00
Linus Torvalds
fdd78889aa Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading update from Ingo Molnar:
 "Two main changes that improve microcode loading on AMD CPUs:

   - Add support for all-in-one binary microcode files that concatenate
     the microcode images of multiple processor families, by Jacob Shin

   - Add early microcode loading (embedded in the initrd) support, also
     by Jacob Shin"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, amd: Another early loading fixup
  x86, microcode, amd: Allow multiple families' bin files appended together
  x86, microcode, amd: Make find_ucode_in_initrd() __init
  x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m
  x86, microcode, amd: Early microcode patch loading support for AMD
  x86, microcode, amd: Refactor functions to prepare for early loading
  x86, microcode: Vendor abstract out save_microcode_in_initrd()
  x86, microcode, intel: Correct typo in printk
2013-07-02 16:28:10 -07:00
Linus Torvalds
d652df0b2f Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU changes from Ingo Molnar:
 "There are two bigger changes in this tree:

   - Add an [early-use-]safe static_cpu_has() variant and other
     robustness improvements, including the new X86_DEBUG_STATIC_CPU_HAS
     configurable debugging facility, motivated by recent obscure FPU
     code bugs, by Borislav Petkov

   - Reimplement FPU detection code in C and drop the old asm code, by
     Peter Anvin."

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: Use static_cpu_has_safe before alternatives
  x86: Add a static_cpu_has_safe variant
  x86: Sanity-check static_cpu_has usage
  x86, cpu: Add a synthetic, always true, cpu feature
  x86: Get rid of ->hard_math and all the FPU asm fu
2013-07-02 16:26:44 -07:00
Linus Torvalds
55a0d3ff60 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug update from Ingo Molnar:
 "Misc debuggability improvements:

   - Optimize the x86 CPU register printout a bit
   - Expose the tboot TXT log via debugfs
   - Small do_debug() cleanup"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tboot: Provide debugfs interfaces to access TXT log
  x86: Remove weird PTR_ERR() in do_debug
  x86/debug: Only print out DR registers if they are not power-on defaults
2013-07-02 16:25:06 -07:00
Linus Torvalds
35c23d5d79 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "Two changes:

   - Extend 32-bit double fault debugging aid to 64-bit
   - Fix a build warning"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/cacheinfo: Shut up last long-standing warning
  x86: Extend #DF debugging aid to 64-bit
2013-07-02 16:24:24 -07:00
Linus Torvalds
57935b262c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc x86 cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S
  x86, cleanups: Remove extra tab in __flush_tlb_one()
  x86/mce: Remove check for CONFIG_X86_MCE_P4THERMAL
2013-07-02 16:23:50 -07:00
Linus Torvalds
002e44bfb5 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull asm/x86 changes from Ingo Molnar:
 "Misc changes, with a bigger processor-flags cleanup/reorganization by
  Peter Anvin"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, asm, cleanup: Replace open-coded control register values with symbolic
  x86, processor-flags: Fix the datatypes and add bit number defines
  x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE
  x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
  linux/const.h: Add _BITUL() and _BITULL()
  x86/vdso: Convert use of typedef ctl_table to struct ctl_table
  x86: __force_order doesn't need to be an actual variable
2013-07-02 16:21:45 -07:00
Linus Torvalds
f0bb4c0ab0 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel improvements:

   - watchdog driver improvements by Li Zefan
   - Power7 CPI stack events related improvements by Sukadev Bhattiprolu
   - event multiplexing via hrtimers and other improvements by Stephane
     Eranian
   - kernel stack use optimization by Andrew Hunter
   - AMD IOMMU uncore PMU support by Suravee Suthikulpanit
   - NMI handling rate-limits by Dave Hansen
   - various hw_breakpoint fixes by Oleg Nesterov
   - hw_breakpoint overflow period sampling and related signal handling
     fixes by Jiri Olsa
   - Intel Haswell PMU support by Andi Kleen

  Tooling improvements:

   - Reset SIGTERM handler in workload child process, fix from David
     Ahern.
   - Makefile reorganization, prep work for Kconfig patches, from Jiri
     Olsa.
   - Add automated make test suite, from Jiri Olsa.
   - Add --percent-limit option to 'top' and 'report', from Namhyung
     Kim.
   - Sorting improvements, from Namhyung Kim.
   - Expand definition of sysfs format attribute, from Michael Ellerman.

  Tooling fixes:

   - 'perf tests' fixes from Jiri Olsa.
   - Make Power7 CPI stack events available in sysfs, from Sukadev
     Bhattiprolu.
   - Handle death by SIGTERM in 'perf record', fix from David Ahern.
   - Fix printing of perf_event_paranoid message, from David Ahern.
   - Handle realloc failures in 'perf kvm', from David Ahern.
   - Fix divide by 0 in variance, from David Ahern.
   - Save parent pid in thread struct, from David Ahern.
   - Handle JITed code in shared memory, from Andi Kleen.
   - Fixes for 'perf diff', from Jiri Olsa.
   - Remove some unused struct members, from Jiri Olsa.
   - Add missing liblk.a dependency for python/perf.so, fix from Jiri
     Olsa.
   - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
   - No need to do locking when adding hists in perf report, only 'top'
     needs that, from Namhyung Kim.
   - Fix alignment of symbol column in in the hists browser (top,
     report) when -v is given, from NAmhyung Kim.
   - Fix 'perf top' -E option behavior, from Namhyung Kim.
   - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
   - Fix compile errors in bp_signal 'perf test', from Sukadev
     Bhattiprolu.

  ... and more things"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
  perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
  perf/x86: Fix shared register mutual exclusion enforcement
  perf/x86/intel: Support full width counting
  x86: Add NMI duration tracepoints
  perf: Drop sample rate when sampling is too slow
  x86: Warn when NMI handlers take large amounts of time
  hw_breakpoint: Introduce "struct bp_cpuinfo"
  hw_breakpoint: Simplify *register_wide_hw_breakpoint()
  hw_breakpoint: Introduce cpumask_of_bp()
  hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
  hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
  perf/x86/intel: Add mem-loads/stores support for Haswell
  perf/x86/intel: Support Haswell/v4 LBR format
  perf/x86/intel: Move NMI clearing to end of PMI handler
  perf/x86/intel: Add Haswell PEBS support
  perf/x86/intel: Add simple Haswell PMU support
  perf/x86/intel: Add Haswell PEBS record support
  perf/x86/intel: Fix sparse warning
  perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
  perf/x86/amd: Add IOMMU Performance Counter resource management
  ...
2013-07-02 16:15:23 -07:00
Seiji Aguchi
4787c368a9 x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()
Reschedule vector tracepoints may be called in cpu idle state.
This causes lockdep check warning below.

The tracepoint requires rcu but for accuracy it also
requires irq_enter() (tracepoints record the irq context), thus,
the tracepoint interrupt handler should be calling irq_enter()
and not rcu_irq_enter() (irq_enter() calls rcu_irq_enter()).

So, add irq_enter/exit() to smp_trace_reschedule_interrupt()
with common pre/post processing functions, smp_entering_irq()
and exiting_irq() (exiting_irq() calls just irq_exit()
 in arch/x86/include/asm/apic.h),
because these can be shared among reschedule, call_function,
and call_function_single vectors.

[   50.720557] Testing event reschedule_exit:
[   50.721349]
[   50.721502] ===============================
[   50.721835] [ INFO: suspicious RCU usage. ]
[   50.722169] 3.10.0-rc6-00004-gcf910e8 #190 Not tainted
[   50.722582] -------------------------------
[   50.722915] /c/kernel-tests/src/linux/arch/x86/include/asm/trace/irq_vectors.h:50 suspicious rcu_dereference_check() usage!
[   50.723770]
[   50.723770] other info that might help us debug this:
[   50.723770]
[   50.724385]
[   50.724385] RCU used illegally from idle CPU!
[   50.724385] rcu_scheduler_active = 1, debug_locks = 0
[   50.725232] RCU used illegally from extended quiescent state!
[   50.725690] no locks held by swapper/0/0.
[   50.726010]
[   50.726010] stack backtrace:
[...]

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/51CDCFA3.9080101@hds.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-02 09:52:31 +02:00
Wang YanQing
237d154854 x86: Fix override new_cpu_data.x86 with 486
We should set X86 to 486 before use cpuid to detect the cpu type, if
we set X86 to 486 after cpuid, then we will get 486 until cpu_detect
runs.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Link: http://lkml.kernel.org/r/20130628144516.GA2177@udknight
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-28 15:27:29 -07:00
H. Peter Anvin
9f84b6267c Merge remote-tracking branch 'origin/x86/fpu' into queue/x86/cpu
Use the union of 3.10 x86/cpu and x86/fpu as baseline.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-28 15:26:17 -07:00
Rafael J. Wysocki
3b4550e0e0 Merge branch 'acpi-pm'
* acpi-pm:
  ACPI / PM: Rework and clean up acpi_dev_pm_get_state()
  ACPI / PM: Replace ACPI_STATE_D3 with ACPI_STATE_D3_COLD in device_pm.c
  ACPI / PM: Rename function acpi_device_power_state() and make it static
  ACPI / PM: acpi_processor_suspend() can be static
  xen / ACPI / sleep: Register an acpi_suspend_lowlevel callback.
  x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel.
2013-06-28 12:58:30 +02:00
Ingo Molnar
fb476cffd5 Changes to simplify the SDM mean that we can also simplify
the code for SRAR (software recoverable action required) errors.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRzKq7AAoJEKurIx+X31iBviQP+gJ6b9BemXj8/TLjWN3P3mAB
 bQHU9imlSXmggfauC1C4/b60k4G/D8d1hN1Lboak7TbFvY69ONcSodY9w6O9Lknb
 r83xT0oUIe5IYWqC42+godt8P59ed10lUa8F5eaA9DhhzcsCWO2GG/ITqCmyjEom
 zJGKlLytq6ViJf+6ig3gTmWgHU4tvlJvOPh/O0WNWXH9kbOtnsLT8ImY9GMO/R1B
 LdItg2nxbDZmhjLOsKamaj2YpPbZ9Vtvam9CnQwirbUAkdGZfmRsq+K9y8O5MjHJ
 GRqOKwQ3BdKBUIEu36BxgIGHSg8hWYSefGD8nz8IP+QHwy5fx7SREdgx5M+AMSFt
 8DewSUij2wMwH/vuP4KrdYWva6LNMBtR+NItCkzpdY5ykcdoSzo06nCotGcUrxVN
 iEmYywZzO3CTYJj0lWqdrhhiFln/9ZrhXaWUbRoX5m3HAQK7XeTPT+3O7Vs5fWDN
 3Fb/0EzyJjmv+Mysl8/PvR9umJs3ELK5MiVLIAIkuozh8+L33byk/WX+VOfqTsY4
 KXYDm6fck3tRfn+PJGrAMSwIEyhgVJArRJC39PydEUIwjlel6DRU0Rs26wQXtFh8
 DHxTIiEubmvctvi3DnHOztOI3d3EkAayU6Dk4cNSy46FwujJC0v+07EQav6BLA3e
 x1HK+3PVVSQYuqFIeyWa
 =PJXS
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE cleanup from Tony Luck:

 "Changes to simplify the SDM means that we can also simplify
  the code for SRAR (software recoverable action required) errors."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28 11:09:26 +02:00
Qiaowei Ren
13bfd47a0e x86/tboot: Provide debugfs interfaces to access TXT log
These logs come from tboot (Trusted Boot, an open source,
pre-kernel/VMM module that uses Intel TXT to perform a
measured and verified launch of an OS kernel/VMM.).

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Gang Wei <gang.wei@intel.com>
Link: http://lkml.kernel.org/r/1372053333-21788-1-git-send-email-qiaowei.ren@intel.com
[ Beautified the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28 11:05:16 +02:00
Chen Gong
33d7885b59 x86/mce: Update MCE severity condition check
Update some SRAR severity conditions check to make it clearer,
according to latest Intel SDM Vol 3(June 2013), table 15-20.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-27 13:45:52 -07:00
Dave Airlie
4300a0f8bd Linux 3.10-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEbBAABAgAGBQJRxf9cAAoJEHm+PkMAQRiGMWkH911xM4gRmFgE7SqVW4F4AWBm
 ngcqMqNy9IdqKfibORUUDvVfEa5gjD5ai2quIKpfQiaukbpQJ696H90ijuAkajLn
 DQBrN243s0pzhhc/quWINnWxsFQ613JjdUMUMaD7e9A1aKjYzWrPGt/tSjrFXGCP
 tArTupVzc/iOmnEQDKiROI/Nokq44QJ36aTGPM7n08xMtpKmkCXM+9/UosBteB0O
 HVI33dmjwz7i55fI53XAWyuZCE+gSEnA4z8spJ9LfXso2W14V+roc+GuL6OyeeTI
 pCn/+4niVPb4B0ROZlpyVmdZjbPPcMMEK5o+BSJI68SH6LHZTQh2iVuqYfpSyA==
 =uUH5
 -----END PGP SIGNATURE-----

Merge tag 'v3.10-rc7' into drm-next

Linux 3.10-rc7

The sdvo lvds fix in this -fixes pull

commit c3456fb3e4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Jun 10 09:47:58 2013 +0200

    drm/i915: prefer VBT modes for SVDO-LVDS over EDID

has a silent functional conflict with

commit 990256aec2
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Fri May 31 12:17:07 2013 +0000

    drm: Add probed modes in probe order

in drm-next. W simply need to add the vbt modes before edid modes, i.e. the
other way round than now.

Conflicts:
	drivers/gpu/drm/drm_prime.c
	drivers/gpu/drm/i915/intel_sdvo.c
2013-06-27 20:40:44 +10:00
Jacob Shin
9608d33b82 x86, microcode, amd: Another early loading fixup
commit cd1c32ca96 is an early premature
rendition of the patch. Augment it with this delta patch to:
  * correctly mark offset and size of the matching bin file
  * use __pa instead of __pa_nodebug during AP load
  * check for !initrd_start before using it

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20130620152414.GA6676@jshin-Toonie
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-26 14:55:37 -07:00
Stephane Eranian
983433b581 perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
Make sure intel_pmu_pebs_disable() and intel_pmu_pebs_enable()
are symmetrical w.r.t. PEBS-LL and precise store.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1371824448-7306-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 21:58:51 +02:00
Stephane Eranian
2f7f73a520 perf/x86: Fix shared register mutual exclusion enforcement
This patch fixes a problem with the shared registers mutual
exclusion code and incremental event scheduling by the
generic perf_event code.

There was a bug whereby the mutual exclusion on the shared
registers was not enforced because of incremental scheduling
abort due to event constraints. As an example on Intel
Nehalem, consider the following events:

group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE
group2= L1D_CACHE_LD:I_STATE

The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there
are 3 instances here. The first group can be scheduled and is committed.
Then, the generic code tries to schedule group2 and this fails (because
there is no more counter to support the 3rd instance of L1D_CACHE_LD).
But in x86_schedule_events() error path, put_event_contraints() is invoked
on ALL the events and not just the ones that just failed. That causes the
"lock" on the shared offcore_response MSR to be released. Yet the first group
is actually scheduled and is exposed to reprogramming of that shared msr by
the sibling HT thread. In other words, there is no guarantee on what is
measured.

This patch fixes the problem by tagging committed events with the
PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(),
only the events NOT tagged have their constraint released. The tag
is eventually removed when the event in descheduled.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130620164254.GA3556@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 21:58:49 +02:00
Linus Torvalds
54faf77d06 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Three small fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
  hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
  kprobes: Fix arch_prepare_kprobe to handle copy insn failures
2013-06-26 08:51:44 -10:00
Andi Kleen
069e0c3c40 perf/x86/intel: Support full width counting
Recent Intel CPUs like Haswell and IvyBridge have a new
alternative MSR range for perfctrs that allows writing the full
counter width. Enable this range if the hardware reports it
using a new capability bit.

Currently the perf code queries CPUID to get the counter width,
and sign extends the counter values as needed. The traditional
PERFCTR MSRs always limit to 32bit, even though the counter
internally is larger (usually 48 bits on recent CPUs)

When the new capability is set use the alternative range which
do not have these restrictions.

This lowers the overhead of perf stat slightly because it has to
do less interrupts to accumulate the counter value. On Haswell
it also avoids some problems with TSX aborting when the end of
the counter range is reached.

( See the patch "perf/x86/intel: Avoid checkpointed counters
  causing excessive TSX aborts" for more details. )

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 11:59:25 +02:00
Ingo Molnar
ca02c21674 Better comments so we understand our existing machine check
bank bitmaps - prelude to adding another bitmap soon.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRygQkAAoJEKurIx+X31iB5RsP/R6w/X4wbKlXD8K8e2qhSXU2
 oYVtaMN46M+xLRdNDdE1Y+clVKDQTuapfrlOtBLZVHIkNfqF5jAtu2O1wTERpx+F
 7lDHuWuFbddbqV2vqE6RJ6ZqsUTsrlzrqLSLWGSoHn36DlnkSm+L5vDDG75QzV+w
 fPpGpJTM3kRxPUHnmwvfniJxTiBjsli6FDZ1vGq4Ingu+xxOJDU6+FdWts08hn4Y
 +KLjs1JMJSdo3di05T6t4ARKBgW3B1lJnrIS6fGYMmax+kHQqO/zvzHs3A86LZyN
 7L0toEoZHa8VRMN6C1mw0XsFwPmyMOsrHrRpBlFgHpC7QLAzx6LkxchyDBqPL0mo
 W4bnoyu1QZUvblq1mrsnJa9yeyMjSHAeq2XTBj8Pbv20AKzmCbNl3aJ5tCDhPj4Y
 A+e8vk/tq8zWCjdf3SV/D+wjgEtkeWALZZj70maufu9AKtKXy5repa6DZCKEhyIC
 yh72c3gZ25IkxjwcHmJh+CYaKll9MgdW5toZkftBin2iOdWSaHS9QX2r4I5Awvbi
 m0Iwq5Fg8DSs3AhBLxiJi8L7N+RNa+7GoTH0z6UDtfAY3ExvepfwjPzLxAfTbPw6
 oK2wLfuPiNizl2DX1yNlizGD2YSRiWKoap3Iog84A3IMTIe1nJ+97GHvsVcGna8I
 zXjnZDNQThAgBswtNY1s
 =U/Kg
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-bitmap-comment' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE updates from Tony Luck:

 "Better comments so we understand our existing machine check
  bank bitmaps - prelude to adding another bitmap soon."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 10:53:45 +02:00
H. Peter Anvin
a3d7b7dddc x86, asm, cleanup: Replace open-coded control register values with symbolic
Clean up an unnecessary open-coded control register values.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-um7za1nzf6brb17o0h4om6e3@git.kernel.org
2013-06-25 16:26:06 -07:00
H. Peter Anvin
1adfa76a95 x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
Bit 1 in the x86 EFLAGS is always set.  Name the macro something that
actually tries to explain what it is all about, rather than being a
tautology.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/n/tip-f10rx5vjjm6tfnt8o1wseb3v@git.kernel.org
2013-06-25 16:25:32 -07:00
Naveen N. Rao
0644414e62 mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
per-cpu bitmaps.  Provide comments so that we all know exactly what these
are used for, and why.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-25 13:53:27 -07:00
Yinghai Lu
d5c78673b1 x86: Fix /proc/mtrr with base/size more than 44bits
On one sytem that mtrr range is more then 44bits, in dmesg we have
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-DFFFF write-through
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
[    0.000000]   1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
[    0.000000]   2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
[    0.000000]   5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
[    0.000000]   6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   8 disabled
[    0.000000]   9 disabled

but /proc/mtrr report wrong:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x81ffa000000 (8519584MB), size=   32MB, count=1: write-through
reg05: base=0x81ffc000000 (8519616MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

so bit 44 and bit 45 get cut off.

We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
1. for base, we miss cast base_lo to 64bit before shifting.
Fix that by adding u64 casting.

2. for size, it only can handle 44 bits aka 32bits + page_shift
Fix that with 64bit mask instead of 32bit mask_lo, then range could be
more than 44bits.
At the same time, we need to update size_or_mask for old cpus that does
support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
to all 1s, otherwise will not get correct size for them.

Also fix mtrr_add_page: it should check base and (base + size - 1)
instead of base and size, as base and size could be small but
base + size could bigger enough to be out of boundary. We can
use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.

So When are we going to have size more than 44bits? that is 16TiB.

after patch we have right ouput:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x381ffa000000 (58851232MB), size=   32MB, count=1: write-through
reg05: base=0x381ffc000000 (58851264MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

-v2: simply checking in mtrr_add_page according to hpa.

[ hpa: This probably wants to go into -stable only after having sat in
  mainline for a bit.  It is not a regression. ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-25 13:08:10 -07:00
H. Peter Anvin
5236eb968e Merge remote-tracking branch 'trace/tip/x86/trace' into x86/trace
Fix from Steven Rostedt.
2013-06-24 11:01:09 -07:00
Grant Likely
ddaf144c61 irqdomain: Refactor irq_domain_associate_many()
Originally, irq_domain_associate_many() was designed to unwind the
mapped irqs on a failure of any individual association. However, that
proved to be a problem with certain IRQ controllers. Some of them only
support a subset of irqs, and will fail when attempting to map a
reserved IRQ. In those cases we want to map as many IRQs as possible, so
instead it is better for irq_domain_associate_many() to make a
best-effort attempt to map irqs, but not fail if any or all of them
don't succeed. If a caller really cares about how many irqs got
associated, then it should instead go back and check that all of the
irqs is cares about were mapped.

The original design open-coded the individual association code into the
body of irq_domain_associate_many(), but with no longer needing to
unwind associations, the code becomes simpler to split out
irq_domain_associate() to contain the bulk of the logic, and
irq_domain_associate_many() to be a simple loop wrapper.

This patch also adds a new error check to the associate path to make
sure it isn't called for an irq larger than the controller can handle,
and adds locking so that the irq_domain_mutex is held while setting up a
new association.

v3: Fixup missing change to irq_domain_add_tree()
v2: Fixup x86 warning. irq_domain_associate_many() no longer returns an
    error code, but reports errors to the printk log directly. In the
    majority of cases we don't actually want to fail if there is a
    problem, but rather log it and still try to boot the system.

Signed-off-by: Grant Likely <grant.likely@linaro.org>

irqdomain: Fix flubbed irq_domain_associate_many refactoring

commit d39046ec72, "irqdomain: Refactor irq_domain_associate_many()" was
missing the following hunk which causes a boot failure on anything using
irq_domain_add_tree() to allocate an irq domain.

Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2013-06-24 14:01:42 +01:00
Dave Hansen
0c4df02d73 x86: Add NMI duration tracepoints
This patch has been invaluable in my adventures finding
issues in the perf NMI handler.  I'm as big a fan of
printk() as anybody is, but using printk() in NMIs is
deadly when they're happening frequently.

Even hacking in trace_printk() ended up eating enough
CPU to throw off some of the measurements I was making.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:58 +02:00
Dave Hansen
14c63f17b1 perf: Drop sample rate when sampling is too slow
This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second.  If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.

This way, we don't have a runaway sampling process eating up the
CPU.

This patch can tend to drop the sample rate down to level where
perf doesn't work very well.  *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.

I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.

BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
[ Prettified it a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:57 +02:00
Dave Hansen
2ab00456ea x86: Warn when NMI handlers take large amounts of time
I have a system which is causing all kinds of problems.  It has
8 NUMA nodes, and lots of cores that can fight over cachelines.
If things are not working _perfectly_, then NMIs can take longer
than expected.

If we get too many of them backed up to each other, we can
easily end up in a situation where we are doing nothing *but*
running NMIs.  The biggest problem, though, is that this happens
_silently_.  You might be lucky to get an hrtimer warning, but
most of the time system simply hangs.

This patch should at least give us some warning before we fall
off the cliff.  the warnings look like this:

	nmi_handle: perf_event_nmi_handler() took: 26095071 ns

The message is triggered whenever we notice the longest NMI
we've seen to date.  You can always view and reset this value
via the debugfs interface if you like.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:56 +02:00
Seiji Aguchi
33e5ff634f x86/tracing: Add config option checking to the definitions of mce handlers
In case CONFIG_X86_MCE_THRESHOLD and CONFIG_X86_THERMAL_VECTOR
are disabled, kernel build fails as follows.

   arch/x86/built-in.o: In function `trace_threshold_interrupt':
   (.entry.text+0x122b): undefined reference to `smp_trace_threshold_interrupt'
   arch/x86/built-in.o: In function `trace_thermal_interrupt':
   (.entry.text+0x132b): undefined reference to `smp_trace_thermal_interrupt'

In this case, trace_threshold_interrupt/trace_thermal_interrupt
are not needed to define.

So, add config option checking to their definitions in entry_64.S.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/51C58B8A.2080808@hds.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:41:36 +02:00
Steven Rostedt (Red Hat)
2b4bc78956 trace,x86: Do not call local_irq_save() in load_current_idt()
As load_current_idt() is now what is used to update the IDT for the
switches needed for NMI, lockdep debug, and for tracing, it must not
call local_irq_save(). This is because one of the users of this is
lockdep, which does tracing of local_irq_save() and when the debug
trap is hit, we need to update the IDT before tracing interrupts
being disabled. As load_current_idt() is used to do this, calling
local_irq_save() which lockdep traces, defeats the point of calling
load_current_idt().

As interrupts are already disabled when used by lockdep and NMI, the
only other user is tracing that can disable interrupts itself. Simply
have the tracing update disable interrupts before calling load_current_idt()
instead of breaking the other users.

Here's the dump that happened:

------------[ cut here ]------------
WARNING: at /work/autotest/nobackup/linux-test.git/kernel/fork.c:1196 copy_process+0x2c3/0x1398()
DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled)
Modules linked in:
CPU: 1 PID: 4570 Comm: gdm-simple-gree Not tainted 3.10.0-rc3-test+ #5
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 ffffffff81d2a7a5 ffff88006ed13d50 ffffffff8192822b ffff88006ed13d90
 ffffffff81035f25 ffff8800721c6000 ffff88006ed13da0 0000000001200011
 0000000000000000 ffff88006ed5e000 ffff8800721c6000 ffff88006ed13df0
Call Trace:
 [<ffffffff8192822b>] dump_stack+0x19/0x1b
 [<ffffffff81035f25>] warn_slowpath_common+0x67/0x80
 [<ffffffff81035fe1>] warn_slowpath_fmt+0x46/0x48
 [<ffffffff812bfc5d>] ? __raw_spin_lock_init+0x31/0x52
 [<ffffffff810341f7>] copy_process+0x2c3/0x1398
 [<ffffffff8103539d>] do_fork+0xa8/0x260
 [<ffffffff810ca7b1>] ? trace_preempt_on+0x2a/0x2f
 [<ffffffff812afb3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff810355cf>] SyS_clone+0x16/0x18
 [<ffffffff81938369>] stub_clone+0x69/0x90
 [<ffffffff81937fc2>] ? system_call_fastpath+0x16/0x1b
---[ end trace 8b157a9d20ca1aa2 ]---

in fork.c:

 #ifdef CONFIG_PROVE_LOCKING
	DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled); <-- bug here
	DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 #endif

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-22 13:16:19 -04:00
Linus Torvalds
f71194a7d4 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This series fixes a couple of build failures, and fixes MTRR cleanup
  and memory setup on very specific memory maps.

  Finally, it fixes triggering backtraces on all CPUs, which was
  inadvertently disabled on x86."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Fix dummy variable buffer allocation
  x86: Fix trigger_all_cpu_backtrace() implementation
  x86: Fix section mismatch on load_ucode_ap
  x86: fix build error and kconfig for ia32_emulation and binfmt
  range: Do not add new blank slot with add_range_with_merge
  x86, mtrr: Fix original mtrr range get for mtrr_cleanup
2013-06-21 06:33:48 -10:00
Linus Torvalds
9d0be540d7 KVM fixes for 3.10-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRwc4AAAoJEBvWZb6bTYbyT/wQAIAvvjM+jlqOsZCUjnutdWXf
 VGvJB1jXPEa6cuUkaE3RYO8rLLJz/PC4Geg8I/awtjeSkPvldeu0FihSkdbrNkpJ
 3dSAJ0cR8+ScUZB74Lt1pfwwEAdZk61MU1opD1qy3wWTy5Rk2xp4qzE2aCVJoLkh
 K8gscLrCHR47DELcm+AWCtHt/0VOZSAVe9z/7Qf9eAMCNmH/efHgrZZTxF/1aBkI
 7XaZdqT+d84D/Bc2isdRNvFYt9rkuII2GCeRciw2t3QsgVoBXn7FVK2OXbufIaYW
 /XX0YiNpSnUpcpOtGw2MwzpKgDSRE9QbnOUVsWThTDeG7Q5d72ioCFuRDba8p3Lc
 u146nQM495bAdJQz6IW3JCeiZlQ86/QjFquk9hJuCwQpuHmlRpVEqFbLVnZEnYvk
 lc9dnK7BcaftbMgo/j6DMv6Bpq/EDp1JrPXzNT4BwG7ptArEXzmdoktwpONjrZzI
 1oZV5xCcNhTjyqlndiZKxneKXoqz/3NOdR+EV0yh3+0l4PK6C52jd6PaKRv7qN3l
 Lt35uFrmKqzOR12KVStYREyh1Sy43ADE4tIIPnOBbcXz23Df4/vVSb1Bqv0nPq4i
 JhsRsGoR8ZdDBd7c40XlvpxOiMmubFLSklX0WaKn2yIZH9FMj63Ak0wet7qhaqb8
 lbJRUH8gnK0RCVEoUFpB
 =yGPS
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Three one-line fixes for my first pull request; one for x86 host, one
  for x86 guest, one for PPC"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86: kvmclock: zero initialize pvclock shared memory area
  kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
  KVM: x86: remove vcpu's CPL check in host-invoked XCR set
2013-06-21 06:29:22 -10:00
Steven Rostedt (Red Hat)
83ab85140b trace,x86: Move creation of irq tracepoints from apic.c to irq.c
Compiling without CONFIG_X86_LOCAL_APIC set, apic.c will not be
compiled, and the irq tracepoints will not be created via the
CREATE_TRACE_POINTS macro. When CONFIG_X86_LOCAL_APIC is not set,
we get the following build error:

  LD      init/built-in.o
arch/x86/built-in.o: In function `trace_x86_platform_ipi_entry':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_entry'
arch/x86/built-in.o: In function `trace_x86_platform_ipi_exit':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_exit'
arch/x86/built-in.o: In function `trace_irq_work_entry':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_entry'
arch/x86/built-in.o: In function `trace_irq_work_exit':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_exit'
arch/x86/built-in.o:(__jump_table+0x8): undefined reference to `__tracepoint_x86_platform_ipi_entry'
arch/x86/built-in.o:(__jump_table+0x14): undefined reference to `__tracepoint_x86_platform_ipi_exit'
arch/x86/built-in.o:(__jump_table+0x20): undefined reference to `__tracepoint_irq_work_entry'
arch/x86/built-in.o:(__jump_table+0x2c): undefined reference to `__tracepoint_irq_work_exit'
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

As irq.c is always compiled for x86, it is a more appropriate location
to create the irq tracepoints.

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-21 10:33:28 -04:00
Seiji Aguchi
cf910e83ae x86, trace: Add irq vector tracepoints
[Purpose of this patch]

As Vaibhav explained in the thread below, tracepoints for irq vectors
are useful.

http://www.spinics.net/lists/mm-commits/msg85707.html

<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit
provide when an interrupt is handled.  They provide good data about when
the system has switched to kernel space and how it affects the currently
running processes.

There are some IRQ vectors which trigger the system into kernel space,
which are not handled in generic IRQ handlers.  Tracing such events gives
us the information about IRQ interaction with other system events.

The trace also tells where the system is spending its time.  We want to
know which cores are handling interrupts and how they are affecting other
processes in the system.  Also, the trace provides information about when
the cores are idle and which interrupts are changing that state.
<snip>

On the other hand, my usecase is tracing just local timer event and
getting a value of instruction pointer.

I suggested to add an argument local timer event to get instruction pointer before.
But there is another way to get it with external module like systemtap.
So, I don't need to add any argument to irq vector tracepoints now.

[Patch Description]

Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.

So, add following tracepoints instead of introducing irq_vector_entry/exit.
so that we can enable them independently.
   - local_timer_vector
   - reschedule_vector
   - call_function_vector
   - call_function_single_vector
   - irq_work_entry_vector
   - error_apic_vector
   - thermal_apic_vector
   - threshold_apic_vector
   - spurious_apic_vector
   - x86_platform_ipi_vector

Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
makes a zero when tracepoints are disabled. Detailed explanations are as follows.
 - Create trace irq handlers with entering_irq()/exiting_irq().
 - Create a new IDT, trace_idt_table, at boot time by adding a logic to
   _set_gate(). It is just a copy of original idt table.
 - Register the new handlers for tracpoints to the new IDT by introducing
   macros to alloc_intr_gate() called at registering time of irq_vector handlers.
 - Add checking, whether irq vector tracing is on/off, into load_current_idt().
   This has to be done below debug checking for these reasons.
   - Switching to debug IDT may be kicked while tracing is enabled.
   - On the other hands, switching to trace IDT is kicked only when debugging
     is disabled.

In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
used for other purposes.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:34 -07:00
Seiji Aguchi
629f4f9d59 x86: Rename variables for debugging
Rename variables for debugging to describe meaning of them precisely.

Also, introduce a generic way to switch IDT by checking a current state,
debug on/off.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:13 -07:00
Seiji Aguchi
eddc0e922a x86, trace: Introduce entering/exiting_irq()
When implementing tracepoints in interrupt handers, if the tracepoints are
simply added in the performance sensitive path of interrupt handers,
it may cause potential performance problem due to the time penalty.

To solve the problem, an idea is to prepare non-trace/trace irq handers and
switch their IDTs at the enabling/disabling time.

So, let's introduce entering_irq()/exiting_irq() for pre/post-
processing of each irq handler.

A way to use them is as follows.

Non-trace irq handler:
smp_irq_handler()
{
	entering_irq();		/* pre-processing of this handler */
	__smp_irq_handler();	/*
				 * common logic between non-trace and trace handlers
				 * in a vector.
				 */
	exiting_irq();		/* post-processing of this handler */

}

Trace irq_handler:
smp_trace_irq_handler()
{
	entering_irq();		/* pre-processing of this handler */
	trace_irq_entry();	/* tracepoint for irq entry */
	__smp_irq_handler();	/*
				 * common logic between non-trace and trace handlers
				 * in a vector.
				 */
	trace_irq_exit();	/* tracepoint for irq exit */
	exiting_irq();		/* post-processing of this handler */

}

If tracepoints can place outside entering_irq()/exiting_irq() as follows,
it looks cleaner.

smp_trace_irq_handler()
{
	trace_irq_entry();
	smp_irq_handler();
	trace_irq_exit();
}

But it doesn't work.
The problem is with irq_enter/exit() being called. They must be called before
trace_irq_enter/exit(),  because of the rcu_irq_enter() must be called before
any tracepoints are used, as tracepoints use  rcu to synchronize.

As a possible alternative, we may be able to call irq_enter() first as follows
if irq_enter() can nest.

smp_trace_irq_hander()
{
	irq_entry();
	trace_irq_entry();
	smp_irq_handler();
	trace_irq_exit();
	irq_exit();
}

But it doesn't work, either.
If irq_enter() is nested, it may have a time penalty because it has to check if it
was already called or not. The time penalty is not desired in performance sensitive
paths even if it is tiny.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:01 -07:00
H. Peter Anvin
f037e416af x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S
There is no point in using "xorq" to clear a register... use "xorl" to
clear the bottom 32 bits, and the upper 32 bits get cleared by virtue
of zero extension.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/n/tip-b76zi1gep39c0zs8fbvkhie9@git.kernel.org
2013-06-20 21:30:04 -07:00
H. Peter Anvin
e6bca5a6a8 Linux 3.10-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRvOHmAAoJEHm+PkMAQRiGpOYIAI/OMDjCMro8S1FjDPGr+wsz
 r/u4h4DiHQTdAUSBYhksgXVBSm02hGhDF3tz7u9oAXuv+4zlGUMZNjmEmLOFfdyd
 Ve9vqMrUkdwA0jt0d7AYVjtCy/j/yVe5szXZCksHXOZiyb78SEZPQgHc13CJ4lKI
 K2jPk0sMko7d5ptALPIX+ome48M+3w3k1HuQ62Znm4pFSADcFGpGdf40HP/5LL+I
 Llp3XHJ5OZrcHRal0t39oHJOCWW61bnYXTWel9J7XoUztebF2cRgFydAhtrto42z
 x7ELOKVY197WH8l56cnVHrISneQd4kgWrooxLYy6QsJl73/qvQBX+7nmkoes7so=
 =qeV0
 -----END PGP SIGNATURE-----

Merge tag 'v3.10-rc6' into x86/cleanups

Linux 3.10-rc6

We need a change that is the mainline tree for further work.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 21:13:55 -07:00
Borislav Petkov
4a90a99c4f x86: Add a static_cpu_has_safe variant
We want to use this in early code where alternatives might not have run
yet and for that case we fall back to the dynamic boot_cpu_has.

For that, force a 5-byte jump since the compiler could be generating
differently sized jumps for each label.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:38:14 -07:00
Borislav Petkov
5700f743b5 x86: Sanity-check static_cpu_has usage
static_cpu_has may be used only after alternatives have run. Before that
it always returns false if constant folding with __builtin_constant_p()
doesn't happen. And you don't want that.

This patch is the result of me debugging an issue where I overzealously
put static_cpu_has in code which executed before alternatives have run
and had to spend some time with scratching head and cursing at the
monitor.

So add a jump to a warning which screams loudly when we use this
function too early. The alternatives patch that check away in
conjunction with patching the rest of the kernel image.

[ hpa: factored this into its own configuration option.  If we want to
  have an overarching option, it should be an option which selects
  other options, not as a group option in the source code. ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:37:19 -07:00
Borislav Petkov
c3b83598c1 x86, cpu: Add a synthetic, always true, cpu feature
This will be used in alternatives later as an always-replace flag.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:06:07 -07:00
Linus Torvalds
a3d5c3460a Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Two smaller fixes - plus a context tracking tracing fix that is a bit
  bigger"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/context-tracking: Add preempt_schedule_context() for tracing
  sched: Fix clear NOHZ_BALANCE_KICK
  sched/x86: Construct all sibling maps if smt
2013-06-20 08:18:35 -10:00
Linus Torvalds
86c76676cf Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Four fixes.  The mmap ones are unfortunately larger than desired -
  fuzzing uncovered bugs that needed perf context life time management
  changes to fix properly"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
  perf: Fix mmap() accounting hole
  perf: Fix perf mmap bugs
  kprobes: Fix to free gone and unused optprobes
2013-06-20 08:17:36 -10:00
Linus Torvalds
805e318548 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu idle fixes from Thomas Gleixner:
 - Add a missing irq enable. Fallout of the idle conversion
 - Fix stackprotector wreckage caused by the idle conversion

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  idle: Enable interrupts in the weak arch_cpu_idle() implementation
  idle: Add the stack canary init to cpu_startup_entry()
2013-06-20 08:16:07 -10:00
Ingo Molnar
f070a4dba9 Merge branch 'perf/urgent' into perf/core
Merge in two hw_breakpoint fixes, before applying another 5.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:57:40 +02:00
Masami Hiramatsu
003002e04e kprobes: Fix arch_prepare_kprobe to handle copy insn failures
Fix arch_prepare_kprobe() to handle failures in copy instruction
correctly. This fix is related to the previous fix: 8101376
which made __copy_instruction return an error result if failed,
but caller site was not updated to handle it. Thus, this is the
other half of the bugfix.

This fix is also related to the following bug-report:

   https://bugzilla.redhat.com/show_bug.cgi?id=910649

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Jonathan Lebon <jlebon@redhat.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: systemtap@sourceware.org
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20130605031216.15285.2001.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 14:25:48 +02:00
Michel Lespinasse
b52e0a7c4e x86: Fix trigger_all_cpu_backtrace() implementation
The following change fixes the x86 implementation of
trigger_all_cpu_backtrace(), which was previously (accidentally,
as far as I can tell) disabled to always return false as on
architectures that do not implement this function.

trigger_all_cpu_backtrace(), as defined in include/linux/nmi.h,
should call arch_trigger_all_cpu_backtrace() if available, or
return false if the underlying arch doesn't implement this
function.

x86 did provide a suitable arch_trigger_all_cpu_backtrace()
implementation, but it wasn't actually being used because it was
declared in asm/nmi.h, which linux/nmi.h doesn't include. Also,
linux/nmi.h couldn't easily be fixed by including asm/nmi.h,
because that file is not available on all architectures.

I am proposing to fix this by moving the x86 definition of
arch_trigger_all_cpu_backtrace() to asm/irq.h.

Tested via: echo l > /proc/sysrq-trigger

Before the change, this uses a fallback implementation which
shows backtraces on active CPUs (using
smp_call_function_interrupt() )

After the change, this shows NMI backtraces on all CPUs

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1370518875-1346-1-git-send-email-walken@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 14:00:21 +02:00
Borislav Petkov
719038de98 x86/intel/cacheinfo: Shut up last long-standing warning
arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘init_intel_cacheinfo’:
arch/x86/kernel/cpu/intel_cacheinfo.c:642:28: warning: ‘this_leaf.size’ may be used uninitialized in this function [-Wmaybe-uninitialized] arch/x86/kernel/cpu/intel_cacheinfo.c:643:29: warning: ‘this_leaf.eax.split.num_threads_sharing’ may be used uninitialized in this function [-Wmaybe-uninitialized]

This keeps on happening during randbuilds and the compiler is
wrong here:

In the case where cpuid4_cache_lookup_regs() returns 0, both
this_leaf.size and this_leaf.eax get initialized. In the case
where the CPUID leaf doesn't contain valid cache info, we error
out which init_intel_cacheinfo() handles correctly without
touching the abovementioned fields.

So shut up the warning by clearing out the struct which we hand
down.

While at it, reverse error handling and gain one indentation
level.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370710095-20547-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 12:27:41 +02:00
Konrad Rzeszutek Wilk
d6a77ead21 x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel.
Which by default will be x86_acpi_suspend_lowlevel.
This registration allows us to register another callback
if there is a need to use another platform specific callback.

Signed-off-by: Liang Tang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Ben Guthro <benjamin.guthro@citrix.com>
Acked-by: "H. Peter Anvin" <hpa@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-19 23:36:30 +02:00
Rusty Russell
5a802e1530 x86: Remove weird PTR_ERR() in do_debug
62edab905 changed the argument to notify_die() from dr6 to &dr6,
but weirdly, used PTR_ERR() to cast it to a long.  Since dr6 is
on the stack, this is an abuse of PTR_ERR().  Cast to long, as
per kernel standard.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1371357768-4968-8-git-send-email-rusty@rustcorp.com.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 15:01:36 +02:00
Andi Kleen
f9134f36ae perf/x86/intel: Add mem-loads/stores support for Haswell
mem-loads is basically the same as Sandy Bridge,
but we use a separate string for changes later.

Haswell doesn't support the full precise store mode,
so we emulate it using the "DataLA" facility.
This allows to do everything, but for data sources we
can only detect L1 hit or not.

There is no explicit enable bit anymore, so we have
to tie it to a perf internal only flag.

The address is supported for all memory related PEBS
events with DataLA. Instead of only logging for the
load and store events we allow logging it for all
(it will be simply 0 if the current event does not
support it)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-7-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:35 +02:00
Andi Kleen
135c5612c4 perf/x86/intel: Support Haswell/v4 LBR format
Haswell has two additional LBR from flags for TSX: in_tx and
abort_tx, implemented as a new "v4" version of the LBR format.

Handle those in and adjust the sign extension code to still
correctly extend. The flags are exported similarly in the LBR
record to the existing misprediction flag

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-6-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:35 +02:00
Andi Kleen
72db559646 perf/x86/intel: Move NMI clearing to end of PMI handler
This avoids some problems with spurious PMIs on Haswell.
Haswell seems to behave more like P4 in this regard. Do
the same thing as the P4 perf handler by unmasking
the NMI only at the end. Shouldn't make any difference
for earlier family 6 cores.

(Tested on Haswell, IvyBridge, Westmere, Saltwell (Atom).)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:34 +02:00
Andi Kleen
3044318f1f perf/x86/intel: Add Haswell PEBS support
Add simple PEBS support for Haswell.

The constraints are similar to SandyBridge with a few new
events.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:33 +02:00
Andi Kleen
3a632cb229 perf/x86/intel: Add simple Haswell PMU support
Similar to SandyBridge, but has a few new events and two
new counter bits.

There are some new counter flags that need to be prevented
from being set on fixed counters, and allowed to be set
for generic counters.

Also we add support for the counter 2 constraint to handle
all raw events.

(Contains fixes from Stephane Eranian.)

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:33 +02:00
Andi Kleen
130768b8c9 perf/x86/intel: Add Haswell PEBS record support
Add support for the Haswell extended (fmt2) PEBS format.

It has a superset of the nhm (fmt1) PEBS fields, but has a
longer record so we need to adjust the code paths.

The main advantage is the new "EventingRip" support which
directly gives the instruction, not off-by-one instruction. So
with precise == 2 we use that directly and don't try to use LBRs
and walking basic blocks. This lowers the overhead of using
precise significantly.

Some other features are added in later patches.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:32 +02:00
Dave Jones
4338774cd4 x86/debug: Only print out DR registers if they are not power-on defaults
The DR registers are rarely useful when decoding oopses.
With screen real estate during oopses at a premium, we can save
two lines by only printing out these registers when they are set
to something other than they power-on state.

Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130618160911.GA24487@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:33:59 +02:00
Ingo Molnar
2e7e98b85d Fix typo in define
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRsuoUAAoJEBLB8Bhh3lVKpFUP/j6763r5WPtrzUx0sAZBt7SN
 T9VAdqWd3SvxP+lDnRoU1luGQc0j+05njUnc74e2k2rGu2+WDonkxJIv4xMHsuG/
 cItdK+ThmxOQa4ijm3uJ0N8I16k15cZkW0H5hyHSVqbawPaAjCt25DEXexTSgBQM
 wnQSsDNXRjtU/LSBPAZX4t9ePdawy/EwLGO+YYJ1ZKck4oesCwp0j/PVqQG+IGTM
 QUbB3mh0GvdHW5pPgVH6n+yqQ7puN92WHzJicYsYCtJDHa1Yf5D6/C+ZBq8GJ1V0
 rCdxQ8uI5tcn9WOJ1gZrUGPmjEMotnLrZX26b/cTEPzBP0mjf+Kd/Ew+gNmdmvX8
 5jv1Agzb1GQVXesPpv+N3hw5HNqLEQCBdlLQ5kiX63wYwF4vck+JaAGDWUILO6qp
 TtdEiotBHDKpk9YuWeepmCUDB2u/uyMrHSA0HGLBh6ACQXsl5/RnQj3KnGFlbvo6
 FbMvoOEXbNIOB92OEnC9kZteXGlIBtsK0sHmRXfOsckbH/gyYyBcFwqmeHFpzuKV
 N5f3NV8GTrUyrNdn4S28S/wlnx+KOqpqFhllVkgp/OcW2/Ry2eEZYM7ypU8bXMpo
 +HOFwQZimmQonVqRzGrQQI9w5+D5xGJ3gI/0lcjkX2r2Njowx8q13wfKKmqE8EDB
 7a+BsNdTBUt01dskFlSe
 =8FzZ
 -----END PGP SIGNATURE-----

Merge tag 'ras_fixlet_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull "Fix typo in define" change from Borislav Petkov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:51:54 +02:00
Yan, Zheng
b2fa344d0c perf/x86/intel: Fix sparse warning
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370421025-10986-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:04:55 +02:00
Suravee Suthikulpanit
7be6296fdd perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
Implement a perf PMU to handle IOMMU performance counters and events.
The PMU only supports counting mode (e.g. perf stat). Since the counters
are shared across all cores, the PMU is implemented as "system-wide" mode.

To invoke the AMD IOMMU PMU, issue a perf tool command such as:

  ./perf stat -a -e amd_iommu/<events>/ <command>

or:

  ./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command>

For example:

  ./perf stat -a -e amd_iommu/mem_trans_total/ <command>

The resulting count will be how many IOMMU total peripheral memory
operations were performed during the command execution window.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370466709-3212-3-git-send-email-suravee.suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:04:53 +02:00
Dave Hansen
ae0def05ed perf/x86: Only print PMU state when also WARN()'ing
intel_pmu_handle_irq() has a warning in it if it does too many
loops.  It is a WARN_ONCE(), but the perf_event_print_debug()
call beneath it is unconditional. For the first warning, you get
a nice backtrace and message, but subsequent ones just dump the
PMU state with no leading messages.  I doubt this is what was
intended.

This patch will only print the PMU state when paired with the
WARN_ON() text.  It effectively open-codes WARN_ONCE()'s
one-time-only logic.

My suspicion is that the code really just wants to make sure we
do not sit in the loop and spit out a warning for every loop
iteration after the 100th.  From what I've seen, this is very
unlikely to happen since we also clear the PMU state.

After this patch, instead of seeing the PMU state dumped each
time, you will just see:

	[57494.894540] perf_event_intel: clearing PMU state on CPU#129
	[57579.539668] perf_event_intel: clearing PMU state on CPU#10
	[57587.137762] perf_event_intel: clearing PMU state on CPU#134
	[57623.039912] perf_event_intel: clearing PMU state on CPU#114
	[57644.559943] perf_event_intel: clearing PMU state on CPU#118
	...

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130530174559.0DB049F4@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:47 +02:00
Andrew Hunter
43b4578071 perf/x86: Reduce stack usage of x86_schedule_events()
x86_schedule_events() caches event constraints on the stack during
scheduling.  Given the number of possible events, this is 512 bytes of
stack; since it can be invoked under schedule() under god-knows-what,
this is causing stack blowouts.

Trade some space usage for stack safety: add a place to cache the
constraint pointer to struct perf_event.  For 8 bytes per event (1% of
its size) we can save the giant stack frame.

This shouldn't change any aspect of scheduling whatsoever and while in
theory the locality's a tiny bit worse, I doubt we'll see any
performance impact either.

Tested: `perf stat whatever` does not blow up and produces
results that aren't hugely obviously wrong.  I'm not sure how to run
particularly good tests of perf code, but this should not produce any
functional change whatsoever.

Signed-off-by: Andrew Hunter <ahh@google.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369332423-4400-1-git-send-email-ahh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:44 +02:00
Ingo Molnar
eff2108f02 Merge branch 'perf/urgent' into perf/core
Merge in the latest fixes, to avoid conflicts with ongoing work.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:41 +02:00
Stephane Eranian
f1a527899e perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
This patch fixes broken support of PEBS-LL on SNB-EP/IVB-EP.
For some reason, the LDLAT extra reg definition for snb_ep
showed up as duplicate in the snb table.

This patch moves the definition of LDLAT back into the
snb_ep table.

Thanks to Don Zickus for tracking this one down.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130607212210.GA11849@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:16 +02:00
Igor Mammedov
07868fc6aa x86: kvmclock: zero initialize pvclock shared memory area
kernel might hung in pvclock_clocksource_read() due to
uninitialized memory might contain odd version value in
following cycle:

        do {
                version = __pvclock_read_cycles(src, &ret, &flags);
        } while ((src->version & 1) || version != src->version);

if secondary kvmclock is accessed before it's registered with kvm.

Clear garbage in pvclock shared memory area right after it's
allocated to avoid this issue.

Ref: https://bugzilla.kernel.org/show_bug.cgi?id=59521
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[See BZ for analysis.  We may want a different fix for 3.11, but
 this is the safest for now - Paolo]
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-19 12:25:28 +02:00
Yinghai Lu
d8d386c106 x86, mtrr: Fix original mtrr range get for mtrr_cleanup
Joshua reported: Commit cd7b304dfaf1 (x86, range: fix missing merge
during add range) broke mtrr cleanup on his setup in 3.9.5.
corresponding commit in upstream is fbe06b7bae.

  *BAD*gran_size: 64K chunk_size: 16M num_reg: 6 lose cover RAM: -0G

https://bugzilla.kernel.org/show_bug.cgi?id=59491

So it rejects new var mtrr layout.

It turns out we have some problem with initial mtrr range retrieval.
The current sequence is:
	x86_get_mtrr_mem_range
		==> bunchs of add_range_with_merge
		==> bunchs of subract_range
		==> clean_sort_range
	add_range_with_merge for [0,1M)
	sort_range()

add_range_with_merge could have blank slots, so we can not just
sort only, that will have final result have extra blank slot in head.

So move that calling add_range_with_merge for [0,1M), with that we
could avoid extra clean_sort_range calling.

Reported-by: Joshua Covington <joshuacov@googlemail.com>
Tested-by: Joshua Covington <joshuacov@googlemail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371154622-8929-2-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-18 11:32:02 -05:00
Fenghua Yu
6bb2ff846f x86 thermal: Disable power limit notification interrupt by default
The package power limit notification interrupt is primarily for
system diagnosis, and should not be blindly enabled on every
system by default -- particuarly since Linux does nothing in the
handler except count how many times it has been called...

Add a new kernel cmdline parameter "int_pln_enable" for situations where
users want to oberve these events via existing system counters:

$ grep TRM /proc/interrupts

$ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/*

https://bugzilla.kernel.org/show_bug.cgi?id=36182

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-14 14:49:00 -07:00
Fenghua Yu
c81147483e x86 thermal: Delete power-limit-notification console messages
Package power limits are common on some systems under some conditions --
so printing console messages when limits are reached
causes unnecessary customer concern and support calls.

Note that even with these console messages gone,
the events can still be observed via system counters:

$ grep TRM /proc/interrupts

Shows total thermal interrupts, which includes both power
limit notifications and thermal throttling interrupts.

$ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/*

Will show what caused those interrupts, core and package
throttling and power limit notifications.

https://bugzilla.kernel.org/show_bug.cgi?id=36182

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-14 14:48:37 -07:00
Srinivas Pandruvada
25cdce170d x86, mcheck, therm_throt: Process package thresholds
Added callback registration for package threshold reports. Also added
a callback to check the rate control implemented in callback or not.
If there is no rate control implemented, then there is a default rate
control similar to core threshold notification by delaying for
CHECK_INTERVAL (5 minutes) between reports.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-06-13 09:59:14 +08:00
Kees Cook
c8a22d19dd x86: Fix typo in kexec register clearing
Fixes a typo in register clearing code. Thanks to PaX Team for fixing
this originally, and James Troup for pointing it out.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net
Cc: <stable@vger.kernel.org> v2.6.30+
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-12 15:16:18 -07:00
Thomas Gleixner
d7880812b3 idle: Add the stack canary init to cpu_startup_entry()
Moving x86 to the generic idle implementation (commit 7d1a9417 "x86:
Use generic idle loop") wreckaged the stack protector.

I stupidly missed that boot_init_stack_canary() must be inlined from a
function which never returns, but I put that call into
arch_cpu_idle_prepare() which of course returns.

I pondered to play tricks with arch_cpu_idle_prepare() first, but then
I noticed, that the other archs which have implemented the
stackprotector (ARM and SH) do not initialize the canary for the
non-boot cpus.

So I decided to move the boot_init_stack_canary() call into
cpu_startup_entry() ifdeffed with an CONFIG_X86 for now. This #ifdef
is just a temporary measure as I don't want to inflict the
boot_init_stack_canary() call on ARM and SH that late in the cycle.

I'll queue a patch for 3.11 which removes the #ifdef if the ARM/SH
maintainers have no objection.

Reported-by: Wouter van Kesteren <woutershep@gmail.com>
Cc: x86@kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-11 22:04:47 +02:00
H. Peter Anvin
60e019eb37 x86: Get rid of ->hard_math and all the FPU asm fu
Reimplement FPU detection code in C and drop old, not-so-recommended
detection method in asm. Move all the relevant stuff into i387.c where
it conceptually belongs. Finally drop cpuinfo_x86.hard_math.

[ hpa: huge thanks to Borislav for taking my original concept patch
  and productizing it ]

[ Boris, note to self: do not use static_cpu_has before alternatives! ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-06 14:32:04 -07:00
Jacob Shin
cd1c32ca96 x86, microcode, amd: Allow multiple families' bin files appended together
Add support for parsing through multiple families' microcode patch
container binary files appended together when early loading. This is
already supported on Intel.

Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1370463236-2115-3-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05 13:56:55 -07:00
Jacob Shin
275bbe2e29 x86, microcode, amd: Make find_ucode_in_initrd() __init
Change find_ucode_in_initrd() to __init and only let BSP call it
during cold boot. This is the right thing to do because only BSP will
see initrd loaded by the boot loader. APs will offset into
initrd_start to find the microcode patch binary.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1370463236-2115-2-git-send-email-jacob.shin@amd.com
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05 13:56:47 -07:00
Mathias Krause
a90936845d x86, mce: Fix "braodcast" typo
Fix the typo in MCJ_IRQ_BRAODCAST.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-05 11:59:17 +02:00
Jacob Shin
6b3389ac21 x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m
Fix section mismatch warnings on microcode_amd_early.
Compile error occurs when CONFIG_MICROCODE=m, change so that early
loading depends on microcode_core.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20130531150241.GA12006@jshin-Toonie
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31 13:56:58 -07:00
Paul Bolle
71c69f7f4b x86/mce: Remove check for CONFIG_X86_MCE_P4THERMAL
The Kconfig symbol X86_MCE_P4THERMAL was removed in v2.6.32.
Remove a useless check for its macro, as it will now always
evaluate to false.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Link: http://lkml.kernel.org/r/1369853850.23034.28.camel@x61.thuisdomein
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:12:35 +02:00
Andrew Jones
b0bc225d0e sched/x86: Construct all sibling maps if smt
Commit 316ad24830 ("sched/x86: Rewrite
set_cpu_sibling_map()") broke the construction of sibling maps,
which also broke the booted_cores accounting.

Before the rewrite, if smt was present, then each map was
updated for each smt sibling. After the rewrite only
cpu_sibling_mask gets updated, as the llc and core maps depend
on 'has_mc = x86_max_cores > 1' instead. This leads to problems
with topologies like the following

(qemu -smp sockets=2,cores=1,threads=2)

  processor       : 0
  physical id     : 0
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 1

  processor       : 1
  physical id     : 0
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 0    <= should be 1

  processor       : 2
  physical id     : 1
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 1

  processor       : 3
  physical id     : 1
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 0    <= should be 1

This patch restores the former construction by defining has_mc
as (has_smt || x86_max_cores > 1). This should be fine as there
were no (has_smt && !has_mc) conditions in the context.

Aso rename has_mc to has_mp now that it's not just for cores.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: a.p.zijlstra@chello.nl
Cc: fenghua.yu@intel.com
Link: http://lkml.kernel.org/r/1369831695-11970-1-git-send-email-drjones@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:10:38 +02:00
Jacob Shin
757885e94a x86, microcode, amd: Early microcode patch loading support for AMD
Add early microcode patch loading support for AMD.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-5-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Jacob Shin
a76096a657 x86, microcode, amd: Refactor functions to prepare for early loading
In preparation work for early loading, refactor some common functions
that will be shared, and move some struct defines to a common header file.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-4-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Jacob Shin
f2b3ee820a x86, microcode: Vendor abstract out save_microcode_in_initrd()
Currently save_microcode_in_initrd() is declared in vendor neutural
microcode.h file, but defined in vendor specific
microcode_intel_early.c file. Vendor abstract it out to
microcode_core_early.c with a wrapper function.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-3-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Borislav Petkov
83b325f1b4 x86, microcode, intel: Correct typo in printk
User-visible so correct it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1369940959-2077-2-git-send-email-jacob.shin@amd.com
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Andy Lutomirski
d0d98eedee Add arch_phys_wc_{add, del} to manipulate WC MTRRs if needed
Several drivers currently use mtrr_add through various #ifdef guards
and/or drm wrappers.  The vast majority of them want to add WC MTRRs
on x86 systems and don't actually need the MTRR if PAT (i.e.
ioremap_wc, etc) are working.

arch_phys_wc_add and arch_phys_wc_del are new functions, available
on all architectures and configurations, that add WC MTRRs on x86 if
needed (and handle errors) and do nothing at all otherwise.  They're
also easier to use than mtrr_add and mtrr_del, so the call sites can
be simplified.

As an added benefit, this will avoid wasting MTRRs and possibly
warning pointlessly on PAT-supporting systems.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-05-31 13:02:52 +10:00
Linus Torvalds
484b002e28 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:

 - Three EFI-related fixes

 - Two early memory initialization fixes

 - build fix for older binutils

 - fix for an eager FPU performance regression -- currently we don't
   allow the use of the FPU at interrupt time *at all* in eager mode,
   which is clearly wrong.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Allow FPU to be used at interrupt time even with eagerfpu
  x86, crc32-pclmul: Fix build with older binutils
  x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
  x86, range: fix missing merge during add range
  x86, efi: initial the local variable of DataSize to zero
  efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse
  efivarfs: Never return ENOENT from firmware again
2013-05-31 09:44:10 +09:00
Pekka Riikonen
5187b28ff0 x86: Allow FPU to be used at interrupt time even with eagerfpu
With the addition of eagerfpu the irq_fpu_usable() now returns false
negatives especially in the case of ksoftirqd and interrupted idle task,
two common cases for FPU use for example in networking/crypto.  With
eagerfpu=off FPU use is possible in those contexts.  This is because of
the eagerfpu check in interrupted_kernel_fpu_idle():

...
  * For now, with eagerfpu we will return interrupted kernel FPU
  * state as not-idle. TBD: Ideally we can change the return value
  * to something like __thread_has_fpu(current). But we need to
  * be careful of doing __thread_clear_has_fpu() before saving
  * the FPU etc for supporting nested uses etc. For now, take
  * the simple route!
...
 	if (use_eager_fpu())
 		return 0;

As eagerfpu is automatically "on" on those CPUs that also have the
features like AES-NI this patch changes the eagerfpu check to return 1 in
case the kernel_fpu_begin() has not been said yet.  Once it has been the
__thread_has_fpu() will start returning 0.

Notice that with eagerfpu the __thread_has_fpu is always true initially.
FPU use is thus always possible no matter what task is under us, unless
the state has already been saved with kernel_fpu_begin().

[ hpa: this is a performance regression, not a correctness regression,
  but since it can be quite serious on CPUs which need encryption at
  interrupt time I am marking this for urgent/stable. ]

Signed-off-by: Pekka Riikonen <priikone@iki.fi>
Link: http://lkml.kernel.org/r/alpine.GSO.2.00.1305131356320.18@git.silcnet.org
Cc: <stable@vger.kernel.org> v3.7+
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30 16:36:42 -07:00
Dimitri Sivanich
879d5ad0dc x86/UV: Add GRU distributed mode mappings
GRU hardware will support an optional distributed mode that will
allow  per-node address mapping of local GRU space, as opposed
to mapping all GRU hardware to the same contiguous high space.

If GRU distributed mode is selected, setup per-node page table
mappings.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130529155609.GB22917@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-30 09:46:09 +02:00
Zhang Yanfei
e9d0626ed4 x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
In head_64.S, a switchover has been used to handle kernel crossing
1G, 512G boundaries.

And commit 8170e6bed4
    x86, 64bit: Use a #PF handler to materialize early mappings on demand
said:
    During the switchover in head_64.S, before #PF handler is available,
    we use three pages to handle kernel crossing 1G, 512G boundaries with
    sharing page by playing games with page aliasing: the same page is
    mapped twice in the higher-level tables with appropriate wraparound.

But from the switchover code, when we set up the PUD table:
114         addq    $4096, %rdx
115         movq    %rdi, %rax
116         shrq    $PUD_SHIFT, %rax
117         andl    $(PTRS_PER_PUD-1), %eax
118         movq    %rdx, (4096+0)(%rbx,%rax,8)
119         movq    %rdx, (4096+8)(%rbx,%rax,8)

It seems line 119 has a potential bug there. For example,
if the kernel is loaded at physical address 511G+1008M, that is
    000000000 111111111 111111000 000000000000000000000
and the kernel _end is 512G+2M, that is
    000000001 000000000 000000001 000000000000000000000
So in this example, when using the 2nd page to setup PUD (line 114~119),
rax is 511.
In line 118, we put rdx which is the address of the PMD page (the 3rd page)
into entry 511 of the PUD table. But in line 119, the entry we calculate from
(4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line
119 should be wraparound into entry 0 of the PUD table.

The patch fixes the bug.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.com
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-28 15:41:59 -07:00
David Vrabel
3565184ed0 x86: Increase precision of x86_platform.get/set_wallclock()
All the virtualized platforms (KVM, lguest and Xen) have persistent
wallclocks that have more than one second of precision.

read_persistent_wallclock() and update_persistent_wallclock() allow
for nanosecond precision but their implementation on x86 with
x86_platform.get/set_wallclock() only allows for one second precision.
This means guests may see a wallclock time that is off by up to 1
second.

Make set_wallclock() and get_wallclock() take a struct timespec
parameter (which allows for nanosecond precision) so KVM and Xen
guests may start with a more accurate wallclock time and a Xen dom0
can maintain a more accurate wallclock for guests.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-05-28 14:00:59 -07:00
Thorsten Glaser
8ef726cb23 update AMD powerflags comments
from http://www.flounder.com/cpuid80000007amd.gif
and http://support.amd.com/us/Embedded_TechDocs/25481.pdf

Signed-off-by: Thorsten Glaser <t.glaser@tarent.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28 12:02:10 +02:00
Peter Zijlstra
1b45adcd9a perf/x86/amd: Rework AMD PMU init code
Josh reported that his QEMU is a bad hardware emulator and trips a
WARN in the AMD PMU init code. He requested the WARN be turned into a
pr_err() or similar.

While there, rework the code a little.

Reported-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: Jacob Shin <jacob.shin@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130521110537.GG26912@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:55 +02:00
Stephane Eranian
2b923c8f5d perf/x86: Check branch sampling priv level in generic code
This patch moves commit 7cc23cd to the generic code:

 perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL

The check is now implemented in generic code instead of x86 specific
code. That way we do not have to repeat the test in each arch
supporting branch sampling.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20130521105337.GA2879@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:54 +02:00
Dan Carpenter
13acac3075 perf/x86/intel: Prevent some shift wrapping bugs in the Intel uncore driver
We're trying to use 64 bit masks but the shifts wrap so we can't use the
high 32 bits. I've fixed this by changing several types to unsigned
long long.

This is a static checker fix.  The one change which is clearly needed is
"mask = 0xff << (idx * 8);" where the author obviously intended to use
all 64 bits.  The other changes are mostly to silence my static checker.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20130518183452.GA14587@elgon.mountain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:52 +02:00
Jiri Olsa
ddd40da4cc x86/signals: Merge EFLAGS bit clearing into a single statement
Merging EFLAGS bit clearing into a single statement, to
ensure EFLAGS bits are being cleared in a single instruction.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-4-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 08:46:53 +02:00
Jiri Olsa
24cda10996 x86/signals: Clear RF EFLAGS bit for signal handler
Clearing RF EFLAGS bit for signal handler. The reason is
that this flag is set by debug exception code to prevent
the recursive exception entry.

Leaving it set for signal handler might prevent debug
exception of the signal handler itself.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-3-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 08:46:52 +02:00
Jiri Olsa
5e219b3c67 x86/signals: Propagate RF EFLAGS bit through the signal restore call
While porting Vince's perf overflow tests I found perf event
breakpoint overflow does not work properly.

I found the x86 RF EFLAG bit not being set when returning
from debug exception after triggering signal handler. Which
is exactly what you get when you set perf breakpoint overflow
SIGIO handler.

This patch and the next two patches fix the underlying bugs.

This patch adds the RF EFLAGS bit to be restored on return from
signal from the original register context before the signal was
entered.

This will prevent the RF flag to disappear when returning
from exception due to the signal handler being executed.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 08:46:50 +02:00
Linus Torvalds
5e427ec2d0 x86: Fix bit corruption at CPU resume time
In commit 78d77df715 ("x86-64, init: Do not set NX bits on non-NX
capable hardware") we added the early_pmd_flags that gets the NX bit set
when a CPU supports NX. However, the new variable was marked __initdata,
because the main _use_ of this is in an __init routine.

However, the bit setting happens from secondary_startup_64(), which is
called not only at bootup, but on every secondary CPU start.  Including
resuming from STR and at CPU hotplug time.  So the value cannot be
__initdata.

Reported-bisected-and-tested-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org # v3.9
Acked-by: Peter Anvin <hpa@linux.intel.com>
Cc: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-20 11:36:03 -07:00
Linus Torvalds
ae3b29e67c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - Fix for a CPU hot-add deadlock in microcode update code

 - Fix for idle consolidation fallout

 - Documentation update for initial kernel direct mapping

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Add missing comments for initial kernel direct mapping
  x86/microcode: Add local mutex to fix physical CPU hot-add deadlock
  x86: Fix idle consolidation fallout
2013-05-15 14:07:53 -07:00
Borislav Petkov
4d067d8e05 x86: Extend #DF debugging aid to 64-bit
It is sometimes very helpful to be able to pinpoint the location which
causes a double fault before it turns into a triple fault and the
machine reboots. We have this for 32-bit already so extend it to 64-bit.
On 64-bit we get the register snapshot at #DF time and not from the
first exception which actually causes the #DF. It should be close
enough, though.

[ hpa: and definitely better than nothing, which is what we have now. ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1368093749-31296-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-13 13:42:44 -07:00
Linus Torvalds
3644bc2ec7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull stray syscall bits from Al Viro:
 "Several syscall-related commits that were missing from the original"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
  unicore32: just use mmap_pgoff()...
  unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
  x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
2013-05-10 09:21:05 -07:00
Konrad Rzeszutek Wilk
074d72ff57 x86/microcode: Add local mutex to fix physical CPU hot-add deadlock
This can easily be triggered if a new CPU is added (via
ACPI hotplug mechanism) and from user-space you do:

   echo 1 > /sys/devices/system/cpu/cpu3/online

(or wait for UDEV to do it) on a newly appeared physical CPU.

The deadlock is that the "store_online" in drivers/base/cpu.c
takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up".
"cpu_up" eventually ends up calling "save_mc_for_early"
which also takes the cpu_hotplug_driver_lock() lock.

And here is that lockdep thinks of it:

 smpboot: Stack at about ffff880075c39f44
 smpboot: CPU3: has booted.
 microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25

 =============================================
 [ INFO: possible recursive locking detected ]
 3.9.0upstream-10129-g167af0e #1 Not tainted
 ---------------------------------------------
 sh/2487 is trying to acquire lock:
  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20

 but task is already holding lock:
  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(x86_cpu_hotplug_driver_mutex);
   lock(x86_cpu_hotplug_driver_mutex);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 6 locks held by sh/2487:
  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190
  #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160
  #2:  (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160
  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20
  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60

Suggested-and-Acked-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: fenghua.yu@intel.com
Cc: xen-devel@lists.xensource.com
Cc: stable@vger.kernel.org # for v3.9
Link: http://lkml.kernel.org/r/1368029583-23337-1-git-send-email-konrad.wilk@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-09 11:10:00 +02:00
Thomas Gleixner
97a5b81fa4 x86: Fix idle consolidation fallout
The core code expects the arch idle code to return with interrupts
enabled. The conversion missed two x86 cases which fail to do that.

Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305021557030.3972@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-07 16:24:03 +02:00
Linus Torvalds
99737982ca IOMMU Updates for Linux v3.10
The updates are mostly about the x86 IOMMUs this time. Exceptions are
 the groundwork for the PAMU IOMMU from Freescale (for a PPC platform)
 and an extension to the IOMMU group interface. On the x86 side this
 includes a workaround for VT-d to disable interrupt remapping on broken
 chipsets. On the AMD-Vi side the most important new feature is a kernel
 command-line interface to override broken information in IVRS ACPI
 tables and get interrupt remapping working this way. Besides that there
 are small fixes all over the place.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRh2vAAAoJECvwRC2XARrjbjkP/jzKzeffybUpQsIJF8rs/IEt
 hSwqpGLr6WR5FdneEH9fiBIp4pyMDXmuAb/2ZNgB+DgPN3xgqmWVo4WLk7pMo3BS
 /xIz/lu7hIX3AtKt807pL9+rPdhGYEJ43Vmr4bW9x0l1kuNXy6fmMLcN5FaPKjV4
 p4hY4jOstEgtYQw4wi39/9b4FsYoipZizkOUSdtCzWwTv7jOHH7/Wra8iZyzL6Je
 1VlF/efp0ytTcwLdHOfGwPCIlZrQRtQCM4SqdAUG9bOL3ARR9Yu/0iW1295nbLzo
 CQX5CfKePvo/fGxki1jcBi+UCyxYKPosB5kCxmh4MAxCg/VzzMsaME/A73tLJa6W
 Y29bbjwPoBPMq03HX8S9R5QWY8HpFujUUp+J4TXcKuTgYEV28WfLu1uaeKD716nM
 LoXUojov7Cj8ZQZnhyu5l+XNaephBZLfw/8bM6bAxhlKXwAjmLiS5Z+srPl1GJee
 5GCV+L94JifHLZaREWh3JFsh9O3W7Wno2++c4JU32uCWJHXH7tMgs2P8n5AY9rnT
 Km1a9y6w2MF3Gg9j4y6u75m0XnFTNzYjeJMUtqVlwVhNHhgaXfuIWY63xOQCLJs1
 ThTHOjoh0VqONGobR/ywn+0ouo9X07DnWpluyFd+zY3XK0UE0NOu9XMr4i6TWxOf
 mlzWoEKxtw36XGHB/FtQ
 =MVc/
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "The updates are mostly about the x86 IOMMUs this time.

  Exceptions are the groundwork for the PAMU IOMMU from Freescale (for a
  PPC platform) and an extension to the IOMMU group interface.

  On the x86 side this includes a workaround for VT-d to disable
  interrupt remapping on broken chipsets.  On the AMD-Vi side the most
  important new feature is a kernel command-line interface to override
  broken information in IVRS ACPI tables and get interrupt remapping
  working this way.

  Besides that there are small fixes all over the place."

* tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (24 commits)
  iommu/tegra: Fix printk formats for dma_addr_t
  iommu: Add a function to find an iommu group by id
  iommu/vt-d: Remove warning for HPET scope type
  iommu: Move swap_pci_ref function to drivers/iommu/pci.h.
  iommu/vt-d: Disable translation if already enabled
  iommu/amd: fix error return code in early_amd_iommu_init()
  iommu/AMD: Per-thread IOMMU Interrupt Handling
  iommu: Include linux/err.h
  iommu/amd: Workaround for ERBT1312
  iommu/amd: Document ivrs_ioapic and ivrs_hpet parameters
  iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides
  iommu/amd: Add ioapic and hpet ivrs override
  iommu/amd: Add early maps for ioapic and hpet
  iommu/amd: Extend IVRS special device data structure
  iommu/amd: Move add_special_device() to __init
  iommu: Fix compile warnings with forward declarations
  iommu/amd: Properly initialize irq-table lock
  iommu/amd: Use AMD specific data structure for irq remapping
  iommu/amd: Remove map_sg_no_iommu()
  iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
  ...
2013-05-06 14:59:13 -07:00
Linus Torvalds
01227a889e Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Gleb Natapov:
 "Highlights of the updates are:

  general:
   - new emulated device API
   - legacy device assignment is now optional
   - irqfd interface is more generic and can be shared between arches

  x86:
   - VMCS shadow support and other nested VMX improvements
   - APIC virtualization and Posted Interrupt hardware support
   - Optimize mmio spte zapping

  ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

  ARM:
   - reworking of Hyp idmaps

  s390:
   - ioeventfd for virtio-ccw

  And many other bug fixes, cleanups and improvements"

* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  kvm: Add compat_ioctl for device control API
  KVM: x86: Account for failing enable_irq_window for NMI window request
  KVM: PPC: Book3S: Add API for in-kernel XICS emulation
  kvm/ppc/mpic: fix missing unlock in set_base_addr()
  kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
  kvm/ppc/mpic: remove users
  kvm/ppc/mpic: fix mmio region lists when multiple guests used
  kvm/ppc/mpic: remove default routes from documentation
  kvm: KVM_CAP_IOMMU only available with device assignment
  ARM: KVM: iterate over all CPUs for CPU compatibility check
  KVM: ARM: Fix spelling in error message
  ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
  KVM: ARM: Fix API documentation for ONE_REG encoding
  ARM: KVM: promote vfp_host pointer to generic host cpu context
  ARM: KVM: add architecture specific hook for capabilities
  ARM: KVM: perform HYP initilization for hotplugged CPUs
  ARM: KVM: switch to a dual-step HYP init code
  ARM: KVM: rework HYP page table freeing
  ARM: KVM: enforce maximum size for identity mapped code
  ARM: KVM: move to a KVM provided HYP idmap
  ...
2013-05-05 14:47:31 -07:00
Linus Torvalds
64049d1973 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes plus a small hw-enablement patch for Intel IB model 58
  uncore events"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
  perf/x86/intel/lbr: Fix LBR filter
  perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge
  perf: Fix vmalloc ring buffer pages handling
  perf/x86/intel: Fix unintended variable name reuse
  perf/x86/intel: Add support for IvyBridge model 58 Uncore
  perf/x86/intel: Fix typo in perf_event_intel_uncore.c
  x86: Eliminate irq_mis_count counted in arch_irq_stat
2013-05-05 11:37:16 -07:00
Peter Zijlstra
7cc23cd6c0 perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
We should always have proper privileges when requesting kernel
data.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.230745028@chello.nl
[ Fix build error reported by fengguang.wu@intel.com, propagate error code back. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwrili@git.kernel.org
2013-05-05 10:58:11 +02:00
Peter Zijlstra
6e15eb3ba6 perf/x86/intel/lbr: Fix LBR filter
The LBR 'from' adddress is under full userspace control; ensure
we validate it before reading from it.

Note: is_module_text_address() can potentially be quite
expensive; for those running into that with high overhead
in modules optimize it using an RCU backed rb-tree.

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.158211806@chello.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-mk8i82ffzax01cnqo829iy1q@git.kernel.org
2013-05-04 08:37:47 +02:00
Peter Zijlstra
741a698f42 perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge
Errata BV98 states that all MEM_*_RETIRED events corrupt the
counter value of the SMT sibling's counters. Blacklist these
events

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.083340271@chello.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-jwra43mujrv1oq9xk6mfe57v@git.kernel.org
2013-05-04 08:37:46 +02:00
Alexander van Heukelum
5522ddb3fc x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
Commit 49cb25e929 x86: 'get rid of pt_regs argument in vm86/vm86old'
got rid of the pt_regs stub for sys_vm86old and sys_vm86. The functions
were, however, not changed to use the calling convention for syscalls.

[AV: killed asmlinkage_protect() - it's done automatically now]

Reported-and-tested-by: Hans de Bruin <jmdebruin@xmsnet.nl>
Cc: stable@vger.kernel.org
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-02 20:36:32 -04:00
H. Peter Anvin
78d77df715 x86-64, init: Do not set NX bits on non-NX capable hardware
During early init, we would incorrectly set the NX bit even if the NX
feature was not supported.  Instead, only set this bit if NX is
actually available and enabled.  We already do very early detection of
the NX bit to enable it in EFER, this simply extends this detection to
the early page table mask.

Reported-by: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1367476850.5660.2.camel@nexus
Cc: <stable@vger.kernel.org> v3.9
2013-05-02 11:27:35 -07:00
Konrad Rzeszutek Wilk
cc456c4e7c x86, gdt, hibernate: Store/load GDT for hibernate path.
The git commite7a5cd063c7b4c58417f674821d63f5eb6747e37
("x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path
is not needed.") assumes that for the hibernate path the booting
kernel and the resuming kernel MUST be the same. That is certainly
the case for a 32-bit kernel (see check_image_kernel and
CONFIG_ARCH_HIBERNATION_HEADER config option).

However for 64-bit kernels it is OK to have a different kernel
version (and size of the image) of the booting and resuming kernels.
Hence the above mentioned git commit introduces an regression.

This patch fixes it by introducing a 'struct desc_ptr gdt_desc'
back in the 'struct saved_context'. However instead of having in the
'save_processor_state' and 'restore_processor_state' the
store/load_gdt calls, we are only saving the GDT in the
save_processor_state.

For the restore path the lgdt operation is done in
hibernate_asm_[32|64].S in the 'restore_registers' path.

The apt reader of this description will recognize that only 64-bit
kernels need this treatment, not 32-bit. This patch adds the logic
in the 32-bit path to be more similar to 64-bit so that in the future
the unification process can take advantage of this.

[ hpa: this also reverts an inadvertent on-disk format change ]

Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1367459610-9656-2-git-send-email-konrad.wilk@oracle.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-02 11:27:35 -07:00
Joerg Roedel
0c4513be3d Merge branches 'iommu/fixes', 'x86/vt-d', 'x86/amd', 'ppc/pamu', 'core' and 'arm/tegra' into next 2013-05-02 12:10:19 +02:00
Linus Torvalds
08d7676083 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull compat cleanup from Al Viro:
 "Mostly about syscall wrappers this time; there will be another pile
  with patches in the same general area from various people, but I'd
  rather push those after both that and vfs.git pile are in."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  syscalls.h: slightly reduce the jungles of macros
  get rid of union semop in sys_semctl(2) arguments
  make do_mremap() static
  sparc: no need to sign-extend in sync_file_range() wrapper
  ppc compat wrappers for add_key(2) and request_key(2) are pointless
  x86: trim sys_ia32.h
  x86: sys32_kill and sys32_mprotect are pointless
  get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
  merge compat sys_ipc instances
  consolidate compat lookup_dcookie()
  convert vmsplice to COMPAT_SYSCALL_DEFINE
  switch getrusage() to COMPAT_SYSCALL_DEFINE
  switch epoll_pwait to COMPAT_SYSCALL_DEFINE
  convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
  switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
  make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect
  make HAVE_SYSCALL_WRAPPERS unconditional
  consolidate cond_syscall and SYSCALL_ALIAS declarations
  teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long
  get rid of duplicate logics in __SC_....[1-6] definitions
2013-05-01 07:21:43 -07:00
Linus Torvalds
5f56886521 Merge branch 'akpm' (incoming from Andrew)
Merge third batch of fixes from Andrew Morton:
 "Most of the rest.  I still have two large patchsets against AIO and
  IPC, but they're a bit stuck behind other trees and I'm about to
  vanish for six days.

   - random fixlets
   - inotify
   - more of the MM queue
   - show_stack() cleanups
   - DMI update
   - kthread/workqueue things
   - compat cleanups
   - epoll udpates
   - binfmt updates
   - nilfs2
   - hfs
   - hfsplus
   - ptrace
   - kmod
   - coredump
   - kexec
   - rbtree
   - pids
   - pidns
   - pps
   - semaphore tweaks
   - some w1 patches
   - relay updates
   - core Kconfig changes
   - sysrq tweaks"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits)
  Documentation/sysrq: fix inconstistent help message of sysrq key
  ethernet/emac/sysrq: fix inconstistent help message of sysrq key
  sparc/sysrq: fix inconstistent help message of sysrq key
  powerpc/xmon/sysrq: fix inconstistent help message of sysrq key
  ARM/etm/sysrq: fix inconstistent help message of sysrq key
  power/sysrq: fix inconstistent help message of sysrq key
  kgdb/sysrq: fix inconstistent help message of sysrq key
  lib/decompress.c: fix initconst
  notifier-error-inject: fix module names in Kconfig
  kernel/sys.c: make prctl(PR_SET_MM) generally available
  UAPI: remove empty Kbuild files
  menuconfig: print more info for symbol without prompts
  init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display
  kconfig menu: move Virtualization drivers near other virtualization options
  Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  relay: use macro PAGE_ALIGN instead of FIX_SIZE
  kernel/relay.c: move FIX_SIZE macro into relay.c
  kernel/relay.c: remove unused function argument actor
  drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave()
  drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave()
  ...
2013-04-30 17:37:43 -07:00
Tejun Heo
a43cb95d54 dump_stack: unify debug information printed by show_regs()
show_regs() is inherently arch-dependent but it does make sense to print
generic debug information and some archs already do albeit in slightly
different forms.  This patch introduces a generic function to print debug
information from show_regs() so that different archs print out the same
information and it's much easier to modify what's printed.

show_regs_print_info() prints out the same debug info as dump_stack()
does plus task and thread_info pointers.

* Archs which didn't print debug info now do.

  alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
  metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
  um, xtensa

* Already prints debug info.  Replaced with show_regs_print_info().
  The printed information is superset of what used to be there.

  arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86

* s390 is special in that it used to print arch-specific information
  along with generic debug info.  Heiko and Martin think that the
  arch-specific extra isn't worth keeping s390 specfic implementation.
  Converted to use the generic version.

Note that now all archs print the debug info before actual register
dumps.

An example BUG() dump follows.

 kernel BUG at /work/os/work/kernel/workqueue.c:4841!
 invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
 task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
 RIP: 0010:[<ffffffff8234a07e>]  [<ffffffff8234a07e>] init_workqueues+0x4/0x6
 RSP: 0000:ffff88007c861ec8  EFLAGS: 00010246
 RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
 RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Stack:
  ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
  0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
  ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
 Call Trace:
  [<ffffffff81000312>] do_one_initcall+0x122/0x170
  [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
  [<ffffffff81c47760>] ? rest_init+0x140/0x140
  [<ffffffff81c4776e>] kernel_init+0xe/0xf0
  [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81c47760>] ? rest_init+0x140/0x140
  ...

v2: Typo fix in x86-32.

v3: CPU number dropped from show_regs_print_info() as
    dump_stack_print_info() has been updated to print it.  s390
    specific implementation dropped as requested by s390 maintainers.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>		[tile bits]
Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00
Tejun Heo
98e5e1bf72 dump_stack: implement arch-specific hardware description in task dumps
x86 and ia64 can acquire extra hardware identification information
from DMI and print it along with task dumps; however, the usage isn't
consistent.

* x86 show_regs() collects vendor, product and board strings and print
  them out with PID, comm and utsname.  Some of the information is
  printed again later in the same dump.

* warn_slowpath_common() explicitly accesses the DMI board and prints
  it out with "Hardware name:" label.  This applies to both x86 and
  ia64 but is irrelevant on all other archs.

* ia64 doesn't show DMI information on other non-WARN dumps.

This patch introduces arch-specific hardware description used by
dump_stack().  It can be set by calling dump_stack_set_arch_desc()
during boot and, if exists, printed out in a separate line with
"Hardware name:" label.

dmi_set_dump_stack_arch_desc() is added which sets arch-specific
description from DMI data.  It uses dmi_ids_string[] which is set from
dmi_present() used for DMI debug message.  It is superset of the
information x86 show_regs() is using.  The function is called from x86
and ia64 boot code right after dmi_scan_machine().

This makes the explicit DMI handling in warn_slowpath_common()
unnecessary.  Removed.

show_regs() isn't yet converted to use generic debug information
printing and this patch doesn't remove the duplicate DMI handling in
x86 show_regs().  The next patch will unify show_regs() handling and
remove the duplication.

An example WARN dump follows.

 WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
  0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
  ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e
  0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
 Call Trace:
  [<ffffffff81c614dc>] dump_stack+0x19/0x1b
  [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
  [<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8234a0c3>] init_workqueues+0x35/0x505
  ...

v2: Use the same string as the debug message from dmi_present() which
    also contains BIOS information.  Move hardware name into its own
    line as warn_slowpath_common() did.  This change was suggested by
    Bjorn Helgaas.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00
Tejun Heo
196779b9b4 dump_stack: consolidate dump_stack() implementations and unify their behaviors
Both dump_stack() and show_stack() are currently implemented by each
architecture.  show_stack(NULL, NULL) dumps the backtrace for the
current task as does dump_stack().  On some archs, dump_stack() prints
extra information - pid, utsname and so on - in addition to the
backtrace while the two are identical on other archs.

The usages in arch-independent code of the two functions indicate
show_stack(NULL, NULL) should print out bare backtrace while
dump_stack() is used for debugging purposes when something went wrong,
so it does make sense to print additional information on the task which
triggered dump_stack().

There's no reason to require archs to implement two separate but mostly
identical functions.  It leads to unnecessary subtle information.

This patch expands the dummy fallback dump_stack() implementation in
lib/dump_stack.c such that it prints out debug information (taken from
x86) and invokes show_stack(NULL, NULL) and drops arch-specific
dump_stack() implementations in all archs except blackfin.  Blackfin's
dump_stack() does something wonky that I don't understand.

Debug information can be printed separately by calling
dump_stack_print_info() so that arch-specific dump_stack()
implementation can still emit the same debug information.  This is used
in blackfin.

This patch brings the following behavior changes.

* On some archs, an extra level in backtrace for show_stack() could be
  printed.  This is because the top frame was determined in
  dump_stack() on those archs while generic dump_stack() can't do that
  reliably.  It can be compensated by inlining dump_stack() but not
  sure whether that'd be necessary.

* Most archs didn't use to print debug info on dump_stack().  They do
  now.

An example WARN dump follows.

 WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
 Hardware name: empty
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9
  0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
  ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c
  0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
 Call Trace:
  [<ffffffff81c614dc>] dump_stack+0x19/0x1b
  [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8234a071>] init_workqueues+0x35/0x505
  ...

v2: CPU number added to the generic debug info as requested by s390
    folks and dropped the s390 specific dump_stack().  This loses %ksp
    from the debug message which the maintainers think isn't important
    enough to keep the s390-specific dump_stack() implementation.

    dump_stack_print_info() is moved to kernel/printk.c from
    lib/dump_stack.c.  Because linkage is per objecct file,
    dump_stack_print_info() living in the same lib file as generic
    dump_stack() means that archs which implement custom dump_stack()
    - at this point, only blackfin - can't use dump_stack_print_info()
    as that will bring in the generic version of dump_stack() too.  v1
    The v1 patch broke build on blackfin due to this issue.  The build
    breakage was reported by Fengguang Wu.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	[s390 bits]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00