Commit Graph

15252 Commits

Author SHA1 Message Date
Shuah Khan
74bc491795 x86/pci-calgary_64.c: Remove obsoleted simple_strtoul() usage
Change calgary_parse_options() to call kstrtoul() instead of
calling obsoleted simple_strtoul().

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Acked-by: Muli Ben-Yehuda <muli@cs.technion.ac.il>
Cc: jdmason@kudzu.us
Link: http://lkml.kernel.org/r/1337556268.3126.5.camel@lorien2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-21 10:29:40 +02:00
Ingo Molnar
bb27f55eb9 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Fixes for perf/core:

 - Rename some perf_target methods to avoid double negation, from Namhyung Kim.
 - Revert change to use per task events with inheritance, from Namhyung Kim.
 - Events should start disabled till children starts running, from David Ahern.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-21 09:17:50 +02:00
Linus Torvalds
5d1204582e Merge branch 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 linker bug workarounds from Peter Anvin.

GNU ld-2.22.52.0.[12] (*) has an unfortunate bug where it incorrectly
turns certain relocation entries absolute.  Section-relative symbols
that are part of otherwise empty sections are silently changed them to
absolute.  We rely on section-relative symbols staying section-relative,
and actually have several sections in the linker script solely for this
purpose.

See for example

   http://sourceware.org/bugzilla/show_bug.cgi?id=14052

We could just black-list the buggy linker, but it appears that it got
shipped in at least F17, and possibly other distros too, so it's sadly
not some rare unusual case.

This backports the workaround from the x86/trampoline branch, and as
Peter says: "This is not a minimal fix, not at all, but it is a tested
code base."

* 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, relocs: When printing an error, say relative or absolute
  x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
  x86, realmode: 16-bit real-mode code support for relocs tool

(*) That's a manly release numbering system. Stupid, sure. But manly.
2012-05-19 15:28:22 -07:00
H. Peter Anvin
24ab82bd9b x86, relocs: When printing an error, say relative or absolute
When the relocs tool throws an error, let the error message say if it
is an absolute or relative symbol.  This should make it a lot more
clear what action the programmer needs to take and should help us find
the reason if additional symbol bugs show up.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
2012-05-18 19:50:02 -07:00
H. Peter Anvin
a3e854d95a x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
GNU ld 2.22.52.0.1 has a bug that it blindly changes symbols from
section-relative to absolute if they are in a section of zero length.
This turns the symbols __init_begin and __init_end into absolute
symbols.  Let the relocs program know that those should be treated as
relative symbols.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
2012-05-18 19:50:00 -07:00
H. Peter Anvin
6520fe5564 x86, realmode: 16-bit real-mode code support for relocs tool
A new option is added to the relocs tool called '--realmode'.
This option causes the generation of 16-bit segment relocations
and 32-bit linear relocations for the real-mode code. When
the real-mode code is moved to the low-memory during kernel
initialization, these relocation entries can be used to
relocate the code properly.

In the assembly code 16-bit segment relocations must be relative
to the 'real_mode_seg' absolute symbol. Linear relocations must be
relative to a symbol prefixed with 'pa_'.

16-bit segment relocation is used to load cs:ip in 16-bit code.
Linear relocations are used in the 32-bit code for relocatable
data references. They are declared in the linker script of the
real-mode code.

The relocs tool is moved to arch/x86/tools/relocs.c, and added new
target archscripts that can be used to build scripts needed building
an architecture.  be compiled before building the arch/x86 tree.

[ hpa: accelerating this because it detects invalid absolute
  relocations, a serious bug in binutils 2.22.52.0.x which currently
  produces bad kernels. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-2-git-send-email-jarkko.sakkinen@intel.com
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2012-05-18 19:49:40 -07:00
Linus Torvalds
3d9944978e Fix machine check recovery
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPtS5qAAoJEKurIx+X31iB+bAP/1FjUa2Nd53X89HFc6DoLktA
 4AshM/JENhAfSbpTfGhg10ZOuwUa8sQ85Sf6yz1CsW6mEiJK/bDFrR1g2KrmejyL
 owgQvV6ukPABzfB27tWyXSVVBPmLkJviedLDautVpgPEPVuqauntmpe7fW51b5pf
 92MxvYZ6AzgbDIjVaXP7e+kPomvgyM1C/UEvCgoyEcw81h5dchU9NSdXNBS67JS/
 uOsMiMJyoNI46haYYbyFgMq3RmpYuxTLFj7qFDlUltyjP+vIvyLs38Ae/vkRMNfV
 sYXWRUQlRpvqg4MDIFVZx8FWTufzTm0BMS+Be7JkXKWdF3DAksq6FprOWIxfYi+d
 PMxwTFeSJzTINb9n9MiLt3TmuRy3mu37QWd28qJaJciNMkYWbclPqyJmjwsuAMKg
 hKSy2FvewIDHTAGOkwaVjS+L8O7j3TNRIAbweFA1d1K4rt6oSfwdn6GZrAb6MTvx
 oV0Fe1nAyY9mucyjknBTim3RYZ9qQ7H9SjL8JoaGihSzi988MgBun+iEQOQiwjNl
 YJNGIv0wb31qCKFtxU0A8rA0sFdhQRoLgigJfETb5a+2ghtG323oH6ZVWTnB8bp/
 g1XOpus222/iJdL2AizMiJeUNV1IZ0SeLn2GoTFQIy9Uf11Zdgu5wLJumUva8EC6
 JAACbpBlYufftIn278OH
 =tk2M
 -----END PGP SIGNATURE-----

Merge tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull a machine check recovery fix from Tony Luck.

I really don't like how the MCE code does some of the things it does,
but this does seem to be an improvement.

* tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  x86/mce: Only restart instruction after machine check recovery if it is safe
2012-05-18 09:42:20 -07:00
Arnaldo Carvalho de Melo
16ee6576e2 Merge remote-tracking branch 'tip/perf/urgent' into perf/core
Merge reason: We are going to queue up a dependent patch:

"perf tools: Move parse event automated tests to separated object"

That depends on:

commit e7c72d8
perf tools: Add 'G' and 'H' modifiers to event parsing

Conflicts:
	tools/perf/builtin-stat.c

Conflicted with the recent 'perf_target' patches when checking the
result of perf_evsel open routines to see if a retry is needed to cope
with older kernels where the exclude guest/host perf_event_attr bits
were not used.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-18 13:13:33 -03:00
Robert Richter
5bcdf5e4fe perf/x86: Update event scheduling constraints for AMD family 15h models
This update is for newer family 15h cpu models from 0x02 to 0x1f.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: stable@vger.kernel.org # v2.6.39+
Link: http://lkml.kernel.org/r/1337337642-1621-1-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-18 14:07:46 +02:00
Jan Beulich
20167d3421 x86-64: Fix accounting in kernel_physical_mapping_init()
When finding a present and acceptable 2M/1G mapping, the number
of pages mapped this way shouldn't be incremented (as it was
already incremented when the earlier part of the mapping was
established). Instead, last_map_addr needs to be updated in this
case.

Further, address increments were wrong in one place each in both
phys_pmd_init() and phys_pud_init() (lacking the aligning down
to the respective page boundary).

As we're now doing the same calculation several times, fold it
into a single instance using a local variable (matching how
kernel_physical_mapping_init() itself does it at the PGD level).

Observed during code inspection, not because of an actual
problem.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FB3C27202000078000841A0@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-18 10:13:37 +02:00
Alex Shi
3e7f3db001 x86/tlb: Clean up and unify TLB_FLUSH_ALL definition
Since sizeof(long) is 4 in x86_32 mode, and it's 8 in x86_64
mode, sizeof(long long) is also 8 byte in x86_64 mode.
use long mode can fit TLB_FLUSH_ALL defination here both in 32
or 64 bits mode.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/n/tip-evv5bekiipi2pmyzdsy8lkkw@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-18 10:13:37 +02:00
Michael S. Tsirkin
0ab711ae6a x86/apic: Implement EIO micro-optimization
We know both register and value for eoi beforehand,
so there's no need to check it and no need to do math
to calculate the msr. Saves instructions/branches
on each EOI when using x2apic.

I looked at the objdump output to verify that the
generated code looks right and actually is shorter.

The real improvemements will be on the KVM guest side
though, those come in a later patch.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/e019d1a125316f10d3e3a4b2f6bda41473f4fb72.1337184153.git.mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-18 09:46:09 +02:00
Michael S. Tsirkin
2a43195d83 x86/apic: Add apic->eoi_write() callback
Add eoi_write callback so that kvm can override
eoi accesses without touching the rest of the apic.
As a side-effect, this will enable a micro-optimization
for apics using msr.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/0df425d746c49ac2ecc405174df87752869629d2.1337184153.git.mst@redhat.com
[ tidied it up a bit ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-18 09:46:08 +02:00
Michael S. Tsirkin
4ebcc24390 x86/apic: Use symbolic APIC_EOI_ACK
Use the symbol instead of hard-coded numbers,
now that the reason for the value is documented
where the constant is defined we don't need to
duplicate this explanation in code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/ecbe4c79d69c172378e47e5a587ff5cd10293c9f.1337184153.git.mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-18 09:46:08 +02:00
Michael S. Tsirkin
c8f64bf7df x86/apic: Fix typo EIO_ACK -> EOI_ACK and document it
Fix typo in the macro name and document the
reason it has this value. Update users.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/37867b31b9330690af2e60a2a7c4cb4b1b070caf.1337184153.git.mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-18 09:46:07 +02:00
Ingo Molnar
87e4baacae x86/xen/apic: Add missing #include <xen/xen.h>
This file depends on <xen/xen.h>, but the dependency was hidden due
to: <asm/acpi.h> -> <asm/trampoline.h> -> <asm/io.h> -> <xen/xen.h>

With the removal of <asm/trampoline.h>, this exposed the missing

Cc: Len Brown <lenb@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-7ccybvue6mw6wje3uxzzcglj@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-18 09:34:45 +02:00
Paul Gortmaker
bb8187d35f MCA: delete all remaining traces of microchannel bus support.
Hardware with MCA bus is limited to 386 and 486 class machines
that are now 20+ years old and typically with less than 32MB
of memory.  A quick search on the internet, and you see that
even the MCA hobbyist/enthusiast community has lost interest
in the early 2000 era and never really even moved ahead from
the 2.4 kernels to the 2.6 series.

This deletes anything remaining related to CONFIG_MCA from core
kernel code and from the x86 architecture.  There is no point in
carrying this any further into the future.

One complication to watch for is inadvertently scooping up
stuff relating to machine check, since there is overlap in
the TLA name space (e.g. arch/x86/boot/mca.c).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17 19:06:13 -04:00
Alan Cox
80b3e55737 x86: Fix boot on Twinhead H12Y
Despite lots of investigation into why this is needed we don't
know or have an elegant cure. The only answer found on this
laptop is to mark a problem region as used so that Linux doesn't
put anything there.

Currently all the users add reserve= command lines and anyone
not knowing this needs to find the magic page that documents it.
Automate it instead.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Tested-and-bugfixed-by: Arne Fitzenreiter <arne@fitzenreiter.de>
Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=10231
Link: http://lkml.kernel.org/r/20120515174347.5109.94551.stgit@bluebook
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-17 20:04:00 +02:00
Linus Torvalds
31ae98359d Merge branches 'perf-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf, x86 and scheduler updates from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing: Do not enable function event with enable
  perf stat: handle ENXIO error for perf_event_open
  perf: Turn off compiler warnings for flex and bison generated files
  perf stat: Fix case where guest/host monitoring is not supported by kernel
  perf build-id: Fix filename size calculation

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable
  x86: Fix section annotation of acpi_map_cpu2node()
  x86/microcode: Ensure that module is only loaded on supported Intel CPUs

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
2012-05-17 09:35:17 -07:00
Peter Zijlstra
8e7fbcbc22 sched: Remove stale power aware scheduling remnants and dysfunctional knobs
It's been broken forever (i.e. it's not scheduling in a power
aware fashion), as reported by Suresh and others sending
patches, and nobody cares enough to fix it properly ...
so remove it to make space free for something better.

There's various problems with the code as it stands today, first
and foremost the user interface which is bound to topology
levels and has multiple values per level. This results in a
state explosion which the administrator or distro needs to
master and almost nobody does.

Furthermore large configuration state spaces aren't good, it
means the thing doesn't just work right because it's either
under so many impossibe to meet constraints, or even if
there's an achievable state workloads have to be aware of
it precisely and can never meet it for dynamic workloads.

So pushing this kind of decision to user-space was a bad idea
even with a single knob - it's exponentially worse with knobs
on every node of the topology.

There is a proposal to replace the user interface with a single
3 state knob:

 sched_balance_policy := { performance, power, auto }

where 'auto' would be the preferred default which looks at things
like Battery/AC mode and possible cpufreq state or whatever the hw
exposes to show us power use expectations - but there's been no
progress on it in the past many months.

Aside from that, the actual implementation of the various knobs
is known to be broken. There have been sporadic attempts at
fixing things but these always stop short of reaching a mergable
state.

Therefore this wholesale removal with the hopes of spurring
people who care to come forward once again and work on a
coherent replacement.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-17 13:48:56 +02:00
Dave Airlie
db2e034d2c x86/vga: fix build with efi disabled.
Reported by sfr on -next merge.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-05-17 08:32:50 +01:00
Steven Rostedt
e4f5d5440b ftrace/x86: Have x86 ftrace use the ftrace_modify_all_code()
To remove duplicate code, have the ftrace arch_ftrace_update_code()
use the generic ftrace_modify_all_code(). This requires that the
default ftrace_replace_code() becomes a weak function so that an
arch may override it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-16 20:00:27 -04:00
Suresh Siddha
1dcc8d7ba2 x86, fpu: drop the fpu state during thread exit
There is no need to save any active fpu state to the task structure
memory if the task is dead. Just drop the state instead.

For example, this saved some 1770 xsave's during the system boot
of a two socket Xeon system.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-4-git-send-email-suresh.b.siddha@intel.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-16 15:20:59 -07:00
Suresh Siddha
d75f1b391f x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
Code paths like fork(), exit() and signal handling flush the fpu
state explicitly to the structures in memory.

BUG_ON() in __sanitize_i387_state() is checking that the fpu state
is not live any more. But for preempt kernels, task can be scheduled
out and in at any place and the preload_fpu logic during context switch
can make the fpu registers live again.

For example, consider a 64-bit Task which uses fpu frequently and as such
you will find its fpu_counter mostly non-zero. During its time slice, kernel
used fpu by doing kernel_fpu_begin/kernel_fpu_end(). After this, in the same
scheduling slice, task-A got a signal to handle. Then during the signal
setup path we got preempted when we are just before the sanitize_i387_state()
in arch/x86/kernel/xsave.c:save_i387_xstate(). And when we come back we
will have the fpu registers live that can hit the bug_on.

Similarly during core dump, other threads can context-switch in and out
(because of spurious wakeups while waiting for the coredump to finish in
 kernel/exit.c:exit_mm()) and the main thread dumping core can run into this
bug when it finds some other thread with its fpu registers live on some other cpu.

So remove the paranoid check for now, even though it caught a bug in the
multi-threaded core dump case (fixed in the previous patch).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-3-git-send-email-suresh.b.siddha@intel.com
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-16 15:17:17 -07:00
Suresh Siddha
55ccf3fe3f fork: move the real prepare_to_copy() users to arch_dup_task_struct()
Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
the architectures and the rest following the x86 model of flushing the extended
register state like fpu there.

Remove it and use the arch_dup_task_struct() instead.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.com
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-16 15:16:26 -07:00
Peter Jones
ab7b64e9ee x86: Don't continue booting if we can't load the specified initrd
If we've determined we can't do what the user asked, trying to do
something else isn't going to make the user's life better.

Without this the screen scrolls a bit and then you get a panic
anyway, and it's nice not to have so much scroll after the real
problem in bug reports.

Link: http://lkml.kernel.org/r/1337190206-12121-1-git-send-email-pjones@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-16 13:59:52 -07:00
Dave Airlie
6cf20beec4 x86/vga: set the default device from the fixup.
Since Matthew's efi/vga changes on non-EFI machines we were failing
to tell the vgaarb/switcheroo what the default device was, this
sets the default device in the quirk if none has been set before.

This fixes the switcheroo on my T410s.

Cc: Matthew Garrett <mjg@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-05-16 10:22:16 +01:00
Bjorn Helgaas
867aae6ebe x86/PCI: only check for spinlock being held in SMP kernels
spin_is_locked() is always false on UP kernels: spin_lock_irqsave() does no
locking, so we can't tell whether the lock is held or not.  Therefore,
this warning is only valid for SMP kernels.

CC: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-05-15 17:01:09 -06:00
Shuah Khan
363f7ce325 x86: kernel/dumpstack.c simple_strtoul cleanup
Change kstack_setup() and code_bytes_setup() in kernel/dumpstack.c
to call kstrtoul() instead of calling obsoleted simple_strtoul().

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Link: http://lkml.kernel.org/r/1336327084.2897.15.camel@lorien2
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-15 15:36:42 -07:00
Shuah Khan
5abe68e493 x86: kernel/check.c simple_strtoul cleanup
Change set_corruption_check() and set_corruption_check_period()
in kernel/check.c to call kstrtoul() instead of calling
obsoleted simple_strtoul().

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Link: http://lkml.kernel.org/r/1336326908.2897.12.camel@lorien2
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-15 15:36:41 -07:00
Eric W. Biederman
a7c1938e22 userns: Convert stat to return values mapped from kuids and kgids
- Store uids and gids with kuid_t and kgid_t in struct kstat
- Convert uid and gids to userspace usable values with
  from_kuid and from_kgid

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:08:35 -07:00
Jussi Kivilinna
ef45b83431 crypto: aesni-intel - move more common code to ablk_init_common
ablk_*_init functions share more common code than what is currently in
ablk_init_common. Move all of the common code to ablk_init_common.

Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-05-15 17:25:33 +10:00
Jussi Kivilinna
fa46ccb8eb crypto: aesni-intel - use crypto_[un]register_algs
Combine all crypto_alg to be registered and use new crypto_[un]register_algs
functions. Simplifies init/exit code and reduce object size.

Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-05-15 17:25:33 +10:00
Tony Luck
dad1743e59 x86/mce: Only restart instruction after machine check recovery if it is safe
Section 15.3.1.2 of the software developer manual has this to say about the
RIPV bit in the IA32_MCG_STATUS register:

  RIPV (restart IP valid) flag, bit 0 — Indicates (when set) that program
  execution can be restarted reliably at the instruction pointed to by the
  instruction pointer pushed on the stack when the machine-check exception
  is generated.  When clear, the program cannot be reliably restarted at
  the pushed instruction pointer.

We need to save the state of this bit in do_machine_check() and use it
in mce_notify_process() to force a signal; even if memory_failure() says
it made a complete recovery ... e.g. replaced a clean LRU page.

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-05-14 15:07:48 -07:00
Alex Shi
641b695c2f percpu: remove percpu_xxx() functions
Remove percpu_xxx serial functions, all of them were replaced by
this_cpu_xxx or __this_cpu_xxx serial functions

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-05-14 14:15:32 -07:00
Alex Shi
c6ae41e7d4 x86: replace percpu_xxx funcs with this_cpu_xxx
Since percpu_xxx() serial functions are duplicated with this_cpu_xxx().
Removing percpu_xxx() definition and replacing them by this_cpu_xxx()
in code. There is no function change in this patch, just preparation for
later percpu_xxx serial function removing.

On x86 machine the this_cpu_xxx() serial functions are same as
__this_cpu_xxx() without no unnecessary premmpt enable/disable.

Thanks for Stephen Rothwell, he found and fixed a i386 build error in
the patch.

Also thanks for Andrew Morton, he kept updating the patchset in Linus'
tree.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-05-14 14:15:31 -07:00
Alan Cox
c3709e6734 x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable
We set cpuid_level to -1 if there is no CPUID instruction (only
possible on i386).

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/20120514174059.30236.1064.stgit@bluebook
Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=12122
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-14 10:49:32 -07:00
Peter Zijlstra
316ad24830 sched/x86: Rewrite set_cpu_sibling_map()
Commit ad7687dde ("x86/numa: Check for nonsensical topologies on real
hw as well") is broken in that the condition can trigger for valid
setups but only changes the end result for invalid setups with no real
means of discerning between those.

Rewrite set_cpu_sibling_map() to make the code clearer and make sure
to only warn when the check changes the end result.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-klcwahu3gx467uhfiqjyhdcs@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-14 15:05:25 +02:00
Ingo Molnar
9cba26e66d Merge branch 'perf/uprobes' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/uprobes 2012-05-14 14:43:40 +02:00
Shai Fultheim
ead91d4b8c x86/vsmp: Fix number of CPUs when vsmp is disabled
In case CONFIG_X86_VSMP is not set, limit the number of CPUs to
the number of CPUs of the first board.

Also make CONFIG_X86_VSMP depend on CONFIG_SMP, as there's
little point in having a vsmp machine with a single CPU.

Signed-off-by: Shai Fultheim <shai@scalemp.com>
[ido@wizery.com: rebased, fixed minor coding-style issues]
Signed-off-by: Ido Yariv <ido@wizery.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-14 14:42:33 +02:00
Robert Richter
978da300c7 perf/x86/ibs: Fix undefined reference to `get_ibs_caps'
Fixing i386 allnoconfig built errors:

 arch/x86/built-in.o: In function `amd_pmu_hw_config':
 perf_event_amd.c:(.text+0xc3e1): undefined reference to `get_ibs_caps'

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-14 14:31:35 +02:00
Don Zickus
3aac27aba7 x86/reboot: Update nonmi_ipi parameter
Update the nonmi_ipi parameter to reflect the simple change
instead of the previous complicated one.  There should be less
of a need to use it but there may still be corner cases on older
hardware that stumble into NMI issues.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1336761675-24296-4-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-14 11:49:38 +02:00
Don Zickus
7d007d21e5 x86/reboot: Use NMI to assist in shutting down if IRQ fails
For v3.3, I added code to use the NMI to stop other cpus in the
panic case.  The idea was to make sure all cpus on the system
were definitely halted to help serialize the panic path to
execute the rest of the code on a single cpu.

The main problem it was trying to solve was how to stop a cpu
that was spinning with its irqs disabled.  A IPI irq would be
stuck and couldn't get in there, but an NMI could.

Things were great until we had another conversation about some
pstore changes.  Because some of the backend pstore still uses
spinlocks to protect the device access, things could get ugly if
a panic happened and we were stuck spinning on a lock.

Now with the NMI shutting down cpus, we could assume no other
cpus were running and just bust the spin lock and proceed.

The counter argument was, well if you do that the backend could
be in a screwed up state and you might not be able to save
anything as a result. If we could have just given the cpu a
little more time to finish things, we could have grabbed the
spin lock cleanly and everything would have been fine.

Well, how do give a cpu a 'little more time' in the panic case?
For the most part you can't without spinning on the lock and
even in that case, how long do you spin for?

So instead of making it ugly in the pstore code, just mimic the
idea that stop_machine had, which is block on an IRQ IPI until
the remote cpu has re-enabled interrupts and left the critical
region.  Which is what happens now using REBOOT_IRQ.

Then leave the NMI case for those cpus that are truly stuck
after a short time.  This leaves the current behaviour alone and
just handle a corner case.  Most systems should never have to
enter the NMI code and if they do, print out a message in case
the NMI itself causes another issue.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1336761675-24296-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-14 11:49:37 +02:00
Don Zickus
5d2b86d90f Revert "x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus"
This reverts commit 3603a2512f.

Originally I wanted a better hammer to shutdown cpus during
panic. However, this really steps on the toes of various
spinlocks in the panic path.  Sometimes it is easier to wait for
the IRQ to become re-enabled to indictate the cpu left the
critical region and then shutdown the cpu.

The next patch moves the NMI addition after the IRQ part.  To
make it easier to see the logic of everything, revert this patch
and apply the next simpler patch.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1336761675-24296-2-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-14 11:49:37 +02:00
Mark Brown
7563bbf89d gpiolib/arches: Centralise bolierplate asm/gpio.h
Rather than requiring architectures that use gpiolib but don't have any
need to define anything custom to copy an asm/gpio.h provide a Kconfig
symbol which architectures must select in order to include gpio.h and
for other architectures just provide the trivial implementation directly.

This makes it much easier to do gpiolib updates and is also a step towards
making gpiolib APIs available on every architecture.

For architectures with existing boilerplate code leave a stub header in
place which warns on direct inclusion of asm/gpio.h and includes
linux/gpio.h to catch code that's doing this.  Direct inclusion of
asm/gpio.h has long been deprecated.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-05-11 18:00:14 -06:00
Linus Torvalds
63f4711aec Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Avi Kivity:
 "Two asynchronous page fault fixes (one guest, one host), a powerpc
  page refcount fix, and an ia64 build fix."

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: ia64: fix build due to typo
  KVM: PPC: Book3S HV: Fix refcounting of hugepages
  KVM: Do not take reference to mm during async #PF
  KVM: ensure async PF event wakes up vcpu from halt
2012-05-09 11:14:13 -07:00
Robert Richter
8b1e13638d perf/x86-ibs: Fix usage of IBS op current count
The value of IbsOpCurCnt rolls over when it reaches IbsOpMaxCnt. Thus,
it is reset to zero by hardware. To get the correct count we need to
add the max count to it in case we received an ibs sample (valid bit
set).

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-13-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:17 +02:00
Robert Richter
fc5fb2b5e1 perf/x86-ibs: Catch spurious interrupts after stopping IBS
After disabling IBS there could be still incomming NMIs with samples
that even have the valid bit cleared. Mark all this NMIs as handled to
avoid spurious interrupt messages.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-12-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:16 +02:00
Robert Richter
c9574fe0bd perf/x86-ibs: Implement workaround for IBS erratum #420
When disabling ibs there might be the case where hardware continuously
generates interrupts. This is described in erratum #420 (Instruction-
Based Sampling Engine May Generate Interrupt that Cannot Be Cleared).
To avoid this we must clear the counter mask first and then clear the
enable bit. This patch implements this.

See Revision Guide for AMD Family 10h Processors, Publication #41322.

Note: We now keep track of the last read ibs config value which is
then used to disable ibs. To update the config value we pass now a
pointer to the functions reading it.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-11-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:16 +02:00
Robert Richter
7caaf4d824 perf/x86-ibs: Extend hw period that triggers overflow
If the last hw period is too short we might hit the irq handler which
biases the results. Thus try to have a max last period that triggers
the sw overflow.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-10-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:15 +02:00
Robert Richter
fc006cf7cc perf/x86-ibs: Trigger overflow if remaining period is too small
There are cases where the remaining period is smaller than the minimal
possible value. In this case the counter is restarted with the minimal
period. This is of no use as the interrupt handler will trigger
immediately again and most likely hits itself. This biases the
results.

So, if the remaining period is within the min range, we better do not
restart the counter and instead trigger the overflow.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-9-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:15 +02:00
Robert Richter
98112d2e95 perf/x86-ibs: Rename some variables
Simple patch that just renames some variables for better
understanding.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-8-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:14 +02:00
Robert Richter
450bbd493d perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
This patch adds support for precise event sampling with IBS. There are
two counting modes to count either cycles or micro-ops. If the
corresponding performance counter events (hw events) are setup with
the precise flag set, the request is redirected to the ibs pmu:

 perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
 perf record -a -e r076:p ...          # same as -e cpu-cycles:p
 perf record -a -e r0C1:p ...          # use ibs op counting micro-ops

Each ibs sample contains a linear address that points to the
instruction that was causing the sample to trigger. With ibs we have
skid 0. Thus, ibs supports precise levels 1 and 2. Samples are marked
with the PERF_EFLAGS_EXACT flag set. In rare cases the rip is invalid
when IBS was not able to record the rip correctly. Then the
PERF_EFLAGS_EXACT flag is cleared and the rip is taken from pt_regs.

V2:
* don't drop samples in precise level 2 if rip is invalid, instead
  support the PERF_EFLAGS_EXACT flag

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120502103309.GP18810@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:14 +02:00
Robert Richter
d47e8238cd perf/x86-ibs: Take instruction pointer from ibs sample
Each IBS sample contains a linear address of the instruction that
caused the sample to trigger. This address is more precise than the
rip that was taken from the interrupt handler's stack. Update the rip
with that address. We use this in the next patch to implement
precise-event sampling on AMD systems using IBS.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-6-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:13 +02:00
Robert Richter
6accb9cf76 perf/x86-ibs: Fix frequency profiling
Fixing profiling at a fixed frequency, in this case the freq value and
sample period was setup incorrectly. Since sampling periods are
adjusted we also allow periods that have lower 4 bits set.

Another fix is the setup of the hw counter: If we modify
hwc->sample_period, we also need to update hwc->last_period and
hwc->period_left.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-5-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:13 +02:00
Robert Richter
7bf352384f perf/x86-ibs: Enable ibs op micro-ops counting mode
Allow enabling ibs op micro-ops counting mode.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-4-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:12 +02:00
Robert Richter
fd0d000b2c perf: Pass last sampling period to perf_sample_data_init()
We always need to pass the last sample period to
perf_sample_data_init(), otherwise the event distribution will be
wrong. Thus, modifiyng the function interface with the required period
as argument. So basically a pattern like this:

        perf_sample_data_init(&data, ~0ULL);
        data.period = event->hw.last_period;

will now be like that:

        perf_sample_data_init(&data, ~0ULL, event->hw.last_period);

Avoids unininitialized data.period and simplifies code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:12 +02:00
Robert Richter
c75841a398 perf/x86-ibs: Fix update of period
The last sw period was not correctly updated on overflow and thus led
to wrong distribution of events. We always need to properly initialize
data.period in struct perf_sample_data.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-2-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:11 +02:00
Ingo Molnar
ad8537cda6 Merge branch 'perf/x86-ibs' into perf/core 2012-05-09 15:22:23 +02:00
Peter Zijlstra
cb83b629ba sched/numa: Rewrite the CONFIG_NUMA sched domain support
The current code groups up to 16 nodes in a level and then puts an
ALLNODES domain spanning the entire tree on top of that. This doesn't
reflect the numa topology and esp for the smaller not-fully-connected
machines out there today this might make a difference.

Therefore, build a proper numa topology based on node_distance().

Since there's no fixed numa layers anymore, the static SD_NODE_INIT
and SD_ALLNODES_INIT aren't usable anymore, the new code tries to
construct something similar and scales some values either on the
number of cpus in the domain and/or the node_distance() ratio.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: linux-alpha@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: sparclinux@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Greg Pearson <greg.pearson@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: bob.picco@oracle.com
Cc: chris.mason@oracle.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-r74n3n8hhuc2ynbrnp3vt954@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:00:55 +02:00
Ingo Molnar
ad7687dde8 x86/numa: Check for nonsensical topologies on real hw as well
Instead of only checking nonsensical topologies on numa-emu, do it
on real hardware as well, and print a warning.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/n/tip-re15l0jqjtpz709oxozt2zoh@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 13:32:35 +02:00
Peter Zijlstra
0acbb440f0 x86/numa: Hard partition cpu topology masks on node boundaries
When using numa=fake= you can get weird topologies where LLCs can span
nodes and other such nonsense. Cure this by hard partitioning these
masks on node boundaries.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/n/tip-di5vwjm96q5vrb76opwuflwx@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 13:28:59 +02:00
Peter Zijlstra
94c0dd3278 x86/numa: Allow specifying node_distance() for numa=fake
Allows emulating more interesting NUMA configurations like a quad
socket AMD Magny-Cour:

 "numa=fake=8:10,16,16,22,16,22,16,22,
              16,10,22,16,22,16,22,16,
              16,22,10,16,16,22,16,22,
              22,16,16,10,22,16,22,16,
              16,22,16,22,10,16,16,22,
              22,16,22,16,16,10,22,16,
              16,22,16,22,16,22,10,16,
              22,16,22,16,22,16,16,10"

Which has a non-fully-connected topology.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/n/tip-e1136ef7kdffj7yf9tjhydln@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 13:28:59 +02:00
Jan Beulich
57da8b960b x86: Avoid double stack traces with show_regs()
What was called show_registers() so far already showed a stack
trace for kernel faults, and kernel_stack_pointer() isn't even
valid to be used for faults from user mode, hence it was
pointless for show_regs() to call show_trace() after
show_registers().

Simply rename show_registers() to show_regs() and eliminate
the old definition.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/4FAA3D3902000078000826E1@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 11:44:42 +02:00
Philipp Hahn
1f0459780c atomic64_32.h: fix parameter naming mismatch
The doc string doesn't match the parameter name, fix
@p -> @v
@ptr -> @v
@n -> @i

Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-05-09 11:38:20 +02:00
Joshua Cov
b2d0b7a061 keyboard: Use BIOS Keyboard variable to set Numlock
The PC BIOS does provide a NUMLOCK flag containing the desired state
of this LED. This patch sets the current state according to the data
in the bios.

[ hpa: fixed __weak declaration without definition, changed "inline"
  to "static inline" ]

Signed-Off-By: Joshua Cov <joshuacov@googlemail.com>
Link: http://lkml.kernel.org/r/CAKL7Q7rvq87TNS1T_Km8fW_5OzS%2BSbYazLXKxW-6ztOxo3zorg@mail.gmail.com
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-05-08 14:19:41 -07:00
Linus Torvalds
4ed6cedeef Innocuous fixes:
* fix to Kconfig to make it fit within 80 line characters,
  * two bootup fixes (AMD 8-core and with PCI BIOS),
  * cleanup code in a Xen PV fb driver,
  * and a crash fix when trying to see non-existent PTE's
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPp/iaAAoJEFjIrFwIi8fJAFsH/1NVxvOAHyzyczU49U/1vi8T
 d67kIb8fFHni6HO7BiBkuM8DricGQnDhP7uC1n9waWf8jRiYsPTAbesyedTLbQos
 SLfzpsLWKilJOxWCf17cBnm6i9ScQn1ioJ6h3jFzHgNCXnvvAVYqfKHW0V6HTErH
 JL0eb4+asiZgXNSeac1gabQlai6LuBzMWaFgzRGY+hDnCQhkdQfDkD7X5zEhUUmH
 jUmtSxRx+5LkfelwRb2kHhI5j58ilOEa7YLZFQc3C+2+bUvgsG9vJDsQ3jwFaGDn
 cryfRY9WJXxgcXqk1ClOnk9U9PGzRc48gdLVLhYuLsIvUWN7RzgRlBMsH33Gq9M=
 =kjvX
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull xen fixes from Konrad Rzeszutek Wilk:
 - fix to Kconfig to make it fit within 80 line characters,
 - two bootup fixes (AMD 8-core and with PCI BIOS),
 - cleanup code in a Xen PV fb driver,
 - and a crash fix when trying to see non-existent PTE's

* tag 'stable/for-linus-3.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/Kconfig: fix Kconfig layout
  xen/pci: don't use PCI BIOS service for configuration space accesses
  xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs
  xen/apic: Return the APIC ID (and version) for CPU 0.
  drivers/video/xen-fbfront.c: add missing cleanup code
2012-05-08 11:07:29 -07:00
Linus Torvalds
e9b19cd43f Merge branch 'for-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull two percpu fixes from Tejun Heo:
 "One adds missing KERN_CONT on split printk()s and the other makes
  the percpu allocator avoid using PMD_SIZE as atom_size on x86_32.

  Using PMD_SIZE led to vmalloc area exhaustion on certain
  configurations (x86_32 android) and the only cost of using PAGE_SIZE
  instead is static percpu area not being aligned to large page
  mapping."

* 'for-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu, x86: don't use PMD_SIZE as embedded atom_size on 32bit
  percpu: use KERN_CONT in pcpu_dump_alloc_info()
2012-05-08 11:06:07 -07:00
Tejun Heo
d5e28005a1 percpu, x86: don't use PMD_SIZE as embedded atom_size on 32bit
With the embed percpu first chunk allocator, x86 uses either PAGE_SIZE
or PMD_SIZE for atom_size.  PMD_SIZE is used when CPU supports PSE so
that percpu areas are aligned to PMD mappings and possibly allow using
PMD mappings in vmalloc areas in the future.  Using larger atom_size
doesn't waste actual memory; however, it does require larger vmalloc
space allocation later on for !first chunks.

With reasonably sized vmalloc area, PMD_SIZE shouldn't be a problem
but x86_32 at this point is anything but reasonable in terms of
address space and using larger atom_size reportedly leads to frequent
percpu allocation failures on certain setups.

As there is no reason to not use PMD_SIZE on x86_64 as vmalloc space
is aplenty and most x86_64 configurations support PSE, fix the issue
by always using PMD_SIZE on x86_64 and PAGE_SIZE on x86_32.

v2: drop cpu_has_pse test and make x86_64 always use PMD_SIZE and
    x86_32 PAGE_SIZE as suggested by hpa.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yanmin Zhang <yanmin.zhang@intel.com>
Reported-by: ShuoX Liu <shuox.liu@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4F97BA98.6010001@intel.com>
Cc: stable@vger.kernel.org
2012-05-08 09:42:18 -07:00
Jan Beulich
fba60c620a x86-64: Eliminate dead ia32 syscall handlers
None of the three routines being removed here was actually
hooked up anywhere, so they all represented dead code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FA947FE020000780008247F@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-08 16:43:01 +02:00
Jan Beulich
b2a3477727 x86: Fix section annotation of acpi_map_cpu2node()
Commit 943bc7e110 ("x86: Fix section warnings") added
__cpuinitdata here, while for functions __cpuinit should be
used.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: <sp@numascale.com>
Link: http://lkml.kernel.org/r/4FA947910200007800082470@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Steffen Persvold <sp@numascale.com>
2012-05-08 16:40:26 +02:00
Thomas Gleixner
38e7c572ce x86: Use common threadinfo allocator
The only difference is the free_thread_info function, which frees
xstate.

Use the new arch_release_task_struct() function instead and switch
over to the core allocator.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120505150141.559556763@linutronix.de
Cc: x86@kernel.org
2012-05-08 14:08:44 +02:00
Thomas Gleixner
67ba5293f7 Merge branch 'smp/threadalloc' into smp/hotplug
Reason: Pull in the separate branch which was created so arch/tile can
base further work on it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-08 14:07:48 +02:00
Thomas Gleixner
6c0a9fa62f fork: Remove the weak insanity
We error out when compiling with gcc4.1.[01] as it miscompiles
__weak. The workaround with magic defines is not longer
necessary. Make it __weak again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120505150141.306358267@linutronix.de
2012-05-08 13:55:20 +02:00
Thomas Gleixner
85f7f65627 x86: Use kick_all_cpus_sync()
Use kick_all_cpus_sync() and remove cpu_idle_wait().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120507175652.190382227@linutronix.de
Cc: x86@kernel.org
2012-05-08 12:35:06 +02:00
Márton Németh
d1ecad6eee x86/apic: Only compile local function if used with !CONFIG_GENERIC_PENDING_IRQ
The local function io_apic_level_ack_pending() is only called
from io_apic_level_ack_pending(). The later function is only
compiled if CONFIG_GENERIC_PENDING_IRQ is defined. Move the
io_apic_level_ack_pending() to the existing #ifdef
CONFIG_GENERIC_PENDING_IRQ code block.

This will remove the following warning message during compiling
without CONFIG_GENERIC_PENDING_IRQ defined:

 * arch/x86/kernel/apic/io_apic.c:382: warning: ‘io_apic_level_ack_pending’ defined but not used

Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/1336461860.2296.3.camel@sbsiddha-mobl2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-08 11:23:15 +02:00
Suresh Siddha
399988eea1 irq_remap: Fix compiler warning with CONFIG_IRQ_REMAP=y
Fix the below compiler warning:

arch/x86/include/asm/irq_remapping.h:72:19: warning: ‘struct IO_APIC_route_entry’ declared inside parameter list [enabled by default]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: joro@8bytes.org
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1336460934-23592-1-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-08 11:17:29 +02:00
Shuah Khan
e826abd523 x86, microcode: microcode_core.c simple_strtoul cleanup
Change reload_for_cpu() in kernel/microcode_core.c to call kstrtoul()
instead of calling obsoleted simple_strtoul().

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1336324264.2897.9.camel@lorien2
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-07 11:36:49 -07:00
Ingo Molnar
9438ef7f4e x86/apic: Fix UP boot crash
Commit 31b3c9d723 ("xen/x86: Implement x86_apic_ops") implemented
this:

... without considering that on UP the function pointer might be NULL.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/n/tip-3pfty0ml4yp62phbkchichh0@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 19:19:56 +02:00
Wei Yang
74d24b219b resources: add resource_overlaps()
Add resource_overlaps(), which returns true if two resources overlap at all.

Use this to replace the complicated check in coalesce_windows().

Signed-Off-By: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-05-07 10:58:57 -06:00
David Vrabel
76a8df7b49 xen/pci: don't use PCI BIOS service for configuration space accesses
The accessing PCI configuration space with the PCI BIOS32 service does
not work in PV guests.

On systems without MMCONFIG or where the BIOS hasn't marked the
MMCONFIG region as reserved in the e820 map, the BIOS service is
probed (even though direct access is preferred) and this hangs.

CC: stable@kernel.org
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[v1: Fixed compile error when CONFIG_PCI is not set]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07 12:16:21 -04:00
Bjorn Helgaas
0cbaa57d82 Merge branch 'topic/stratus' into next 2012-05-07 09:23:27 -06:00
Srivatsa S. Bhat
19209bbb86 x86/sched: Make mwait_usable() heed to "idle=" kernel parameters properly
The checks that exist in mwait_usable() for "idle=" kernel
parameters are insufficient. As a result, mwait_usable() can
return 1 even if "idle=nomwait" or "idle=poll" or "idle=halt"
parameters are passed.

Of these cases, incorrect handling of idle=nomwait is a
universal problem since mwait can get used for usual CPU idling.
However the rest of the cases are problematic only during CPU
Hotplug (offline) because, in the CPU offline path, the function
mwait_play_dead() is called, which might result in mwait being
used in the offline CPUs, if mwait_usable() happens to return 1.

Fix these issues by checking for the boot time "idle=" kernel
parameter properly in mwait_usable().

The first issue (usual cpu idling) is demonstrated below:

Before applying the patch (dmesg snippet):

 [    0.000000] Command line: [...] idle=nomwait
 [    0.000000] Kernel command line: [...] idle=nomwait
 [    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
 [    0.140606] using mwait in idle threads.  <======= mwait being used
 [    4.303986] cpuidle: using governor ladder
 [    4.308232] cpuidle: using governor menu

After applying the patch:

 [    0.000000] Command line: [...] idle=nomwait
 [    0.000000] Kernel command line: [...] idle=nomwait
 [    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
 [    4.264100] cpuidle: using governor ladder
 [    4.268342] cpuidle: using governor menu

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: venki@google.com
Cc: suresh.b.siddha@intel.com
Cc: Borislav Petkov <bp@amd64.org>
Cc: lenb@kernel.org
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Link: http://lkml.kernel.org/r/4F9E37B8.30001@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 16:27:20 +02:00
Shai Fultheim
42fa425043 x86: Conditionally update time when ack-ing pending irqs
On virtual environments, apic_read could take a long time. As a
result, under certain conditions the ack pending loop may exit
without any queued irqs left, but after more than one second. A
warning will be printed needlessly in this case.

If the loop is about to exit regardless of max_loops, don't
update it.

Signed-off-by: Shai Fultheim <shai@scalemp.com>
[ rebased and reworded the commit message]
Signed-off-by: Ido Yariv <ido@wizery.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 16:25:28 +02:00
Ingo Molnar
79fec2c557 Interrupt remapping ops for x86
This patchset introduces a generic ops-interface for
 accessing interrupt remapping hardware on x86. It factors
 out the VT-d specific code from io_apic.c and moves it to
 drivers/iommu. These changes will be used to add support for
 AMD interrupt remapping hardware.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPp8ZZAAoJECvwRC2XARrj0o0P/22YaTNWvl1ZGDnO2Z913pEM
 hbE0RPsht/2paCR+oXggu3vv3D0BzY5SkC0Hmjr4po4hiwDNXHj+V1QQKHFEMwuO
 gzWatXCW/dDvIbDNHuCgG76n0kNmDMeps/MgKGrG4seV7zQ2jMfjaWFvJ2UKF/fJ
 D1Jm+7QLr+MLM/cT64WG5S/8/Pi1N6CYbdXwkPTlFTnKRS+NODWGMo9fgc7VIsI/
 9A1haZp+aV/TvGqklDlSh6CgIUBZVt/cffpftJiuH4L3o5mq1sXaFcK4Whu8PxGD
 OwmbLe4PKXl39piPe4x8NGGrj0xnk+WN17JUQpefxFIbBkCjfGakNqLxFMKbxH8k
 AtWeERge3Jd4jcfbmlXQ0GKaL2+BQUzIq0VHadHwsifCKs0KP7qG+q4Ev+02+Xbo
 gfC5UtemoffI4fXznvNIE0Hcp8EYONmtksPJ2iKOxy+t7uDlESDMDspXf1IOLZzO
 BODOdDlnsPCCWxT9RsJziVkD6C/hSZRwQIavZchBeMele4+YMpym+POlTJ5lx9KJ
 TVXYZmRv8sUgzP67s0WuDNlB2SEQq8fHNYIHrCPGqcmp/VHCFzj1nSe5jRu3FdnB
 Yu44BOoTibYss/aVy21hbB99SRMkpUWfImhH2eZLO/dahtlUMiI9ZnnYBJ62Ol7u
 bBAnLHtq8WkVZH6drSCe
 =H+Nt
 -----END PGP SIGNATURE-----

Merge tag 'intr-remapping-ops-for-ingo' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu into core/iommu

- This patchset introduces a generic ops-interface for
  accessing interrupt remapping hardware on x86. It factors
  out the VT-d specific code from io_apic.c and moves it to
  drivers/iommu. These changes will be used to add support for
  AMD interrupt remapping hardware.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 16:21:35 +02:00
Konrad Rzeszutek Wilk
b7e5ffe5d8 xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs
If I try to do "cat /sys/kernel/debug/kernel_page_tables"
I end up with:

BUG: unable to handle kernel paging request at ffffc7fffffff000
IP: [<ffffffff8106aa51>] ptdump_show+0x221/0x480
PGD 0
Oops: 0000 [#1] SMP
CPU 0
.. snip..
RAX: 0000000000000000 RBX: ffffc00000000fff RCX: 0000000000000000
RDX: 0000800000000000 RSI: 0000000000000000 RDI: ffffc7fffffff000

which is due to the fact we are trying to access a PFN that is not
accessible to us. The reason (at least in this case) was that
PGD[256] is set to __HYPERVISOR_VIRT_START which was setup (by the
hypervisor) to point to a read-only linear map of the MFN->PFN array.
During our parsing we would get the MFN (a valid one), try to look
it up in the MFN->PFN tree and find it invalid and return ~0 as PFN.
Then pte_mfn_to_pfn would happilly feed that in, attach the flags
and return it back to the caller. 'ptdump_show' bitshifts it and
gets and invalid value that it tries to dereference.

Instead of doing all of that, we detect the ~0 case and just
return !_PAGE_PRESENT.

This bug has been in existence .. at least until 2.6.37 (yikes!)

CC: stable@kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07 10:21:13 -04:00
Konrad Rzeszutek Wilk
558daa289a xen/apic: Return the APIC ID (and version) for CPU 0.
On x86_64 on AMD machines where the first APIC_ID is not zero, we get:

ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled)
BIOS bug: APIC version is 0 for CPU 1/0x10, fixing up to 0x10
BIOS bug: APIC version mismatch, boot CPU: 0, CPU 1: version 10

which means that when the ACPI processor driver loads and
tries to parse the _Pxx states it fails to do as, as it
ends up calling acpi_get_cpuid which does this:

for_each_possible_cpu(i) {
        if (cpu_physical_id(i) == apic_id)
                return i;
}

And the bootup CPU, has not been found so it fails and returns -1
for the first CPU - which then subsequently in the loop that
"acpi_processor_get_info" does results in returning an error, which
means that "acpi_processor_add" failing and per_cpu(processor)
is never set (and is NULL).

That means that when xen-acpi-processor tries to load (much much
later on) and parse the P-states it gets -ENODEV from
acpi_processor_register_performance() (which tries to read
the per_cpu(processor)) and fails to parse the data.

Reported-by-and-Tested-by:  Stefan Bader <stefan.bader@canonical.com>
Suggested-by:  Boris Ostrovsky <boris.ostrovsky@amd.com>
[v2: Bit-shift APIC ID by 24 bits]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07 10:18:47 -04:00
Shai Fultheim
ddc5681ed3 x86/cache_info: Fix setup of l2/l3 ids
On some architectures (such as vSMP), it is possible to have
CPUs with a different number of cores sharing the same cache.

The current implementation implicitly assumes that all CPUs will
have the same number of cores sharing caches, and as a result,
different CPUs can end up with the same l2/l3 ids.

Fix this by masking out the shared cache bits, instead of
shifting the APICID. By doing so, it is guaranteed that the
generated cache ids are always unique.

Signed-off-by: Shai Fultheim <shai@scalemp.com>
[ rebased, simplified, and reworded the commit message]
Signed-off-by: Ido Yariv <ido@wizery.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/1334873351-31142-1-git-send-email-ido@wizery.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 15:27:37 +02:00
Jan Beulich
b2d6aba965 x86: Allow multiple values to be specified with "hpet="
This is particularly to be able to specify "hpet=force,verbose",
as "force" ought to be a primary candidate for also wanting to
use "verbose".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/4F79D120020000780007C031@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 15:06:28 +02:00
Jan Beulich
396e2c6fed x86: Clear HPET configuration registers on startup
While Linux itself has been calling hpet_disable() for quite a
while, having e.g. a secondary (kexec) kernel depend on such
behavior of the primary (crashed) environment is fragile. It
particularly broke until very recently when the primary
environment was Xen based, as that hypervisor did not clear any
of the HPET settings it may have used.

Rather than blindly (and incompletely) clearing certain HPET
settings in hpet_disable(), latch the config register settings
during boot and restore then here.

(Note on the hpet_set_mode() change: Now that we're clearing the
level bit upon initialization, there's no need anymore to do so
here.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/4F79D0BB020000780007C02D@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 15:06:27 +02:00
Daniel Drake
c2c21e9bb1 x86/olpc/xo1/sci: Report RTC wakeup events
When the system is woken due to a RTC event, report the wakeup
event on the relevant rtc device (if it can be found).

Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: dilinger@queued.net
Cc: pgf@laptop.org
Link: http://lkml.kernel.org/r/20120418223402.D73249D401E@zog.reactivated.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 15:02:26 +02:00
Daniel Drake
d2aa37411b x86/olpc/xo1/sci: Produce wakeup events for buttons and switches
Produce wakeup events for the XO-1's power button, lid switch
and ebook switch, taking care to only produce events when the
states have changed.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: dilinger@queued.net
Cc: pgf@laptop.org
Link: http://lkml.kernel.org/r/20120412171824.D14C49D401E@zog.reactivated.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 15:02:25 +02:00
Srivatsa S. Bhat
7164b3f5e5 x86/microcode: Ensure that module is only loaded on supported Intel CPUs
Exit early when there's no support for a particular CPU family.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Jones <davej@redhat.com>
Cc: tigran@aivazian.fsnet.co.uk
Cc: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/4F8BDB58.6070007@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 14:37:14 +02:00
Suresh Siddha
8a8f422d3b iommu: rename intr_remapping.[ch] to irq_remapping.[ch]
Make the file names consistent with the naming conventions of irq subsystem.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-07 14:35:00 +02:00
Suresh Siddha
95a02e976c iommu: rename intr_remapping references to irq_remapping
Make the code consistent with the naming conventions of irq subsystem.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-07 14:35:00 +02:00
Joerg Roedel
263b5e8629 x86, iommu/vt-d: Clean up interfaces for interrupt remapping
Remove the Intel specific interfaces from dmar.h and remove
asm/irq_remapping.h which is only used for io_apic.c anyway.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-07 14:35:00 +02:00
Joerg Roedel
5e2b930b07 iommu/vt-d: Convert MSI remapping setup to remap_ops
This patch introduces remapping-ops for setting ups MSI
interrupts.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-07 14:35:00 +02:00
Joerg Roedel
9d619f6572 iommu/vt-d: Convert free_irte into a remap_ops callback
The operation for releasing a remapping entry is iommu
specific too.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-07 14:34:59 +02:00
Joerg Roedel
4c1bad6a0a iommu/vt-d: Convert IR set_affinity function to remap_ops
The function to set interrupt affinity with interrupt
remapping enabled is Intel specific too. So move it to the
irq_remap_ops too.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-07 14:34:59 +02:00
Joerg Roedel
0c3f173a88 iommu/vt-d: Convert IR ioapic-setup to use remap_ops
The IOAPIC setup routine for interrupt remapping is VT-d
specific. Move it to the irq_remap_ops and add a call helper
function.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-07 14:34:59 +02:00