Commit Graph

12946 Commits

Author SHA1 Message Date
Rusty Russell
bb4093deb2 lguest: restore boot speed
lguest is dumb and drops *all* the pagetables for set_pte (which is
only used for kernel mapping manipulation, so it's OK without highmem).

But it's used a lot in boot, too.  As a guest optimization, we
suppressed this flushing until the first page switch.  Now we have
initial_page_table, that happens much earlier, so extend the heuristic
to wait until we switch to something other than the swapper_pg_dir or
initial_page_table.

As measured on my laptop under kvm, this dropped the time-to-mount-root
from 48 seconds to 4.3 seconds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-12-16 17:03:15 +10:30
Rusty Russell
bb6f1d9a99 lguest: fix crash lguest_time_init
fe25c7fc2e "x86: lguest: Convert to new irq chip functions" converted
enable_lguest_irq() to take a struct irq_data *, but didn't fix the one
internal caller.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: x86@kernel.org
2010-12-16 17:03:14 +10:30
Andres Salomon
b5318d302f x86, olpc: Speed up device tree creation during boot
Calling alloc_bootmem() for tiny chunks of memory over and over is really
slow; on an XO-1, it caused the time between when the kernel started
booting and when the display came alive (post-lxfb probe) to increase
to 44s.  This patch optimizes the prom_early_alloc function by
calling alloc_bootmem for 4k-sized blocks of memory, and handing out
chunks of that to callers.  With this patch, the time between kernel load
and display initialization decreased to 23s.  If there's a better way to
do this early in the boot process, please let me know.

(Note: increasing the chunk size to 16k didn't noticably affect boot time,
and wasted 9k.)

v4: clarify comment, requested by hpa
v3: fix wasted memory buglet found by Milton Miller, and style fix.
v2: reorder prom_early_alloc as suggested by Grant.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20101129153951.74202a84@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-15 17:11:40 -08:00
Andres Salomon
c10d1e260f x86, olpc: Add OLPC device-tree support
Make use of PROC_DEVICETREE to export the tree, and sparc's PROMTREE code to
call into OLPC's Open Firmware to build the tree.

v5: fix buglet with root node check (introduced in v4)

v4: address some minor style issues pointed out by Grant, and explicitly cast
    negative phandle checks to s32.

v3: rename olpc_prom to olpc_dt
  - rework Kconfig entries
  - drop devtree build hook from proc, instead adding a call to x86's
    paging_init (similarly to how sparc64 does it)
  - switch allocation from using slab to alloc_bootmem.  this allows
    the DT to be built earlier during boot (during setup_arch); the
    downside is that there are some 1200 bootmem reservations that are
    done during boot.  Not ideal..
  - add a helper olpc_ofw_is_installed function to test for the
    existence and successful detection of OLPC's OFW.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20101116220952.26526a80@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-15 17:11:30 -08:00
Andres Salomon
4722d194e6 x86, of: Define irq functions to allow drivers/of/* to build on x86
- Define a stub irq_create_of_mapping for x86 as a stop-gap solution until
   drivers/of/irq is further along.
 - Define irq_dispose_mapping for x86 to appease of_i2c.c

These are needed to allow stuff in drivers/of/ to build on x86.  This stuff
will eventually get replaced; quoting Grant,

"The long term plan is to have the drivers/of/ code handling the mapping
intelligently like powerpc currently does."  But for now, just provide
these functions.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20101111214526.5de7121b@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-15 17:11:16 -08:00
Randy Dunlap
52f6c5ad43 crypto: ghash-intel - ghash-clmulni-intel_glue needs err.h
Add missing header file:

arch/x86/crypto/ghash-clmulni-intel_glue.c:256: error: implicit declaration of function 'IS_ERR'
arch/x86/crypto/ghash-clmulni-intel_glue.c:257: error: implicit declaration of function 'PTR_ERR'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-12-15 19:44:08 +08:00
Kenji Kaneshige
7f7fbf45c6 x86: Enable the intr-remap fault handling after local APIC setup
Interrupt-remapping gets enabled very early in the boot, as it determines the
apic mode that the processor can use. And the current code enables the vt-d
fault handling before the setup_local_APIC(). And hence the APIC LDR registers
and data structure in the memory may not be initialized. So the vt-d fault
handling in logical xapic/x2apic modes were broken.

Fix this by enabling the vt-d fault handling in the end_local_APIC_setup()

A cleaner fix of enabling fault handling while enabling intr-remapping
will be addressed for v2.6.38. [ Enabling intr-remapping determines the
usage of x2apic mode and the apic mode determines the fault-handling
configuration. ]

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <20101201062244.541996375@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org [v2.6.32+]
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-13 16:53:32 -08:00
Kenji Kaneshige
086e8ced65 x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode
In x2apic mode, we need to set the upper address register of the fault
handling interrupt register of the vt-d hardware. Without this
irq migration of the vt-d fault handling interrupt is broken.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org [v2.6.32+]
Acked-by: Chris Wright <chrisw@sous-sol.org>
Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-13 16:52:52 -08:00
Suresh Siddha
3fb82d56ad x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume
During suspend, we disable all the non boot cpus. And during resume we bring
them all back again. So no need to do alternatives_smp_switch() in between.

On my core 2 based laptop, this speeds up the suspend path by 15msec and the
resume path by 5 msec (suspend/resume speed up differences can be attributed
to the different P-states that the cpu is in during suspend/resume).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-13 16:23:56 -08:00
Suresh Siddha
10340ae130 x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem()
Alignment of alloc_bootmem() depends on the value of
L1_CACHE_SHIFT. What we need here, however, is 64 byte alignment.  Use
alloc_bootmem_align() and explicitly specify the alignment instead.

This fixes a kernel boot crash reported by Jody when the cpu in .config
is set to MPENTIUMII but the kernel is booted on a xsave-capable CPU.

Reported-by: Jody Bruchon <jody@nctritech.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20101116212442.059967454@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
2010-12-13 16:13:11 -08:00
H. Peter Anvin
de2a8cf98e x86, gcc-4.6: Use gcc -m options when building vdso
The vdso Makefile passes linker-style -m options not to the linker but
to gcc.  This happens to work with earlier gcc, but fails with gcc
4.6.  Pass gcc-style -m options, instead.

Note: all currently supported versions of gcc supports -m32, so there
is no reason to conditionalize it any more.

Reported-by: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-*@git.kernel.org>
Cc: <stable@kernel.org>
2010-12-13 16:08:37 -08:00
Don Zickus
5f29805a4f x86, watchdog: Compile fix when CONFIG_LOCAL_APIC not enabled
When adjusting the code to handle removing the old nmi watchdog,
I forgot to consider the compile case when the local apic is not
enabled.

This change fixes the following build error:

  arch/x86/kernel/apic/hw_nmi.c:28:6: error: redefinition of ‘touch_nmi_watchdog’

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <20101213153719.GD18577@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-13 18:23:23 +01:00
Thomas Gleixner
f1c18071ad x86: HPET: Chose a paranoid safe value for the ETIME check
commit 995bd3bb5 (x86: Hpet: Avoid the comparator readback penalty)
chose 8 HPET cycles as a safe value for the ETIME check, as we had the
confirmation that the posted write to the comparator register is
delayed by two HPET clock cycles on Intel chipsets which showed
readback problems.

After that patch hit mainline we got reports from machines with newer
AMD chipsets which seem to have an even longer delay. See
http://thread.gmane.org/gmane.linux.kernel/1054283 and
http://thread.gmane.org/gmane.linux.kernel/1069458 for further
information.

Boris tried to come up with an ACPI based selection of the minimum
HPET cycles, but this failed on a couple of test machines. And of
course we did not get any useful information from the hardware folks.

For now our only option is to chose a paranoid high and safe value for
the minimum HPET cycles used by the ETIME check. Adjust the minimum ns
value for the HPET clockevent accordingly.

Reported-Bistected-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6>
Cc: Simon Kirby <sim@hostway.ca>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com>
Cc: John Stultz <johnstul@us.ibm.com>
2010-12-13 13:42:44 +01:00
Tadeusz Struk
3c097b8008 crypto: aesni-intel - Fixed build with binutils 2.16
This patch fixes the problem with 2.16 binutils.

Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com>
Signed-off-by: Adrian Hoban <adrian.hoban@intel.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-12-13 19:51:15 +08:00
Thomas Gleixner
a8760eca6c x86: Check tsc available/disabled in the delayed init function
The delayed TSC init function does not check whether the system has no
TSC or TSC is disabled at the kernel command line, which results in a
crash in the work queue based extended calibration due to division by
zero because the basic calibration never happened.

Add the missing checks and do not touch TSC when not available or
disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>
2010-12-13 11:35:05 +01:00
Tejun Heo
0aa002fe60 x86: apic: Cleanup and simplify setup_local_APIC()
setup_local_APIC() is used to setup local APIC early during CPU
initialization and already assumes that preemption is disabled on
entry. However, The function unnecessarily disables and enables
preemption and uses smp_processor_id() multiple times in and out of
the nested preemption disabled section. This gives the wrong
impression that the function might be able to handle being called with
preemption enabled and/or migrated to another processor in the middle.

Make it clear that the function is always called with preemption
disabled, drop the confusing preemption disable block and call
smp_processor_id() once at the beginning of the function.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: brgerst@gmail.com
LKML-Reference: <4D00B3B9.7060702@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-10 13:46:26 +01:00
Don Zickus
5dc3055879 x86, NMI: Add back unknown_nmi_panic and nmi_watchdog sysctls
Originally adapted from Huang Ying's patch which moved the
unknown_nmi_panic to the traps.c file.  Because the old nmi
watchdog was deleted before this change happened, the
unknown_nmi_panic sysctl was lost.  This re-adds it.

Also, the nmi_watchdog sysctl was re-implemented and its
documentation updated accordingly.

Patch-inspired-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1291068437-5331-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-10 00:01:06 +01:00
Don Zickus
96a84c20d6 lockup detector: Compile fixes from removing the old x86 nmi watchdog
My patch that removed the old x86 nmi watchdog broke other
arches.  This change reverts a piece of that patch and puts the
change in the correct spot.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: fweisbec@gmail.com
Cc: yinghai@kernel.org
LKML-Reference: <1291068437-5331-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-10 00:01:06 +01:00
Feng Tang
0e3fa13f4e x86: Further simplify mp_irq info handling
assign_to_mp_irq() is copying the struct mpc_intsrc members one by
one. That's silly. Use memcpy() and let the compiler figure it out.
Same for the identical function assign_to_mpc_intsrc()

mp_irq_mpc_intsrc_cmp() is comparing the struct members one by one,
but no caller ever checks the different return codes. Use memcmp()
instead.

Remove the extra printk in MP_ioapic_info()

Signed-off-by: Feng Tang <feng.tang@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Alan Cox <alan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <20101208151857.212f0018@feng-i7>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:06 +01:00
Feng Tang
2d8009ba67 x86: Unify 3 similar ways of saving mp_irqs info
There are 3 places defining similar functions of saving IRQ vector
info into mp_irqs[] array: mmparse/acpi/mrst.

Replace the redundant code by a common function in io_apic.c as it's
only called when CONFIG_X86_IO_APIC=y

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101207133204.4d913c5a@feng-i7>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:06 +01:00
Yinghai Lu
60d79fd99f x86, ioapic: Avoid writing io_apic id if already correct
For 32bit mptable path, setup_ids_from_mpc() always writes the io_apic
id register, even there is no change needed.

Skip the write, when readout and mptable match.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
LKML-Reference: <4CFDF785.7010401@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:05 +01:00
Yinghai Lu
0450193bff x86, x2apic: Don't map lapic addr for preenabled x2apic systems
If x2apic is preenabled and used by the kernel, we don't need to map
the lapic address. That mapping will never be used.

So just skip that in register_lapic_address()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4CFDF69C.9070501@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:05 +01:00
Yinghai Lu
53301f36f3 x86, sfi: Use register_lapic_address()
register_lapic_address() and mp_sfi_register_lapic_address() are
almost identical. Use the common function.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4CFDF693.6000908@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:05 +01:00
Yinghai Lu
326a2e6bae x86, apic: Use register_lapic_address() in init_apic_mapping()
Remove the printk as well, we don't want to print when nothing
changed. We print in register_lapic_address() already.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4CFDF68A.7020902@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:05 +01:00
Yinghai Lu
f115714163 x86, apic: Remove early_init_lapic_mapping()
It is almost the same as smp_register_lapic_addr(). We just need to
let smp_read_mpc() call smp_register_lapic_addr() when early==1.

Add the apic_printk to smp_register_lapic_address()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4CFDF681.3030509@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:04 +01:00
Yinghai Lu
c0104d38a7 x86, apic: Unify identical register_lapic_address() functions
They are the same, move the common function to apic.c to allow
further cleanups.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4CFDF675.4060305@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:04 +01:00
Thomas Gleixner
51ddafcbc7 Merge branch 'x86/platform' into x86/apic-cleanups
Reason: apic cleanup series depends on x86/apic, x86/amd-nb and x86/platform

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 18:19:21 +01:00
Thomas Gleixner
d834a9dcec Merge branch 'x86/amd-nb' into x86/apic-cleanups
Reason: apic cleanup series depends on x86/apic, x86/amd-nb x86/platform

Conflicts:
	arch/x86/include/asm/io_apic.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 18:17:25 +01:00
Thomas Gleixner
4720dd1b38 x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n
arch/x86/kernel/apic/io_apic.c: In function 'ack_apic_level':
arch/x86/kernel/apic/io_apic.c:2433: warning: unused variable 'desc'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <201010272107.o9RL7rse018212@imap1.linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 17:43:21 +01:00
Rakib Mullick
2c6cb1053a x86: Address 'unused' warning in hw_nmi.c again
arch/x86/kernel/apic/hw_nmi.c:29: warning: backtrace_mask defined but not used

commit 0e2af2a9(x86, hw_nmi: Move backtrace_mask declaration under
ARCH_HAS_NMI_WATCHDOG) addressed this warning, but it was reintroduced
by commit 5f2b0ba4(x86, nmi_watchdog: Remove the old nmi_watchdog).

Move backtrace_mask into the #ifdef arch_trigger_all_cpu_backtrace
section again.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <AANLkTi=rcc38QzoKa6LFy4m++-p_9=Zt4_kDQE=GeKxf@mail.gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 16:06:52 +01:00
Peter Zijlstra
c079c791c5 perf, amd: Remove the nb lock
Since all the hotplug stuff is serialized by the hotplug mutex,
do away with the amd_nb_lock.

Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08 20:16:30 +01:00
Andre Przywara
73c1160ce3 KVM: enlarge number of possible CPUID leaves
Currently the number of CPUID leaves KVM handles is limited to 40.
My desktop machine (AthlonII) already has 35 and future CPUs will
expand this well beyond the limit. Extend the limit to 80 to make
room for future processors.

KVM-Stable-Tag.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-12-08 17:28:38 +02:00
Joerg Roedel
24d1b15f72 KVM: SVM: Do not report xsave in supported cpuid
To support xsave properly for the guest the SVM module need
software support for it. As long as this is not present do
not report the xsave as supported feature in cpuid.
As a side-effect this patch moves the bit() helper function
into the x86.h file so that it can be used in svm.c too.

KVM-Stable-Tag.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-12-08 17:28:37 +02:00
Sheng Yang
3ea3aa8cf6 KVM: Fix OSXSAVE after migration
CPUID's OSXSAVE is a mirror of CR4.OSXSAVE bit. We need to update the CPUID
after migration.

KVM-Stable-Tag.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-12-08 17:28:31 +02:00
Linus Torvalds
6313e3c217 Merge branches 'x86-fixes-for-linus', 'perf-fixes-for-linus' and 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/pvclock: Zero last_value on resume

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf record: Fix eternal wait for stillborn child
  perf header: Don't assume there's no attr info if no sample ids is provided
  perf symbols: Figure out start address of kernel map from kallsyms
  perf symbols: Fix kallsyms kernel/module map splitting

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  nohz: Fix printk_needs_cpu() return value on offline cpus
  printk: Fix wake_up_klogd() vs cpu hotplug
2010-12-08 06:40:59 -08:00
Ingo Molnar
10a18d7dc0 Merge commit 'v2.6.37-rc5' into perf/core
Merge reason: Pick up the latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-07 07:49:51 +01:00
Feng Tang
991cfffa7c x86, earlyprintk: Move mrst early console to platform/ and fix a typo
Move the code to arch/x86/platform/mrst/. Also fix a typo to use
the correct config option: ONFIG_EARLY_PRINTK_MRST

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: alan@linux.intel.com
LKML-Reference: <1291348298-21263-1-git-send-email-feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-06 20:52:04 +01:00
Masami Hiramatsu
f984ba4eb5 kprobes: Use text_poke_smp_batch for unoptimizing
Use text_poke_smp_batch() on unoptimization path for reducing
the number of stop_machine() issues. If the number of
unoptimizing probes is more than MAX_OPTIMIZE_PROBES(=256),
kprobes unoptimizes first MAX_OPTIMIZE_PROBES probes and kicks
optimizer for remaining probes.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20101203095434.2961.22657.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:32 +01:00
Masami Hiramatsu
cd7ebe2298 kprobes: Use text_poke_smp_batch for optimizing
Use text_poke_smp_batch() in optimization path for reducing
the number of stop_machine() issues. If the number of optimizing
probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes optimizes
first MAX_OPTIMIZE_PROBES probes and kicks optimizer for
remaining probes.

Changes in v5:
- Use kick_kprobe_optimizer() instead of directly calling
  schedule_delayed_work().
- Rescheduling optimizer outside of kprobe mutex lock.

Changes in v2:
- Allocate code buffer and parameters in arch_init_kprobes()
  instead of using static arraies.
- Merge previous max optimization limit patch into this patch.
  So, this patch introduces upper limit of optimization at
  once.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20101203095428.2961.8994.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:31 +01:00
Masami Hiramatsu
7deb18dcf0 x86: Introduce text_poke_smp_batch() for batch-code modifying
Introduce text_poke_smp_batch(). This function modifies several
text areas with one stop_machine() on SMP. Because calling
stop_machine() is heavy task, it is better to aggregate
text_poke requests.

( Note: I've talked with Rusty about this interface, and
  he would not like to expand stop_machine() interface, since
  it is not for generic use. )

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:31 +01:00
Masami Hiramatsu
6274de4984 kprobes: Support delayed unoptimizing
Unoptimization occurs when a probe is unregistered or disabled,
and is heavy because it recovers instructions by using
stop_machine(). This patch delays unoptimization operations and
unoptimize several probes at once by using
text_poke_smp_batch(). This can avoid unexpected system slowdown
coming from stop_machine().

Changes in v5:
- Split this patch into several cleanup patches and this patch.
- Fix some text_mutex lock miss.
- Use bool instead of int for behavior flags.
- Add additional comment for (un)optimizing path.

Changes in v2:
- Use dynamic allocated buffers and params.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20101203095409.2961.82733.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:30 +01:00
Feng Tang
e4d2ebcab1 x86, apbt: Setup affinity for apb timers acting as per-cpu timer
Commit a5ef2e70 "x86: Sanitize apb timer interrupt handling" forgot
the affinity setup when cleaning up the code, this patch just
adds the forgotten part

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
LKML-Reference: <1291348298-21263-2-git-send-email-feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-06 15:58:26 +01:00
Dirk Brandewie
5ec6960f6f ce4100: Add errata fixes for UART on CE4100
This patch enables the UART on the CE4100. The UART has a couple of
issues that need to be worked around. First the UART is mostly PC
compatible except that it is clocked eight times faster than a
standard PC so the default configuration provided in
arch/x86/include/asm/serial.h needs to be overridden. Second the TX
interrupt may not be set correctly all the time. Lastly accessing the
UART via I/O space for early_prink() hangs the chip when the IOAPIC is
enabled.

A custom mem_serial_in() is provided to work around the TX interrupt
issue. The configuration issues are dealt with in the call back
registered with the 8250 driver via serial8250_set_isa_configurator()

Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
LKML-Reference: <1290436128-17958-1-git-send-email-dirk.brandewie@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-06 15:58:26 +01:00
Sebastian Andrzej Siewior
a38c5380ef x86: io_apic: Split setup_ioapic_ids_from_mpc()
Sodaville needs to setup the IO_APIC ids as the boot loader leaves
them uninitialized. Split out the setter function so it can be called
unconditionally from the sodaville board code.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101126165020.GA26361@www.tglx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-06 14:30:28 +01:00
Linus Torvalds
11e8896474 Merge branch '2.6.37-rc4-pvhvm-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* '2.6.37-rc4-pvhvm-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
  xen: unplug the emulated devices at resume time
  xen: fix save/restore for PV on HVM guests with pirq remapping
  xen: resume the pv console for hvm guests too
  xen: fix MSI setup and teardown for PV on HVM guests
  xen: use PHYSDEVOP_get_free_pirq to implement find_unbound_pirq
2010-12-03 11:30:57 -08:00
Linus Torvalds
8338fded13 Merge branches 'upstream/core' and 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: allocate irq descs on any NUMA node
  xen: prevent crashes with non-HIGHMEM 32-bit kernels with largeish memory
  xen: use default_idle
  xen: clean up "extra" memory handling some more

* 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: x86/32: perform initial startup on initial_page_table
  xen: don't bother to stop other cpus on shutdown/reboot
2010-12-03 10:08:52 -08:00
John Stultz
08ec0c58fb x86: Improve TSC calibration using a delayed workqueue
Boot to boot the TSC calibration may vary by quite a large amount.

While normal variance of 50-100ppm can easily be seen, the quick
calibration code only requires 500ppm accuracy, which is the limit
of what NTP can correct for.

This can cause problems for systems being used as NTP servers, as
every time they reboot it can take hours for them to calculate the
new drift error caused by the calibration.

The classic trade-off here is calibration accuracy vs slow boot times,
as during the calibration nothing else can run.

This patch uses a delayed workqueue  to calibrate the TSC over the
period of a second. This allows very accurate calibration (in my
tests only varying by 1khz or 0.4ppm boot to boot). Additionally this
refined calibration step does not block the boot process, and only
delays the TSC clocksoure registration by a few seconds in early boot.
If the refined calibration strays 1% from the early boot calibration
value, the system will fall back to already calculated early boot
calibration.

Credit to Andi Kleen who suggested using a timer quite awhile back,
but I dismissed it thinking the timer calibration would be done after
the clocksource was registered (which would break things). Forgive
me for my short-sightedness.

This patch has worked very well in my testing, but TSC hardware is
quite varied so it would probably be good to get some extended
testing, possibly pushing inclusion out to 2.6.39.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1289003985-29060-1-git-send-email-johnstul@us.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Clark Williams <williams@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
2010-12-02 16:48:37 -08:00
John Stultz
b0f969009f Merge remote branch 'tip/x86/tsc' into fortglx/2.6.38/tip/x86/tsc
Conflicts:
	Documentation/kernel-parameters.txt
2010-12-02 16:47:52 -08:00
Jeremy Fitzhardinge
64141da587 vmalloc: eagerly clear ptes on vunmap
On stock 2.6.37-rc4, running:

  # mount lilith:/export /mnt/lilith
  # find  /mnt/lilith/ -type f -print0 | xargs -0 file

crashes the machine fairly quickly under Xen.  Often it results in oops
messages, but the couple of times I tried just now, it just hung quietly
and made Xen print some rude messages:

    (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
    3000000000000000) for mfn 1d7058 (pfn 18fa7)
    (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
    (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
    1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
    (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04

Which means the domain tried to map a pagetable page RW, which would
allow it to map arbitrary memory, so Xen stopped it.  This is because
vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
finished with them, and those pages got recycled as pagetable pages
while still having these RW aliases.

Removing those mappings immediately removes the Xen-visible aliases, and
so it has no problem with those pages being reused as pagetable pages.
Deferring the TLB flush doesn't upset Xen because it can flush the TLB
itself as needed to maintain its invariants.

When unmapping a region in the vmalloc space, clear the ptes
immediately.  There's no point in deferring this because there's no
amortization benefit.

The TLBs are left dirty, and they are flushed lazily to amortize the
cost of the IPIs.

This specific motivation for this patch is an oops-causing regression
since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
of vm_map_ram() introduced in 56e4ebf877 ("NFS: readdir with vmapped
pages") .  XFS also uses vm_map_ram() and could cause similar problems.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-02 14:51:15 -08:00
Stefano Stabellini
512b109ec9 xen: unplug the emulated devices at resume time
Early after being resumed we need to unplug again the emulated devices.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-12-02 14:40:53 +00:00
Stefano Stabellini
af42b8d12f xen: fix MSI setup and teardown for PV on HVM guests
When remapping MSIs into pirqs for PV on HVM guests, qemu is responsible
for doing the actual mapping and unmapping.
We only give qemu the desired pirq number when we ask to do the mapping
the first time, after that we should be reading back the pirq number
from qemu every time we want to re-enable the MSI.

This fixes a bug in xen_hvm_setup_msi_irqs that manifests itself when
trying to enable the same MSI for the second time: the old MSI to pirq
mapping is still valid at this point but xen_hvm_setup_msi_irqs would
try to assign a new pirq anyway.
A simple way to reproduce this bug is to assign an MSI capable network
card to a PV on HVM guest, if the user brings down the corresponding
ethernet interface and up again, Linux would fail to enable MSIs on the
device.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-12-02 14:34:25 +00:00
Ian Campbell
805e3f4950 xen: x86/32: perform initial startup on initial_page_table
Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
(which also starts on initial_page_table) switches to it.  This helps ensure
that the generic setup paths work on Xen unmodified. In particular
clone_pgd_range writes directly to the destination pgd entries and is used to
initialise swapper_pg_dir so we need to ensure that it remains writeable until
the last possible moment during bring up.

This is complicated slightly by the need to avoid sharing kernel PMD entries
when running under Xen, therefore the Xen implementation must make a copy of
the kernel PMD (which is otherwise referred to by both intial_page_table and
swapper_pg_dir) before switching to swapper_pg_dir.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-29 17:07:53 -08:00
Jeremy Fitzhardinge
31e323cca9 xen: don't bother to stop other cpus on shutdown/reboot
Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's
no need to do it manually.

In any case it will fail because all the IPI irqs have been pulled
down by this point, so the cross-CPU calls will simply hang forever.

Until change 76fac077db the function calls
were not synchronously waited for, so this wasn't apparent.  However after
that change the calls became synchronous leading to a hang on shutdown
on multi-VCPU guests.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Alok Kataria <akataria@vmware.com>
2010-11-29 14:16:53 -08:00
Mathias Krause
559ad0ff13 crypto: aesni-intel - Fixed build error on x86-32
Exclude AES-GCM code for x86-32 due to heavy usage of 64-bit registers
not available on x86-32.

While at it, fixed unregister order in aesni_exit().

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-11-29 08:35:39 +08:00
Linus Torvalds
a9e40a2493 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix the software context switch counter
  perf, x86: Fixup Kconfig deps
  x86, perf, nmi: Disable perf if counters are not accessible
  perf: Fix inherit vs. context rotation bug
2010-11-28 12:25:02 -08:00
Jeremy Fitzhardinge
e7a3481c02 x86/pvclock: Zero last_value on resume
If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).

Make sure we zero last_value in that case so that the domain
continues to see clock updates.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-28 09:33:20 +01:00
Mathias Krause
0d258efb6a crypto: aesni-intel - Ported implementation to x86-32
The AES-NI instructions are also available in legacy mode so the 32-bit
architecture may profit from those, too.

To illustrate the performance gain here's a short summary of a dm-crypt
speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
implementations:

x86:                   i568       aes-ni    delta
ECB, 256 bit:     93.8 MB/s   123.3 MB/s   +31.4%
CBC, 256 bit:     84.8 MB/s   262.3 MB/s  +209.3%
LRW, 256 bit:    108.6 MB/s   222.1 MB/s  +104.5%
XTS, 256 bit:    105.0 MB/s   205.5 MB/s   +95.7%

Additionally, due to some minor optimizations, the 64-bit version also
got a minor performance gain as seen below:

x86-64:           old impl.    new impl.    delta
ECB, 256 bit:    121.1 MB/s   123.0 MB/s    +1.5%
CBC, 256 bit:    285.3 MB/s   290.8 MB/s    +1.9%
LRW, 256 bit:    263.7 MB/s   265.3 MB/s    +0.6%
XTS, 256 bit:    251.1 MB/s   255.3 MB/s    +1.7%

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-11-27 16:34:46 +08:00
Linus Torvalds
fbe6c4047f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  dmar, x86: Use function stubs when CONFIG_INTR_REMAP is disabled
  x86-64: Fix and clean up AMD Fam10 MMCONF enabling
  x86: UV: Address interrupt/IO port operation conflict
  x86: Use online node real index in calulate_tbl_offset()
  x86, asm: Fix binutils 2.15 build failure
2010-11-27 07:28:47 +09:00
Linus Torvalds
d2f30c73ab Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf symbols: Remove incorrect open-coded container_of()
  perf record: Handle restrictive permissions in /proc/{kallsyms,modules}
  x86/kprobes: Prevent kprobes to probe on save_args()
  irq_work: Drop cmpxchg() result
  perf: Fix owner-list vs exit
  x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG
  tracing: Fix recursive user stack trace
  perf,hw_breakpoint: Initialize hardware api earlier
  x86: Ignore trap bits on single step exceptions
  tracing: Force arch_local_irq_* notrace for paravirt
  tracing: Fix module use of trace_bprintk()
2010-11-27 07:28:17 +09:00
Cyrill Gorcunov
af86da5318 perf, x86: P4 PMU - describe config format
Add description of .config in a sake of RAW events.
At least this should bring some light to those who
will be reading this code.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:14:57 +01:00
Peter Zijlstra
004417a6d4 perf, arch: Cleanup perf-pmu init vs lockup-detector
The perf hardware pmu got initialized at various points in the boot,
some before early_initcall() some after (notably arch_initcall).

The problem is that the NMI lockup detector is ran from early_initcall()
and expects the hardware pmu to be present.

Sanitize this by moving all architecture hardware pmu implementations to
initialize at early_initcall() and move the lockup detector to an explicit
initcall right after that.

Cc: paulus <paulus@samba.org>
Cc: davem <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290707759.2145.119.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:14:56 +01:00
Andi Kleen
5ef428c4b5 x86: Set cpu masks before calling CPU_STARTING notifiers
When booting up a CPU set the various topology masks before
calling the CPU_STARTING notifier. This way the notifier
can actually use the masks.

This is needed for a perf change.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290077254-12165-2-git-send-email-andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:14:56 +01:00
Franck Bui-Huu
6c7e550f13 perf: Introduce is_sampling_event()
and use it when appropriate.

Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290525705-6265-1-git-send-email-fbuihuu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:14:54 +01:00
Ingo Molnar
6c869e772c Merge branch 'perf/urgent' into perf/core
Conflicts:
	arch/x86/kernel/apic/hw_nmi.c

Merge reason: Resolve conflict, queue up dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:07:02 +01:00
Ingo Molnar
e4e91ac410 Merge commit 'v2.6.37-rc3' into perf/core
Merge reason: Pick up latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:04:47 +01:00
Peter Zijlstra
cc2067a514 perf, x86: Fixup Kconfig deps
This leads to a Kconfig dep inversion, x86 selects PERF_EVENT (due to
a hw_breakpoint dep) but doesn't unconditionally provide
HAVE_PERF_EVENT.

(This can cause build failures on M386/M486 kernel .config's.)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101117222055.982965150@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:00:58 +01:00
Don Zickus
33c6d6a7ad x86, perf, nmi: Disable perf if counters are not accessible
In a kvm virt guests, the perf counters are not emulated.  Instead they
return zero on a rdmsrl. The perf nmi handler uses the fact that crossing
a zero means the counter overflowed (for those counters that do not have
specific interrupt bits). Therefore on kvm guests, perf will swallow all
NMIs thinking the counters overflowed.

This causes problems for subsystems like kgdb which needs NMIs to do its
magic. This problem was discovered by running kgdb tests.

The solution is to write garbage into a perf counter during the
initialization and hopefully reading back the same number.  On kvm
guests, the value will be read back as zero and we disable perf as
a result.

Reported-by: Jason Wessel <jason.wessel@windriver.com>
Patch-inspired-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <1290462923-30734-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:00:57 +01:00
Linus Torvalds
8a3fbc9fdb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: remove duplicated #include
  xen: x86/32: perform initial startup on initial_page_table
2010-11-25 08:35:53 +09:00
Andrew Morton
91d95fda85 arch/x86/include/asm/fixmap.h: mark __set_fixmap_offset as __always_inline
When compiling arch/x86/kernel/early_printk_mrst.c with i386
allmodconfig, gcc-4.1.0 generates an out-of-line copy of
__set_fixmap_offset() which contains a reference to
__this_fixmap_does_not_exist which the compiler cannot elide.

Marking __set_fixmap_offset() as __always_inline prevents this.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Feng Tang <feng.tang@intel.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-25 06:50:49 +09:00
Huang Weiyi
e6d4a76dbf xen: remove duplicated #include
Remove duplicated #include('s) in
  arch/x86/xen/setup.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-11-24 12:07:45 -05:00
Ian Campbell
5b5c1af104 xen: x86/32: perform initial startup on initial_page_table
Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
(which also starts on initial_page_table) switches to it.  This helps ensure
that the generic setup paths work on Xen unmodified. In particular
clone_pgd_range writes directly to the destination pgd entries and is used to
initialise swapper_pg_dir so we need to ensure that it remains writeable until
the last possible moment during bring up.

This is complicated slightly by the need to avoid sharing kernel PMD entries
when running under Xen, therefore the Xen implementation must make a copy of
the kernel PMD (which is otherwise referred to by both intial_page_table and
swapper_pg_dir) before switching to swapper_pg_dir.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-11-24 12:07:44 -05:00
Linus Torvalds
a4ec046c98 Merge branch 'upstream/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (23 commits)
  xen/events: Use PIRQ instead of GSI value when unmapping MSI/MSI-X irqs.
  xen: set IO permission early (before early_cpu_init())
  xen: re-enable boot-time ballooning
  xen/balloon: make sure we only include remaining extra ram
  xen/balloon: the balloon_lock is useless
  xen: add extra pages to balloon
  xen: make evtchn's name less generic
  xen/evtchn: the evtchn device is non-seekable
  Revert "xen/privcmd: create address space to allow writable mmaps"
  xen/events: use locked set|clear_bit() for cpu_evtchn_mask
  xen/evtchn: clear secondary CPUs' cpu_evtchn_mask[] after restore
  xen/xenfs: update xenfs_mount for new prototype
  xen: fix header export to userspace
  xen: implement XENMEM_machphys_mapping
  xen: set vma flag VM_PFNMAP in the privcmd mmap file_op
  xen: xenfs: privcmd: check put_user() return code
  xen/evtchn: add missing static
  xen/evtchn: Fix name of Xen event-channel device
  xen/evtchn: don't do unbind_from_irqhandler under spinlock
  xen/evtchn: remove spurious barrier
  ...
2010-11-24 08:23:18 +09:00
Jeremy Fitzhardinge
bc15fde77f xen: use default_idle
We just need the idle loop to drop into safe_halt, which default_idle()
is perfectly capable of doing.  There's no need to duplicate it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-22 17:19:34 -08:00
Jeremy Fitzhardinge
c2d0879112 xen: clean up "extra" memory handling some more
Make sure that extra_pages is added for all E820_RAM regions beyond
mem_end - completely excluded regions as well as the remains of partially
included regions.

Also makes sure the extra region is not unnecessarily high, and simplifies
the logic to decide which regions should be added.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-22 16:34:28 -08:00
Jeremy Fitzhardinge
9b8321531a Merge branches 'upstream/core', 'upstream/xenfs' and 'upstream/evtchn' into upstream/for-linus
* upstream/core:
  xen/events: Use PIRQ instead of GSI value when unmapping MSI/MSI-X irqs.
  xen: set IO permission early (before early_cpu_init())
  xen: re-enable boot-time ballooning
  xen/balloon: make sure we only include remaining extra ram
  xen/balloon: the balloon_lock is useless
  xen: add extra pages to balloon
  xen/events: use locked set|clear_bit() for cpu_evtchn_mask
  xen/evtchn: clear secondary CPUs' cpu_evtchn_mask[] after restore
  xen: implement XENMEM_machphys_mapping

* upstream/xenfs:
  Revert "xen/privcmd: create address space to allow writable mmaps"
  xen/xenfs: update xenfs_mount for new prototype
  xen: fix header export to userspace
  xen: set vma flag VM_PFNMAP in the privcmd mmap file_op
  xen: xenfs: privcmd: check put_user() return code

* upstream/evtchn:
  xen: make evtchn's name less generic
  xen/evtchn: the evtchn device is non-seekable
  xen/evtchn: add missing static
  xen/evtchn: Fix name of Xen event-channel device
  xen/evtchn: don't do unbind_from_irqhandler under spinlock
  xen/evtchn: remove spurious barrier
  xen/evtchn: ports start enabled
  xen/evtchn: dynamically allocate port_user array
  xen/evtchn: track enabled state for each port
2010-11-22 12:22:42 -08:00
Konrad Rzeszutek Wilk
ec35a69c46 xen: set IO permission early (before early_cpu_init())
This patch is based off "xen dom0: Set up basic IO permissions for dom0."
by Juan Quintela <quintela@redhat.com>.

On AMD machines when we boot the kernel as Domain 0 we get this nasty:

mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) traps.c:475:d0 Unhandled general protection fault fault/trap [#13] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.1-101116  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff8130271b>]
(XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest
(XEN) rax: 000000008000c068   rbx: ffffffff8186c680   rcx: 0000000000000068
(XEN) rdx: 0000000000000cf8   rsi: 000000000000c000   rdi: 0000000000000000
(XEN) rbp: ffffffff81801e98   rsp: ffffffff81801e50   r8:  ffffffff81801eac
(XEN) r9:  ffffffff81801ea8   r10: ffffffff81801eb4   r11: 00000000ffffffff
(XEN) r12: ffffffff8186c694   r13: ffffffff81801f90   r14: ffffffffffffffff
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 0000000221803000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81801e50:

RIP points to read_pci_config() function.

The issue is that we don't set IO permissions for the Linux kernel early enough.

The call sequence used to be:

    xen_start_kernel()
	x86_init.oem.arch_setup = xen_setup_arch;
        setup_arch:
           - early_cpu_init
               - early_init_amd
                  - read_pci_config
           - x86_init.oem.arch_setup [ xen_arch_setup ]
               - set IO permissions.

We need to set the IO permissions earlier on, which this patch does.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-11-22 12:10:31 -08:00
Lin Ming
691513f70d x86: Resume trampoline must be executable
commit 5bd5a452(x86: Add NX protection for kernel data) marked the
trampoline area NX - which unsurprisingly breaks resume and cpu
hotplug.

Revert the portion of that commit, which touches the trampoline.

Originally-from: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <1290410581.2405.24.camel@minggr.sh.intel.com>
Cc: Matthieu Castet <castet.matthieu@free.fr>
Cc: Siarhei Liakh <sliakh.lkml@gmail.com>
Cc: Xuxian Jiang <jiang@cs.ncsu.edu>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andi Kleen <andi@firstfloor.org>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-22 14:38:52 +01:00
Thomas Gleixner
9cdca86972 x86: platform: Move iris to x86/platform where it belongs
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-20 10:37:05 +01:00
Jeremy Fitzhardinge
d2a817130c xen: re-enable boot-time ballooning
Now that the balloon driver doesn't stumble over non-RAM pages, we
can enable the extra space for ballooning.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-19 23:28:08 -08:00
Vasiliy Kulikov
5ca9afdb9f x86, mrst: Check platform_device_register() return code
platform_device_register() may fail, if so propagate the return
code from mrst_device_create().

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
LKML-Reference: <1290104207-31279-1-git-send-email-segoon@openwall.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-18 13:45:46 -08:00
Ingo Molnar
ae51ce9061 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-11-18 20:07:12 +01:00
Linus Torvalds
fb3ff69d13 Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: Fix host userspace gsbase corruption
  KVM: Correct ordering of ldt reload wrt fs/gs reload
2010-11-18 09:45:47 -08:00
Linus Torvalds
2d42dc3feb Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kgdb,ppc: Fix regression in evr register handling
  kgdb,x86: fix regression in detach handling
  kdb: fix crash when KDB_BASE_CMD_MAX is exceeded
  kdb: fix memory leak in kdb_main.c
2010-11-18 08:24:58 -08:00
Hans Rosenfeld
f658bcfb26 x86, cacheinfo: Cleanup L3 cache index disable support
Adaptions to the changes of the AMD northbridge caching code: instead
of a bool in each l3 struct, use a flag in amd_northbridges.flags to
indicate L3 cache index disable support; use a pointer to the whole
northbridge instead of the misc device in the l3 struct; simplify the
initialisation; dynamically generate sysfs attribute array.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:06 +01:00
Hans Rosenfeld
9653a5c76c x86, amd-nb: Cleanup AMD northbridge caching code
Support more than just the "Misc Control" part of the northbridges.
Support more flags by turning "gart_supported" into a single bit flag
that is stored in a flags member. Clean up related code by using a set
of functions (amd_nb_num(), amd_nb_has_feature() and node_to_amd_nb())
instead of accessing the NB data structures directly. Reorder the
initialization code and put the GART flush words caching in a separate
function.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:05 +01:00
Hans Rosenfeld
eec1d4fa00 x86, amd-nb: Complete the rename of AMD NB and related code
Not only the naming of the files was confusing, it was even more so for
the function and variable names.

Renamed the K8 NB and NUMA stuff that is also used on other AMD
platforms. This also renames the CONFIG_K8_NUMA option to
CONFIG_AMD_NUMA and the related file k8topology_64.c to
amdtopology_64.c. No functional changes intended.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:04 +01:00
Soeren Sandmann Pedersen
9c0729dc80 x86: Eliminate bp argument from the stack tracing routines
The various stack tracing routines take a 'bp' argument in which the
caller is supposed to provide the base pointer to use, or 0 if doesn't
have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not
defined, this means all callers in principle should either always pass
0, or be conditional on CONFIG_FRAME_POINTER.

However, there are only really three use cases for stack tracing:

(a) Trace the current task, including IRQ stack if any
(b) Trace the current task, but skip IRQ stack
(c) Trace some other task

In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just
be 0.  If it _is_ defined, then

- in case (a) bp should be gotten directly from the CPU's register, so
  the caller should pass NULL for regs,

- in case (b) the caller should should pass the IRQ registers to
  dump_trace(),

- in case (c) bp should be gotten from the top of the task's stack, so
  the caller should pass NULL for regs.

Hence, the bp argument is not necessary because the combination of
task and regs is sufficient to determine an appropriate value for bp.

This patch introduces a new inline function stack_frame(task, regs)
that computes the desired bp. This function is then called from the
two versions of dump_stack().

Signed-off-by: Soren Sandmann <ssp@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@infradead.org>,
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>,
LKML-Reference: <m3oc9rop28.fsf@dhcp-100-3-82.bos.redhat.com>>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-11-18 14:37:34 +01:00
Jan Beulich
37db6c8f1d x86-64: Fix and clean up AMD Fam10 MMCONF enabling
Candidate memory ranges were not calculated properly (start
addresses got needlessly rounded down, and end addresses didn't
get rounded up at all), address comparison for secondary CPUs
was done on only part of the address, and disabled status wasn't
tracked properly.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <4CE24DF40200007800022737@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 13:41:35 +01:00
Masami Hiramatsu
de31ec8a31 x86/kprobes: Prevent kprobes to probe on save_args()
Prevent kprobes to probe on save_args() since this function
will be called from breakpoint exception handler. That will
cause infinit loop on breakpoint handling.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20101118101655.2779.2816.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 13:40:19 +01:00
matthieu castet
84e1c6bb38 x86: Add RO/NX protection for loadable kernel modules
This patch is a logical extension of the protection provided by
CONFIG_DEBUG_RODATA to LKMs. The protection is provided by
splitting module_core and module_init into three logical parts
each and setting appropriate page access permissions for each
individual section:

 1. Code: RO+X
 2. RO data: RO+NX
 3. RW data: RW+NX

In order to achieve proper protection, layout_sections() have
been modified to align each of the three parts mentioned above
onto page boundary. Next, the corresponding page access
permissions are set right before successful exit from
load_module(). Further, free_module() and sys_init_module have
been modified to set module_core and module_init as RW+NX right
before calling module_free().

By default, the original section layout and access flags are
preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y,
the patch will page-align each group of sections to ensure that
each page contains only one type of content and will enforce
RO/NX for each group of pages.

  -v1: Initial proof-of-concept patch.
  -v2: The patch have been re-written to reduce the number of #ifdefs
       and to make it architecture-agnostic. Code formatting has also
       been corrected.
  -v3: Opportunistic RO/NX protection is now unconditional. Section
       page-alignment is enabled when CONFIG_DEBUG_RODATA=y.
  -v4: Removed most macros and improved coding style.
  -v5: Changed page-alignment and RO/NX section size calculation
  -v6: Fixed comments. Restricted RO/NX enforcement to x86 only
  -v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added
       calls to set_all_modules_text_rw() and set_all_modules_text_ro()
       in ftrace
  -v8: updated for compatibility with linux 2.6.33-rc5
  -v9: coding style fixes
 -v10: more coding style fixes
 -v11: minor adjustments for -tip
 -v12: minor adjustments for v2.6.35-rc2-tip
 -v13: minor adjustments for v2.6.37-rc1-tip

Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F914.9070106@free.fr>
[ minor cleanliness edits, -v14: build failure fix ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 13:32:56 +01:00
Matthieu Castet
5bd5a45266 x86: Add NX protection for kernel data
This patch expands functionality of CONFIG_DEBUG_RODATA to set main
(static) kernel data area as NX.

The following steps are taken to achieve this:

 1. Linker script is adjusted so .text always starts and ends on a page bound
 2. Linker script is adjusted so .rodata always start and end on a page boundary
 3. NX is set for all pages from _etext through _end in mark_rodata_ro.
 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c
 5. bios rom is set to x when pcibios is used.

The results of patch application may be observed in the diff of kernel page
table dumps:

pcibios:

 -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
 ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
  0x00000000-0xc0000000           3G                           pmd
  ---[ Kernel Mapping ]---
 -0xc0000000-0xc0100000           1M     RW             GLB x  pte
 +0xc0000000-0xc00a0000         640K     RW             GLB NX pte
 +0xc00a0000-0xc0100000         384K     RW             GLB x  pte
 -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
 +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
 +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
 -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
 +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
  0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
  0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
  0xf7bfe000-0xf7c00000           8K                           pte

No pcibios:

 -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
 ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
  0x00000000-0xc0000000           3G                           pmd
  ---[ Kernel Mapping ]---
 -0xc0000000-0xc0100000           1M     RW             GLB x  pte
 +0xc0000000-0xc0100000           1M     RW             GLB NX pte
 -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
 +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
 +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
 -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
 +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
  0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
  0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
  0xf7bfe000-0xf7c00000           8K                           pte

The patch has been originally developed for Linux 2.6.34-rc2 x86 by
Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.

 -v1:  initial patch for 2.6.30
 -v2:  patch for 2.6.31-rc7
 -v3:  moved all code into arch/x86, adjusted credits
 -v4:  fixed ifdef, removed credits from CREDITS
 -v5:  fixed an address calculation bug in mark_nxdata_nx()
 -v6:  added acked-by and PT dump diff to commit log
 -v7:  minor adjustments for -tip
 -v8:  rework with the merge of "Set first MB as RW+NX"

Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: James Morris <jmorris@namei.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F82E.60601@free.fr>
[ minor cleanliness edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 12:52:04 +01:00
matthieu castet
64edc8ed5f x86: Fix improper large page preservation
This patch fixes a bug in try_preserve_large_page() which may
result in improper large page preservation and improper
application of page attributes to the memory area outside of the
original change request.

More specifically, the problem manifests itself when set_memory_*()
is called for several pages at the beginning of the large page and
try_preserve_large_page() erroneously concludes that the change can
be applied to whole large page.

The fix consists of 3 parts:

  1. Addition of "required" protection attributes in
     static_protections(), so .data and .bss can be guaranteed to
     stay "RW"

  2. static_protections() is now called for every small
     page within large page to determine compatibility of new
     protection attributes (instead of just small pages within the
     requested range).

  3. Large page can be preserved only if attribute change is
     large-page-aligned and covers whole large page.

 -v1: Try_preserve_large_page() patch for Linux 2.6.34-rc2
 -v2: Replaced pfn check with address check for kernel rw-data

Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: James Morris <jmorris@namei.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F7F3.8030809@free.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 12:52:04 +01:00
Dimitri Sivanich
8191c9f692 x86: UV: Address interrupt/IO port operation conflict
This patch for SGI UV systems addresses a problem whereby
interrupt transactions being looped back from a local IOH,
through the hub to a local CPU can (erroneously) conflict with
IO port operations and other transactions.

To workaound this we set a high bit in the APIC IDs used for
interrupts. This bit appears to be ignored by the sockets, but
it avoids the conflict in the hub.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20101116222352.GA8155@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
___

 arch/x86/include/asm/uv/uv_hub.h   |    4 ++++
 arch/x86/include/asm/uv/uv_mmrs.h  |   19 ++++++++++++++++++-
 arch/x86/kernel/apic/x2apic_uv_x.c |   25 +++++++++++++++++++++++--
 arch/x86/platform/uv/tlb_uv.c      |    2 +-
 arch/x86/platform/uv/uv_time.c     |    4 +++-
 5 files changed, 49 insertions(+), 5 deletions(-)
2010-11-18 10:41:25 +01:00
Ingo Molnar
fcf48a725a Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent 2010-11-18 10:37:51 +01:00
Yinghai Lu
9223081f54 x86: Use online node real index in calulate_tbl_offset()
Found a NUMA system that doesn't have RAM installed at the first
socket which hangs while executing init scripts.

bisected it to:

 | commit 9329672021
 | Author: Shaohua Li <shaohua.li@intel.com>
 | Date:   Wed Oct 20 11:07:03 2010 +0800
 |
 |     x86: Spread tlb flush vector between nodes

It turns out when first socket is not online it could have cpus on
node1 tlb_offset set to bigger than NUM_INVALIDATE_TLB_VECTORS.

That could affect systems like 4 sockets, but socket 2 doesn't
have installed, sockets 3 will get too big tlb_offset.

Need to use real online node idx.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Shaohua Li <shaohua.li@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CDEDE59.40603@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 10:10:50 +01:00
Shérab
82148d1d0b x86/platform: Add Eurobraille/Iris power off support
The Iris machines from Eurobraille do not have APM or ACPI support
to shut themselves down properly.  A special I/O sequence is
needed to do so.  This modle runs this I/O sequence at
kernel shutdown when its force parameter is set to 1.

Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
[ did minor coding style edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 10:03:24 +01:00
Kees Cook
79250af2d5 x86: Fix included-by file reference comments
Adjust the paths for files that are including verify_cpu.S.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
LKML-Reference: <1289931004-16066-1-git-send-email-kees.cook@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 09:58:54 +01:00
Tetsuo Handa
96e612ffc3 x86, asm: Fix binutils 2.15 build failure
Add parentheses around one pushl_cfi argument.

Commit df5d1874 "x86: Use {push,pop}{l,q}_cfi in more places"
caused GNU assembler 2.15 (Debian Sarge) to fail. It is still
failing as of commit 07bd8516 "x86, asm: Restore parentheses
around one pushl_cfi argument". This patch solves build failure
with GNU assembler 2.15.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Jan Beulich <jbeulich@novell.com>
Cc: heukelum@fastmail.fm
Cc: hpa@linux.intel.com
LKML-Reference: <201011160445.oAG4jGif079860@www262.sakura.ne.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 09:25:11 +01:00
Rakib Mullick
0e2af2a9ab x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG
backtrace_mask has been used under the code context of
ARCH_HAS_NMI_WATCHDOG. So put it into that context.
We were warned by the following warning:

  arch/x86/kernel/apic/hw_nmi.c:21: warning: ‘backtrace_mask’ defined but not used

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <1289573455-3410-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 09:15:12 +01:00
Don Zickus
072b198a4a x86, nmi_watchdog: Remove all stub function calls from old nmi_watchdog
Now that the bulk of the old nmi_watchdog is gone, remove all
the stub variables and hooks associated with it.

This touches lots of files mainly because of how the io_apic
nmi_watchdog was implemented.  Now that the io_apic nmi_watchdog
is forever gone, remove all its fingers.

Most of this code was not being exercised by virtue of
nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to
risky here.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: gorcunov@openvz.org
LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 09:08:23 +01:00
Don Zickus
5f2b0ba4d9 x86, nmi_watchdog: Remove the old nmi_watchdog
Now that we have a new nmi_watchdog that is more generic and
sits on top of the perf subsystem, we really do not need the old
nmi_watchdog any more.

In addition, the old nmi_watchdog doesn't really work if you are
using the default clocksource, hpet.  The old nmi_watchdog code
relied on local apic interrupts to determine if the cpu is still
alive.  With hpet as the clocksource, these interrupts don't
increment any more and the old nmi_watchdog triggers false
postives.

This piece removes the old nmi_watchdog code and stubs out any
variables and functions calls.  The stubs are the same ones used
by the new nmi_watchdog code, so it should be well tested.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: gorcunov@openvz.org
LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 09:08:23 +01:00
Ingo Molnar
a89d4bd055 Merge branch 'tip/perf/urgent-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent 2010-11-18 08:07:36 +01:00
Avi Kivity
c8770e7ba6 KVM: VMX: Fix host userspace gsbase corruption
We now use load_gs_index() to load gs safely; unfortunately this also
changes MSR_KERNEL_GS_BASE, which we managed separately.  This resulted
in confusion and breakage running 32-bit host userspace on a 64-bit kernel.

Fix by
- saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs
- doing the host save/load unconditionally, instead of only when in guest
  long mode

Things can be cleaned up further, but this is the minmal fix for now.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-11-17 19:48:05 -02:00
Avi Kivity
0a77fe4c18 KVM: Correct ordering of ldt reload wrt fs/gs reload
If fs or gs refer to the ldt, they must be reloaded after the ldt.  Reorder
the code to that effect.

Userspace code that uses the ldt with kvm is nonexistent, so this doesn't fix
a user-visible bug.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-11-17 19:47:59 -02:00
Jason Wessel
10a6e67648 kgdb,x86: fix regression in detach handling
The fix from ba773f7c51
(x86,kgdb: Fix hw breakpoint regression) was not entirely complete.

The kgdb_remove_all_hw_break() function also needs to call the
hw_break_release_slot() or else a breakpoint can get activated again
after the debugger has detached.

The kgdb test suite exposes the behavior in the form of either a hang
or repetitive failure.  The kernel config that exposes the problem
contains all of the following:

CONFIG_DEBUG_RODATA=y
CONFIG_KGDB_TESTS=y
CONFIG_KGDB_TESTS_ON_BOOT=y
CONFIG_KGDB_TESTS_BOOT_STRING="V1F100"

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-11-17 13:54:57 -06:00
Arnd Bergmann
451a3c24b0 BKL: remove extraneous #include <smp_lock.h>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-17 08:59:32 -08:00
Randy Dunlap
ad02519a0d x86, mrst: Fix dependencies of "select INTEL_SCU_IPC"
commit b9fc71f47 (x86, mrst: The shutdown for MRST requires the SCU
IPC mechanism) introduced the following warning:

 warning: (X86_MRST && PCI && PCI_GOANY && X86_32 &&
 X86_EXTENDED_PLATFORM && X86_IO_APIC) selects INTEL_SCU_IPC which has
 unmet direct dependencies (X86 && X86_PLATFORM_DEVICES && X86_MRST)

which is due to the hierarchical menu structure.

Select X86_PLATFORM_DEVICES as well.

Originally-from: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20101115101406.77e072ef.randy.dunlap@oracle.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
2010-11-17 13:41:44 +01:00
Alan Cox
b9fc71f47d x86, mrst: The shutdown for MRST requires the SCU IPC mechanism
Fix the build failure reported by Randy.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
LKML-Reference: <20101115173110.6877.83958.stgit@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-17 13:27:44 +01:00
Jeremy Fitzhardinge
20b4755e4f Merge commit 'v2.6.37-rc2' into upstream/xenfs
* commit 'v2.6.37-rc2': (10093 commits)
  Linux 2.6.37-rc2
  capabilities/syslog: open code cap_syslog logic to fix build failure
  i2c: Sanity checks on adapter registration
  i2c: Mark i2c_adapter.id as deprecated
  i2c: Drivers shouldn't include <linux/i2c-id.h>
  i2c: Delete unused adapter IDs
  i2c: Remove obsolete cleanup for clientdata
  include/linux/kernel.h: Move logging bits to include/linux/printk.h
  Fix gcc 4.5.1 miscompiling drivers/char/i8k.c (again)
  hwmon: (w83795) Check for BEEP pin availability
  hwmon: (w83795) Clear intrusion alarm immediately
  hwmon: (w83795) Read the intrusion state properly
  hwmon: (w83795) Print the actual temperature channels as sources
  hwmon: (w83795) List all usable temperature sources
  hwmon: (w83795) Expose fan control method
  hwmon: (w83795) Fix fan control mode attributes
  hwmon: (lm95241) Check validity of input values
  hwmon: Change mail address of Hans J. Koch
  PCI: sysfs: fix printk warnings
  GFS2: Fix inode deallocation race
  ...
2010-11-16 11:06:22 -08:00
Linus Torvalds
e5c13537b0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: sysfs: fix printk warnings
  PCI: fix pci_bus_alloc_resource() hang, prefer positive decode
  PCI: read current power state at enable time
  PCI: fix size checks for mmap() on /proc/bus/pci files
  x86/PCI: coalesce overlapping host bridge windows
  PCI hotplug: ibmphp: Add check to prevent reading beyond mapped area
2010-11-15 14:01:33 -08:00
Tadeusz Struk
0bd82f5f63 crypto: aesni-intel - RFC4106 AES-GCM Driver Using Intel New Instructions
This patch adds an optimized RFC4106 AES-GCM implementation for 64-bit
kernels. It supports 128-bit AES key size. This leverages the crypto
AEAD interface type to facilitate a combined AES & GCM operation to
be implemented in assembly code. The assembly code leverages Intel(R)
AES New Instructions and the PCLMULQDQ instruction.

Signed-off-by: Adrian Hoban <adrian.hoban@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com>
Signed-off-by: Erdinc Ozturk <erdinc.ozturk@intel.com>
Signed-off-by: James Guilford <james.guilford@intel.com>
Signed-off-by: Wajdi Feghali <wajdi.k.feghali@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-11-13 21:47:55 +09:00
Linus Torvalds
891cbd30ef Merge branch 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: do not release any memory under 1M in domain 0
  xen: events: do not unmask event channels on resume
  xen: correct size of level2_kernel_pgt
2010-11-12 16:01:55 -08:00
Linus Torvalds
b5c5510436 Merge branch 'stable/xen-pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/xen-pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  MAINTAINERS: Mark XEN lists as moderated
  xen-pcifront: fix PCI reference leak
  xen-pcifront: Remove duplicate inclusion of headers.
  xen: fix memory leak in Xen PCI MSI/MSI-X allocator.
  MAINTAINERS: Update mailing list name for Xen pieces.
2010-11-12 15:54:39 -08:00
Ian Campbell
7e77506a59 xen: implement XENMEM_machphys_mapping
This hypercall allows Xen to specify a non-default location for the
machine to physical mapping. This capability is used when running a 32
bit domain 0 on a 64 bit hypervisor to shrink the hypervisor hole to
exactly the size required.

[ Impact: add Xen hypercall definitions ]

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-11-12 15:00:06 -08:00
Linus Torvalds
25a34554d6 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pvclock: Remove leftover scale_delta() function
  x86, apic: Remove double #include
  x86: Adjust section annotations in AMD Fam10 MMCONF enabling code
  x86, UV: Update node controller MMRs
  x86: Remove unnecessary casts of void ptr returning alloc function return values
  x86: Address gcc4.6 "set but not used" warnings in apic.h
  x86, mm: Fix section mismatch in tlb.c
2010-11-12 08:40:23 -08:00
Frederic Weisbecker
6c0aca288e x86: Ignore trap bits on single step exceptions
When a single step exception fires, the trap bits, used to
signal hardware breakpoints, are in a random state.

These trap bits might be set if another exception will follow,
like a breakpoint in the next instruction, or a watchpoint in the
previous one. Or there can be any junk there.

So if we handle these trap bits during the single step exception,
we are going to handle an exception twice, or we are going to
handle junk.

Just ignore them in this case.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=21332

Reported-by: Michael Stefaniuc <mstefani@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: All since 2.6.33.x <stable@kernel.org>
2010-11-12 14:51:01 +01:00
Dirk Brandewie
37bc9f5078 x86: Ce4100: Add reboot_fixup() for CE4100
This patch adds the CE4100 reboot fixup to reboot_fixups_32.c

[ tglx: Moved PCI id to reboot_fixups_32.c ]

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
LKML-Reference: <5bdcfb4f0206fa721570504e95659a03b815bc5e.1289331834.git.dirk.brandewie@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-12 00:45:41 +01:00
Dirk Brandewie
91d8037f56 ce4100: Add PCI register emulation for CE4100
This patch provides access methods for PCI registers that mis-behave on
the CE4100. Each register can be assigned a private init, read and
write routine. The exception to this is the bridge device. The
bridge device is the only device on bus zero (0) that requires any
fixup so it is a special case.

[ tglx: minor coding style cleanups, __init annotation and
  	simplification of ce4100_conf_read/write ]

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
LKML-Reference: <40b6751381c2275dc359db5a17989cce22ad8db7.1289331834.git.dirk.brandewie@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-12 00:45:41 +01:00
Thomas Gleixner
c751e17b53 x86: Add CE4100 platform support
Add CE4100 platform support. CE4100 needs early setup like
moorestown.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
LKML-Reference: <94720fd7f5564a12ebf202cf2c4f4c0d619aab35.1289331834.git.dirk.brandewie@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-12 00:45:41 +01:00
Stefano Stabellini
e060e7af98 xen: set vma flag VM_PFNMAP in the privcmd mmap file_op
Set VM_PFNMAP in the privcmd mmap file_op, rather than later in
xen_remap_domain_mfn_range when it is too late because
vma_wants_writenotify has already been called and vm_page_prot has
already been modified.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-11 12:37:43 -08:00
Bjorn Helgaas
4723d0f2f9 x86/PCI: coalesce overlapping host bridge windows
Some BIOSes provide PCI host bridge windows that overlap, e.g.,

    pci_root PNP0A03:00: host bridge window [mem 0xb0000000-0xffffffff]
    pci_root PNP0A03:00: host bridge window [mem 0xafffffff-0xdfffffff]
    pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xffffffff]

If we simply insert these as children of iomem_resource, the second window
fails because it conflicts with the first, and the third is inserted as a
child of the first, i.e.,

    b0000000-ffffffff PCI Bus 0000:00
      f0000000-ffffffff PCI Bus 0000:00

When we claim PCI device resources, this can cause collisions like this
if we put them in the first window:

    pci 0000:00:01.0: address space collision: [mem 0xff300000-0xff4fffff] conflicts with PCI Bus 0000:00 [mem 0xf0000000-0xffffffff]

Host bridge windows are top-level resources by definition, so it doesn't
make sense to make the third window a child of the first.  This patch
coalesces any host bridge windows that overlap.  For the example above,
the result is this single window:

    pci_root PNP0A03:00: host bridge window [mem 0xafffffff-0xffffffff]

This fixes a 2.6.34 regression.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=17011
Reported-and-tested-by: Anisse Astier <anisse@astier.eu>
Reported-and-tested-by: Pramod Dematagoda <pmd.lotr.gandalf@gmail.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-11-11 09:34:31 -08:00
Feng Tang
6f207e9bb4 x86: mrst: Set vRTC's IRQ to level trigger type
When setting up the mpc_intsrc structure for vRTC's IRQ,
we need to set its irqflag to level trigger, otherwise
it will be taken as edge triggered and the vRTC IRQ will
fire only once, as there is never a EOI issued from the
IA core for it.

The original code worked in previous kernel. This is  because it
was configured to level trigger type by luck. It fell
into the default PCI trigger category which is level triggered.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
LKML-Reference: <20101111155019.12924.569.stgit@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-11 17:43:18 +01:00
Vinod Koul
86071535f8 x86: mrst: Add audio driver bindings
This patch adds the sound card bindings for Moorestown (pmic_audio) and
the Medfield platform (msic_audio) as IPC devices. This ensures they will be
created at the right time.

Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
LKML-Reference: <20101110174044.11340.78008.stgit@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-11 11:34:28 +01:00
Feng Tang
0146f26145 rtc: Add drivers/rtc/rtc-mrst.c
Provide the standard kernel rtc driver interface on top of the vrtc layer
added in the previous patch.

Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <20101110172911.3311.20593.stgit@localhost.localdomain>
[Fixed swapped arguments on IPC]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
[Cleaned up and the device creation moved to arch/x86/platform]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-11 11:34:27 +01:00
Feng Tang
7309282c90 x86: mrst: Add vrtc driver which serves as a wall clock device
Moorestown platform doesn't have a m146818 RTC device like traditional
x86 PC, but a firmware emulated virtual RTC device(vrtc), which provides
some basic RTC functions like get/set time. vrtc serves as the only
wall clock device on Moorestown platform.

[ tglx: Changed the exports to _GPL ]

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
LKML-Reference: <20101110172837.3311.40483.stgit@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-11 11:34:27 +01:00
Alek Du
cfb505a7eb x86: mrst: Add Moorestown specific reboot/shutdown support
Moorestowns needs to use a special IPC command to reboot or shutdown the
platform.

Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
LKML-Reference: <20101110164928.6365.94243.stgit@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-11 11:34:27 +01:00
Steven Rostedt
b590854853 tracing: Force arch_local_irq_* notrace for paravirt
When running ktest.pl randconfig tests, I would sometimes trigger
a lockdep annotation bug (possible reason: unannotated irqs-on).

This triggering happened right after function tracer self test was
executed. After doing a config bisect I found that this was caused with
having function tracer, paravirt guest, prove locking, and rcu torture
all enabled.

The rcu torture just enhanced the likelyhood of triggering the bug.
Prove locking was needed, since it was the thing that was bugging.
Function tracer would trace and disable interrupts in all sorts
of funny places.
paravirt guest would turn arch_local_irq_* into functions that would
be traced.

Besides the fact that tracing arch_local_irq_* is just a bad idea,
this is what is happening.

The bug happened simply in the local_irq_restore() code:

		if (raw_irqs_disabled_flags(flags)) {	\
			raw_local_irq_restore(flags);	\
			trace_hardirqs_off();		\
		} else {				\
			trace_hardirqs_on();		\
			raw_local_irq_restore(flags);	\
		}					\

The raw_local_irq_restore() was defined as arch_local_irq_restore().

Now imagine, we are about to enable interrupts. We go into the else
case and call trace_hardirqs_on() which tells lockdep that we are enabling
interrupts, so it sets the current->hardirqs_enabled = 1.

Then we call raw_local_irq_restore() which calls arch_local_irq_restore()
which gets traced!

Now in the function tracer we disable interrupts with local_irq_save().
This is fine, but flags is stored that we have interrupts disabled.

When the function tracer calls local_irq_restore() it does it, but this
time with flags set as disabled, so we go into the if () path.
This keeps interrupts disabled and calls trace_hardirqs_off() which
sets current->hardirqs_enabled = 0.

When the tracer is finished and proceeds with the original code,
we enable interrupts but leave current->hardirqs_enabled as 0. Which
now breaks lockdeps internal processing.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-11-10 22:29:49 -05:00
Ian Campbell
9ec23a7f6d xen: do not release any memory under 1M in domain 0
We already deliberately setup a 1-1 P2M for the region up to 1M in
order to allow code which assumes this region is already mapped to
work without having to convert everything to ioremap.

Domain 0 should not return any apparently unused memory regions
(reserved or otherwise) in this region to Xen since the e820 may not
accurately reflect what the BIOS has stashed in this region.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-10 17:19:25 -08:00
Kees Cook
6036f373ea x86, cpu: Only CPU features determine NX capabilities
Fix the NX feature boot warning when NX is missing to correctly
reflect that BIOSes cannot disable NX now.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-5-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-10 15:43:15 -08:00
Kees Cook
ebba638ae7 x86, cpu: Call verify_cpu during 32bit CPU startup
The XD_DISABLE-clearing side-effect needs to happen for both 32bit
and 64bit, but the 32bit init routines were not calling verify_cpu()
yet. This adds that call to gain the side-effect.

The longmode/SSE tests being performed in verify_cpu() need to happen very
early for 64bit but not for 32bit. Instead of including it in two places
for 32bit, we can just include it once in arch/x86/kernel/head_32.S.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-4-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-10 15:43:09 -08:00
Kees Cook
ae84739c27 x86, cpu: Clear XD_DISABLED flag on Intel to regain NX
Intel CPUs have an additional MSR bit to indicate if the BIOS was
configured to disable the NX cpu feature. This bit was traditionally
used for operating systems that did not understand how to handle the
NX bit. Since Linux understands this, this BIOS flag should be ignored
by default.

In a review[1] of reported hardware being used by Ubuntu bug reporters,
almost 10% of systems had an incorrectly configured BIOS, leaving their
systems unable to use the NX features of their CPU.

This change will clear the MSR_IA32_MISC_ENABLE_XD_DISABLE bit so that NX
cannot be inappropriately controlled by the BIOS on Intel CPUs. If, under
very strange hardware configurations, NX actually needs to be disabled,
"noexec=off" can be used to restore the prior behavior.

[1] http://www.outflux.net/blog/archives/2010/02/18/data-mining-for-nx-bit/

Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-3-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-10 15:42:54 -08:00
Kees Cook
c5cbac6942 x86, cpu: Rename verify_cpu_64.S to verify_cpu.S
The code is 32bit already, and can be used in 32bit routines.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-2-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-10 15:42:42 -08:00
Peter Zijlstra
034c6efa46 perf, amd: Use kmalloc_node(,__GFP_ZERO) for northbridge structure allocation
Jasper suggested we use the zeroing capability of the allocators
instead of calling memset ourselves. Add node affinity while we're at
it.

Reported-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 22:58:40 +01:00
Borislav Petkov
c7657ac0c3 x86, microcode, AMD: Cleanup code a bit
get_ucode_data is a memcpy() wrapper which always returns 0. Move it
into the header and make it an inline. Remove all code checking its
return value and turn it into a void.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-10 14:54:54 +01:00
Jesper Juhl
1ea6be212e x86, microcode, AMD: Replace vmalloc+memset with vzalloc
We don't have to do memset() ourselves after vmalloc() when we have
vzalloc(), so change that in
arch/x86/kernel/microcode_amd.c::get_next_ucode().

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-10 14:48:57 +01:00
Kusanagi Kouichi
1f523bf367 x86, pvclock: Remove leftover scale_delta() function
Commit 92580d64e16402762e2acc3022f065397c780425
("x86: pvclock: Move scale_delta into common header")
forgot to remove scale_delta.

Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Cc: Zachary Amsden <zamsden@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Glauber Costa <glommer@redhat.com>
LKML-Reference: <20101105110444.BAF6D6FC03B@msa105.auone-net.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 10:32:15 +01:00
Jesper Juhl
2a8dcbd6cd x86, apic: Remove double #include
Remove the second <asm/atomic.h> inclusion.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
LKML-Reference: <alpine.LNX.2.00.1011072253360.26247@swampdragon.chaosbits.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 10:21:16 +01:00
Jan Beulich
2f62bf7d23 x86: Adjust section annotations in AMD Fam10 MMCONF enabling code
check_enable_amd_mmconf_dmi() gets called only for the BSP,
hence everything hanging off of it can be __init*.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CD2DE1E0200007800020990@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 10:08:26 +01:00
Jack Steiner
62b0cfc240 x86, UV: Update node controller MMRs
A new version of the SGI UV hub node controller is being
developed. A few of the MMRs (control registers) that exist on
the current hub no longer exist on the new hub. Fortunately,
there are alternate MMRs that are are functionally equivalent
and that exist on both hubs.

This patch changes the UV code to use MMRs that exist in BOTH
versions of the hub node controller.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20101106204056.GA27584@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 10:06:38 +01:00
Jesper Juhl
8e5e9521c1 x86: Remove unnecessary casts of void ptr returning alloc function return values
The [vk][cmz]alloc(_node) family of functions return void
pointers which it's completely unnecessary/pointless to cast to
other pointer types since that happens implicitly.

This patch removes such casts from arch/x86.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: trivial@kernel.org
Cc: amd64-microcode@amd64.org
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <alpine.LNX.2.00.1011082310220.23697@swampdragon.chaosbits.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 09:13:00 +01:00
Andi Kleen
0059b2436a x86: Address gcc4.6 "set but not used" warnings in apic.h
native_apic_msr_read() and x2apic_enabled() use rdmsr(msr, low, high),
but only use the low part.

gcc4.6 complains about this:
.../apic.h:144:11: warning: variable 'high' set but not used [-Wunused-but-set-variable]

rdmsr() is just a wrapper around rdmsrl() which splits the 64bit value
into low and high, so using rdmsrl() directly solves this.

[tglx: Changed the variables to u64 as suggested by Cyrill. It's less
       confusing and has no code impact as this is 64bit only anyway.
       Massaged changelog as well. ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: x86@kernel.org
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <1289251229-19589-1-git-send-email-andi@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-09 18:40:30 +01:00
Jacob Pan
7f05dec3dd x86: mrst: Parse SFI timer table for all timer configs
Penwell has APB timer based watchdog timers, it requires platform code to parse
SFI MTMR tables in order to claim its timer.

This patch will always parse SFI MTMR regardless of system timer configuration
choices. Otherwise, SFI MTMR table may not get parsed if running on Medfield
with always-on local APIC timers and constant TSC. Watchdog timer driver will
then not get a timer to use.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
LKML-Reference: <20101109112800.20591.10802.stgit@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-09 14:45:52 +01:00
Feng Tang
1da4b1c6a4 x86/mrst: Add SFI platform device parsing code
SFI provides a series of tables. These describe the platform devices present
including SPI and I²C devices, as well as various sensors, keypads and other
glue as well as interfaces provided via the SCU IPC mechanism (intel_scu_ipc.c)

This patch is a merge of the core elements and relevant fixes from the
Intel development code by Feng, Alek, myself into a single coherent patch
for upstream submission.

It provides the needed infrastructure to register I2C, SPI and platform devices
described by the tables, as well as handlers for some of the hardware already
supported in kernel. The 0.8 firmware also provides GPIO tables.

Devices are created at boot time or if they are SCU dependant at the point an
SCU is discovered. The existing Linux device mechanisms will then handle the
device binding. At an abstract level this is an SFI to Linux device translator.

Device/platform specific setup/glue is in this file. This is done so that the
drivers for the generic I²C and SPI bus devices remain cross platform as they
should.

(Updated from RFC version to correct the emc1403 name used by the firmware
 and a wrongly used #define)

Signed-off-by: Alek Du <alek.du@linux.intel.com>
LKML-Reference: <20101109112158.20013.6158.stgit@localhost.localdomain>
[Clean ups, removal of 0.7 support]
Signed-off-by: Feng Tang <feng.tang@linux.intel.com>
[Clean ups]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-09 14:45:52 +01:00
Jiri Slaby
07cf2a64c2 xen: fix memory leak in Xen PCI MSI/MSI-X allocator.
Stanse found that xen_setup_msi_irqs leaks memory when
xen_allocate_pirq fails. Free the memory in that fail path.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xensource.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
2010-11-08 11:30:00 -05:00
Jan Kiszka
453d9c57e2 KVM: x86: Issue smp_call_function_many with preemption disabled
smp_call_function_many is specified to be called only with preemption
disabled. Fulfill this requirement.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-11-05 14:42:27 -02:00
Vasiliy Kulikov
97e69aa62f KVM: x86: fix information leak to userland
Structures kvm_vcpu_events, kvm_debugregs, kvm_pit_state2 and
kvm_clock_data are copied to userland with some padding and reserved
fields unitialized.  It leads to leaking of contents of kernel stack
memory.  We have to initialize them to zero.

In patch v1 Jan Kiszka suggested to fill reserved fields with zeros
instead of memset'ting the whole struct.  It makes sense as these
fields are explicitly marked as padding.  No more fields need zeroing.

KVM-Stable-Tag.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-11-05 14:42:27 -02:00
Marcelo Tosatti
eb45fda45f KVM: MMU: fix rmap_remove on non present sptes
drop_spte should not attempt to rmap_remove a non present shadow pte.

This fixes a BUG_ON seen on kvm-autotest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Lucas Meneghel Rodrigues <lmr@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-11-05 14:42:26 -02:00
Michael S. Tsirkin
edde99ce05 KVM: Write protect memory after slot swap
I have observed the following bug trigger:

1. userspace calls GET_DIRTY_LOG
2. kvm_mmu_slot_remove_write_access is called and makes a page ro
3. page fault happens and makes the page writeable
   fault is logged in the bitmap appropriately
4. kvm_vm_ioctl_get_dirty_log swaps slot pointers

a lot of time passes

5. guest writes into the page
6. userspace calls GET_DIRTY_LOG

At point (5), bitmap is clean and page is writeable,
thus, guest modification of memory is not logged
and GET_DIRTY_LOG returns an empty bitmap.

The rule is that all pages are either dirty in the current bitmap,
or write-protected, which is violated here.

It seems that just moving kvm_mmu_slot_remove_write_access down
to after the slot pointer swap should fix this bug.

KVM-Stable-Tag.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-11-05 14:42:25 -02:00
Uwe Kleine-König
b595076a18 tree-wide: fix comment/printk typos
"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-11-01 15:38:34 -04:00
Rakib Mullick
cf38d0ba7e x86, mm: Fix section mismatch in tlb.c
Mark tlb_cpuhp_notify as __cpuinit. It's basically a callback
function, which is called from __cpuinit init_smp_flash(). So -
it's safe.

We were warned by the following warning:

 WARNING: arch/x86/mm/built-in.o(.text+0x356d): Section mismatch
 in reference from the function tlb_cpuhp_notify() to the
 function .cpuinit.text:calculate_tlb_offset()
 The function tlb_cpuhp_notify() references
 the function __cpuinit calculate_tlb_offset().
 This is often because tlb_cpuhp_notify lacks a __cpuinit
 annotation or the annotation of calculate_tlb_offset is wrong.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <AANLkTinWQRG=HA9uB3ad0KAqRRTinL6L_4iKgF84coph@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-01 10:09:07 +01:00
Linus Torvalds
f02a38d86a Merge branches 'perf-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  jump label: Add work around to i386 gcc asm goto bug
  x86, ftrace: Use safe noops, drop trap test
  jump_label: Fix unaligned traps on sparc.
  jump label: Make arch_jump_label_text_poke_early() optional
  jump label: Fix error with preempt disable holding mutex
  oprofile: Remove deprecated use of flush_scheduled_work()
  oprofile: Fix the hang while taking the cpu offline
  jump label: Fix deadlock b/w jump_label_mutex vs. text_mutex
  jump label: Fix module __init section race

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Check irq_remapped instead of remapping_enabled in destroy_irq()
2010-10-30 11:43:26 -07:00
Yinghai Lu
7b79462a20 x86: Check irq_remapped instead of remapping_enabled in destroy_irq()
Russ Anderson reported:
| There is a regression that is causing a NULL pointer dereference
| in free_irte when shutting down xpc. git bisect narrowed it down
| to git commit d585d06(intr_remap: Simplify the code further), which
| changed free_irte(). Reverse applying the patch fixes the problem.

We need to use irq_remapped() for each irq instead of checking only
intr_remapping_enabled as there might be non remapped irqs even when
remapping is enabled.

[ tglx: use cfg instead of retrieving it again. Massaged changelog ]

Reported-bisected-and-tested-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4CCBD511.40607@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-30 10:28:31 +02:00
Linus Torvalds
2d10d8737c Merge branches 'x86-fixes-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, alternative: Call stop_machine_text_poke() on all cpus
  x86-32: Restore irq stacks NUMA-aware allocations
  x86, memblock: Fix early_node_mem with big reserved region.

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, uv: More Westmere support on SGI UV
  x86, uv: Enable Westmere support on SGI UV
2010-10-29 18:58:00 -07:00
Jason Baron
404ba5d7bb x86, alternative: Call stop_machine_text_poke() on all cpus
Currently, text_poke_smp() passes a NULL as the third argument to
__stop_machine(), which will only run stop_machine_text_poke()
on 1 cpu. Change NULL -> cpu_online_mask, as stop_machine_text_poke()
is intended to be run on all cpus.

I actually didn't notice any problems with stop_machine_text_poke()
only being called on 1 cpu, but found this via code inspection.

Signed-off-by: Jason Baron <jbaron@redhat.com>
LKML-Reference: <20101028152026.GB2875@redhat.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-29 16:42:58 -07:00
Ian Campbell
a2d771c036 xen: correct size of level2_kernel_pgt
sizeof(pmd_t *) is 4 bytes on 32-bit PAE leading to an allocation of
only 2048 bytes. The correct size is sizeof(pmd_t) giving us a full
page allocation.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-29 12:23:57 -07:00
Linus Torvalds
1e431a9d64 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kgdb,ppc: Individual register get/set for ppc
  kgdbts: prevent re-entry to kgdbts before it unregisters
  debug_core,x86,blackfin: Clean up hw debug disable API
  kdb: Fix early debugging crash regression
  kgdb,arm: fix register dump
  kdb: fix per_cpu command to remove supress mask
  kdb: Add kdb kernel module sample
2010-10-29 11:49:38 -07:00
Steven Rostedt
45f81b1c96 jump label: Add work around to i386 gcc asm goto bug
On i386 (not x86_64) early implementations of gcc would have a bug
with asm goto causing it to produce code like the following:

(This was noticed by Peter Zijlstra)

   56 pushl 0
   67 nopl         jmp 0x6f
      popl
      jmp 0x8c

   6f              mov
                   test
                   je 0x8c

   8c mov
      call *(%esp)

The jump added in the asm goto skipped over the popl that matched
the pushl 0, which lead up to a quick crash of the system when
the jump was enabled. The nopl is defined in the asm goto () statement
and when tracepoints are enabled, the nop changes to a jump to the label
that was specified by the asm goto. asm goto is suppose to tell gcc that
the code in the asm might jump to an external label. Here gcc obviously
fails to make that work.

The bug report for gcc is here:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226

The bug only appears on x86 when not compiled with
-maccumulate-outgoing-args. This option is always set on x86_64 and it
is also the work around for a function graph tracer i386 bug.
(See commit: 746357d6a5)
This explains why the bug only showed up on i386 when function graph
tracer was not enabled.

This patch now adds a CONFIG_JUMP_LABEL option that is default
off instead of using jump labels by default. When jump labels are
enabled, the -maccumulate-outgoing-args will be used (causing a
slightly larger kernel image on i386). This option will exist
until we have a way to detect if the gcc compiler in use is safe
to use on all configurations without the work around.

Note, there exists such a test, but for now we will keep the enabling
of jump label as a manual option.

Archs that know the compiler is safe with asm goto, may choose to
select JUMP_LABEL and enable it by default.

Reported-by: Ingo Molnar <mingo@elte.hu>
Cause-discovered-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Baron <jbaron@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Miller <davem@davemloft.net>
Cc: Richard Henderson <rth@redhat.com>
LKML-Reference: <1288028746.3673.11.camel@laptop>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-29 14:45:29 -04:00
Dongdong Deng
d7ba979d45 debug_core,x86,blackfin: Clean up hw debug disable API
The kgdb_disable_hw_debug() was an architecture specific function for
disabling all hardware breakpoints on a per cpu basis when entering
the debug core.

This patch will remove the weak function kdbg_disable_hw_debug() and
change it into a call back which lives with the rest of hw breakpoint
call backs in struct kgdb_arch.

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-29 13:14:41 -05:00
H. Peter Anvin
2d1d7126bb x86, ftrace: Use safe noops, drop trap test
Always use a safe 5-byte noop sequence.  Drop the trap test, since it
is known to return false negatives on some virtualization platforms on
32 bits.  The resulting code is both simpler and safer.

Cc: Daniel Drake <dsd@laptop.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-29 13:07:59 -04:00
Eric Dumazet
5c1eb08936 x86-32: Restore irq stacks NUMA-aware allocations
Commit 22d4cd4c4d ("Allocate irq stacks seperate from percpu
area") removed NUMA affinity of IRQ stacks as side-effect of
the fix.

Using alloc_pages_node() instead of __get_free_pages() is safe,
even if the target node has no available LOWMEM pages :
alloc_pages_node() fallbacks to another node.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Brian Gerst <brgerst@gmail.com>
Cc: tj@kernel.org
Cc: torvalds@linux-foundation.org
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1288276854.2649.607.camel@edumazet-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-29 08:17:07 +02:00
Russ Anderson
0520bd8438 x86, uv: More Westmere support on SGI UV
Enable Westmere support for all APIC modes on SGI UV.

Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20101028224132.GB15804@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-28 22:38:07 -07:00
Linus Torvalds
18cb657ca1 Merge branch 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
and branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm

* 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
  xen: register xen pci notifier
  xen: initialize cpu masks for pv guests in xen_smp_init
  xen: add a missing #include to arch/x86/pci/xen.c
  xen: mask the MTRR feature from the cpuid
  xen: make hvc_xen console work for dom0.
  xen: add the direct mapping area for ISA bus access
  xen: Initialize xenbus for dom0.
  xen: use vcpu_ops to setup cpu masks
  xen: map a dummy page for local apic and ioapic in xen_set_fixmap
  xen: remap MSIs into pirqs when running as initial domain
  xen: remap GSIs as pirqs when running as initial domain
  xen: introduce XEN_DOM0 as a silent option
  xen: map MSIs into pirqs
  xen: support GSI -> pirq remapping in PV on HVM guests
  xen: add xen hvm acpi_register_gsi variant
  acpi: use indirect call to register gsi in different modes
  xen: implement xen_hvm_register_pirq
  xen: get the maximum number of pirqs from xen
  xen: support pirq != irq

* 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (27 commits)
  X86/PCI: Remove the dependency on isapnp_disable.
  xen: Update Makefile with CONFIG_BLOCK dependency for biomerge.c
  MAINTAINERS: Add myself to the Xen Hypervisor Interface and remove Chris Wright.
  x86: xen: Sanitse irq handling (part two)
  swiotlb-xen: On x86-32 builts, select SWIOTLB instead of depending on it.
  MAINTAINERS: Add myself for Xen PCI and Xen SWIOTLB maintainer.
  xen/pci: Request ACS when Xen-SWIOTLB is activated.
  xen-pcifront: Xen PCI frontend driver.
  xenbus: prevent warnings on unhandled enumeration values
  xenbus: Xen paravirtualised PCI hotplug support.
  xen/x86/PCI: Add support for the Xen PCI subsystem
  x86: Introduce x86_msi_ops
  msi: Introduce default_[teardown|setup]_msi_irqs with fallback.
  x86/PCI: Export pci_walk_bus function.
  x86/PCI: make sure _PAGE_IOMAP it set on pci mappings
  x86/PCI: Clean up pci_cache_line_size
  xen: fix shared irq device passthrough
  xen: Provide a variant of xen_poll_irq with timeout.
  xen: Find an unbound irq number in reverse order (high to low).
  xen: statically initialize cpu_evtchn_mask_p
  ...

Fix up trivial conflicts in drivers/pci/Makefile
2010-10-28 17:11:17 -07:00
Linus Torvalds
51399a3919 Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (38 commits)
  kbuild: convert `arch/tile' to the kconfig mainmenu upgrade
  README: cite nconfig
  Revert "kconfig: Temporarily disable dependency warnings"
  kconfig: Use PATH_MAX instead of 128 for path buffer sizes.
  kconfig: Fix realloc usage()
  kconfig: Propagate const
  kconfig: Don't go out from read config loop when you read new symbol
  kconfig: fix menuconfig on debian lenny
  kbuild: migrate all arch to the kconfig mainmenu upgrade
  kconfig: expand file names
  kconfig: use the file's name of sourced file
  kconfig: constify file name
  kconfig: don't emit warning upon rootmenu's prompt redefinition
  kconfig: replace KERNELVERSION usage by the mainmenu's prompt
  kconfig: delay gconf window initialization
  kconfig: expand by default the rootmenu's prompt
  kconfig: add a symbol string expansion helper
  kconfig: regen parser
  kconfig: implement the `mainmenu' directive
  kconfig: allow PACKAGE to be defined on the compiler's command-line
  ...

Fix up trivial conflict in arch/mn10300/Kconfig
2010-10-28 16:16:39 -07:00
Yinghai Lu
419db274be x86, memblock: Fix early_node_mem with big reserved region.
Xen can reserve huge amounts of memory for pre-ballooning, but that
still shows as RAM in the e820 memory map.  early_node_mem could not
find range because of start/end adjusting, and will go through the
fallback path.  However, the fallback patch is still using
memblock_x86_find_range_node(), and it is partially top-down because
it go through active_range entries from low to high.

Let's use memblock_find_in_range instead memblock_x86_find_range_node.
So get real top down in fallback path.

We may still need to make memblock_x86_find_range_node to do overall
top_down work.

Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
Tested-by: Jeremy Fitzhardinge <jeremy@goop.org>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CC9A9C9.8020700@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-28 15:52:36 -07:00
Linus Torvalds
2d3b07c07b Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Move olpc to platform
  x86: Move uv to platform
  x86: Move mrst to platform
  x86: Move scx200 to platform
  x86: Move visws to platform
  x86: Move efi to platform
  x86: Move sfi to platform
  x86: Add platform directory
2010-10-28 12:25:42 -07:00
Linus Torvalds
e9f29c9a56 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (27 commits)
  x86: allocate space within a region top-down
  x86: update iomem_resource end based on CPU physical address capabilities
  x86/PCI: allocate space from the end of a region, not the beginning
  PCI: allocate bus resources from the top down
  resources: support allocating space within a region from the top down
  resources: handle overflow when aligning start of available area
  resources: ensure callback doesn't allocate outside available space
  resources: factor out resource_clip() to simplify find_resource()
  resources: add a default alignf to simplify find_resource()
  x86/PCI: MMCONFIG: fix region end calculation
  PCI: Add support for polling PME state on suspended legacy PCI devices
  PCI: Export some PCI PM functionality
  PCI: fix message typo
  PCI: log vendor/device ID always
  PCI: update Intel chipset names and defines
  PCI: use new ccflags variable in Makefile
  PCI: add PCI_MSIX_TABLE/PBA defines
  PCI: add PCI vendor id for STmicroelectronics
  x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs
  PCI: OLPC: Only enable PCI configuration type override on XO-1
  ...
2010-10-28 11:59:52 -07:00
Linus Torvalds
a042e26137 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
  perf python scripting: Add futex-contention script
  perf python scripting: Fixup cut'n'paste error in sctop script
  perf scripting: Shut up 'perf record' final status
  perf record: Remove newline character from perror() argument
  perf python scripting: Support fedora 11 (audit 1.7.17)
  perf python scripting: Improve the syscalls-by-pid script
  perf python scripting: print the syscall name on sctop
  perf python scripting: Improve the syscalls-counts script
  perf python scripting: Improve the failed-syscalls-by-pid script
  kprobes: Remove redundant text_mutex lock in optimize
  x86/oprofile: Fix uninitialized variable use in debug printk
  tracing: Fix 'faild' -> 'failed' typo
  perf probe: Fix format specified for Dwarf_Off parameter
  perf trace: Fix detection of script extension
  perf trace: Use $PERF_EXEC_PATH in canned report scripts
  perf tools: Document event modifiers
  perf tools: Remove direct slang.h include
  perf_events: Fix for transaction recovery in group_sched_in()
  perf_events: Revert: Fix transaction recovery in group_sched_in()
  perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations
  ...
2010-10-27 18:48:00 -07:00
Linus Torvalds
17bb51d56c Merge branch 'akpm-incoming-2'
* akpm-incoming-2: (139 commits)
  epoll: make epoll_wait() use the hrtimer range feature
  select: rename estimate_accuracy() to select_estimate_accuracy()
  Remove duplicate includes from many files
  ramoops: use the platform data structure instead of module params
  kernel/resource.c: handle reinsertion of an already-inserted resource
  kfifo: fix kfifo_alloc() to return a signed int value
  w1: don't allow arbitrary users to remove w1 devices
  alpha: remove dma64_addr_t usage
  mips: remove dma64_addr_t usage
  sparc: remove dma64_addr_t usage
  fuse: use release_pages()
  taskstats: use real microsecond granularity for CPU times
  taskstats: split fill_pid function
  taskstats: separate taskstats commands
  delayacct: align to 8 byte boundary on 64-bit systems
  delay-accounting: reimplement -c for getdelays.c to report information on a target command
  namespaces Kconfig: move namespace menu location after the cgroup
  namespaces Kconfig: remove the cgroup device whitelist experimental tag
  namespaces Kconfig: remove pointless cgroup dependency
  namespaces Kconfig: make namespace a submenu
  ...
2010-10-27 18:42:52 -07:00
Linus Torvalds
0671b7674f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  percpu: Remove the multi-page alignment facility
  x86-32: Allocate irq stacks seperate from percpu area
  x86-32, mm: Remove duplicated #include
  x86, printk: Get rid of <0> from stack output
  x86, kexec: Make sure to stop all CPUs before exiting the kernel
  x86/vsmp: Eliminate kconfig dependency warning
2010-10-27 18:38:55 -07:00
Zimny Lech
61d8e11e51 Remove duplicate includes from many files
Signed-off-by: Zimny Lech <napohybelskurwysynom2010@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 18:03:18 -07:00
Namhyung Kim
eb5a369931 ptrace: cleanup arch_ptrace() on x86
Remove checking @addr less than 0 because @addr is now unsigned and
use new udescp variable in order to remove unnecessary castings.

[akpm@linux-foundation.org: fix unused variable 'udescp']
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 18:03:10 -07:00
Namhyung Kim
9b05a69e05 ptrace: change signature of arch_ptrace()
Fix up the arguments to arch_ptrace() to take account of the fact that
@addr and @data are now unsigned long rather than long as of a preceding
patch in this series.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 18:03:10 -07:00
Peter Zijlstra
20273941f2 mm: fix race in kunmap_atomic()
Christoph reported a nice splat which illustrated a race in the new stack
based kmap_atomic implementation.

The problem is that we pop our stack slot before we're completely done
resetting its state -- in particular clearing the PTE (sometimes that's
CONFIG_DEBUG_HIGHMEM).  If an interrupt happens before we actually clear
the PTE used for the last slot, that interrupt can reuse the slot in a
dirty state, which triggers a BUG in kmap_atomic().

Fix this by introducing kmap_atomic_idx() which reports the current slot
index without actually releasing it and use that to find the PTE and delay
the _pop() until after we're completely done.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Christoph Hellwig <hch@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 18:03:05 -07:00
Brian Gerst
22d4cd4c4d x86-32: Allocate irq stacks seperate from percpu area
The percpu allocator cannot handle alignments larger than one
page. Allocate the irq stacks seperately, and only keep the
pointers as percpu data.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tj@kernel.org
LKML-Reference: <1288158182-1753-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-27 17:31:42 +02:00
Thomas Gleixner
8654b1c2de x86: Move olpc to platform
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andres Salomon <dilinger@queued.net>
2010-10-27 17:22:16 +02:00
Thomas Gleixner
329b84e42e x86: Move uv to platform
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Travis <travis@sgi.com>
2010-10-27 14:30:02 +02:00
Thomas Gleixner
9694d4afc1 x86: Move mrst to platform
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
2010-10-27 14:30:01 +02:00
Thomas Gleixner
3b3da9d25a x86: Move scx200 to platform
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-27 14:30:01 +02:00
Thomas Gleixner
c4e72ad6bb x86: Move visws to platform
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-27 14:30:01 +02:00
Thomas Gleixner
b17ed48040 x86: Move efi to platform
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Huang Ying <ying.huang@intel.com>
2010-10-27 14:30:01 +02:00
Thomas Gleixner
937f961a65 x86: Move sfi to platform
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Len Brown <lenb@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
2010-10-27 14:30:01 +02:00
Thomas Gleixner
3adbb7f4a3 x86: Add platform directory
x86 has finally arrived in the embedded nightmare and will rapidly
grow SoC platform support in various flavours. So we need a place for
the platform support files. That also allows us to clean up the
dumpground which arch/x86/kernel has become over time.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-27 14:30:01 +02:00
Linus Torvalds
520045db94 Merge branches 'upstream/xenfs' and 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/xenfs' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen/privcmd: make privcmd visible in domU
  xen/privcmd: move remap_domain_mfn_range() to core xen code and export.
  privcmd: MMAPBATCH: Fix error handling/reporting
  xenbus: export xen_store_interface for xenfs
  xen/privcmd: make sure vma is ours before doing anything to it
  xen/privcmd: print SIGBUS faults
  xen/xenfs: set_page_dirty is supposed to return true if it dirties
  xen/privcmd: create address space to allow writable mmaps
  xen: add privcmd driver
  xen: add variable hypercall caller
  xen: add xen_set_domain_pte()
  xen: add /proc/xen/xsd_{kva,port} to xenfs

* 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (29 commits)
  xen: include xen/xen.h for definition of xen_initial_domain()
  xen: use host E820 map for dom0
  xen: correctly rebuild mfn list list after migration.
  xen: improvements to VIRQ_DEBUG output
  xen: set up IRQ before binding virq to evtchn
  xen: ensure that all event channels start off bound to VCPU 0
  xen/hvc: only notify if we actually sent something
  xen: don't add extra_pages for RAM after mem_end
  xen: add support for PAT
  xen: make sure xen_max_p2m_pfn is up to date
  xen: limit extra memory to a certain ratio of base
  xen: add extra pages for E820 RAM regions, even if beyond mem_end
  xen: make sure xen_extra_mem_start is beyond all non-RAM e820
  xen: implement "extra" memory to reserve space for pages not present at boot
  xen: Use host-provided E820 map
  xen: don't map missing memory
  xen: defer building p2m mfn structures until kernel is mapped
  xen: add return value to set_phys_to_machine()
  xen: convert p2m to a 3 level tree
  xen: make install_p2mtop_page() static
  ...

Fix up trivial conflict in arch/x86/xen/mmu.c, and fix the use of
'reserve_early()' - in the new memblock world order it is now
'memblock_x86_reserve_range()' instead. Pointed out by Jeremy.
2010-10-26 18:20:19 -07:00
Linus Torvalds
474829e875 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (53 commits)
  ACPI: install ACPI table handler before any dynamic tables being loaded
  ACPI / PM: Blacklist another machine that needs acpi_sleep=nonvs
  ACPI: Page based coalescing of I/O remappings optimization
  ACPI: Convert simple locking to RCU based locking
  ACPI: Pre-map 'system event' related register blocks
  ACPI: Add interfaces for ioremapping/iounmapping ACPI registers
  ACPI: Maintain a list of ACPI memory mapped I/O remappings
  ACPI: Fix ioremap size for MMIO reads and writes
  ACPI / Battery: Return -ENODEV for unknown values in get_property()
  ACPI / PM: Fix reference counting of power resources
  Subject: [PATCH] ACPICA: Fix Scope() op in module level code
  ACPI battery: support percentage battery remaining capacity
  ACPI: Make Embedded Controller command timeout delay configurable
  ACPI dock: move some functions to .init.text
  ACPI: thermal: remove unused limit code
  ACPI: static sleep_states[] and acpi_gts_bfs_check
  ACPI: remove dead code
  ACPI: delete dedicated MAINTAINERS entries for ACPI EC and BATTERY drivers
  ACPI: Only processor needs CPU_IDLE
  ACPICA: Update version to 20101013
  ...
2010-10-26 17:28:37 -07:00
Andrew Morton
ca1cab37d9 workqueues: s/ON_STACK/ONSTACK/
Silly though it is, completions and wait_queue_heads use foo_ONSTACK
(COMPLETION_INITIALIZER_ONSTACK, DECLARE_COMPLETION_ONSTACK,
__WAIT_QUEUE_HEAD_INIT_ONSTACK and DECLARE_WAIT_QUEUE_HEAD_ONSTACK) so I
guess workqueues should do the same thing.

s/INIT_WORK_ON_STACK/INIT_WORK_ONSTACK/
s/INIT_DELAYED_WORK_ON_STACK/INIT_DELAYED_WORK_ONSTACK/

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:14 -07:00
Hagen Paul Pfeifer
732eacc054 replace nested max/min macros with {max,min}3 macro
Use the new {max,min}3 macros to save some cycles and bytes on the stack.
This patch substitutes trivial nested macros with their counterpart.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:12 -07:00
Michel Lespinasse
68da336a14 x86: access_error API cleanup
access_error() already takes error_code as an argument, so there is
no need for an additional write flag.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Ying Han <yinghan@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:09 -07:00
Michel Lespinasse
d065bd810b mm: retry page fault when blocking on disk transfer
This change reduces mmap_sem hold times that are caused by waiting for
disk transfers when accessing file mapped VMAs.

It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call
site wants mmap_sem to be released if blocking on a pending disk transfer.
In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and
do_page_fault() will then re-acquire mmap_sem and retry the page fault.

It is expected that the retry will hit the same page which will now be
cached, and thus it will complete with a low mmap_sem hold time.

Tests:

- microbenchmark: thread A mmaps a large file and does random read accesses
  to the mmaped area - achieves about 55 iterations/s. Thread B does
  mmap/munmap in a loop at a separate location - achieves 55 iterations/s
  before, 15000 iterations/s after.

- We are seeing related effects in some applications in house, which show
  significant performance regressions when running without this change.

[akpm@linux-foundation.org: fix warning & crash]
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Ying Han <yinghan@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:09 -07:00
Peter Zijlstra
7a837d1bb7 perf, x86: Fix up kmap_atomic() type
Now that the KM_type stuff is history, clean up the compiler warning.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:08 -07:00
Peter Zijlstra
ece0e2b640 mm: remove pte_*map_nested()
Since we no longer need to provide KM_type, the whole pte_*map_nested()
API is now redundant, remove it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:08 -07:00
Peter Zijlstra
3e4d3af501 mm: stack based kmap_atomic()
Keep the current interface but ignore the KM_type and use a stack based
approach.

The advantage is that we get rid of crappy code like:

	#define __KM_PTE			\
		(in_nmi() ? KM_NMI_PTE : 	\
		 in_irq() ? KM_IRQ_PTE :	\
		 KM_PTE0)

and in general can stop worrying about what context we're in and what kmap
slots might be appropriate for that.

The downside is that FRV kmap_atomic() gets more expensive.

For now we use a CPP trick suggested by Andrew:

  #define kmap_atomic(page, args...) __kmap_atomic(page)

to avoid having to touch all kmap_atomic() users in a single patch.

[ not compiled on:
  - mn10300: the arch doesn't actually build with highmem to begin with ]

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c]
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:08 -07:00
Bjorn Helgaas
1af3c2e45e x86: allocate space within a region top-down
Request that allocate_resource() use available space from high addresses
first, rather than the default of using low addresses first.

The most common place this makes a difference is when we move or assign
new PCI device resources.  Low addresses are generally scarce, so it's
better to use high addresses when possible.  This follows Windows practice
for PCI allocation.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c42
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:45 -07:00
Bjorn Helgaas
419afdf53c x86: update iomem_resource end based on CPU physical address capabilities
The iomem_resource map reflects the available physical address space.
We statically initialize the end to -1, i.e., 0xffffffff_ffffffff, but
of course we can only use as much as the CPU can address.

This patch updates the end based on the CPU capabilities, so we don't
mistakenly allocate space that isn't usable, as we're likely to do when
allocating from the top-down.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:44 -07:00
Bjorn Helgaas
dc9887dc02 x86/PCI: allocate space from the end of a region, not the beginning
Allocate from the end of a region, not the beginning.

For example, if we need to allocate 0x800 bytes for a device on bus
0000:00 given these resources:

    [mem 0xbff00000-0xdfffffff] PCI Bus 0000:00
      [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02

the available space at [mem 0xbff00000-0xbfffffff] is passed to the
alignment callback (pcibios_align_resource()).  Prior to this patch, we
would put the new 0x800 byte resource at the beginning of that available
space, i.e., at [mem 0xbff00000-0xbff007ff].

With this patch, we put it at the end, at [mem 0xbffff800-0xbfffffff].

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c41
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26 15:33:42 -07:00
Russ Anderson
c8f730b1ab x86, uv: Enable Westmere support on SGI UV
Enable Westmere support on SGI UV.  The UV initialization code is dependent on
the APICID bits.  Westmere-EX uses different APIC bit mapping than Nehalem-EX.
This code reads the apic shift value from a UV MMR to do the proper bit
decoding to determint the pnode.

Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20101026212728.GB15071@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-26 15:15:28 -07:00
Stefano Stabellini
ea5b8f7393 xen: initialize cpu masks for pv guests in xen_smp_init
Pv guests don't have ACPI and need the cpu masks to be set
correctly as early as possible so we call xen_fill_possible_map from
xen_smp_init.
On the other hand the initial domain supports ACPI so in this case we skip
xen_fill_possible_map and rely on it. However Xen might limit the number
of cpus usable by the domain, so we filter those masks during smp
initialization using the VCPUOP_is_up hypercall.
It is important that the filtering is done before
xen_setup_vcpu_info_placement.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-10-26 20:33:15 +01:00
Linus Torvalds
f1ebdd60cc Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits)
  Add _addr_lsb field to ia64 siginfo
  Fix migration.c compilation on s390
  HWPOISON: Remove retry loop for try_to_unmap
  HWPOISON: Turn addr_valid from bitfield into char
  HWPOISON: Disable DEBUG by default
  HWPOISON: Convert pr_debugs to pr_info
  HWPOISON: Improve comments in memory-failure.c
  x86: HWPOISON: Report correct address granuality for huge hwpoison faults
  Encode huge page size for VM_FAULT_HWPOISON errors
  Fix build error with !CONFIG_MIGRATION
  hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning
  Clean up __page_set_anon_rmap
  HWPOISON, hugetlb: fix unpoison for hugepage
  HWPOISON, hugetlb: soft offlining for hugepage
  HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED
  hugetlb: move refcounting in hugepage allocation inside hugetlb_lock
  HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page()
  hugetlb: hugepage migration core
  hugetlb: redefine hugepage copy functions
  hugetlb: add allocate function for hugepage migration
  ...
2010-10-26 10:13:10 -07:00
Linus Torvalds
4f6876031e Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ]: x86, cpufreq: Mark longrun_get_policy with __cpuinit.
  [CPUFREQ] add sampling_down_factor tunable to improve ondemand performance
  [CPUFREQ] arch/x86/kernel/cpu/cpufreq: Fix unsigned return type
  [CPUFREQ] drivers/cpufreq: Adjust confusing if indentation
2010-10-26 10:00:04 -07:00
Ian Campbell
45263cb099 xen: include xen/xen.h for definition of xen_initial_domain()
CC      arch/x86/xen/setup.o
        arch/x86/xen/setup.c: In function 'xen_memory_setup':
        arch/x86/xen/setup.c:161: error: implicit declaration of function 'xen_initial_domain'

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-25 16:32:48 -07:00
Borislav Petkov
610470ce80 x86-32, mm: Remove duplicated #include
b40827fa72 added an include
directive which is needless and is taken care of by a previous
one. Remove it.

Caught-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jaswinder Singh Rajput <jaswinderlinux@gmail.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <20101025162523.GA4712@a1.tnic>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-25 19:39:33 +02:00
Ingo Molnar
7d7a48b760 Merge branch 'linus' into x86/urgent
Merge reason: We want to queue up a dependent fix.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-25 19:38:52 +02:00
Ingo Molnar
0b849ee888 Merge branch 'x86' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent 2010-10-25 19:17:32 +02:00
Borislav Petkov
9afd281a15 x86-32, mm: Remove duplicated include
Commit b40827fa72 ("x86-32, mm: Add an initial page table for core
bootstrapping") added an include directive which is needless and is
taken care of by a previous one.  Remove it.

Caught-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25 10:05:13 -07:00
Robert Richter
eb48c9cb20 apic, amd: Make firmware bug messages more meaningful
This improves error messages in case the BIOS was setting up
wrong LVT offsets.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1288015419-29543-6-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-25 18:59:43 +02:00
Robert Richter
0a17941e71 mce, amd: Remove goto in threshold_create_device()
Removing the goto in threshold_create_device().

Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1288015419-29543-5-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-25 18:59:43 +02:00
Robert Richter
bbaff08dca mce, amd: Add helper functions to setup APIC
This patch reworks and cleans up mce_amd_feature_init() by
introducing helper functions to setup and check the LVT offset.
It also fixes line endings in pr_err() calls.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1288015419-29543-4-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-25 18:59:43 +02:00
Robert Richter
7203a04940 mce, amd: Shorten local variables mci_misc_{hi,lo}
Shorten this variables to make later changes more readable.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1288015419-29543-3-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-25 18:59:42 +02:00
Robert Richter
9c37c9d897 mce, amd: Implement mce_threshold_block_init() helper function
This patch adds a helper function for the initial setup of an
mce threshold block. The LVT offset is passed as argument. Also
making variable threshold_defaults local as it is only used in
function mce_amd_feature_init(). Function
threshold_restart_bank() is extended to setup the LVT offset,
the change is backward compatible. Thus, now there is only a
single wrmsrl() to setup the block.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1288015419-29543-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-25 18:59:42 +02:00
Linus Torvalds
fbaab1dc19 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (44 commits)
  eeepc-wmi: Add cpufv sysfs interface
  eeepc-wmi: add additional hotkeys
  panasonic-laptop: Simplify calls to acpi_pcc_retrieve_biosdata
  panasonic-laptop: Handle errors properly if they happen
  intel_pmic_gpio: fix off-by-one value range checking
  IBM Real-Time "SMI Free" mode driver -v7
  Add OLPC XO-1 rfkill driver
  Move hdaps driver to platform/x86
  ideapad-laptop: Fix Makefile
  intel_pmic_gpio: swap the bits and mask args for intel_scu_ipc_update_register
  ideapad: Add param: no_bt_rfkill
  ideapad: Change the driver name to ideapad-laptop
  ideapad: rewrite the sw rfkill set
  ideapad: rewrite the hw rfkill notify
  ideapad: use EC command to control camera
  ideapad: use return value of _CFG to tell if device exist or not
  ideapad: make sure we bind on the correct device
  ideapad: check VPC bit before sync rfkill hw status
  ideapad: add ACPI helpers
  dell-laptop: Add debugfs support
  ...
2010-10-25 08:28:13 -07:00
Robert Richter
4cafc4b8d7 Merge branch 'oprofile/core' into oprofile/x86
Conflicts:
	arch/x86/oprofile/op_model_amd.c

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-25 16:58:34 +02:00
Ingo Molnar
2c78ffeca9 x86/oprofile: Fix uninitialized variable use in debug printk
Stephen Rothwell reported this build warning:

  arch/x86/oprofile/op_model_amd.c: In function 'ibs_eilvt_valid':
  arch/x86/oprofile/op_model_amd.c:289: warning: 'offset' may be used uninitialized in this function

And correctly observed that indeed the variable is used uninitialized in
this function. The result of this bug can be a debug printk with a bogus
value.

Also fix a few more small details that made this function hard to read
and which probably contributed to the bug being introduced to begin with:

 - Use more symmetric error conditions

 - Remove the !0 obfuscation

 - Add newlines to the printk output

 - Remove bogus linebreaks in printk strings and elsewhere

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20101025115736.41d51abe.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-25 08:46:20 +02:00
Len Brown
38add9b4ba Merge branches 'bugzilla-15807', 'bugzilla-15979-v2' and 'bugzilla-19162' into release 2010-10-25 02:12:27 -04:00
Linus Torvalds
229aebb873 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  Update broken web addresses in arch directory.
  Update broken web addresses in the kernel.
  Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget
  Revert "Fix typo: configuation => configuration" partially
  ida: document IDA_BITMAP_LONGS calculation
  ext2: fix a typo on comment in ext2/inode.c
  drivers/scsi: Remove unnecessary casts of private_data
  drivers/s390: Remove unnecessary casts of private_data
  net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data
  drivers/infiniband: Remove unnecessary casts of private_data
  drivers/gpu/drm: Remove unnecessary casts of private_data
  kernel/pm_qos_params.c: Remove unnecessary casts of private_data
  fs/ecryptfs: Remove unnecessary casts of private_data
  fs/seq_file.c: Remove unnecessary casts of private_data
  arm: uengine.c: remove C99 comments
  arm: scoop.c: remove C99 comments
  Fix typo configue => configure in comments
  Fix typo: configuation => configuration
  Fix typo interrest[ing|ed] => interest[ing|ed]
  Fix various typos of valid in comments
  ...

Fix up trivial conflicts in:
	drivers/char/ipmi/ipmi_si_intf.c
	drivers/usb/gadget/rndis.c
	net/irda/irnet/irnet_ppp.c
2010-10-24 13:41:39 -07:00
Linus Torvalds
1765a1fe5d Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits)
  KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages
  KVM: Fix signature of kvm_iommu_map_pages stub
  KVM: MCE: Send SRAR SIGBUS directly
  KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED
  KVM: fix typo in copyright notice
  KVM: Disable interrupts around get_kernel_ns()
  KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address
  KVM: MMU: move access code parsing to FNAME(walk_addr) function
  KVM: MMU: audit: check whether have unsync sps after root sync
  KVM: MMU: audit: introduce audit_printk to cleanup audit code
  KVM: MMU: audit: unregister audit tracepoints before module unloaded
  KVM: MMU: audit: fix vcpu's spte walking
  KVM: MMU: set access bit for direct mapping
  KVM: MMU: cleanup for error mask set while walk guest page table
  KVM: MMU: update 'root_hpa' out of loop in PAE shadow path
  KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()
  KVM: x86: Fix constant type in kvm_get_time_scale
  KVM: VMX: Add AX to list of registers clobbered by guest switch
  KVM guest: Move a printk that's using the clock before it's ready
  KVM: x86: TSC catchup mode
  ...
2010-10-24 12:47:25 -07:00
Thomas Gleixner
7fb2b870d6 x86: io_apic: Fix CONFIG_X86_IO_APIC=n breakage
Stupid me forgot to change the function name for the
CONFIG_X86_IO_APIC=n case in commit 23f9b2671 (x86: apic: Move
probe_nr_irqs_gsi() into ioapic_init_mappings())

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-24 11:14:14 +02:00
Huang Ying
77db5cbd29 KVM: MCE: Send SRAR SIGBUS directly
Originally, SRAR SIGBUS is sent to QEMU-KVM via touching the poisoned
page. But commit 9605456919 prevents the
signal from being sent. So now the signal is sent via
force_sig_info_fault directly.

[marcelo: use send_sig_info instead]

Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:15 +02:00
Huang Ying
5854dbca9b KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED
Now we have MCG_SER_P (and corresponding SRAO/SRAR MCE) support in
kernel and QEMU-KVM, the MCG_SER_P should be added into
KVM_MCE_CAP_SUPPORTED to make all these code really works.

Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:15 +02:00
Nicolas Kaiser
9611c18777 KVM: fix typo in copyright notice
Fix typo in copyright notice.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:14 +02:00
Avi Kivity
395c6b0a9d KVM: Disable interrupts around get_kernel_ns()
get_kernel_ns() wants preemption disabled.  It doesn't make a lot of sense
during the get/set ioctls (no way to make them non-racy) but the callee wants
it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:14 +02:00
Avi Kivity
7ebaf15eef KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:14 +02:00
Xiao Guangrong
3377078027 KVM: MMU: move access code parsing to FNAME(walk_addr) function
Move access code parsing from caller site to FNAME(walk_addr) function

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:14 +02:00
Xiao Guangrong
6903074c36 KVM: MMU: audit: check whether have unsync sps after root sync
After root synced, all unsync sps are synced, this patch add a check to make
sure it's no unsync sps in VCPU's page table

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:14 +02:00
Xiao Guangrong
38904e1287 KVM: MMU: audit: introduce audit_printk to cleanup audit code
Introduce audit_printk, and record audit point instead audit name

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:13 +02:00
Xiao Guangrong
c42fffe3a3 KVM: MMU: audit: unregister audit tracepoints before module unloaded
fix:

Call Trace:
 [<ffffffffa01e46ba>] ? kvm_mmu_pte_write+0x229/0x911 [kvm]
 [<ffffffffa01c6ba9>] ? gfn_to_memslot+0x39/0xa0 [kvm]
 [<ffffffffa01c6c26>] ? mark_page_dirty+0x16/0x2e [kvm]
 [<ffffffffa01c6d6f>] ? kvm_write_guest_page+0x67/0x7f [kvm]
 [<ffffffff81066fbd>] ? local_clock+0x2a/0x3b
 [<ffffffffa01d52ce>] emulator_write_phys+0x46/0x54 [kvm]
 ......
Code:  Bad RIP value.
RIP  [<ffffffffa0172056>] 0xffffffffa0172056
 RSP <ffff880134f69a70>
CR2: ffffffffa0172056

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:13 +02:00
Xiao Guangrong
98224bf1d1 KVM: MMU: audit: fix vcpu's spte walking
After nested nested paging, it may using long mode to shadow 32/PAE paging
guest, so this patch fix it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:12 +02:00
Xiao Guangrong
33f91edb92 KVM: MMU: set access bit for direct mapping
Set access bit while setup up direct page table if it's nonpaing or npt enabled,
it's good for CPU's speculate access

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:11 +02:00
Xiao Guangrong
20bd40dc64 KVM: MMU: cleanup for error mask set while walk guest page table
Small cleanup for set page fault error code

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:10 +02:00
Xiao Guangrong
6292757fb0 KVM: MMU: update 'root_hpa' out of loop in PAE shadow path
The value of 'vcpu->arch.mmu.pae_root' is not modified, so we can update
'root_hpa' out of the loop.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:09 +02:00
Sheng Yang
7129eecac1 KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()
Eliminate:
arch/x86/kvm/emulate.c:801: warning: ‘sv’ may be used uninitialized in this
function

on gcc 4.1.2

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:09 +02:00
Jan Kiszka
50933623e5 KVM: x86: Fix constant type in kvm_get_time_scale
Older gcc versions complain about the improper type (for x86-32), 4.5
seems to fix this silently. However, we should better use the right type
initially.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:08 +02:00
Jan Kiszka
07d6f555d5 KVM: VMX: Add AX to list of registers clobbered by guest switch
By chance this caused no harm so far. We overwrite AX during switch
to/from guest context, so we must declare this.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:07 +02:00
Arjan Koers
19b6a85b78 KVM guest: Move a printk that's using the clock before it's ready
Fix a hang during SMP kernel boot on KVM that showed up
after commit 489fb490db
(2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
(2.6.34.1). The problem only occurs when
CONFIG_PRINTK_TIME is set.

KVM-Stable-Tag.
Signed-off-by: Arjan Koers <0h61vkll2ly8@xutrox.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:53:06 +02:00
Zachary Amsden
c285545f81 KVM: x86: TSC catchup mode
Negate the effects of AN TYM spell while kvm thread is preempted by tracking
conversion factor to the highest TSC rate and catching the TSC up when it has
fallen behind the kernel view of time.  Note that once triggered, we don't
turn off catchup mode.

A slightly more clever version of this is possible, which only does catchup
when TSC rate drops, and which specifically targets only CPUs with broken
TSC, but since these all are considered unstable_tsc(), this patch covers
all necessary cases.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:05 +02:00
Zachary Amsden
34c238a1d1 KVM: x86: Rename timer function
This just changes some names to better reflect the usage they
will be given.  Separated out to keep confusion to a minimum.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:05 +02:00
Zachary Amsden
5f4e3f8827 KVM: x86: Make math work for other scales
The math in kvm_get_time_scale relies on the fact that
NSEC_PER_SEC < 2^32.  To use the same function to compute
arbitrary time scales, we must extend the first reduction
step to shrink the base rate to a 32-bit value, and
possibly reduce the scaled rate into a 32-bit as well.

Note we must take care to avoid an arithmetic overflow
when scaling up the tps32 value (this could not happen
with the fixed scaled value of NSEC_PER_SEC, but can
happen with scaled rates above 2^31.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:04 +02:00
Avi Kivity
49e9d557f9 KVM: VMX: Respect interrupt window in big real mode
If an interrupt is pending, we need to stop emulation so we
can inject it.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:02 +02:00
Mohammed Gamal
a92601bb70 KVM: VMX: Emulated real mode interrupt injection
Replace the inject-as-software-interrupt hack we currently have with
emulated injection.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:01 +02:00
Mohammed Gamal
63995653ad KVM: Add kvm_inject_realmode_interrupt() wrapper
This adds a wrapper function kvm_inject_realmode_interrupt() around the
emulator function emulate_int_real() to allow real mode interrupt injection.

[avi: initialize operand and address sizes before emulating interrupts]
[avi: initialize rip for real mode interrupt injection]
[avi: clear interrupt pending flag after emulating interrupt injection]

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:01 +02:00
Mohammed Gamal
4ab8e02404 KVM: x86 emulator: Expose emulate_int_real()
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:00 +02:00
Hillf Danton
cb16a7b387 KVM: MMU: fix counting of rmap entries in rmap_add()
It seems that rmap entries are under counted.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:52:59 +02:00
Gleb Natapov
a0a07cd2c5 KVM: SVM: do not generate "external interrupt exit" if other exit is pending
Nested SVM checks for external interrupt after injecting nested exception.
In case there is external interrupt pending the code generates "external
interrupt exit" and overwrites previous exit info. If previously injected
exception already generated exit it will be lost.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:52:56 +02:00
Avi Kivity
f4f5105087 KVM: Convert PIC lock from raw spinlock to ordinary spinlock
The PIC code used to be called from preempt_disable() context, which
wasn't very good for PREEMPT_RT.  That is no longer the case, so move
back from raw_spinlock_t to spinlock_t.

Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:52:56 +02:00
Zachary Amsden
28e4639adf KVM: x86: Fix kvmclock bug
If preempted after kvmclock values are updated, but before hardware
virtualization is entered, the last tsc time as read by the guest is
never set.  It underflows the next time kvmclock is updated if there
has not yet been a successful entry / exit into hardware virt.

Fix this by simply setting last_tsc to the newly read tsc value so
that any computed nsec advance of kvmclock is nulled.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:52:56 +02:00
Joerg Roedel
0959ffacf3 KVM: MMU: Don't track nested fault info in error-code
This patch moves the detection whether a page-fault was
nested or not out of the error code and moves it into a
separate variable in the fault struct.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:55 +02:00
Avi Kivity
625831a3f4 KVM: VMX: Move fixup_rmode_irq() to avoid forward declaration
No code changes.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:54 +02:00
Avi Kivity
b463a6f744 KVM: Non-atomic interrupt injection
Change the interrupt injection code to work from preemptible, interrupts
enabled context.  This works by adding a ->cancel_injection() operation
that undoes an injection in case we were not able to actually enter the guest
(this condition could never happen with atomic injection).

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:54 +02:00
Avi Kivity
83422e17c1 KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry
Currently vmx_complete_interrupts() can decode event information from vmx
exit fields into the generic kvm event queues.  Make it able to decode
the information from the entry fields as well by parametrizing it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:52 +02:00
Avi Kivity
537b37e267 KVM: VMX: Move real-mode interrupt injection fixup to vmx_complete_interrupts()
This allows reuse of vmx_complete_interrupts() for cancelling injections.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:51 +02:00
Avi Kivity
51aa01d13d KVM: VMX: Split up vmx_complete_interrupts()
vmx_complete_interrupts() does too much, split it up:
 - vmx_vcpu_run() gets the "cache important vmcs fields" part
 - a new vmx_complete_atomic_exit() gets the parts that must be done atomically
 - a new vmx_recover_nmi_blocking() does what its name says
 - vmx_complete_interrupts() retains the event injection recovery code

This helps in reducing the work done in atomic context.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:51 +02:00
Avi Kivity
3842d135ff KVM: Check for pending events before attempting injection
Instead of blindly attempting to inject an event before each guest entry,
check for a possible event first in vcpu->requests.  Sites that can trigger
event injection are modified to set KVM_REQ_EVENT:

- interrupt, nmi window opening
- ppr updates
- i8259 output changes
- local apic irr changes
- rflags updates
- gif flag set
- event set on exit

This improves non-injecting entry performance, and sets the stage for
non-atomic injection.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:50 +02:00
Avi Kivity
b0bc3ee2b5 KVM: MMU: Fix regression with ept memory types merged into non-ept page tables
Commit "KVM: MMU: Make tdp_enabled a mmu-context parameter" made real-mode
set ->direct_map, and changed the code that merges in the memory type depend
on direct_map instead of tdp_enabled.  However, in this case what really
matters is tdp, not direct_map, since tdp changes the pte format regardless
of whether the mapping is direct or not.

As a result, real-mode shadow mappings got corrupted with ept memory types.
The result was a huge slowdown, likely due to the cache being disabled.

Change it back as the simplest fix for the regression (real fix is to move
all that to vmx code, and not use tdp_enabled as a synonym for ept).

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:49 +02:00
Joerg Roedel
4c62a2dc92 KVM: X86: Report SVM bit to userspace only when supported
This patch fixes a bug in KVM where it _always_ reports the
support of the SVM feature to userspace. But KVM only
supports SVM on AMD hardware and only when it is enabled in
the kernel module. This patch fixes the wrong reporting.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:48 +02:00
Joerg Roedel
3d4aeaad8b KVM: SVM: Report Nested Paging support to userspace
This patch implements the reporting of the nested paging
feature support to userspace.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:47 +02:00
Joerg Roedel
55c5e464fc KVM: SVM: Expect two more candiates for exit_int_info
This patch adds INTR and NMI intercepts to the list of
expected intercepts with an exit_int_info set. While this
can't happen on bare metal it is architectural legal and may
happen with KVMs SVM emulation.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:46 +02:00
Joerg Roedel
4b16184c1c KVM: SVM: Initialize Nested Nested MMU context on VMRUN
This patch adds code to initialize the Nested Nested Paging
MMU context when the L1 guest executes a VMRUN instruction
and has nested paging enabled in its VMCB.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:46 +02:00
Joerg Roedel
5bd2edc341 KVM: SVM: Implement MMU helper functions for Nested Nested Paging
This patch adds the helper functions which will be used in
the mmu context for handling nested nested page faults.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:45 +02:00
Joerg Roedel
2d48a985c7 KVM: MMU: Track NX state in struct kvm_mmu
With Nested Paging emulation the NX state between the two
MMU contexts may differ. To make sure that always the right
fault error code is recorded this patch moves the NX state
into struct kvm_mmu so that the code can distinguish between
L1 and L2 NX state.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:44 +02:00
Joerg Roedel
81407ca553 KVM: MMU: Allow long mode shadows for legacy page tables
Currently the KVM softmmu implementation can not shadow a 32
bit legacy or PAE page table with a long mode page table.
This is a required feature for nested paging emulation
because the nested page table must alway be in host format.
So this patch implements the missing pieces to allow long
mode page tables for page table types.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:43 +02:00
Joerg Roedel
651dd37a9c KVM: MMU: Refactor mmu_alloc_roots function
This patch factors out the direct-mapping paths of the
mmu_alloc_roots function into a seperate function. This
makes it a lot easier to avoid all the unnecessary checks
done in the shadow path which may break when running direct.
In fact, this patch already fixes a problem when running PAE
guests on a PAE shadow page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:42 +02:00
Joerg Roedel
d41d1895eb KVM: MMU: Introduce kvm_pdptr_read_mmu
This function is implemented to load the pdptr pointers of
the currently running guest (l1 or l2 guest). Therefore it
takes care about the current paging mode and can read pdptrs
out of l2 guest physical memory.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:42 +02:00
Joerg Roedel
ff03a073e7 KVM: MMU: Add kvm_mmu parameter to load_pdptrs function
This function need to be able to load the pdptrs from any
mmu context currently in use. So change this function to
take an kvm_mmu parameter to fit these needs.
As a side effect this patch also moves the cached pdptrs
from vcpu_arch into the kvm_mmu struct.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:41 +02:00
Joerg Roedel
d47f00a62b KVM: X86: Propagate fetch faults
KVM currently ignores fetch faults in the instruction
emulator. With nested-npt we could have such faults. This
patch adds the code to handle these.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:41 +02:00
Joerg Roedel
d4f8cf664e KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa
This patch implements logic to make sure that either a
page-fault/page-fault-vmexit or a nested-page-fault-vmexit
is propagated back to the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:40 +02:00
Joerg Roedel
02f59dc9f1 KVM: MMU: Introduce init_kvm_nested_mmu()
This patch introduces the init_kvm_nested_mmu() function
which is used to re-initialize the nested mmu when the l2
guest changes its paging mode.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:39 +02:00
Joerg Roedel
3d06b8bfd4 KVM: MMU: Introduce kvm_read_nested_guest_page()
This patch introduces the kvm_read_guest_page_x86 function
which reads from the physical memory of the guest. If the
guest is running in guest-mode itself with nested paging
enabled it will read from the guest's guest physical memory
instead.
The patch also changes changes the code to use this function
where it is necessary.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:38 +02:00
Joerg Roedel
2329d46d21 KVM: MMU: Make walk_addr_generic capable for two-level walking
This patch uses kvm_read_guest_page_tdp to make the
walk_addr_generic functions suitable for two-level page
table walking.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:38 +02:00
Joerg Roedel
ec92fe44e7 KVM: X86: Add kvm_read_guest_page_mmu function
This patch adds a function which can read from the guests
physical memory or from the guest's guest physical memory.
This will be used in the two-dimensional page table walker.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:37 +02:00
Joerg Roedel
6539e738f6 KVM: MMU: Implement nested gva_to_gpa functions
This patch adds the functions to do a nested l2_gva to
l1_gpa page table walk.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:36 +02:00
Joerg Roedel
14dfe855f9 KVM: X86: Introduce pointer to mmu context used for gva_to_gpa
This patch introduces the walk_mmu pointer which points to
the mmu-context currently used for gva_to_gpa translations.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:35 +02:00
Joerg Roedel
c30a358d33 KVM: MMU: Add infrastructure for two-level page walker
This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:34 +02:00
Joerg Roedel
1e301feb07 KVM: MMU: Introduce generic walk_addr function
This is the first patch in the series towards a generic
walk_addr implementation which could walk two-dimensional
page tables in the end. In this first step the walk_addr
function is renamed into walk_addr_generic which takes a
mmu context as an additional parameter.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:33 +02:00
Joerg Roedel
8df25a328a KVM: MMU: Track page fault data in struct vcpu
This patch introduces a struct with two new fields in
vcpu_arch for x86:

	* fault.address
	* fault.error_code

This will be used to correctly propagate page faults back
into the guest when we could have either an ordinary page
fault or a nested page fault. In the case of a nested page
fault the fault-address is different from the original
address that should be walked. So we need to keep track
about the real fault-address.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:33 +02:00
Joerg Roedel
3241f22da8 KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu
This patch changes is_rsvd_bits_set() function prototype to
take only a kvm_mmu context instead of a full vcpu.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:32 +02:00
Joerg Roedel
52fde8df7d KVM: MMU: Introduce kvm_init_shadow_mmu helper function
Some logic of the init_kvm_softmmu function is required to
build the Nested Nested Paging context. So factor the
required logic into a seperate function and export it.
Also make the whole init path suitable for more than one mmu
context.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:32 +02:00
Joerg Roedel
cb659db8a7 KVM: MMU: Introduce inject_page_fault function pointer
This patch introduces an inject_page_fault function pointer
into struct kvm_mmu which will be used to inject a page
fault. This will be used later when Nested Nested Paging is
implemented.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:31 +02:00
Joerg Roedel
5777ed340d KVM: MMU: Introduce get_cr3 function pointer
This function pointer in the MMU context is required to
implement Nested Nested Paging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:31 +02:00
Joerg Roedel
1c97f0a04c KVM: X86: Introduce a tdp_set_cr3 function
This patch introduces a special set_tdp_cr3 function pointer
in kvm_x86_ops which is only used for tpd enabled mmu
contexts. This allows to remove some hacks from svm code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:30 +02:00
Joerg Roedel
f43addd461 KVM: MMU: Make set_cr3 a function pointer in kvm_mmu
This is necessary to implement Nested Nested Paging. As a
side effect this allows some cleanups in the SVM nested
paging code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:29 +02:00
Joerg Roedel
c5a78f2b64 KVM: MMU: Make tdp_enabled a mmu-context parameter
This patch changes the tdp_enabled flag from its global
meaning to the mmu-context and renames it to direct_map
there. This is necessary for Nested SVM with emulation of
Nested Paging where we need an extra MMU context to shadow
the Nested Nested Page Table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:28 +02:00
Joerg Roedel
957446afce KVM: MMU: Check for root_level instead of long mode
The walk_addr function checks for !is_long_mode in its 64
bit version. But what is meant here is a check for pae
paging. Change the condition to really check for pae paging
so that it also works with nested nested paging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:52:27 +02:00
Jes Sorensen
7b91409822 KVM: x86: Emulate MSR_EBC_FREQUENCY_ID
Some operating systems store data about the host processor at the
time of installation, and when booted on a more uptodate cpu tries
to read MSR_EBC_FREQUENCY_ID. This has been found with XP.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:52:27 +02:00
Jes Sorensen
b9a52c4b78 x86: Define MSR_EBC_FREQUENCY_ID
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:52:26 +02:00
Roedel, Joerg
b75f4eb341 KVM: SVM: Clean up rip handling in vmrun emulation
This patch changes the rip handling in the vmrun emulation
path from using next_rip to the generic kvm register access
functions.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:52:25 +02:00
Joerg Roedel
cda0008299 KVM: SVM: Restore correct registers after sel_cr0 intercept emulation
This patch implements restoring of the correct rip, rsp, and
rax after the svm emulation in KVM injected a selective_cr0
write intercept into the guest hypervisor. The problem was
that the vmexit is emulated in the instruction emulation
which later commits the registers right after the write-cr0
instruction. So the l1 guest will continue to run with the
l2 rip, rsp and rax resulting in unpredictable behavior.

This patch is not the final word, it is just an easy patch
to fix the issue. The real fix will be done when the
instruction emulator is made aware of nested virtualization.
Until this is done this patch fixes the issue and provides
an easy way to fix this in -stable too.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:52:24 +02:00
Joerg Roedel
f87f928882 KVM: MMU: Fix 32 bit legacy paging with NPT
This patch fixes 32 bit legacy paging with NPT enabled. The
mmu_check_root call on the top-level of the loop causes
root_gfn to take values (in the tdp_enabled path) which are
outside of guest memory. So the mmu_check_root call fails at
some point in the loop interation causing the guest to
tiple-fault.
This patch changes the mmu_check_root calls to the places
where they are really necessary. As a side-effect it
introduces a check for the root of a pae page table too.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:52:23 +02:00
Xiao Guangrong
30644b902c KVM: MMU: lower the aduit frequency
The audit is very high overhead, so we need lower the frequency to assure
the guest is running.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:59 +02:00
Xiao Guangrong
eb2591865a KVM: MMU: improve spte audit
Both audit_mappings() and audit_sptes_have_rmaps() need to walk vcpu's page
table, so we can do these checking in a spte walking

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:58 +02:00
Xiao Guangrong
49edf87806 KVM: MMU: improve active sp audit
Both audit_rmap() and audit_write_protection() need to walk all active sp, so
we can do these checking in a sp walking

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:57 +02:00
Xiao Guangrong
2f4f337248 KVM: MMU: move audit to a separate file
Move the audit code from arch/x86/kvm/mmu.c to arch/x86/kvm/mmu_audit.c

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:57 +02:00
Xiao Guangrong
8b1fe17cc7 KVM: MMU: support disable/enable mmu audit dynamicly
Add a r/w module parameter named 'mmu_audit', it can control audit
enable/disable:

enable:
  echo 1 > /sys/module/kvm/parameters/mmu_audit

disable:
  echo 0 > /sys/module/kvm/parameters/mmu_audit

This patch not change the logic

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:56 +02:00
Jes Sorensen
84e0cefa8d KVM: Fix guest kernel crash on MSR_K7_CLK_CTL
MSR_K7_CLK_CTL is a no longer documented MSR, which is only relevant
on said old AMD CPU models. This change returns the expected value,
which the Linux kernel is expecting to avoid writing back the MSR,
plus it ignores all writes to the MSR.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:55 +02:00
Avi Kivity
9ed049c3b6 KVM: i8259: Make ICW1 conform to spec
ICW is not a full reset, instead it resets a limited number of registers
in the PIC.  Change ICW1 emulation to only reset those registers.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:54 +02:00
Avi Kivity
7d9ddaedd8 KVM: x86 emulator: clean up control flow in x86_emulate_insn()
x86_emulate_insn() is full of things like

    if (rc != X86EMUL_CONTINUE)
        goto done;
    break;

consolidate all of those at the end of the switch statement.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:54 +02:00
Avi Kivity
a4d4a7c188 KVM: x86 emulator: fix group 11 decoding for reg != 0
These are all undefined.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:53 +02:00
Avi Kivity
b9eac5f4d1 KVM: x86 emulator: use single stage decoding for mov instructions
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:52 +02:00
Avi Kivity
e90aa41e6c KVM: Don't save/restore MSR_IA32_PERF_STATUS
It is read/only; restoring it only results in annoying messages.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:51 +02:00
Marcelo Tosatti
eaa48512ba KVM: SVM: init_vmcb should reset vcpu->efer
Otherwise EFER_LMA bit is retained across a SIPI reset.

Fixes guest cpu onlining.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:51 +02:00
Marcelo Tosatti
678041ad9d KVM: SVM: reset mmu context in init_vmcb
Since commit aad827034e no mmu reinitialization is performed
via init_vmcb.

Zero vcpu->arch.cr0 and pass the reset value as a parameter to
kvm_set_cr0.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:50 +02:00
Avi Kivity
c41a15dd46 KVM: Fix pio trace direction
out = write, in = read, not the other way round.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:49 +02:00
Xiao Guangrong
8e0e8afa82 KVM: MMU: remove count_rmaps()
Nothing is checked in count_rmaps(), so remove it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:49 +02:00
Xiao Guangrong
365fb3fdf6 KVM: MMU: rewrite audit_mappings_page() function
There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in
atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection).

This patch fix it by:
- introduce gfn_to_pfn_atomic instead of gfn_to_pfn
- get the mapping gfn from kvm_mmu_page_get_gfn()

And it adds 'notrap' ptes check in unsync/direct sps

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:48 +02:00
Xiao Guangrong
bc32ce2152 KVM: MMU: fix wrong not write protected sp report
The audit code reports some sp not write protected in current code, it's just the
bug in audit_write_protection(), since:

- the invalid sp not need write protected
- using uninitialize local variable('gfn')
- call kvm_mmu_audit() out of mmu_lock's protection

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:47 +02:00
Xiao Guangrong
0beb8d6604 KVM: MMU: check rmap for every spte
The read-only spte also has reverse mapping, so fix the code to check them,
also modify the function name to fit its doing

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:46 +02:00
Xiao Guangrong
9ad17b1001 KVM: MMU: fix compile warning in audit code
fix:

arch/x86/kvm/mmu.c: In function ‘kvm_mmu_unprotect_page’:
arch/x86/kvm/mmu.c:1741: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c:1745: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘mmu_unshadow’:
arch/x86/kvm/mmu.c:1761: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘set_spte’:
arch/x86/kvm/mmu.c:2005: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘mmu_set_spte’:
arch/x86/kvm/mmu.c:2033: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 7 has type ‘gfn_t’

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:46 +02:00
Jason Wang
23e7a7944f KVM: pit: Do not check pending pit timer in vcpu thread
Pit interrupt injection was done by workqueue, so no need to check
pending pit timer in vcpu thread which could lead unnecessary
unblocking of vcpu.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:45 +02:00
Avi Kivity
6230f7fc04 KVM: x86 emulator: simplify ALU opcode block decode further
The ALU opcode block is very regular; introduce D6ALU() to define decode
flags for 6 instructions at a time.

Suggested by Paolo Bonzini.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:43 +02:00
Avi Kivity
217fc9cfca KVM: Fix build error due to 64-bit division in nsec_to_cycles()
Use do_div() instead.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:43 +02:00
Avi Kivity
34d1f4905e KVM: x86 emulator: trap and propagate #DE from DIV and IDIV
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:42 +02:00
Avi Kivity
f6b3597bde KVM: x86 emulator: add macros for executing instructions that may trap
Like DIV and IDIV.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:41 +02:00
Avi Kivity
739ae40606 KVM: x86 emulator: simplify instruction decode flags for opcodes 0F 00-FF
Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:41 +02:00
Avi Kivity
d269e3961a KVM: x86 emulator: simplify instruction decode flags for opcodes E0-FF
Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:40 +02:00
Avi Kivity
d2c6c7adb1 KVM: x86 emulator: simplify instruction decode flags for opcodes C0-DF
Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:39 +02:00
Avi Kivity
50748613d1 KVM: x86 emulator: simplify instruction decode flags for opcodes A0-AF
Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:38 +02:00
Avi Kivity
76e8e68d44 KVM: x86 emulator: simplify instruction decode flags for opcodes 80-8F
Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:38 +02:00
Avi Kivity
48fe67b5f7 KVM: x86 emulator: simplify string instruction decode flags
Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:37 +02:00
Avi Kivity
5315fbb223 KVM: x86 emulator: simplify ALU block (opcodes 00-3F) decode flags
Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:36 +02:00
Avi Kivity
8d8f4e9f66 KVM: x86 emulator: support byte/word opcode pairs
Many x86 instructions come in byte and word variants distinguished with bit
0 of the opcode.  Add macros to aid in defining them.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:35 +02:00
Avi Kivity
081bca0e6b KVM: x86 emulator: refuse SrcMemFAddr (e.g. LDS) with register operand
SrcMemFAddr is not defined with the modrm operand designating a register
instead of a memory address.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:35 +02:00
Gleb Natapov
d2ddd1c483 KVM: x86 emulator: get rid of "restart" in emulation context.
x86_emulate_insn() will return 1 if instruction can be restarted
without re-entering a guest.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:34 +02:00
Gleb Natapov
3e2f65d57a KVM: x86 emulator: move string instruction completion check into separate function
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:33 +02:00
Gleb Natapov
6e2fb2cadd KVM: x86 emulator: Rename variable that shadows another local variable.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:32 +02:00
Wei Yongjun
cc4feed57f KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:31 +02:00
Xiao Guangrong
189be38db3 KVM: MMU: combine guest pte read between fetch and pte prefetch
Combine guest pte read between guest pte check in the fetch path and pte prefetch

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:28 +02:00
Xiao Guangrong
957ed9effd KVM: MMU: prefetch ptes when intercepted guest #PF
Support prefetch ptes when intercept guest #PF, avoid to #PF by later
access

If we meet any failure in the prefetch path, we will exit it and
not try other ptes to avoid become heavy path

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:27 +02:00
Zachary Amsden
1d5f066e0b KVM: x86: Fix a possible backwards warp of kvmclock
Kernel time, which advances in discrete steps may progress much slower
than TSC.  As a result, when kvmclock is adjusted to a new base, the
apparent time to the guest, which runs at a much higher, nsec scaled
rate based on the current TSC, may have already been observed to have
a larger value (kernel_ns + scaled tsc) than the value to which we are
setting it (kernel_ns + 0).

We must instead compute the clock as potentially observed by the guest
for kernel_ns to make sure it does not go backwards.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:24 +02:00
Zachary Amsden
347bb4448c x86: pvclock: Move scale_delta into common header
The scale_delta function for shift / multiply with 31-bit
precision moves to a common header so it can be used by both
kernel and kvm module.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:24 +02:00
Zachary Amsden
ca84d1a24c KVM: x86: Add clock sync request to hardware enable
If there are active VCPUs which are marked as belonging to
a particular hardware CPU, request a clock sync for them when
enabling hardware; the TSC could be desynchronized on a newly
arriving CPU, and we need to recompute guests system time
relative to boot after a suspend event.

This covers both cases.

Note that it is acceptable to take the spinlock, as either
no other tasks will be running and no locks held (BSP after
resume), or other tasks will be guaranteed to drop the lock
relatively quickly (AP on CPU_STARTING).

Noting we now get clock synchronization requests for VCPUs
which are starting up (or restarting), it is tempting to
attempt to remove the arch/x86/kvm/x86.c CPU hot-notifiers
at this time, however it is not correct to do so; they are
required for systems with non-constant TSC as the frequency
may not be known immediately after the processor has started
until the cpufreq driver has had a chance to run and query
the chipset.

Updated: implement better locking semantics for hardware_enable

Removed the hack of dropping and retaking the lock by adding the
semantic that we always hold kvm_lock when hardware_enable is
called.  The one place that doesn't need to worry about it is
resume, as resuming a frozen CPU, the spinlock won't be taken.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:24 +02:00
Zachary Amsden
46543ba45f KVM: x86: Robust TSC compensation
Make the match of TSC find TSC writes that are close to each other
instead of perfectly identical; this allows the compensator to also
work in migration / suspend scenarios.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:23 +02:00
Zachary Amsden
759379dd68 KVM: x86: Add helper functions for time computation
Add a helper function to compute the kernel time and convert nanoseconds
back to CPU specific cycles.  Note that these must not be called in preemptible
context, as that would mean the kernel could enter software suspend state,
which would cause non-atomic operation.

Also, convert the KVM_SET_CLOCK / KVM_GET_CLOCK ioctls to use the kernel
time helper, these should be bootbased as well.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:23 +02:00
Zachary Amsden
48434c20e1 KVM: x86: Fix deep C-state TSC desynchronization
When CPUs with unstable TSCs enter deep C-state, TSC may stop
running.  This causes us to require resynchronization.  Since
we can't tell when this may potentially happen, we assume the
worst by forcing re-compensation for it at every point the VCPU
task is descheduled.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:23 +02:00
Zachary Amsden
e48672fa25 KVM: x86: Unify TSC logic
Move the TSC control logic from the vendor backends into x86.c
by adding adjust_tsc_offset to x86 ops.  Now all TSC decisions
can be done in one place.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:23 +02:00
Zachary Amsden
6755bae8e6 KVM: x86: Warn about unstable TSC
If creating an SMP guest with unstable host TSC, issue a warning

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:22 +02:00
Zachary Amsden
8cfdc00085 KVM: x86: Make cpu_tsc_khz updates use local CPU
This simplifies much of the init code; we can now simply always
call tsc_khz_changed, optionally passing it a new value, or letting
it figure out the existing value (while interrupts are disabled, and
thus, by inference from the rule, not raceful against CPU hotplug or
frequency updates, which will issue IPIs to the local CPU to perform
this very same task).

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:22 +02:00
Zachary Amsden
f38e098ff3 KVM: x86: TSC reset compensation
Attempt to synchronize TSCs which are reset to the same value.  In the
case of a reliable hardware TSC, we can just re-use the same offset, but
on non-reliable hardware, we can get closer by adjusting the offset to
match the elapsed time.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:22 +02:00
Zachary Amsden
99e3e30aee KVM: x86: Move TSC offset writes to common code
Also, ensure that the storing of the offset and the reading of the TSC
are never preempted by taking a spinlock.  While the lock is overkill
now, it is useful later in this patch series.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:22 +02:00
Zachary Amsden
f4e1b3c8bd KVM: x86: Convert TSC writes to TSC offset writes
Change svm / vmx to be the same internally and write TSC offset
instead of bare TSC in helper functions.  Isolated as a single
patch to contain code movement.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:22 +02:00
Zachary Amsden
ae38436b78 KVM: x86: Drop vm_init_tsc
This is used only by the VMX code, and is not done properly;
if the TSC is indeed backwards, it is out of sync, and will
need proper handling in the logic at each and every CPU change.
For now, drop this test during init as misguided.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:21 +02:00
Wei Yongjun
45bf21a8ce KVM: MMU: fix missing percpu counter destroy
commit ad05c88266b4cce1c820928ce8a0fb7690912ba1
(KVM: create aggregate kvm_total_used_mmu_pages value)
introduce percpu counter kvm_total_used_mmu_pages but never
destroy it, this may cause oops when rmmod & modprobe.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:21 +02:00
Xiaotian Feng
80b63faf02 KVM: MMU: fix regression from rework mmu_shrink() code
Latest kvm mmu_shrink code rework makes kernel changes kvm->arch.n_used_mmu_pages/
kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is called
by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or
kvm_mmu_available_pages(vcpu->kvm) is unchanged after kvm_mmu_prepare_zap_page(),
This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever.
Moving kvm_mmu_commit_zap_page would make the while loop performs as normal.

Reported-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Tested-by: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:21 +02:00
Wei Yongjun
e4abac67b7 KVM: x86 emulator: add JrCXZ instruction emulation
Add JrCXZ instruction emulation (opcode 0xe3)
Used by FreeBSD boot loader.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:20 +02:00
Wei Yongjun
09b5f4d3c4 KVM: x86 emulator: add LDS/LES/LFS/LGS/LSS instruction emulation
Add LDS/LES/LFS/LGS/LSS instruction emulation.
(opcode 0xc4, 0xc5, 0x0f 0xb2, 0x0f 0xb4~0xb5)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:20 +02:00
Dave Hansen
45221ab668 KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:

 * Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
 * querying the cache size, so a fastpath for that case is appropriate.

and it *means* it.  Look at how it calls the shrinkers:

    nr_before = (*shrinker->shrink)(0, gfp_mask);
    shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);

So, if you do anything stupid in your shrinker, the VM will doubly
punish you.

The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence.  If we have 100 VMs, then
we're going to take 101 locks.  We do it twice, so each call takes
202 locks.  If we're under memory pressure, we can have each cpu
trying to do this.  It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.

This is guaranteed to optimize at least half of those lock
aquisitions away.  It removes the need to take any of the locks
when simply trying to count objects.

A 'percpu_counter' can be a large object, but we only have one
of these for the entire system.  There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:19 +02:00
Dave Hansen
49d5ca2663 KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pages
Doing this makes the code much more readable.  That's
borne out by the fact that this patch removes code.  "used"
also happens to be the number that we need to return back to
the slab code when our shrinker gets called.  Keeping this
value as opposed to free makes the next patch simpler.

So, 'struct kvm' is kzalloc()'d.  'struct kvm_arch' is a
structure member (and not a pointer) of 'struct kvm'.  That
means they start out zeroed.  I _think_ they get initialized
properly by kvm_mmu_change_mmu_pages().  But, that only happens
via kvm ioctls.

Another benefit of storing 'used' intead of 'free' is
that the values are consistent from the moment the structure is
allocated: no negative "used" value.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:18 +02:00
Dave Hansen
39de71ec53 KVM: rename x86 kvm->arch.n_alloc_mmu_pages
arch.n_alloc_mmu_pages is a poor choice of name. This value truly
means, "the number of pages which _may_ be allocated".  But,
reading the name, "n_alloc_mmu_pages" implies "the number of allocated
mmu pages", which is dead wrong.

It's really the high watermark, so let's give it a name to match:
nr_max_mmu_pages.  This change will make the next few patches
much more obvious and easy to read.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:18 +02:00
Dave Hansen
e0df7b9f6c KVM: abstract kvm x86 mmu->n_free_mmu_pages
"free" is a poor name for this value.  In this context, it means,
"the number of mmu pages which this kvm instance should be able to
allocate."  But "free" implies much more that the objects are there
and ready for use.  "available" is a much better description, especially
when you see how it is calculated.

In this patch, we abstract its use into a function.  We'll soon
replace the function's contents by calculating the value in a
different way.

All of the reads of n_free_mmu_pages are taken care of in this
patch.  The modification sites will be handled in a patch
later in the series.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:17 +02:00
Avi Kivity
6142914280 KVM: x86 emulator: implement CWD (opcode 99)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:16 +02:00
Avi Kivity
d46164dbd9 KVM: x86 emulator: implement IMUL REG, R/M, IMM (opcode 69)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:16 +02:00
Avi Kivity
7db41eb762 KVM: x86 emulator: add Src2Imm decoding
Needed for 3-operand IMUL.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:15 +02:00
Avi Kivity
39f21ee546 KVM: x86 emulator: consolidate immediate decode into a function
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:14 +02:00
Avi Kivity
48bb5d3c40 KVM: x86 emulator: implement RDTSC (opcode 0F 31)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:14 +02:00
Avi Kivity
7077aec0bc KVM: x86 emulator: remove SrcImplicit
Useless.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:13 +02:00
Avi Kivity
5c82aa2998 KVM: x86 emulator: implement IMUL REG, R/M (opcode 0F AF)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:12 +02:00
Avi Kivity
f3a1b9f496 KVM: x86 emulator: implement IMUL REG, R/M, imm8 (opcode 6B)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:12 +02:00
Avi Kivity
40ece7c729 KVM: x86 emulator: implement RET imm16 (opcode C2)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:12 +02:00
Avi Kivity
b250e60589 KVM: x86 emulator: add SrcImmU16 operand type
Used for RET NEAR instructions.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:12 +02:00
Avi Kivity
0ef753b8c3 KVM: x86 emulator: implement CALL FAR (FF /3)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:12 +02:00
Avi Kivity
7af04fc05c KVM: x86 emulator: implement DAS (opcode 2F)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:11 +02:00
Avi Kivity
fb2c264105 KVM: x86 emulator: Use a register for ____emulate_2op() destination
Most x86 two operand instructions allow the destination to be a memory operand,
but IMUL (for example) requires that the destination be a register.  Change
____emulate_2op() to take a register for both source and destination so we
can invoke IMUL.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:11 +02:00
Avi Kivity
b3b3d25a12 KVM: x86 emulator: pass destination type to ____emulate_2op()
We'll need it later so we can use a register for the destination.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:11 +02:00
Wei Yongjun
f2f3184534 KVM: x86 emulator: add LOOP/LOOPcc instruction emulation
Add LOOP/LOOPcc instruction emulation (opcode 0xe0~0xe2).

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:11 +02:00
Wei Yongjun
e8b6fa70e3 KVM: x86 emulator: add CBW/CWDE/CDQE instruction emulation
Add CBW/CWDE/CDQE instruction emulation.(opcode 0x98)
Used by FreeBSD's boot loader.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:11 +02:00
Avi Kivity
0fa6ccbd28 KVM: x86 emulator: fix REPZ/REPNZ termination condition
EFLAGS.ZF needs to be checked after each iteration, not before.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:10 +02:00
Avi Kivity
f6b33fc504 KVM: x86 emulator: implement SCAS (opcodes AE, AF)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:10 +02:00
Avi Kivity
5c56e1cf7a KVM: x86 emulator: fix INTn emulation not pushing EFLAGS and CS
emulate_push() only schedules a push; it doesn't actually push anything.
Call writeback() to flush out the write.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:09 +02:00
Wei Yongjun
a13a63faa6 KVM: x86 emulator: remove dup code of in/out instruction
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:09 +02:00
Wei Yongjun
41167be544 KVM: x86 emulator: change OUT instruction to use dst instead of src
Change OUT instruction to use dst instead of src, so we can
reuse those code for all out instructions.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:09 +02:00
Wei Yongjun
943858e275 KVM: x86 emulator: introduce DstImmUByte for dst operand decode
Introduce DstImmUByte for dst operand decode, which
will be used for out instruction.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:09 +02:00
Wei Yongjun
c483c02ad3 KVM: x86 emulator: remove useless label from x86_emulate_insn()
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:09 +02:00
Wei Yongjun
ee45b58efe KVM: x86 emulator: add setcc instruction emulation
Add setcc instruction emulation (opcode 0x0f 0x90~0x9f)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:08 +02:00
Wei Yongjun
92f738a52b KVM: x86 emulator: add XADD instruction emulation
Add XADD instruction emulation (opcode 0x0f 0xc0~0xc1)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:06 +02:00
Wei Yongjun
31be40b398 KVM: x86 emulator: put register operand write back to a function
Introduce function write_register_operand() to write back the
register operand.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:06 +02:00
Mohammed Gamal
8ec4722dd2 KVM: Separate emulation context initialization in a separate function
The code for initializing the emulation context is duplicated at two
locations (emulate_instruction() and kvm_task_switch()). Separate it
in a separate function and call it from there.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:04 +02:00
Wei Yongjun
d9574a25af KVM: x86 emulator: add bsf/bsr instruction emulation
Add bsf/bsr instruction emulation (opcode 0x0f 0xbc~0xbd)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:04 +02:00
Mohammed Gamal
8c5eee30a9 KVM: x86 emulator: Fix emulate_grp3 return values
This patch lets emulate_grp3() return X86EMUL_* return codes instead
of hardcoded ones.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:04 +02:00
Mohammed Gamal
3f9f53b0d5 KVM: x86 emulator: Add unary mul, imul, div, and idiv instructions
This adds unary mul, imul, div, and idiv instructions (group 3 r/m 4-7).

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:04 +02:00
Wei Yongjun
ba7ff2b76d KVM: x86 emulator: mask group 8 instruction as BitOp
Mask group 8 instruction as BitOp, so we can share the
code for adjust the source operand.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:03 +02:00
Wei Yongjun
3885f18fe3 KVM: x86 emulator: do not adjust the address for immediate source
adjust the dst address for a register source but not adjust the
address for an immediate source.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:02 +02:00
Wei Yongjun
35c843c485 KVM: x86 emulator: fix negative bit offset BitOp instruction emulation
If bit offset operands is a negative number, BitOp instruction
will return wrong value. This patch fix it.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:01 +02:00
Mohammed Gamal
8744aa9aad KVM: x86 emulator: Add stc instruction (opcode 0xf9)
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:01 +02:00
Wei Yongjun
c034da8b92 KVM: x86 emulator: using SrcOne for instruction d0/d1 decoding
Using SrcOne for instruction d0/d1 decoding.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:00 +02:00
Wei Yongjun
36089fed70 KVM: x86 emulator: disable writeback when decode dest operand
This patch change to disable writeback when decode dest
operand if the dest type is ImplicitOps or not specified.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:00 +02:00
Wei Yongjun
06cb704611 KVM: x86 emulator: use SrcAcc to simplify stos decoding
Use SrcAcc to simplify stos decoding.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:00 +02:00
Mohammed Gamal
6e154e56b4 KVM: x86 emulator: Add into, int, and int3 instructions (opcodes 0xcc-0xce)
This adds support for int instructions to the emulator.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:00 +02:00
Mohammed Gamal
160ce1f1a8 KVM: x86 emulator: Allow accessing IDT via emulator ops
The patch adds a new member get_idt() to x86_emulate_ops.
It also adds a function to get the idt in order to be used by the emulator.

This is needed for real mode interrupt injection and the emulation of int
instructions.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:59 +02:00
Wei Yongjun
d3ad624329 KVM: x86 emulator: simplify two-byte opcode check
Two-byte opcode always start with 0x0F and the decode flags
of opcode 0xF0 is always 0, so remove dup check.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:59 +02:00
Alexander Graf
ba49296236 KVM: Move kvm_guest_init out of generic code
Currently x86 is the only architecture that uses kvm_guest_init(). With
PowerPC we're getting a second user, but the signature is different there
and we don't need to export it, as it uses the normal kernel init framework.

So let's move the x86 specific definition of that function over to the x86
specfic header file.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:49 +02:00
Mohammed Gamal
34698d8c61 KVM: x86 emulator: Fix nop emulation
If a nop instruction is encountered, we jump directly to the done label.
This skip updating rip. Break from the switch case instead

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:41 +02:00
Avi Kivity
2dbd0dd711 KVM: x86 emulator: Decode memory operands directly into a 'struct operand'
Since modrm operand can be either register or memory, decoding it into
a 'struct operand', which can represent both, is simpler.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:40 +02:00
Avi Kivity
1f6f05800e KVM: x86 emulator: change invlpg emulation to use src.mem.addr
Instead of using modrm_ea, which will soon be gone.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:39 +02:00
Avi Kivity
342fc63095 KVM: x86 emulator: switch LEA to use SrcMem decoding
The NoAccess flag will prevent memory from being accessed.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:38 +02:00
Avi Kivity
5a506b125f KVM: x86 emulator: add NoAccess flag for memory instructions that skip access
Use for INVLPG, which accesses the tlb, not memory.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:37 +02:00
Avi Kivity
b27f38563d KVM: x86 emulator: use struct operand for mov reg,dr and mov dr,reg for reg op
This is an ordinary modrm source or destination; use the standard structure
representing it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:36 +02:00
Avi Kivity
1a0c7d44e4 KVM: x86 emulator: use struct operand for mov reg,cr and mov cr,reg for reg op
This is an ordinary modrm source or destination; use the standard structure
representing it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:35 +02:00
Avi Kivity
cecc9e3916 KVM: x86 emulator: mark mov cr and mov dr as 64-bit instructions in long mode
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:35 +02:00
Avi Kivity
7f9b4b75be KVM: x86 emulator: introduce Op3264 for mov cr and mov dr instructions
The operands for these instructions are 32 bits or 64 bits, depending on
long mode, and ignoring REX prefixes, or the operand size prefix.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:35 +02:00
Avi Kivity
1e87e3efe7 KVM: x86 emulator: simplify REX.W check
(x && (x & y)) == (x & y)

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:34 +02:00
Avi Kivity
d4709c78ee KVM: x86 emulator: drop use_modrm_ea
Unused (and has never been).

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:34 +02:00
Avi Kivity
91ff3cb43c KVM: x86 emulator: put register operand fetch into a function
The code is repeated three times, put it into fetch_register_operand()

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:33 +02:00
Avi Kivity
3d9e77dff8 KVM: x86 emulator: use SrcAcc to simplify xchg decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:33 +02:00
Avi Kivity
4515453964 KVM: x86 emulator: simplify xchg decode tables
Use X8() to avoid repetition.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:33 +02:00
Avi Kivity
1a6440aef6 KVM: x86 emulator: use correct type for memory address in operands
Currently we use a void pointer for memory addresses.  That's wrong since
these are guest virtual addresses which are not directly dereferencable by
the host.

Use the correct type, unsigned long.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:33 +02:00
Avi Kivity
09ee57cdae KVM: x86 emulator: push segment override out of decode_modrm()
Let it compute modrm_seg instead, and have the caller apply it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:33 +02:00
Joerg Roedel
dbe7758482 KVM: SVM: Check for asid != 0 on nested vmrun
This patch lets a nested vmrun fail if the L1 hypervisor
left the asid zero. This fixes the asid_zero unit test.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:32 +02:00
Joerg Roedel
52c65a30a5 KVM: SVM: Check for nested vmrun intercept before emulating vmrun
This patch lets the nested vmrun fail if the L1 hypervisor
has not intercepted vmrun. This fixes the "vmrun intercept
check" unit test.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:32 +02:00
Xiao Guangrong
4132779b17 KVM: MMU: mark page dirty only when page is really written
Mark page dirty only when this page is really written, it's more exacter,
and also can fix dirty page marking in speculation path

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:32 +02:00
Xiao Guangrong
8672b7217a KVM: MMU: move bits lost judgement into a separate function
Introduce spte_has_volatile_bits() function to judge whether spte
bits will miss, it's more readable and can help us to cleanup code
later

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:31 +02:00
Xiao Guangrong
251464c464 KVM: MMU: using kvm_set_pfn_accessed() instead of mark_page_accessed()
It's a small cleanup that using using kvm_set_pfn_accessed() instead
of mark_page_accessed()

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:30 +02:00
Gleb Natapov
4fc40f076f KVM: x86 emulator: check io permissions only once for string pio
Do not recheck io permission on every iteration.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:29 +02:00
Avi Kivity
9928ff608b KVM: x86 emulator: fix LMSW able to clear cr0.pe
LMSW is documented not to be able to clear cr0.pe; make it so.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:28 +02:00
Gleb Natapov
e85d28f8e8 KVM: x86 emulator: don't update vcpu state if instruction is restarted
No need to update vcpu state since instruction is in the middle of the
emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:27 +02:00
Avi Kivity
63540382cc KVM: x86 emulator: convert some push instructions to direct decode
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:26 +02:00
Avi Kivity
d0e533255d KVM: x86 emulator: allow repeat macro arguments to contain commas
Needed for repeating instructions with execution functions.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:25 +02:00
Avi Kivity
73fba5f4fe KVM: x86 emulator: move decode tables downwards
So they can reference execution functions.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:25 +02:00
Avi Kivity
dde7e6d12a KVM: x86 emulator: move x86_decode_insn() downwards
No code changes.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:24 +02:00
Avi Kivity
ef65c88912 KVM: x86 emulator: allow storing emulator execution function in decode tables
Instead of looking up the opcode twice (once for decode flags, once for
the big execution switch) look up both flags and function in the decode tables.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:22 +02:00
Avi Kivity
9aabc88fc8 KVM: x86 emulator: store x86_emulate_ops in emulation context
It doesn't ever change, so we don't need to pass it around everywhere.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:21 +02:00
Avi Kivity
ab85b12b1a KVM: x86 emulator: move ByteOp and Dst back to bits 0:3
Now that the group index no longer exists, the space is free.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:20 +02:00
Avi Kivity
3885d530b0 KVM: x86 emulator: drop support for old-style groups
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:19 +02:00
Avi Kivity
9f5d3220e3 KVM: x86 emulator: convert group 9 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:18 +02:00
Avi Kivity
2cb20bc8af KVM: x86 emulator: convert group 8 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:18 +02:00
Avi Kivity
2f3a9bc9eb KVM: x86 emulator: convert group 7 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:16 +02:00
Avi Kivity
b67f9f0741 KVM: x86 emulator: convert group 5 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:15 +02:00
Avi Kivity
591c9d20a3 KVM: x86 emulator: convert group 4 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:14 +02:00
Avi Kivity
ee70ea30ee KVM: x86 emulator: convert group 3 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:13 +02:00
Avi Kivity
99880c5cd5 KVM: x86 emulator: convert group 1A to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:12 +02:00
Avi Kivity
5b92b5faff KVM: x86 emulator: convert group 1 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:11 +02:00
Avi Kivity
120df8902d KVM: x86 emulator: allow specifying group directly in opcode
Instead of having a group number, store the group table pointer directly in
the opcode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:10 +02:00
Avi Kivity
793d5a8d6b KVM: x86 emulator: reserve group code 0
We'll be using that to distinguish between new-style and old-style groups.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:09 +02:00
Avi Kivity
42a1c52095 KVM: x86 emulator: move group tables to top
No code changes.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:08 +02:00
Avi Kivity
fd853310a1 KVM: x86 emulator: Add wrappers for easily defining opcodes
Once 'struct opcode' grows, its initializer will become more complicated.
Wrap the simple initializers in a D() macro, and replace the empty initializers
with an even simpler N macro.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:08 +02:00
Avi Kivity
d65b1dee40 KVM: x86 emulator: introduce 'struct opcode'
This will hold all the information known about the opcode.  Currently, this
is just the decode flags.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:07 +02:00
Avi Kivity
ea9ef04e19 KVM: x86 emulator: drop parentheses in repreat macros
The parenthese make is impossible to use the macros with initializers that
require braces.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:06 +02:00
Mohammed Gamal
62bd430e6d KVM: x86 emulator: Add IRET instruction
Ths patch adds IRET instruction (opcode 0xcf).
Currently, only IRET in real mode is emulated. Protected mode support is to be added later if needed.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Reviewed-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:05 +02:00
Joerg Roedel
7a190667bb KVM: SVM: Emulate next_rip svm feature
This patch implements the emulations of the svm next_rip
feature in the nested svm implementation in kvm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:04 +02:00
Joerg Roedel
3f6a9d1693 KVM: SVM: Sync efer back into nested vmcb
This patch fixes a bug in a nested hypervisor that heavily
switches between real-mode and long-mode. The problem is
fixed by syncing back efer into the guest vmcb on emulated
vmexit.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:03 +02:00
Xiao Guangrong
19ada5c4b6 KVM: MMU: remove valueless output message
After commit 53383eaad08d, the '*spte' has updated before call
rmap_remove()(in most case it's 'shadow_trap_nonpresent_pte'), so
remove this information from error message

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:02 +02:00
Avi Kivity
d359192fea KVM: VMX: Use host_gdt variable wherever we need the host gdt
Now that we have the host gdt conveniently stored in a variable, make use
of it instead of querying the cpu.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:01 +02:00
Avi Kivity
e071edd5ba KVM: x86 emulator: unify the two Group 3 variants
Use just one group table for byte (F6) and word (F7) opcodes.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:00 +02:00
Avi Kivity
dfe11481d8 KVM: x86 emulator: Allow LOCK prefix for NEG and NOT
Opcodes F6/2, F6/3, F7/2, F7/3.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:59 +02:00
Avi Kivity
4968ec4e26 KVM: x86 emulator: simplify Group 1 decoding
Move operand decoding to the opcode table, keep lock decoding in the group
table.  This allows us to get consolidate the four variants of Group 1 into one
group.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:58 +02:00
Avi Kivity
52811d7de5 KVM: x86 emulator: mix decode bits from opcode and group decode tables
Allow bits that are common to all members of a group to be specified in the
opcode table instead of the group table.  This allows some simplification
of the decode tables.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:58 +02:00
Avi Kivity
047a481809 KVM: x86 emulator: add Undefined decode flag
Add a decode flag to indicate the instruction is invalid.  Will come in useful
later, when we mix decode bits from the opcode and group table.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:57 +02:00
Avi Kivity
2ce495365f KVM: x86 emulator: Make group storage bits separate from operand bits
Currently group bits are stored in bits 0:7, where operand bits are stored.

Make group bits be 0:3, and move the existing bits 0:3 to 16:19, so we can
mix group and operand bits.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:55 +02:00
Avi Kivity
880a188378 KVM: x86 emulator: consolidate Jcc rel32 decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:55 +02:00
Avi Kivity
be8eacddbd KVM: x86 emulator: consolidate CMOVcc decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:53 +02:00
Avi Kivity
b6e6153885 KVM: x86 emulator: consolidate MOV reg, imm decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:52 +02:00
Avi Kivity
b3ab3405fe KVM: x86 emulator: consolidate Jcc rel8 decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:51 +02:00
Avi Kivity
3849186c38 KVM: x86 emulator: consolidate push/pop reg decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:49 +02:00
Avi Kivity
749358a6b4 KVM: x86 emulator: consolidate inc/dec reg decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:48 +02:00
Avi Kivity
83babbca46 KVM: x86 emulator: add macros for repetitive instructions
Some instructions are repetitive in the opcode space, add macros for
consolidating them.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:48 +02:00
Avi Kivity
91269b8f94 KVM: x86 emulator: fix handling for unemulated instructions
If an instruction is present in the decode tables but not in the execution
switch, it will be emulated as a NOP.  An example is IRET (0xcf).

Fix by adding default: labels to the execution switches.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:47 +02:00
Jiri Slaby
e4072a9a9d x86, printk: Get rid of <0> from stack output
The stack output currently looks like this:

 7fffffffffffffff 0000000a00000000 ffffffff81093341 0000000000000046
<0> ffff88003a545fd8 0000000000000000 0000000000000000 00007fffa39769c0
<0> ffff88003e403f58 ffffffff8102fc4c ffff88003e403f58 ffff88003e403f78

The superfluous <0> are caused by recent printk KERN_CONT
change. <*> is now ignored in printk unless some text follows
the level and even then it still has to be the first in the
format message.

Note that the log_lvl parameter is now completely ignored in
show_stack_log_lvl and the stack is dumped with the default
level (like for quite some time already). It behaves the same as
the rest of the dump, function traces are dumped in the very
same manner. Only Code and maybe some lines are printed with
EMERG level.

Unfortunately I see no way how to fix this conceptually to have
the whole oops/BUG/panic output with the same level, so this
removed only the superfluous characters for the time being.

Just for illustration:

<4>Process kworker/0:0 (pid: 0, threadinfo ffff88003c8a6000, task ffff88003c85c100)
<0>Stack:
<4> ffffffff818022c0 0000000a00000001 0000000000000001 0000000000000046
<4> ffff88003c8a7fd8 0000000000000001 ffff88003c8a7e58 0000000000000000
<4> ffff88003e503f48 ffffffff8102fc4c ffff88003e503f48 ffff88003e503f68
<0>Call Trace:
<0> <IRQ>
<4> [<ffffffff8102fc4c>] ? call_softirq+0x1c/0x30 ...
<0>Code: 00 01 00 00 65 8b 04 25 80 c5 00 00 c7 45 ...

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: jirislaby@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1287586131-16222-1-git-send-email-jslaby@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-23 20:03:03 +02:00
Thomas Gleixner
23f9b26715 x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings()
probe_br_irqs_gsi() is called right after ioapic_init_mappings() and
there are no other users. Move it into ioapic_init_mappings() so the
declaration can disappear and the function can become static.

Rename ioapic_init_mappings() to ioapic_and_gsi_init() to reflect that
change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <1287510389-8388-2-git-send-email-dirk.brandewie@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
2010-10-23 17:27:50 +02:00
Thomas Gleixner
5a7ae78fd4 x86: Allow platforms to force enable apic
Some embedded x86 platforms don't setup the APIC in the
BIOS/bootloader and would be forced to add "lapic" on the kernel
command line. That's a bit akward.

Split out the force enable code from detect_init_APIC() and allow
platform code to call it from the platform setup. That avoids the
command line parameter and possible replication of the MSR dance in
the force enable code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <1287510389-8388-1-git-send-email-dirk.brandewie@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
2010-10-23 17:27:43 +02:00
Linus Torvalds
02f36038c5 Merge branches 'softirq-for-linus', 'x86-debug-for-linus', 'x86-numa-for-linus', 'x86-quirks-for-linus', 'x86-setup-for-linus', 'x86-uv-for-linus' and 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'softirq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  softirqs: Make wakeup_softirqd static

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, asm: Restore parentheses around one pushl_cfi argument
  x86, asm: Fix ancient-GAS workaround
  x86, asm: Fix CFI macro invocations to deal with shortcomings in gas

* 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA

* 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: HPET force enable for CX700 / VIA Epia LT

* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, setup: Use string copy operation to optimze copy in kernel compression

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: Use allocated buffer in tlb_uv.c:tunables_read()

* 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.
2010-10-23 08:25:36 -07:00
Linus Torvalds
10f2a2b0f6 Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32, mm: Add an initial page table for core bootstrapping
2010-10-22 20:37:50 -07:00
Linus Torvalds
8814011679 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kdb,debug_core: adjust master cpu switch logic against new debug_core locking
  debug_core: refactor locking for master/slave cpus
  x86,kgdb: remove unnecessary call to kgdb_correct_hw_break()
  debug_core: disable hw_breakpoints on all cores in kgdb_cpu_enter()
  kdb,kgdb: fix sparse fixups
  kdb: Fix oops in kdb_unregister
  kdb,ftdump: Remove reference to internal kdb include
  kdb: Allow kernel loadable modules to add kdb shell functions
  debug_core: stop rcu warnings on kernel resume
  debug_core: move all watch dog syncs to a single function
  x86,kgdb: fix debugger hw breakpoint test regression in 2.6.35
2010-10-22 20:35:12 -07:00
Linus Torvalds
0fc0531e0a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: update comments to reflect that percpu allocations are always zero-filled
  percpu: Optimize __get_cpu_var()
  x86, percpu: Optimize this_cpu_ptr
  percpu: clear memory allocated with the km allocator
  percpu: fix build breakage on s390 and cleanup build configuration tests
  percpu: use percpu allocator on UP too
  percpu: reduce PCPU_MIN_UNIT_SIZE to 32k
  vmalloc: pcpu_get/free_vm_areas() aren't needed on UP

Fixed up trivial conflicts in include/linux/percpu.h
2010-10-22 17:31:36 -07:00
H. Peter Anvin
fd35fbcdd1 x86-64, asm: Use fxsaveq/fxrestorq in more places
Checkin d7acb92fea made use of fxsaveq
in fpu_fxsave() if the assembler supports it; this adds
fxsaveq/fxrstorq to fxrstor_checking() and fxsave_user() as well.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <AANLkTi=RKyHLNTq6iomZOXkc6Zw1j9iAgsq8388XmzwN@mail.gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-22 15:33:38 -07:00
Dongdong Deng
39a0715f5a x86,kgdb: remove unnecessary call to kgdb_correct_hw_break()
The kernel debug_core invokes hw breakpoint install and removal via
call backs.  The architecture specific kgdb stubs only need to
implement the call backs and not actually call the functions.

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
CC: x86@kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
2010-10-22 15:34:13 -05:00
Jason Wessel
91b152aa85 kdb,kgdb: fix sparse fixups
Fix the following sparse warnings:

kdb_main.c:328:5: warning: symbol 'kdbgetu64arg' was not declared. Should it be static?
kgdboc.c:246:12: warning: symbol 'kgdboc_early_init' was not declared. Should it be static?
kgdb.c:652:26: warning: incorrect type in argument 1 (different address spaces)
kgdb.c:652:26:    expected void const *ptr
kgdb.c:652:26:    got struct perf_event *[noderef] <asn:3>*pev

The one in kgdb.c required the (void * __force) because of the return
code from register_wide_hw_breakpoint looking like:

        return (void __percpu __force *)ERR_PTR(err);

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-22 15:34:12 -05:00
Jason Wessel
fad99fac26 x86,kgdb: fix debugger hw breakpoint test regression in 2.6.35
HW breakpoints events stopped working correctly with kgdb as a result
of commit: 018cbffe68 (Merge commit
'v2.6.33' into perf/core), later commit:
ba773f7c51 (x86,kgdb: Fix hw breakpoint
regression) allowed breakpoints to propagate to the debugger core but
did not completely address the original regression in functionality
found in 2.6.35.

When the DR_STEP flag is set in dr6 along with any of the DR_TRAP
bits, the kgdb exception handler will enter once from the
hw_breakpoint API call back and again from the die notifier for
do_debug(), which causes the debugger to stop twice and also for the
kgdb regression tests to fail running under kvm with:

echo V2I1 > /sys/module/kgdbts/parameters/kgdbts

To address the problem, the kgdb overflow handler needs to implement
the same logic as the ptrace overflow handler call back with respect
to updating the virtual copy of dr6.  This will allow the kgdb
do_debug() die notifier to properly handle the exception and the
attached debugger, or kgdb test suite, will only receive a single
notification.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: x86@kernel.org
2010-10-22 15:34:10 -05:00
Stefano Stabellini
0e058e5277 xen: add a missing #include to arch/x86/pci/xen.c
Add missing #include <asm/io_apic.h> to arch/x86/pci/xen.c.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-10-22 21:26:02 +01:00
Stefano Stabellini
ff12849a7a xen: mask the MTRR feature from the cpuid
We don't want Linux to think that the cpu supports MTRRs when running
under Xen because MTRR operations could only be performed through
hypercalls.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:26:02 +01:00
Juan Quintela
4ec5387cc3 xen: add the direct mapping area for ISA bus access
add the direct mapping area for ISA bus access when running as initial
domain

Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:47 +01:00
Stefano Stabellini
801fd14a72 xen: use vcpu_ops to setup cpu masks
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:45 +01:00
Jeremy Fitzhardinge
98511f3532 xen: map a dummy page for local apic and ioapic in xen_set_fixmap
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:44 +01:00
Qing He
f731e3ef02 xen: remap MSIs into pirqs when running as initial domain
Implement xen_create_msi_irq to create an msi and remap it as pirq.
Use xen_create_msi_irq to implement an initial domain specific version
of setup_msi_irqs.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:44 +01:00
Jeremy Fitzhardinge
38aa66fcb7 xen: remap GSIs as pirqs when running as initial domain
Implement xen_register_gsi to setup the correct triggering and polarity
properties of a gsi.
Implement xen_register_pirq to register a particular gsi as pirq and
receive interrupts as events.
Call xen_setup_pirqs to register all the legacy ISA irqs as pirqs.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:43 +01:00
Stefano Stabellini
6b0661a5e6 xen: introduce XEN_DOM0 as a silent option
Add XEN_DOM0 to arch/x86/xen/Kconfig as a silent compile time option
that gets enabled when xen and basic x86, acpi and pci support are
selected.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:43 +01:00
Stefano Stabellini
809f9267bb xen: map MSIs into pirqs
Map MSIs into pirqs, writing 0 in the MSI vector data field and the pirq
number in the MSI destination id field.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:43 +01:00
Stefano Stabellini
3942b740e5 xen: support GSI -> pirq remapping in PV on HVM guests
Disable pcifront when running on HVM: it is meant to be used with pv
guests that don't have PCI bus.

Use acpi_register_gsi_xen_hvm to remap GSIs into pirqs.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:42 +01:00
Jeremy Fitzhardinge
90f6881e64 xen: add xen hvm acpi_register_gsi variant
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-10-22 21:25:42 +01:00
Jeremy Fitzhardinge
2f065aef17 acpi: use indirect call to register gsi in different modes
Rather than using a tree of conditionals, use function pointer
for acpi_register_gsi.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-10-22 21:25:41 +01:00
Stefano Stabellini
42a1de56f3 xen: implement xen_hvm_register_pirq
xen_hvm_register_pirq allows the kernel to map a GSI into a Xen pirq and
receive the interrupt as an event channel from that point on.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:41 +01:00
Stefano Stabellini
67ba37293e Merge commit 'konrad/stable/xen-pcifront-0.8.2' into 2.6.36-rc8-initial-domain-v6 2010-10-22 21:24:06 +01:00
Ian Campbell
9e9a5fcb04 xen: use host E820 map for dom0
When running as initial domain, get the real physical memory map from
xen using the XENMEM_machine_memory_map hypercall and use it to setup
the e820 regions.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 13:19:19 -07:00
Ian Campbell
375b2a9ada xen: correctly rebuild mfn list list after migration.
Otherwise the second migration attempt fails because the mfn_list_list
still refers to all the old mfns.

We need to update the entires in both p2m_top_mfn and the mid_mfn
pages which p2m_top_mfn refers to.

In order to do this we need to keep track of the virtual addresses
mapping the p2m_mid_mfn pages since we cannot rely on
mfn_to_virt(p2m_top_mfn[idx]) since p2m_top_mfn[idx] will still
contain the old MFN after a migration, which may now belong to another
domain and hence have a different mapping in the m2p.

Therefore add and maintain a third top level page, p2m_top_mfn_p[],
which tracks the virtual addresses of the mfns contained in
p2m_top_mfn[].

We also need to update the content of the p2m_mid_missing_mfn page on
resume to refer to the page's new mfn.

p2m_missing does not need updating since the migration process takes
care of the leaf p2m pages for us.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:36 -07:00
Jeremy Fitzhardinge
3654581e47 xen: don't add extra_pages for RAM after mem_end
If an E820 region is entirely beyond mem_end, don't attempt to truncate
it and add the truncated pages to extra_pages, as they will be negative.

Also, make sure the extra memory region starts after all BIOS provided
E820 regions (and in the case of RAM regions, post-clipping).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:32 -07:00
Jeremy Fitzhardinge
41f2e4771a xen: add support for PAT
Convert Linux PAT entries into Xen ones when constructing ptes.  Linux
doesn't use _PAGE_PAT for ptes, so the only difference in the first 4
entries is that Linux uses _PAGE_PWT for WC, whereas Xen (and default)
use it for WT.

xen_pte_val does the inverse conversion.

We hard-code assumptions about Linux's current PAT layout, but a
warning on the wrmsr to MSR_IA32_CR_PAT should point out any problems.
If necessary we could go to a more general table-based conversion between
Linux and Xen PAT entries.

hugetlbfs poses a problem at the moment, the x86 architecture uses the
same flag for _PAGE_PAT and _PAGE_PSE, which changes meaning depending
on which pagetable level we're using.  At the moment this should be OK
so long as nobody tries to do a pte_val on a hugetlbfs pte.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:31 -07:00
Jeremy Fitzhardinge
2f7acb2085 xen: make sure xen_max_p2m_pfn is up to date
Keep xen_max_p2m_pfn up to date with the end of the extra memory
we're adding.  It is possible that it will be too high since memory
may be truncated by a "mem=" option on the kernel command line, but
that won't matter.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:30 -07:00
Jeremy Fitzhardinge
698bb8d14a xen: limit extra memory to a certain ratio of base
If extra memory is very much larger than the base memory size
then all of the base memory can be filled with structures reserved to
describe the extra memory, leaving no space for anything else.

Even at the maximum ratio there will be little space for anything else,
but this change is intended to at least allow the system to boot rather
than crash mysteriously.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:29 -07:00
Jeremy Fitzhardinge
b5b43ced7a xen: add extra pages for E820 RAM regions, even if beyond mem_end
If an entire E820 RAM region is beyond mem_end, still add its
pages to the extra area so that space can be used by the kernel.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:29 -07:00
Jeremy Fitzhardinge
36bc251b87 xen: make sure xen_extra_mem_start is beyond all non-RAM e820
If Xen gives us non-RAM E820 entries (dom0 only, typically), then
make sure the extra RAM region is beyond them.  It's OK for
the extra space to grow into E820 regions, however.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:28 -07:00
Jeremy Fitzhardinge
42ee1471e9 xen: implement "extra" memory to reserve space for pages not present at boot
When using the e820 map to get the initial pseudo-physical address space,
look for either Xen-provided memory which doesn't lie within an E820
region, or an E820 RAM region which extends beyond the Xen-provided
memory range.

Count these pages, and add them to a new "extra memory" range.  This range
has an E820 RAM range to describe it - so the kernel will allocate page
structures for it - but it is also marked reserved so that the kernel
will not attempt to use it.

The balloon driver can then add this range as a set of currently
ballooned-out pages, which can be used to extend the domain beyond its
original size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:27 -07:00
Ian Campbell
35ae11fd14 xen: Use host-provided E820 map
Rather than simply using a flat memory map from Xen, use its provided
E820 map.  This allows the domain builder to tell the domain to reserve
space for more pages than those initially provided at domain-build time.

It also allows the host to specify holes in the address space (for
PCI-passthrough, for example).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:27 -07:00
Jeremy Fitzhardinge
cfd8951e08 xen: don't map missing memory
When setting up a pte for a missing pfn (no matching mfn), just create
an empty pte rather than a junk mapping.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:26 -07:00
Jeremy Fitzhardinge
33a847502b xen: defer building p2m mfn structures until kernel is mapped
When building mfn parts of p2m structure, we rely on being able to
use mfn_to_virt, which in turn requires kernel to be mapped into
the linear area (which is distinct from the kernel image mapping
on 64-bit).  Defer calling xen_build_mfn_list_list() until after
xen_setup_kernel_pagetable();

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge
c3798062f1 xen: add return value to set_phys_to_machine()
set_phys_to_machine() can return false on failure, which means a memory
allocation failure for the p2m structure.  It can only fail if setting
the mfn for a pfn in previously unused address space.  It is guaranteed
to succeed if you're setting a mapping to INVALID_P2M_ENTRY or updating
the mfn for an existing pfn.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge
58e05027b5 xen: convert p2m to a 3 level tree
Make the p2m structure a 3 level tree which covers the full possible
physical space.

The p2m structure contains mappings from the domain's pfns to system-wide
mfns.  The structure has 3 levels and two roots.  The first root is for
the domain's own use, and is linked with virtual addresses.  The second
is all mfn references, and is used by Xen on save/restore to allow it to
update the p2m mapping for the domain.

At boot, the domain builder provides a simple flat p2m array for all the
initially present pages.  We construct the two levels above that using
the early_brk allocator.  After early boot time, set_phys_to_machine()
will allocate any missing levels using the normal kernel allocator
(at GFP_KERNEL, so it must be called in a normal blocking context).

Because the early_brk() API requires us to pre-reserve the maximum amount
of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY
config option, but its only negative side-effect is to increase the
kernel's apparent bss size.  However, since all unused brk memory is
returned to the heap, there's no real downside to making it large.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:24 -07:00
Jeremy Fitzhardinge
bbbf61eff9 xen: make install_p2mtop_page() static
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:23 -07:00
Jeremy Fitzhardinge
1f2d9dd309 xen: set the actual extent of the mfn_list_list
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:23 -07:00
Jeremy Fitzhardinge
b7eb4ad391 xen: set shared_info->arch.max_pfn to max_p2m_pfn
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:22 -07:00
Jeremy Fitzhardinge
1e17fc7eff xen: remove noise about registering vcpu info
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:21 -07:00
Jeremy Fitzhardinge
764f0138b9 xen: allocate level1_ident_pgt
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:20 -07:00
Jeremy Fitzhardinge
f0991802bb xen: use early_brk for level2_kernel_pgt
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:19 -07:00
Jeremy Fitzhardinge
a2e8752987 xen: allocate p2m size based on actual max size
Allocate p2m tables based on the actual runtime maximum pfn rather than
the static config-time limit.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:19 -07:00
Jeremy Fitzhardinge
a171ce6e7b xen: dynamically allocate p2m space
Use early brk mechanism to allocate p2m tables, to save memory when
booting non-Xen.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:18 -07:00
Jeremy Fitzhardinge
5e941c0939 x86: add RESERVE_BRK_ARRAY() helper
Useful when converting static arrays into boottime brk allocated objects.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:17 -07:00
Linus Torvalds
db08bf0877 Merge git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
* git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic/io.h: allow people to override individual funcs
  bitops: remove duplicated extern declarations
  bitops: make asm-generic/bitops/find.h more generic
  asm-generic: kdebug.h: Checkpatch cleanup
  asm-generic: fcntl: make exported headers use strict posix types
  asm-generic: cmpxchg does not handle non-long arguments
  asm-generic: make atomic_add_unless a function
2010-10-22 11:17:06 -07:00
Linus Torvalds
092e0e7e52 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
  vfs: make no_llseek the default
  vfs: don't use BKL in default_llseek
  llseek: automatically add .llseek fop
  libfs: use generic_file_llseek for simple_attr
  mac80211: disallow seeks in minstrel debug code
  lirc: make chardev nonseekable
  viotape: use noop_llseek
  raw: use explicit llseek file operations
  ibmasmfs: use generic_file_llseek
  spufs: use llseek in all file operations
  arm/omap: use generic_file_llseek in iommu_debug
  lkdtm: use generic_file_llseek in debugfs
  net/wireless: use generic_file_llseek in debugfs
  drm: use noop_llseek
2010-10-22 10:52:56 -07:00
Linus Torvalds
91151240ed Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE
  x86: Move alloc_desk_mask variables inside ifdef
  x86-32: Align IRQ stacks properly
  x86: Remove CONFIG_4KSTACKS
  x86: Always use irq stacks

Fixed up trivial conflicts in include/linux/{irq.h, percpu-defs.h}
2010-10-22 08:54:21 -07:00
Linus Torvalds
211baf4ffc Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Hpet: Avoid the comparator readback penalty
2010-10-22 08:47:45 -07:00
Rakib Mullick
a69a0612c4 [CPUFREQ]: x86, cpufreq: Mark longrun_get_policy with __cpuinit.
This patch fixes the following warning. The function
longrun_cpu_init() is marked with __cpuinit which calls
longrun_get_policy() which is a __init function. So make
longrun_get_policy with __cpuinit.

WARNING: arch/x86/kernel/cpu/cpufreq/longrun.o(.cpuinit.text+0x4c5):
Section mismatch in reference from the function longrun_cpu_init() to
the function .init.text:longrun_get_policy()
The function __cpuinit longrun_cpu_init() references
a function __init longrun_get_policy().
If longrun_get_policy is only used by longrun_cpu_init then
annotate longrun_get_policy with a matching annotation.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-10-22 11:44:47 -04:00
Julia Lawall
b2a33c1728 [CPUFREQ] arch/x86/kernel/cpu/cpufreq: Fix unsigned return type
In each case, the function has an unsigned return type, but returns a
negative constant to indicate an error condition.  Each function is only
called once.  For nforce2_detect_chipset, the result is only compared to 0,
and for longrun_determine_freqs, the result is stored in a variable of type
(signed) int.  Thus, for both functions, unsigned can be dropped from the
return type.

A sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@exists@
identifier f;
constant C;
@@

 unsigned f(...)
 { <+...
*  return -C;
 ...+> }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-10-22 11:44:47 -04:00
Andi Kleen
46e387bbd8 Merge branch 'hwpoison-hugepages' into hwpoison
Conflicts:
	mm/memory-failure.c
2010-10-22 17:40:48 +02:00
Peter Zijlstra
96681fc3c9 perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations
For performance reasons its best to use memory node local memory for
per-cpu buffers.

This logic comes from a much larger patch proposed by Stephane.

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.514465326@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:26 +02:00
Peter Zijlstra
f80c9e304b perf, x86: Clean up reserve_ds_buffers() signature
Now that reserve_ds_buffers() never fails, change it to return
void and remove all code dealing with the error return.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.462621937@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:26 +02:00
Peter Zijlstra
6809b6ea73 perf, x86: Less disastrous PEBS/BTS buffer allocation failure
Currently PEBS/BTS buffers are allocated when we instantiate the first
event, when this fails everything fails.

This is a problem because esp. BTS tries to allocate a rather large
buffer (64K), which can easily fail.

This patch changes the logic such that when either buffer allocation
fails, we simply don't allow events that would use these facilities,
but continue functioning for all other events.

This logic comes from a much larger patch proposed by Stephane.

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.354429461@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:26 +02:00
Peter Zijlstra
5553be2620 perf, x86: Fixup the precise_ip computation
In case we don't have PEBS, the LBR fixup doesn't make sense.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.354429461@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:25 +02:00
Peter Zijlstra
65af94baca perf, x86: Extract DS alloc/free functions
Again, mostly a cleanup to unclutter the reserve_ds_buffer() code.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.304495776@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:25 +02:00
Peter Zijlstra
5ee25c8731 perf, x86: Extract PEBS/BTS allocation functions
Mostly a cleanup.. it reduces code indentation and makes the code flow
of reserve_ds_buffers() clearer.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.253453452@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:25 +02:00
Peter Zijlstra
b39f88acd7 perf, x86: Extract PEBS/BTS buffer free routines
So that we may grow additional call-sites..

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.196793164@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:24 +02:00
Jan Beulich
07bd8516a2 x86, asm: Restore parentheses around one pushl_cfi argument
These were (intentionally) stripped by "fix CFI macro
invocations to deal with shortcomings in gas" to expose problems
with unexpected splitting of arguments by older gas also on
newer versions, but as it turns out there is at least one distro
(Ubuntu 6.06) where even not having *any* spaces in a macro
argument doesn't reliably prevent splitting into multiple
arguments.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4CC157DB020000780001E8A2@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 10:51:44 +02:00
Linus Torvalds
3044100e58 Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits)
  x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S
  xen: Cope with unmapped pages when initializing kernel pagetable
  memblock, bootmem: Round pfn properly for memory and reserved regions
  memblock: Annotate memblock functions with __init_memblock
  memblock: Allow memblock_init to be called early
  memblock/arm: Fix memblock_region_is_memory() typo
  x86, memblock: Remove __memblock_x86_find_in_range_size()
  memblock: Fix wraparound in find_region()
  x86-32, memblock: Make add_highpages honor early reserved ranges
  x86, memblock: Fix crashkernel allocation
  arm, memblock: Fix the sparsemem build
  memblock: Fix section mismatch warnings
  powerpc, memblock: Fix memblock API change fallout
  memblock, microblaze: Fix memblock API change fallout
  x86: Remove old bootmem code
  x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve
  x86: Remove not used early_res code
  x86, memblock: Replace e820_/_early string with memblock_
  x86: Use memblock to replace early_res
  x86, memblock: Use memblock_debug to control debug message print out
  ...

Fix up trivial conflicts in arch/x86/kernel/setup.c and kernel/Makefile
2010-10-21 18:52:11 -07:00
Linus Torvalds
e36f561a2c Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags:
  Fix IRQ flag handling naming
  MIPS: Add missing #inclusions of <linux/irq.h>
  smc91x: Add missing #inclusion of <linux/irq.h>
  Drop a couple of unnecessary asm/system.h inclusions
  SH: Add missing consts to sys_execve() declaration
  Blackfin: Rename IRQ flags handling functions
  Blackfin: Add missing dep to asm/irqflags.h
  Blackfin: Rename DES PC2() symbol to avoid collision
  Blackfin: Split the BF532 BFIN_*_FIO_FLAG() functions to their own header
  Blackfin: Split PLL code from mach-specific cdef headers
2010-10-21 14:37:27 -07:00
Linus Torvalds
157b6ceb13 Merge branch 'x86-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, iommu: Update header comments with appropriate naming
  ia64, iommu: Add a dummy iommu_table.h file in IA64.
  x86, iommu: Fix IOMMU_INIT alignment rules
  x86, doc: Adding comments about .iommu_table and its neighbors.
  x86, iommu: Utilize the IOMMU_INIT macros functionality.
  x86, VT-d: Make Intel VT-d IOMMU use IOMMU_INIT_* macros.
  x86, GART/AMD-VI: Make AMD GART and IOMMU use IOMMU_INIT_* macros.
  x86, calgary: Make Calgary IOMMU use IOMMU_INIT_* macros.
  x86, xen-swiotlb: Make Xen-SWIOTLB use IOMMU_INIT_* macros.
  x86, swiotlb: Make SWIOTLB use IOMMU_INIT_* macros.
  x86, swiotlb: Simplify SWIOTLB pci_swiotlb_detect routine.
  x86, iommu: Add proper dependency sort routine (and sanity check).
  x86, iommu: Make all IOMMU's detection routines return a value.
  x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and iommu_table_entry structure
2010-10-21 14:23:48 -07:00
Linus Torvalds
4a60cfa945 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (96 commits)
  apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets
  apic, x86: Check if EILVT APIC registers are available (AMD only)
  x86: ioapic: Call free_irte only if interrupt remapping enabled
  arm: Use ARCH_IRQ_INIT_FLAGS
  genirq, ARM: Fix boot on ARM platforms
  genirq: Fix CONFIG_GENIRQ_NO_DEPRECATED=y build
  x86: Switch sparse_irq allocations to GFP_KERNEL
  genirq: Switch sparse_irq allocator to GFP_KERNEL
  genirq: Make sparse_lock a mutex
  x86: lguest: Use new irq allocator
  genirq: Remove the now unused sparse irq leftovers
  genirq: Sanitize dynamic irq handling
  genirq: Remove arch_init_chip_data()
  x86: xen: Sanitise sparse_irq handling
  x86: Use sane enumeration
  x86: uv: Clean up the direct access to irq_desc
  x86: Make io_apic.c local functions static
  genirq: Remove irq_2_iommu
  x86: Speed up the irq_remapped check in hot pathes
  intr_remap: Simplify the code further
  ...

Fix up trivial conflicts in arch/x86/Kconfig
2010-10-21 14:11:46 -07:00
Linus Torvalds
5fe8321b88 Merge branch 'x86-x2apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-x2apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, x2apic: Simplify apic init in SMP and UP builds
  x86, intr-remap: Remove IRTE setup duplicate code
  x86, intr-remap: Set redirection hint in the IRTE
2010-10-21 13:54:05 -07:00
Linus Torvalds
709d9f54cc Merge branch 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI
  x86, vmware: Remove deprecated VMI kernel support

Fix up trivial #include conflict in arch/x86/kernel/smpboot.c
2010-10-21 13:53:24 -07:00
Linus Torvalds
cca8209ed9 Merge branch 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, olpc: XO-1 uses/depends on PCI
  x86, olpc: Register XO-1 platform devices
  x86, olpc: Add XO-1 poweroff support
  x86, olpc: Don't retry EC commands forever
  x86, olpc: Rework BIOS signature check
  x86, olpc: Only enable PCI configuration type override on XO-1
2010-10-21 13:52:01 -07:00
Linus Torvalds
d77bdc423d Merge branch 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mtrr: Support mtrr lookup for range spanning across MTRR range
  x86, mtrr: Refactor MTRR type overlap check code
2010-10-21 13:51:41 -07:00
Linus Torvalds
87affd0b94 Merge branch 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: sfi: Make local functions static
  x86, earlyprintk: Add hsu early console for Intel Medfield platform
  x86, earlyprintk: Add earlyprintk for Intel Moorestown platform
  x86: Add two helper macros for fixed address mapping
  x86, mrst: A function in a header file needs to be marked "inline"
2010-10-21 13:47:54 -07:00
Linus Torvalds
c3b86a2942 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32, percpu: Correct the ordering of the percpu readmostly section
  x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 || HIGHMEM64G
  x86: Spread tlb flush vector between nodes
  percpu: Introduce a read-mostly percpu API
  x86, mm: Fix incorrect data type in vmalloc_sync_all()
  x86, mm: Hold mm->page_table_lock while doing vmalloc_sync
  x86, mm: Fix bogus whitespace in sync_global_pgds()
  x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation
  x86, mm: Add RESERVE_BRK_ARRAY() helper
  mm, x86: Saving vmcore with non-lazy freeing of vmas
  x86, kdump: Change copy_oldmem_page() to use cached addressing
  x86, mm: fix uninitialized addr in kernel_physical_mapping_init()
  x86, kmemcheck: Remove double test
  x86, mm: Make spurious_fault check explicitly check the PRESENT bit
  x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes
  x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions
  x86, mm: Avoid unnecessary TLB flush
2010-10-21 13:47:29 -07:00
Linus Torvalds
8d8d2e9ccd Merge branch 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mem: Optimize memmove for small size and unaligned cases
  x86, mem: Optimize memcpy by avoiding memory false dependece
  x86, mem: Don't implement forward memmove() as memcpy()
2010-10-21 13:46:28 -07:00
Linus Torvalds
2a8b67fb72 Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line
  x86, hotplug: Move WBINVD back outside the play_dead loop
  x86, hotplug: Use mwait to offline a processor, fix the legacy case
  x86, mwait: Move mwait constants to a common header file
2010-10-21 13:45:38 -07:00
Linus Torvalds
b6f7e38dbb Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, fpu: Merge fpu_save_init()
  x86-32, fpu: Rewrite fpu_save_init()
  x86, fpu: Remove PSHUFB_XMM5_* macros
  x86, fpu: Remove unnecessary ifdefs from i387 code.
  x86-32, fpu: Remove math_emulate stub
  x86-64, fpu: Simplify constraints for fxsave/fxtstor
  x86-64, fpu: Fix %cs value in convert_from_fxsr()
  x86-64, fpu: Disable preemption when using TS_USEDFPU
  x86, fpu: Merge __save_init_fpu()
  x86, fpu: Merge tolerant_fwait()
  x86, fpu: Merge fpu_init()
  x86: Use correct type for %cr4
  x86, xsave: Disable xsave in i387 emulation mode

Fixed up fxsaveq-induced conflict in arch/x86/include/asm/i387.h
2010-10-21 13:34:32 -07:00
Alok Kataria
76fac077db x86, kexec: Make sure to stop all CPUs before exiting the kernel
x86 smp_ops now has a new op, stop_other_cpus which takes a parameter
"wait" this allows the caller to specify if it wants to stop until all
the cpus have processed the stop IPI.  This is required specifically
for the kexec case where we should wait for all the cpus to be stopped
before starting the new kernel.  We now wait for the cpus to stop in
all cases except for panic/kdump where we expect things to be broken
and we are doing our best to make things work anyway.

This patch fixes a legitimate regression, which was introduced during
2.6.30, by commit id 4ef702c10b.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1286833028.1372.20.camel@ank32.eng.vmware.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: <stable@kernel.org> v2.6.30-36
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-21 13:30:44 -07:00
Linus Torvalds
214515b578 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Remove pr_<level> uses of KERN_<level>
  therm_throt.c: Trivial printk message fix for a unsuitable abbreviation of 'thermal'
  x86: Use {push,pop}{l,q}_cfi in more places
  i386: Add unwind directives to syscall ptregs stubs
  x86-64: Use symbolics instead of raw numbers in entry_64.S
  x86-64: Adjust frame type at paranoid_exit:
  x86-64: Fix unwind annotations in syscall stubs
2010-10-21 13:20:32 -07:00
Linus Torvalds
bf70030dc0 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cpu: Fix X86_FEATURE_NOPL
  x86, cpu: Re-run get_cpu_cap() after adjusting the CPUID level
2010-10-21 13:18:36 -07:00
Linus Torvalds
d60a2793ba Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Remove stale pmtimer_64.c
  x86, cleanups: Use clear_page/copy_page rather than memset/memcpy
  x86: Remove unnecessary #ifdef ACPI/X86_IO_ACPI
  x86, cleanup: Remove obsolete boot_cpu_id variable
2010-10-21 13:18:06 -07:00
Linus Torvalds
781c5a67f1 Merge branch 'x86-bios-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-bios-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, bios: Make the x86 early memory reservation a kernel option
  x86, bios: By default, reserve the low 64K for all BIOSes
2010-10-21 13:06:49 -07:00
Linus Torvalds
e990c77d06 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64, asm: If the assembler supports fxsave64, use it
  i386: Make kernel_execve() suitable for stack unwinding
2010-10-21 13:06:00 -07:00
Linus Torvalds
2f0384e5fc Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, amd_nb: Enable GART support for AMD family 0x15 CPUs
  x86, amd: Use compute unit information to determine thread siblings
  x86, amd: Extract compute unit information for AMD CPUs
  x86, amd: Add support for CPUID topology extension of AMD CPUs
  x86, nmi: Support NMI watchdog on newer AMD CPU families
  x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs
  x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB
  x86, k8-gart: Decouple handling of garts and northbridges
  x86, cacheinfo: Fix dependency of AMD L3 CID
  x86, kvm: add new AMD SVM feature bits
  x86, cpu: Fix allowed CPUID bits for KVM guests
  x86, cpu: Update AMD CPUID feature bits
  x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit
  x86, AMD: Remove needless CPU family check (for L3 cache info)
  x86, tsc: Remove CPU frequency calibration on AMD
2010-10-21 13:01:08 -07:00
Linus Torvalds
bc4016f481 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits)
  sched: Export account_system_vtime()
  sched: Call tick_check_idle before __irq_enter
  sched: Remove irq time from available CPU power
  sched: Do not account irq time to current task
  x86: Add IRQ_TIME_ACCOUNTING
  sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time
  sched: Add a PF flag for ksoftirqd identification
  sched: Consolidate account_system_vtime extern declaration
  sched: Fix softirq time accounting
  sched: Drop group_capacity to 1 only if local group has extra capacity
  sched: Force balancing on newidle balance if local group has capacity
  sched: Set group_imb only a task can be pulled from the busiest cpu
  sched: Do not consider SCHED_IDLE tasks to be cache hot
  sched: Drop all load weight manipulation for RT tasks
  sched: Create special class for stop/migrate work
  sched: Unindent labels
  sched: Comment updates: fix default latency and granularity numbers
  tracing/sched: Add sched_pi_setprio tracepoint
  sched: Give CPU bound RT tasks preference
  sched: Try not to migrate higher priority RT tasks
  ...
2010-10-21 12:55:43 -07:00
Linus Torvalds
5d70f79b5e Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits)
  tracing: Fix compile issue for trace_sched_wakeup.c
  [S390] hardirq: remove pointless header file includes
  [IA64] Move local_softirq_pending() definition
  perf, powerpc: Fix power_pmu_event_init to not use event->ctx
  ftrace: Remove recursion between recordmcount and scripts/mod/empty
  jump_label: Add COND_STMT(), reducer wrappery
  perf: Optimize sw events
  perf: Use jump_labels to optimize the scheduler hooks
  jump_label: Add atomic_t interface
  jump_label: Use more consistent naming
  perf, hw_breakpoint: Fix crash in hw_breakpoint creation
  perf: Find task before event alloc
  perf: Fix task refcount bugs
  perf: Fix group moving
  irq_work: Add generic hardirq context callbacks
  perf_events: Fix transaction recovery in group_sched_in()
  perf_events: Fix bogus AMD64 generic TLB events
  perf_events: Fix bogus context time tracking
  tracing: Remove parent recording in latency tracer graph options
  tracing: Use one prologue for the preempt irqs off tracer function tracers
  ...
2010-10-21 12:54:49 -07:00
Linus Torvalds
1053e6bba0 Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/amd-iommu: Update copyright headers
  x86/amd-iommu: Reenable AMD IOMMU if it's mysteriously vanished over suspend
  AGP: Warn when GATT memory cannot be set to UC
  x86, GART: Disable GART table walk probes
  x86, GART: Remove superfluous AMD64_GARTEN
2010-10-21 12:49:15 -07:00
Daniel Drake
260586d2b4 Add OLPC XO-1 rfkill driver
Add a software rfkill switch for the WLAN interface in the OLPC XO-1
laptop. It uses the OLPC embedded controller to cut/restore power to
the Marvell WLAN chip on the motherboard.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2010-10-21 10:10:44 -04:00
Konrad Rzeszutek Wilk
5bba6c56dc X86/PCI: Remove the dependency on isapnp_disable.
This looks to be vestigial dependency that had never been used even
in the original code base (2.6.18) from which this driver
was up-ported. Without this fix, with the CONFIG_ISAPNP, we get this
compile failure:

arch/x86/pci/xen.c: In function 'pci_xen_init':
arch/x86/pci/xen.c:138: error: 'isapnp_disable' undeclared (first use in this function)
arch/x86/pci/xen.c:138: error: (Each undeclared identifier is reported only once
arch/x86/pci/xen.c:138: error: for each function it appears in.)

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-21 09:36:07 -04:00
Ian Campbell
de1ef2065c xen/privcmd: move remap_domain_mfn_range() to core xen code and export.
This allows xenfs to be built as a module, previously it required flush_tlb_all
and arbitrary_virt_to_machine to be exported.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-20 16:22:34 -07:00
Jeremy Fitzhardinge
1246ae0bb9 xen: add variable hypercall caller
Allow non-constant hypercall to be called, for privcmd.

[ Impact: make arbitrary hypercalls; needed for privcmd ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-20 16:22:27 -07:00
Jeremy Fitzhardinge
eba3ff8b99 xen: add xen_set_domain_pte()
Add xen_set_domain_pte() to allow setting a pte mapping a page from
another domain.  The common case is to map from DOMID_IO, the pseudo
domain which owns all IO pages, but will also be used in the privcmd
interface to map other domain pages.

[ Impact: new Xen-internal API for cross-domain mappings ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-20 16:22:27 -07:00
FUJITA Tomonori
66f2b06154 x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 || HIGHMEM64G
Set CONFIG_ARCH_DMA_ADDR_T_64BIT when we set dma_addr_t to 64 bits in
<asm/types.h>; this allows Kconfig decisions based on this property.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <201010202255.o9KMtZXu009370@imap1.linux-foundation.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 16:02:42 -07:00
Shaohua Li
9329672021 x86: Spread tlb flush vector between nodes
Currently flush tlb vector allocation is based on below equation:
	sender = smp_processor_id() % 8
This isn't optimal, CPUs from different node can have the same vector, this
causes a lot of lock contention. Instead, we can assign the same vectors to
CPUs from the same node, while different node has different vectors. This has
below advantages:
a. if there is lock contention, the lock contention is between CPUs from one
node. This should be much cheaper than the contention between nodes.
b. completely avoid lock contention between nodes. This especially benefits
kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity
to specific node.

In my test, this could reduce > 20% CPU overhead in extreme case.The test
machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd
to the first CPU of the node. I run a workload with 4 sequential mmap file
read thread. The files are empty sparse file. This workload will trigger a
lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the
extreme tlb flush lock contention because otherwise kswapd keeps migrating
between CPUs of a node and I can't get stable result. Sure in real workload,
we can't always see so big tlb flush lock contention, but it's possible.

[ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ]

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 14:44:42 -07:00
Borislav Petkov
b40827fa72 x86-32, mm: Add an initial page table for core bootstrapping
This patch adds an initial page table with low mappings used exclusively
for booting APs/resuming after ACPI suspend/machine restart. After this,
there's no need to add low mappings to swapper_pg_dir and zap them later
or create own swsusp PGD page solely for ACPI sleep needs - we have
initial_page_table for that.

Signed-off-by: Borislav Petkov <bp@alien8.de>
LKML-Reference: <20101020070526.GA9588@liondog.tnic>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 14:23:55 -07:00
H. Peter Anvin
d25e6b0b32 Merge branch 'x86/cleanups' into x86/trampoline 2010-10-20 14:22:45 -07:00
H. Peter Anvin
e44dea35cc Merge branch 'x86/vmware' into x86/trampoline 2010-10-20 13:18:17 -07:00
Borislav Petkov
f01f7c56a1 x86, mm: Fix incorrect data type in vmalloc_sync_all()
arch/x86/mm/fault.c: In function 'vmalloc_sync_all':
arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast

introduced by 617d34d9e5.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20101020103642.GA3135@kryptos.osrc.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 12:54:04 -07:00
Robert Richter
27afdf2008 apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets
We want the BIOS to setup the EILVT APIC registers. The offsets
were hardcoded and BIOS settings were overwritten by the OS.
Now, the subsystems for MCE threshold and IBS determine the LVT
offset from the registers the BIOS has setup. If the BIOS setup
is buggy on a family 10h system, a workaround enables IBS. If
the OS determines an invalid register setup, a "[Firmware Bug]:
" error message is reported.

We need this change also for upcomming cpu families.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1286360874-1471-3-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-20 04:42:13 +02:00
Robert Richter
a68c439b19 apic, x86: Check if EILVT APIC registers are available (AMD only)
This patch implements checks for the availability of LVT entries
(APIC500-530) and reserves it if used. The check becomes
necessary since we want to let the BIOS provide the LVT offsets.
 The offsets should be determined by the subsystems using it
like those for MCE threshold or IBS.  On K8 only offset 0
(APIC500) and MCE interrupts are supported. Beginning with
family 10h at least 4 offsets are available.

Since offsets must be consistent for all cores, we keep track of
the LVT offsets in software and reserve the offset for the same
vector also to be used on other cores. An offset is freed by
setting the entry to APIC_EILVT_MASKED.

If the BIOS is right, there should be no conflicts. Otherwise a
"[Firmware Bug]: ..." error message is generated. However, if
software does not properly determines the offsets, it is not
necessarily a BIOS bug.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1286360874-1471-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-20 04:42:13 +02:00
Ingo Molnar
14d4962dc8 Merge branch 'linus' into irq/core
Merge reason: update to almost-final-.36

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-20 04:38:59 +02:00
Jan Beulich
3234282f33 x86, asm: Fix CFI macro invocations to deal with shortcomings in gas
gas prior to (perhaps) 2.16.90 has problems with passing non-
parenthesized expressions containing spaces to macros. Spaces, however,
get inserted by cpp between any macro expanding to a number and a
subsequent + or -. For the +, current x86 gas then removes the space
again (future gas may not do so), but for the - the space gets retained
and is then considered a separator between macro arguments.

Fix the respective definitions for both the - and + cases, so that they
neither contain spaces nor make cpp insert any (the latter by adding
seemingly redundant parentheses).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4CBDBEBA020000780001E05A@vpn.id2.novell.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-19 14:28:02 -07:00
Jeremy Fitzhardinge
617d34d9e5 x86, mm: Hold mm->page_table_lock while doing vmalloc_sync
Take mm->page_table_lock while syncing the vmalloc region.  This prevents
a race with the Xen pagetable pin/unpin code, which expects that the
page_table_lock is already held.  If this race occurs, then Xen can see
an inconsistent page type (a page can either be read/write or a pagetable
page, and pin/unpin converts it between them), which will cause either
the pin or the set_p[gm]d to fail; either will crash the kernel.

vmalloc_sync_all() should be called rarely, so this extra use of
page_table_lock should not interfere with its normal users.

The mm pointer is stashed in the pgd page's index field, as that won't
be otherwise used for pgds.

Reported-by: Ian Campbell <ian.cambell@eu.citrix.com>
Originally-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4CB88A4C.1080305@goop.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-19 13:57:08 -07:00
Jeremy Fitzhardinge
44235dcde4 x86, mm: Fix bogus whitespace in sync_global_pgds()
Whitespace cleanup only.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-19 13:56:03 -07:00
Avi Kivity
9581d442b9 KVM: Fix fs/gs reload oops with invalid ldt
kvm reloads the host's fs and gs blindly, however the underlying segment
descriptors may be invalid due to the user modifying the ldt after loading
them.

Fix by using the safe accessors (loadsegment() and load_gs_index()) instead
of home grown unsafe versions.

This is CVE-2010-3698.

KVM-Stable-Tag.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-19 14:21:45 -02:00
Yinghai Lu
9717967c4b x86: ioapic: Call free_irte only if interrupt remapping enabled
On a system that support intr-rempping when booting with "intremap=off"

[  177.895501] BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
[  177.913316] IP: [<ffffffff8145fc18>] free_irte+0x47/0xc0
...
[  178.173326] Call Trace:
[  178.173574]  [<ffffffff810515b4>] destroy_irq+0x3a/0x75
[  178.192934]  [<ffffffff81051834>] arch_teardown_msi_irq+0xe/0x10
[  178.193418]  [<ffffffff81458dc3>] arch_teardown_msi_irqs+0x56/0x7f
[  178.213021]  [<ffffffff81458e79>] free_msi_irqs+0x8d/0xeb

Call free_irte only when interrupt remapping is enabled.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CBCB274.7010108@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-19 09:25:33 +02:00
Venkatesh Pallipadi
e82b8e4ea4 x86: Add IRQ_TIME_ACCOUNTING
This patch adds IRQ_TIME_ACCOUNTING option on x86 and runtime enables it
when TSC is enabled.

This change just enables fine grained irq time accounting, isn't used yet.
Following patches use it for different purposes.

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-6-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18 20:52:25 +02:00
Peter Zijlstra
e360adbe29 irq_work: Add generic hardirq context callbacks
Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.

Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.

The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.

Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[ various fixes ]
Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18 19:58:50 +02:00
Stephane Eranian
ba0cef3d14 perf_events: Fix bogus AMD64 generic TLB events
PERF_COUNT_HW_CACHE_DTLB:READ:MISS had a bogus umask value of 0 which
counts nothing. Needed to be 0x7 (to count all possibilities).

PERF_COUNT_HW_CACHE_ITLB:READ:MISS had a bogus umask value of 0 which
counts nothing. Needed to be 0x3 (to count all possibilities).

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: <stable@kernel.org> # as far back as it applies
LKML-Reference: <4cb85478.41e9d80a.44e2.3f00@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18 19:58:48 +02:00
Konrad Rzeszutek Wilk
74226b8c8a xen/pci: Request ACS when Xen-SWIOTLB is activated.
It used to done in the Xen startup code but that is not really
appropiate.

[v2: Update Kconfig with PCI requirement]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-18 10:49:38 -04:00
Alex Nixon
b5401a96b5 xen/x86/PCI: Add support for the Xen PCI subsystem
The frontend stub lives in arch/x86/pci/xen.c, alongside other
sub-arch PCI init code (e.g. olpc.c).

It provides a mechanism for Xen PCI frontend to setup/destroy
legacy interrupts, MSI/MSI-X, and PCI configuration operations.

[ Impact: add core of Xen PCI support ]
[ v2: Removed the IOMMU code and only focusing on PCI.]
[ v3: removed usage of pci_scan_all_fns as that does not exist]
[ v4: introduced pci_xen value to fix compile warnings]
[ v5: squished fixes+features in one patch, changed Reviewed-by to Ccs]
[ v7: added Acked-by]
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Qing He <qing.he@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
2010-10-18 10:49:35 -04:00
Stefano Stabellini
294ee6f89c x86: Introduce x86_msi_ops
Introduce an x86 specific indirect mechanism to setup MSIs.
The MSI setup functions become function pointers in an x86_msi_ops
struct, that defaults to the implementation in io_apic.c and msi.c.

[v2: Use HAVE_DEFAULT_* knobs]
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-18 10:49:34 -04:00
Jeremy Fitzhardinge
5ee01f49c9 x86/PCI: make sure _PAGE_IOMAP it set on pci mappings
When mapping pci space via /sys or /proc, make sure we're really
doing a hardware mapping by setting _PAGE_IOMAP.

[ Impact: bugfix; make PCI mappings map the right pages ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: x86@kernel.org
2010-10-18 10:49:31 -04:00
Alex Nixon
44de3395a4 x86/PCI: Clean up pci_cache_line_size
Separate out x86 cache_line_size initialisation code into its own
function (so it can be shared by Xen later in this patch series)

[ Impact: cleanup ]

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: x86@kernel.org
2010-10-18 10:49:30 -04:00
Jeremy Fitzhardinge
7b586d7185 x86/io_apic: add get_nr_irqs_gsi()
Impact: new interface to get max GSI

Add get_nr_irqs_gsi() to return nr_irqs_gsi.  Xen will use this to
determine how many irqs it needs to reserve for hardware irqs.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-18 10:40:30 -04:00
Jeremy Fitzhardinge
d8e0420603 xen: define BIOVEC_PHYS_MERGEABLE()
Impact: allow Xen control of bio merging

When running in Xen domain with device access, we need to make sure
the block subsystem doesn't merge requests across pages which aren't
machine physically contiguous.  To do this, we define our own
BIOVEC_PHYS_MERGEABLE.  When CONFIG_XEN isn't enabled, or we're not
running in a Xen domain, this has identical behaviour to the normal
implementation.  When running under Xen, we also make sure the
underlying machine pages are the same or adjacent.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-18 10:40:28 -04:00
Alex Nixon
23ace955c2 xen: Don't disable the I/O space
If a guest domain wants to access PCI devices through the frontend
driver (coming later in the patch series), it will need access to the
I/O space.

[ Impact: Allow for domU IO access, preparing for pci passthrough ]

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-18 10:40:27 -04:00
Justin P. Mattock
50a23e6eec Update broken web addresses in arch directory.
The patch below updates broken web addresses in the arch directory.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-18 11:03:21 +02:00
Bjorn Helgaas
1ca98fa652 x86/PCI: MMCONFIG: fix region end calculation
The end of an MMCONFIG region depends on the ending bus number, not on the
number of buses the region covers.  We previously computed the wrong ending
address whenever the starting bus number was non-zero, e.g.,:

  MMCONFIG for [bus 00-1f] at [mem 0xe0000000-0xe1ffffff] (base 0xe0000000)
  MMCONFIG for [bus 20-3f] at [mem 0xe2000000-0xe1ffffff] (base 0xe0000000)

The correct regions are:

  MMCONFIG for [bus 00-1f] at [mem 0xe0000000-0xe1ffffff] (base 0xe0000000)
  MMCONFIG for [bus 20-3f] at [mem 0xe2000000-0xe3ffffff] (base 0xe0000000)

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-17 20:03:07 -07:00
Seth Heasley
cb04e95bdd PCI: update Intel chipset names and defines
This patch updates the defines for Intel devices in
include/linux/pci_ids.h, referenced in arch/x86/pci/irq.c and
drivers/i2c/busses/i2c-i801.c, reflecting approved legal branding, and
using fuller code-names for products under development.

Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-17 20:03:04 -07:00
Seth Heasley
25143fd127 x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs
This patch adds the LPC Controller DeviceIDs for the Intel Patsburg PCH.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-15 13:09:52 -07:00
Daniel Drake
80e7b19ae1 PCI: OLPC: Only enable PCI configuration type override on XO-1
This configuration type override is for XO-1 only and must not happen
on XO-1.5.

Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-15 13:09:51 -07:00
Thomas Gleixner
40ffa93791 x86: Remove stale pmtimer_64.c
This file is unused since the apic unification in 2.6.29, but nobody
noticed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-15 21:18:59 +02:00
Thomas Gleixner
940b3c7b19 x86: sfi: Make local functions static
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Len Brown <lenb@kernel.org>
2010-10-15 19:37:39 +02:00
Arnd Bergmann
6038f373a3 llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time.  Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
//   but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
   *off = E
|
   *off += E
|
   func(..., off, ...)
|
   E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
  *off = E
|
  *off += E
|
  func(..., off, ...)
|
  E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
 ...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
 .llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
 .read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
 .write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
 .open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
...  .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
...  .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
...  .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+	.llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
 .write = write_f,
 .read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-10-15 15:53:27 +02:00
Robert Richter
b47fad3bfb oprofile, x86: Add support for IBS periodic op counter extension
The count value for IBS op sampling has been extended by 7 bits. The
feature is reflected in bit 6 (OpCntExt) of the IBS capability
register (CPUID Fn8000_001B_EAX).

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:43 +02:00
Robert Richter
25da695047 oprofile, x86: Add support for IBS branch target address reporting
This patch adds support for IBS branch target address reporting. A new
MSR (MSRC001_103B IBS Branch Target Address) has been added that
provides the logical address in canonical form for the branch
target. The size of the IBS sample that is transferred to the userland
has been increased.

For backward compatibility, the userland daemon must explicit enable
the feature by writing to the oprofilefs file

 ibs_op/branch_target

After enabling branch target address reporting, the userland daemon
must handle the extended size of the IBS sample.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:42 +02:00
Robert Richter
53b39e9480 oprofile, x86: Introduce struct ibs_state
This patch introduces struct ibs_state that will extended by additinal
members in follow-on patches.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:41 +02:00
Robert Richter
fc889aa23f oprofile, x86: Remove duplicate check for IBS_CAPS_OPCNT
Since oprofile is setting up ibs_op/dispatched_ops in the fs only if
the feature is available, its corresponding variable
ibs_config.dispatched_ops is only set, if the feature is
available. Thus the check is duplicate and can be removed.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:41 +02:00
Robert Richter
4ac945f002 oprofile, x86: Check IBS capability bits 1 and 2
There are IBS CPUID feature flags in CPUID Fn8000_001B to detect if
the cpu supports IBS fetch sampling (FetchSam) and/or IBS execution
sampling (OpSam). This patch adds checks if the both features are
available.

Spec:

 http://support.amd.com/us/Processor_TechDocs/31116.pdf

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:40 +02:00
Robert Richter
e63414740e oprofile, x86: Add support for AMD family 14h
This patch adds support for AMD family 14h (Ontario/Zacate) cpus.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:39 +02:00
Robert Richter
3acbf0849b oprofile, x86: Add support for AMD family 12h
This patch adds support for AMD family 12h (Llano) cpus.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:39 +02:00
Robert Richter
6268464b37 Merge remote branch 'tip/perf/core' into oprofile/core
Conflicts:
	arch/arm/oprofile/common.c
	kernel/perf_event.c
2010-10-15 12:45:00 +02:00
Randy Dunlap
9e9006e909 x86, olpc: XO-1 uses/depends on PCI
olpc-xo1 uses pci_*() interfaces so it should depend on PCI.

Otherwise we get build failure like:

 arch/x86/kernel/olpc-xo1.c:65: error: implicit declaration of function 'pci_enable_device_io'
 arch/x86/kernel/olpc-xo1.c:71: error: implicit declaration of function 'pci_request_region'
 arch/x86/kernel/olpc-xo1.c:80: error: implicit declaration of function 'pci_release_region'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Daniel Drake <dsd@laptop.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20101014101313.adf7eb2a.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-15 09:41:38 +02:00
Ingo Molnar
0fdf13606b Merge branch 'tip/perf/recordmcount-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-10-15 06:12:28 +02:00
Steven Rostedt
cf4db2597a ftrace: Rename config option HAVE_C_MCOUNT_RECORD to HAVE_C_RECORDMCOUNT
The config option used by archs to let the build system know that
the C version of the recordmcount works for said arch is currently
called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To
be more consistent with the name that all archs may use, it has been
renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since
we are building a C recordmcount and not a mcount_record.

Suggested-by: Ingo Molnar <mingo@elte.hu>
Cc: <linux-arch@vger.kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-14 23:32:44 -04:00
Ingo Molnar
d9d572a9c0 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-10-15 05:12:45 +02:00
Steven Rostedt
72441cb1fd ftrace/x86: Add support for C version of recordmcount
This patch adds the support for the C version of recordmcount and
compile times show ~ 12% improvement.

After verifying this works, other archs can add:

 HAVE_C_MCOUNT_RECORD

in its Kconfig and it will use the C version of recordmcount
instead of the perl version.

Cc: <linux-arch@vger.kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-14 16:52:41 -04:00
Frederic Weisbecker
ebc8827f75 x86: Barf when vmalloc and kmemcheck faults happen in NMI
In x86, faults exit by executing the iret instruction, which then
reenables NMIs if we faulted in NMI context. Then if a fault
happens in NMI, another NMI can nest after the fault exits.

But we don't yet support nested NMIs because we have only one NMI
stack. To prevent from that, check that vmalloc and kmemcheck
faults don't happen in this context. Most of the other kernel faults
in NMIs can be more easily spotted by finding explicit
copy_from,to_user() calls on review.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
2010-10-14 20:43:36 +02:00
Linus Torvalds
0eead9ab41 Don't dump task struct in a.out core-dumps
akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code).  Just remove it.

Also do the access_ok() check on dump_write().  It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...

[ I suspect that we should possibly do "vfs_write()" instead of
  calling ->write directly.  That also does the whole fsnotify and write
  statistics thing, which may or may not be a good idea. ]

And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)

Reported-by: akiphie <akiphie@lavabit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-14 10:57:40 -07:00
Jeremy Fitzhardinge
67e87f0a1c x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S
head_64.S maps up to 512 MiB, but that is not necessarity true for
other entry paths, such as Xen.

Thus, co-locate the setting of max_pfn_mapped with the code to
actually set up the page tables in head_64.S.  The 32-bit code is
already so co-located.  (The Xen code already sets max_pfn_mapped
correctly for its own use case.)

-v2:

 Yinghai fixed the following bug in this patch:

 |
 | max_pfn_mapped is in .bss section, so we need to set that
 | after bss get cleared. Without that we crash on bootup.
 |
 | That is safe because Xen does not call x86_64_start_kernel().
 |

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Fixed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <4CB6AB24.9020504@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-14 09:06:49 +02:00
Masami Hiramatsu
3cba11d32b kconfig/x86: Add HAVE_TEXT_POKE_SMP config for stop_machine dependency
Since the text_poke_smp() definately depends on actual
stop_machine() on smp, add that dependency to Kconfig.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <20101014031042.4100.90877.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-14 08:55:29 +02:00
Masami Hiramatsu
3caa37519c x86: Use __stop_machine() in text_poke_smp()
Use __stop_machine() in text_poke_smp() because the caller
must get online_cpus before calling text_poke_smp(), but
stop_machine() do it again. We don't need it.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <20101014031036.4100.83989.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-14 08:55:28 +02:00
Randy Dunlap
03f1a17cd5 x86/vsmp: Eliminate kconfig dependency warning
Fix kconfig dependency warning to satisfy dependencies:

warning: (X86_VSMP && X86_64 && PCI && X86_EXTENDED_PLATFORM ||
XEN && PARAVIRT_GUEST && (X86_64 || X86_32 && X86_PAE && !X86_VISWS) && X86_CMPXCHG && X86_TSC || KVM_CLOCK && PARAVIRT_GUEST || KVM_GUEST && PARAVIRT_GUEST || LGUEST_GUEST && PARAVIRT_GUEST && X86_32) selects PARAVIRT which has unmet direct dependencies (PARAVIRT_GUEST)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
LKML-Reference: <20101013210023.9a033222.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-14 07:04:30 +02:00
Linus Torvalds
509d4486bd Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, numa: For each node, register the memory blocks actually used
  x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order
  x86, mce, therm_throt.c: Fix missing curly braces in error handling logic
2010-10-13 16:34:23 -07:00
Jeremy Fitzhardinge
fef5ba7979 xen: Cope with unmapped pages when initializing kernel pagetable
Xen requires that all pages containing pagetable entries to be mapped
read-only.  If pages used for the initial pagetable are already mapped
then we can change the mapping to RO.  However, if they are initially
unmapped, we need to make sure that when they are later mapped, they
are also mapped RO.

We do this by knowing that the kernel pagetable memory is pre-allocated
in the range e820_table_start - e820_table_end, so any pfn within this
range should be mapped read-only.  However, the pagetable setup code
early_ioremaps the pages to write their entries, so we must make sure
that mappings created in the early_ioremap fixmap area are mapped RW.
(Those mappings are removed before the pages are presented to Xen
as pagetable pages.)

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
LKML-Reference: <4CB63A80.8060702@goop.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-13 16:07:13 -07:00
H. Peter Anvin
d7acb92fea x86-64, asm: If the assembler supports fxsave64, use it
Kbuild allows for us to probe for the existence of specific constructs
in the assembler, use them to find out if we can use fxsave64 and
permit the compiler to generate better code.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-13 16:00:29 -07:00
Daniel Drake
447b1d43de x86, olpc: Register XO-1 platform devices
The upcoming XO-1 rfkill driver (for drivers/platform/x86) will register
itself with the name "xo1-rfkill", and the already-merged XO-1 poweroff
code uses name "olpc-xo1"

Add the necessary mechanics so that these devices are properly
initialized on XO-1 laptops.

Signed-off-by: Daniel Drake <dsd@laptop.org>
LKML-Reference: <20101013181042.90C8F9D401B@zog.reactivated.net>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-13 11:51:25 -07:00
Ingo Molnar
3d8a1a6a8a Merge branch 'amd-iommu/2.6.37' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2010-10-13 15:44:24 +02:00
Joerg Roedel
5d0d71569e x86/amd-iommu: Update copyright headers
This patch updates the copyright headers in all source files
of the AMD IOMMU driver.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-10-13 11:13:21 +02:00
Matthew Garrett
5bcd757f93 x86/amd-iommu: Reenable AMD IOMMU if it's mysteriously vanished over suspend
AMD's reference BIOS code had a bug that could result in the
firmware failing to reenable the iommu on resume. It
transpires that this causes certain less than desirable
behaviour when it comes to PCI accesses, to whit them ending
up somewhere near Bristol when the more desirable outcome
was Edinburgh. Sadness ensues, perhaps along with filesystem
corruption.  Let's make sure that it gets turned back on,
and that we restore its configuration so decisions it makes
bear some resemblance to those made by reasonable people
rather than crack-addled lemurs who spent all your DMA on
Thunderbird.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-10-13 11:11:46 +02:00
Daniel Drake
bf1ebf0079 x86, olpc: Add XO-1 poweroff support
Add a pm_power_off handler for the OLPC XO-1 laptop.

The driver can be built modular and follows the behaviour of the
APM driver, setting pm_power_off to NULL on unload. However, the
ability to unload the module will probably be removed (with a simple
__module_get(THIS_MODULE)) if/when XO-1 suspend/resume support is
added to this file at a later date.

Signed-off-by: Daniel Drake <dsd@laptop.org>
LKML-Reference: <20101010094032.9AE669D401B@zog.reactivated.net>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-12 17:31:15 -07:00
Thomas Gleixner
2ee3906598 x86: Switch sparse_irq allocations to GFP_KERNEL
No callers from atomic context (except boot) anymore.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:46 +02:00
Thomas Gleixner
c2f31c37b7 x86: lguest: Use new irq allocator
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2010-10-12 16:53:45 +02:00
Thomas Gleixner
ad9f43340f x86: Use sane enumeration
Instead of looping through all interrupts, use the bitmap lookup to
find the next.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:44 +02:00
Thomas Gleixner
48b2650196 x86: uv: Clean up the direct access to irq_desc
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:43 +02:00
Thomas Gleixner
1a8ce7ff68 x86: Make io_apic.c local functions static
No users outside of io_apic.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:43 +02:00
Thomas Gleixner
1a0730d664 x86: Speed up the irq_remapped check in hot pathes
irq_2_iommu is in struct irq_cfg, so we can do the irq_remapped check
based on irq_cfg instead of going through a lookup function. That's
especially interesting in the eoi_ioapic_irq() hotpath.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:42 +02:00
Thomas Gleixner
423f085952 x86: Embedd irq_2_iommu into irq_cfg
That interrupt remapping code is x86 specific and tied to the io_apic
code. No need for separate allocator functions in the interrupt
remapping code. This allows to simplify the code and irq_2_iommu is
small (13 bytes on 64bit) so it's not a real problem even if interrupt
remapping is runtime disabled. If it's compile time disabled the
impact is zero.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:41 +02:00
Thomas Gleixner
bc5fdf9f3a x86: io_apic: Remove the now unused sparse_irq arch_* functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:40 +02:00
Thomas Gleixner
fbc6bff04a x86: ioapic: Cleanup sparse irq code
Switch over to the new allocator and remove all the magic which was
caused by the unability to destroy irq descriptors. Get rid of the
create_irq_nr() loop for sparse and non sparse irq.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:40 +02:00
Yinghai Lu
fe6dab4e79 x86: Don't setup ioapic irq for sci twice
The sparseirq rework triggered a warning in the iommu code, which was
caused by setting up ioapic for ACPI irq 9 twice. This function is
solely to handle interrupts which are on a secondary ioapic and
outside the legacy irq range.

Replace the sparse irq_to_desc check with a non ifdeffed version.

[ tglx: Moved it before the ioapic sparse conversion and simplified
  	the inverse logic ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB00122.3030301@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:40 +02:00
Thomas Gleixner
f981a3dc19 x86: io_apic: Prepare alloc/free_irq_cfg()
Rename the grossly misnamed get_one_free_irq_cfg() to alloc_irq_cfg().
Add a (not yet used) irq number argument to free_irq_cfg()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:40 +02:00
Thomas Gleixner
08c33db6d0 x86: Implement new allocator functions
Implement new allocator functions which make use of the core changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:40 +02:00
Thomas Gleixner
6e2fff50a5 x86: ioapic: Cleanup get_one_free_irq_cfg()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:39 +02:00
Thomas Gleixner
7e495529b6 x86: ioapic: Cleanup some more
Cleanup after the irq_chip conversion a bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:39 +02:00
Thomas Gleixner
be5b7bf738 x86: Convert ht set_affinity to new chip function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:39 +02:00
Thomas Gleixner
0e09ddf2d7 x86: Cleanup hpet affinity setting
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:39 +02:00
Thomas Gleixner
fe52b2d259 x86: Convert dmar affinity setting to new chip function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
2010-10-12 16:53:39 +02:00
Thomas Gleixner
b5d1c46579 x86: Convert remapped msi to new chip.irq_set_affinity function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:38 +02:00
Thomas Gleixner
f19f5ecc92 x86: Convert remapped ioapic affinity setting to new irq chip function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
2010-10-12 16:53:38 +02:00
Thomas Gleixner
5346b2a78f x86: Convert msi affinity setting to new chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:38 +02:00
Thomas Gleixner
f7e909eae4 x86: Prepare the affinity common functions for taking struct irq_data *
While at it rename it to sensible function names and fix the return
value from unsigned to int for __ioapic_set_affinity (set_desc_affinity).
Returning -1 in a function returning unsigned int is somewhat strange.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:38 +02:00
Thomas Gleixner
60c69948e5 x86: ioapic: Clean up the direct access to irq_desc
Most of it is useless pseudo optimization.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:38 +02:00
Thomas Gleixner
e9f7ac664b ht: Convert to new irq_chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:37 +02:00
Thomas Gleixner
5c2837fbaa dmar: Convert to new irq chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <dwmw2@infradead.org>
2010-10-12 16:53:37 +02:00
Thomas Gleixner
d0fbca8f93 x86: ioapic/hpet: Convert to new chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:37 +02:00
Thomas Gleixner
90297c5fe7 x86: ioapic: Convert mask to new irq_chip function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:37 +02:00
Thomas Gleixner
61a38ce3f5 x86: io_apic: Convert startup to new irq_chip function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:37 +02:00
Thomas Gleixner
dd5f15e5cf x86: Cleanup io_apic
Sanitize functions. Remove irq_desc pointer magic.
Preparatory patch for further cleanups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:36 +02:00
Thomas Gleixner
d4eba29770 x86: Cleanup access to irq_data
Fixup the open coded access to 
      irq_desc->[handler_data|chip_data|msi-desc]

Use the macros and inline functions for it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:36 +02:00
Thomas Gleixner
4305df947c x86: i8259: Convert to new irq_chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:36 +02:00
Thomas Gleixner
020dd984d7 x86: Cleanup visws interrupt handling
Remove the open coded access to irq_desc and convert to the new irq
chip functions. Change the mask function of piix4_virtual_irq_type so
we can use the generic irq handling function for the virtual interrupt
instead of open coding it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:35 +02:00
Thomas Gleixner
fe25c7fc2e x86: lguest: Convert to new irq chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2010-10-12 16:53:35 +02:00
Thomas Gleixner
a5ef2e7040 x86: Sanitize apb timer interrupt handling
Disable the interrupt in CPU_DEAD where it belongs. Remove the
open coded irq_desc manipulation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
2010-10-12 16:53:35 +02:00
Thomas Gleixner
a3c08e5d80 x86: Convert irq_chip access to new functions
Before moving the irq chips to the new functions, fixup direct callers.

The cpu offline irq fixup code needs to become generic and archs need
to honour the "force" flag as an indicator, but that's for later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:35 +02:00
Thomas Gleixner
011d578fda x86: Remove useless reinitialization of irq descriptors
The descriptors are already initialized in exactly this way.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:34 +02:00
Thomas Gleixner
39431acb1a pci: Cleanup the irq_desc mess in msi
Handing down irq_desc to msi just so that msi can access
irq_desc.irq_data.msi_desc is a pretty stupid idea. The calling code
can hand down a pointer to msi_desc so msi code does not need to know
about the irq descriptor at all.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:34 +02:00
Thomas Gleixner
1c9db52534 pci: Convert msi to new irq_chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
2010-10-12 16:53:34 +02:00
Thomas Gleixner
7c5f13519a Merge branch 'x86/urgent' of into irq/sparseirq
Reason: Pull in the latest io_apic bugfixes

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-12 16:41:26 +02:00
Thomas Gleixner
5e62feabcc Merge branch 'x86/cleanups' into irq/sparseirq
Reason: Avoid conflicts with removal of boot_cpu_id

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-12 16:40:42 +02:00
Thomas Gleixner
8ffcfa4e2d Merge branch 'x86/x2apic' into irq/sparseirq
Reason: Avoid conflicts with the x2apic modifications

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-12 16:39:53 +02:00
Thomas Gleixner
b683de2b3c genirq: Query arch for number of early descriptors
sparse irq sets up NR_IRQS_LEGACY irq descriptors and archs then go
ahead and allocate more.

Use the unused return value of arch_probe_nr_irqs() to let the
architecture return the number of early allocations. Fix up all users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:39:08 +02:00
Michal Marek
239060b93b Merge branch 'kbuild/rc-fixes' into kbuild/kconfig
We need to revert the temporary hack in 71ebc01, hence the merge.
2010-10-12 15:09:06 +02:00
Zhang Rui
dab5fff14d acpi-cpufreq: fix a memleak when unloading driver
We didn't free per_cpu(acfreq_data, cpu)->freq_table
when acpi_freq driver is unloaded.

Resulting in the following messages in /sys/kernel/debug/kmemleak:

unreferenced object 0xf6450e80 (size 64):
  comm "modprobe", pid 1066, jiffies 4294677317 (age 19290.453s)
  hex dump (first 32 bytes):
    00 00 00 00 e8 a2 24 00 01 00 00 00 00 9f 24 00  ......$.......$.
    02 00 00 00 00 6a 18 00 03 00 00 00 00 35 0c 00  .....j.......5..
  backtrace:
    [<c123ba97>] kmemleak_alloc+0x27/0x50
    [<c109f96f>] __kmalloc+0xcf/0x110
    [<f9da97ee>] acpi_cpufreq_cpu_init+0x1ee/0x4e4 [acpi_cpufreq]
    [<c11cd8d2>] cpufreq_add_dev+0x142/0x3a0
    [<c11920b7>] sysdev_driver_register+0x97/0x110
    [<c11cce56>] cpufreq_register_driver+0x86/0x140
    [<f9dad080>] 0xf9dad080
    [<c1001130>] do_one_initcall+0x30/0x160
    [<c10626e9>] sys_init_module+0x99/0x1e0
    [<c1002d97>] sysenter_do_call+0x12/0x26
    [<ffffffff>] 0xffffffff

https://bugzilla.kernel.org/show_bug.cgi?id=15807#c21

Tested-by: Toralf Forster <toralf.foerster@gmx.de>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-12 00:58:28 -04:00
H. Peter Anvin
8e4029ee35 Merge branch 'x86/urgent' into core/memblock
Reason for merge:

Forward-port urgent change to arch/x86/mm/srat_64.c to the memblock tree.

Resolved Conflicts:
	arch/x86/mm/srat_64.c

Originally-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-11 17:05:11 -07:00
Nikanth Karthikesan
50f2d7f682 x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA
commit d9c2d5ac6a "x86, numa: Use near(er)
online node instead of roundrobin for NUMA" changed NUMA initialization on
Intel to choose the nearest online node or first node.  Fake NUMA would be
better of with round-robin initialization, instead of the all CPUS on
first node.  Change the choice of first node, back to round-robin.

For testing NUMA kernel behaviour without cpusets and NUMA aware
applications, it would be better to have cpus in different nodes, rather
than all in a single node.  With cpusets migration of tasks scenarios
cannot not be tested.

I guess having it round-robin shouldn't affect the use cases for all cpus
on the first node.

The code comments in arch/x86/mm/numa_64.c:759 indicate that this used to
be the case, which was changed by commit d9c2d5ac6.  It changed from
roundrobin to nearer or first node.  And I couldn't find any reason for
this change in its changelog.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2010-10-11 16:16:56 -07:00
Jeremy Fitzhardinge
236260b90d memblock: Allow memblock_init to be called early
The Xen setup code needs to call memblock_x86_reserve_range() very early,
so allow it to initialize the memblock subsystem before doing so.  The
second memblock_init() is ignored.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <4CACFDAD.3090900@goop.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-11 15:59:01 -07:00
Yinghai Lu
73cf624d02 x86, numa: For each node, register the memory blocks actually used
Russ reported SGI UV is broken recently. He said:

| The SRAT table shows that memory range is spread over two nodes.
|
| SRAT: Node 0 PXM 0 100000000-800000000
| SRAT: Node 1 PXM 1 800000000-1000000000
| SRAT: Node 0 PXM 0 1000000000-1080000000
|
|Previously, the kernel early_node_map[] would show three entries
|with the proper node.
|
|[    0.000000]     0: 0x00100000 -> 0x00800000
|[    0.000000]     1: 0x00800000 -> 0x01000000
|[    0.000000]     0: 0x01000000 -> 0x01080000
|
|The problem is recent community kernel early_node_map[] shows
|only two entries with the node 0 entry overlapping the node 1
|entry.
|
|    0: 0x00100000 -> 0x01080000
|    1: 0x00800000 -> 0x01000000

After looking at the changelog, Found out that it has been broken for a while by
following commit

|commit 8716273cae
|Author: David Rientjes <rientjes@google.com>
|Date:   Fri Sep 25 15:20:04 2009 -0700
|
|    x86: Export srat physical topology

Before that commit, register_active_regions() is called for every SRAT memory
entry right away.

Use nodememblk_range[] instead of nodes[] in order to make sure we
capture the actual memory blocks registered with each node.  nodes[]
contains an extended range which spans all memory regions associated
with a node, but that does not mean that all the memory in between are
included.

Reported-by: Russ Anderson <rja@sgi.com>
Tested-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB27BDF.5000800@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org> 2.6.33 .34 .35 .36
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-11 15:26:15 -07:00
Zachary Amsden
47008cd887 KVM: x86: Move TSC reset out of vmcb_init
The VMCB is reset whenever we receive a startup IPI, so Linux is setting
TSC back to zero happens very late in the boot process and destabilizing
the TSC.  Instead, just set TSC to zero once at VCPU creation time.

Why the separate patch?  So git-bisect is your friend.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-11 12:36:07 +02:00
Zachary Amsden
58877679fd KVM: x86: Fix SVM VMCB reset
On reset, VMCB TSC should be set to zero.  Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-11 12:36:07 +02:00
Borislav Petkov
6dcbfe4f0b x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order
This fixes possible cases of not collecting valid error info in
the MCE error thresholding groups on F10h hardware.

The current code contains a subtle problem of checking only the
Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM
thresholding group) in its first iteration and breaking out if
the bit is cleared.

But (!), this MSR contains an offset value, BlkPtr[31:24], which
points to the remaining MSRs in this thresholding group which
might contain valid information too. But if we bail out only
after we checked the valid bit in the first MSR and not the
block pointer too, we miss that other information.

The thing is, MC4_MISC0[BlkPtr] is not predicated on
MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked
prior to iterating over the MCI_MISCj thresholding group,
irrespective of the MC4_MISC0[Valid] setting.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-11 11:04:36 +02:00
Akinobu Mita
708ff2a009 bitops: make asm-generic/bitops/find.h more generic
asm-generic/bitops/find.h has the extern declarations of find_next_bit()
and find_next_zero_bit() and the macro definitions of find_first_bit()
and find_first_zero_bit(). It is only usable by the architectures which
enables CONFIG_GENERIC_FIND_NEXT_BIT and disables
CONFIG_GENERIC_FIND_FIRST_BIT.

x86 and tile enable both CONFIG_GENERIC_FIND_NEXT_BIT and
CONFIG_GENERIC_FIND_FIRST_BIT. These architectures cannot include
asm-generic/bitops/find.h in their asm/bitops.h. So ifdefed extern
declarations of find_first_bit and find_first_zero_bit() are put in
linux/bitops.h.

This makes asm-generic/bitops/find.h usable by these architectures
and use it. Also this change is needed for the forthcoming duplicated
extern declarations cleanup.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Chris Metcalf <cmetcalf@tilera.com>
2010-10-09 21:51:44 +02:00
Konrad Rzeszutek Wilk
6e96366933 x86, iommu: Update header comments with appropriate naming
The header comments diverged a bit from the implementation. Lets
re-sync them.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1286564028-2352-3-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-08 13:11:21 -07:00
Ingo Molnar
7cd2541cf2 Merge commit 'v2.6.36-rc7' into perf/core
Conflicts:
	arch/x86/kernel/module.c

Merge reason: Resolve the conflict, pick up fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:46:27 +02:00
Jin Dongming
b62be8ea9d x86, mce, therm_throt.c: Fix missing curly braces in error handling logic
When the feature PTS is not supported by CPU, the sysfile
package_power_limit_count for package should not be
generated.

This patch is used for fixing missing { and }.

The patch is not complete as there are other error handling
problems in this function - but that can wait until the
merge window.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@initel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: Brown Len <len.brown@intel.com>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org>
LKML-Reference: <4C7625D1.4060201@np.css.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:29:20 +02:00
Paul Fox
286e5b97eb x86, olpc: Don't retry EC commands forever
Avoids a potential infinite loop.

It was observed once, during an EC hacking/debugging
session - not in regular operation.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: dilinger@queued.net
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:06:09 +02:00
Feng Tang
4d033556f1 x86, earlyprintk: Add hsu early console for Intel Medfield platform
Intel Medfield platform has a high speed UART device, which
could act as a early console. To enable early printk of HSU
console, simply add "earlyprintk=hsu" in kernel command line.

Currently we put the code in the early_printk_mrst.c as it is
also for Intel MID platforms like the mrst early console

Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: greg@kroah.com
LKML-Reference: <1284361736-23011-5-git-send-email-feng.tang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:01:47 +02:00
Feng Tang
c20b5c3318 x86, earlyprintk: Add earlyprintk for Intel Moorestown platform
Intel Moorestown platform has a spi-uart device(Maxim3110),
which connects to a Designware spi core controller. This patch
will add early console function based on it.

As it will be used long before Linux spi subsystem get
initialised, we simply directly manipulate the spi controller's
register to acheive the early console func. This is safe as it
will be disabled when devices subsytem get initialised.

To use it, user need enable CONFIG_X86_MRST_EARLY_PRINTK in
kenrel config and add "earlyprintk=mrst" in kernel command line.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: greg@kroah.com
LKML-Reference: <1284361736-23011-4-git-send-email-feng.tang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:01:47 +02:00
Feng Tang
5a47c7dae8 x86: Add two helper macros for fixed address mapping
Sometimes fixmap will be used to map an physical address which
is not PAGE align, so to use it we need first map it and then
add the address offset to the mapped fixed address. These 2 new
helpers are suggested by Ingo Molnar to make the process
simpler.

For a physicall address like "phys", a directly usable virtual
address can be get by
	virt = (void *)set_fixmap_offset(fixed_idx, phys);
or
	virt = (void *)set_fixmap_offset_nocache(fixed_idx, phys);
(depends on whether the physical address is cachable or not).

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: alan@linux.intel.com
Cc: greg@kroah.com
Cc: x86@kernel.org
LKML-Reference: <1284361736-23011-3-git-send-email-feng.tang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:01:46 +02:00
Andi Kleen
f672b49b07 x86: HWPOISON: Report correct address granuality for huge hwpoison faults
An earlier patch fixed the hwpoison fault handling to encode the
huge page size in the fault code of the page fault handler.

This is needed to report this information in SIGBUS to user space.

This is a straight forward patch to pass this information
through to the signal handling in the x86 specific fault.c

Cc: x86@kernel.org
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: fengguang.wu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:46 +02:00
Ingo Molnar
153db80f8c Merge commit 'v2.6.36-rc7' into core/memblock
Merge reason: Update from -rc3 to -rc7.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 09:15:00 +02:00
Zhao Yakui
68f4d5a00a x86, setup: Use string copy operation to optimze copy in kernel compression
The kernel decompression code parses the ELF header and then copies
the segment to the corresponding destination.  Currently it uses slow
byte-copy code.  This patch makes it use the string copy operations
instead.

In the test the copy performance can be improved very significantly after using
the string copy operation mechanism.
        1. The copy time can be reduced from 150ms to 20ms on one Atom machine
	2. The copy time can be reduced about 80% on another machine
		The time is reduced from 7ms to 1.5ms when using 32-bit kernel.
		The time is reduced from 10ms to 2ms when using 64-bit kernel.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
LKML-Reference: <1286502453-7043-1-git-send-email-yakui.zhao@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-07 21:23:09 -07:00
H. Peter Anvin
55572b293b x86, mrst: A function in a header file needs to be marked "inline"
A function in a header file needs to be explicitly marked "inline", or
gcc will complain if it is not used.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: <stable@kernel.org> v2.6.36
LKML-Reference: <1274295685-6774-3-git-send-email-jacob.jun.pan@linux.intel.com>
2010-10-07 16:45:18 -07:00
Namhyung Kim
a416e9e1dd x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation
On 32-bit non-PAE system, cast to 'phys_addr_t' truncates value
before subtraction. Subtracting before cast produce same result
but remove following warnings from sparse:

 arch/x86/include/asm/pgtable_types.h:255:38: warning: cast truncates bits from constant value (100000000 becomes 0)
 arch/x86/include/asm/pgtable_types.h:270:38: warning: cast truncates bits from constant value (100000000 becomes 0)
 arch/x86/include/asm/pgtable.h:127:32: warning: cast truncates bits from constant value (100000000 becomes 0)
 arch/x86/include/asm/pgtable.h:132:32: warning: cast truncates bits from constant value (100000000 becomes 0)
 arch/x86/include/asm/pgtable.h:344:31: warning: cast truncates bits from constant value (100000000 becomes 0)

64-bit or PAE machines will not be affected by this change.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
LKML-Reference: <1285770588-14065-1-git-send-email-namhyung@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-07 16:36:17 -07:00
David Howells
df9ee29270 Fix IRQ flag handling naming
Fix the IRQ flag handling naming.  In linux/irqflags.h under one configuration,
it maps:

	local_irq_enable() -> raw_local_irq_enable()
	local_irq_disable() -> raw_local_irq_disable()
	local_irq_save() -> raw_local_irq_save()
	...

and under the other configuration, it maps:

	raw_local_irq_enable() -> local_irq_enable()
	raw_local_irq_disable() -> local_irq_disable()
	raw_local_irq_save() -> local_irq_save()
	...

This is quite confusing.  There should be one set of names expected of the
arch, and this should be wrapped to give another set of names that are expected
by users of this facility.

Change this to have the arch provide:

	flags = arch_local_save_flags()
	flags = arch_local_irq_save()
	arch_local_irq_restore(flags)
	arch_local_irq_disable()
	arch_local_irq_enable()
	arch_irqs_disabled_flags(flags)
	arch_irqs_disabled()
	arch_safe_halt()

Then linux/irqflags.h wraps these to provide:

	raw_local_save_flags(flags)
	raw_local_irq_save(flags)
	raw_local_irq_restore(flags)
	raw_local_irq_disable()
	raw_local_irq_enable()
	raw_irqs_disabled_flags(flags)
	raw_irqs_disabled()
	raw_safe_halt()

with type checking on the flags 'arguments', and then wraps those to provide:

	local_save_flags(flags)
	local_irq_save(flags)
	local_irq_restore(flags)
	local_irq_disable()
	local_irq_enable()
	irqs_disabled_flags(flags)
	irqs_disabled()
	safe_halt()

with tracing included if enabled.

The arch functions can now all be inline functions rather than some of them
having to be macros.

Signed-off-by: David Howells <dhowells@redhat.com> [X86, FRV, MN10300]
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> [Tile]
Signed-off-by: Michal Simek <monstr@monstr.eu> [Microblaze]
Tested-by: Catalin Marinas <catalin.marinas@arm.com> [ARM]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> [AVR]
Acked-by: Tony Luck <tony.luck@intel.com> [IA-64]
Acked-by: Hirokazu Takata <takata@linux-m32r.org> [M32R]
Acked-by: Greg Ungerer <gerg@uclinux.org> [M68K/M68KNOMMU]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [MIPS]
Acked-by: Kyle McMartin <kyle@mcmartin.ca> [PA-RISC]
Acked-by: Paul Mackerras <paulus@samba.org> [PowerPC]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [S390]
Acked-by: Chen Liqin <liqin.chen@sunplusct.com> [Score]
Acked-by: Matt Fleming <matt@console-pimps.org> [SH]
Acked-by: David S. Miller <davem@davemloft.net> [Sparc]
Acked-by: Chris Zankel <chris@zankel.net> [Xtensa]
Reviewed-by: Richard Henderson <rth@twiddle.net> [Alpha]
Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp> [H8300]
Cc: starvik@axis.com [CRIS]
Cc: jesper.nilsson@axis.com [CRIS]
Cc: linux-cris-kernel@axis.com
2010-10-07 14:08:55 +01:00
Linus Torvalds
34984f54b7 Merge branch 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
  xen: do not initialize PV timers on HVM if !xen_have_vector_callback
  xen: do not set xenstored_ready before xenbus_probe on hvm
2010-10-06 09:51:28 -07:00
Jeremy Fitzhardinge
161b0275e2 x86, mm: Add RESERVE_BRK_ARRAY() helper
This is useful when converting static arrays into boot-time brk
allocated objects.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
LKML-Reference: <4C805EEA.1080205@goop.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-05 22:16:54 -07:00
Yinghai Lu
16c36f743b x86, memblock: Remove __memblock_x86_find_in_range_size()
Fold it into memblock_x86_find_in_range(), and change bad_addr_size()
to check_reserve_memblock().

So whole memblock_x86_find_in_range_size() code is more readable.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CAA4DEC.4000401@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-05 21:45:43 -07:00
Yinghai Lu
1d931264af x86-32, memblock: Make add_highpages honor early reserved ranges
Originally the only early reserved range that is overlapped with high
pages is "KVA RAM", but we already do remove that from the active ranges.

However, It turns out Xen could have that kind of overlapping to support memory
ballooning.x

So we need to make add_highpage_with_active_regions() to subtract
memblock reserved just like low ram; this is the proper design anyway.

In this patch, refactering get_freel_all_memory_range() to make it can
be used by add_highpage_with_active_regions().  Also we don't need to
remove "KVA RAM" from active ranges.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CABB183.1040607@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-05 21:44:35 -07:00
Yinghai Lu
9f4c13964b x86, memblock: Fix crashkernel allocation
Cai Qian found crashkernel is broken with the x86 memblock changes.

1. crashkernel=128M@32M always reported that range is used, even if
   the first kernel is small and does not usethat range

2. we always got following report when using "kexec -p"
	Could not find a free area of memory of a000 bytes...
	locate_hole failed

The root cause is that generic memblock_find_in_range() will try to
allocate from the top of the range, whereas the kexec code was written
assuming that allocation was always near the bottom and that it could
blindly extend memory upward.  Unfortunately the kexec code doesn't
have a system for requesting the range that it really needs, so this
is subject to probabilistic failures.

This patch hacks around the problem by limiting the target range
heuristically to below the traditional bzImage max range.  This number
is arbitrary and not always correct, and a much better result would be
obtained by having kexec communicate this number based on the kernel
header information and any appropriate command line options.

Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CABAF2A.5090501@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-05 21:43:14 -07:00
Linus Torvalds
39c12be86a Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf trace scripting: Fix extern struct definitions
  perf ui hist browser: Fix segfault on 'a' for annotate
  perf tools: Fix build breakage
  perf, x86: Handle in flight NMIs on P4 platform
  oprofile, ARM: Release resources on failure
  oprofile: Add Support for Intel CPU Family 6 / Model 29
2010-10-05 11:57:37 -07:00
Linus Torvalds
5336377d62 modules: Fix module_bug_list list corruption race
With all the recent module loading cleanups, we've minimized the code
that sits under module_mutex, fixing various deadlocks and making it
possible to do most of the module loading in parallel.

However, that whole conversion totally missed the rather obscure code
that adds a new module to the list for BUG() handling.  That code was
doubly obscure because (a) the code itself lives in lib/bugs.c (for
dubious reasons) and (b) it gets called from the architecture-specific
"module_finalize()" rather than from generic code.

Calling it from arch-specific code makes no sense what-so-ever to begin
with, and is now actively wrong since that code isn't protected by the
module loading lock any more.

So this commit moves the "module_bug_{finalize,cleanup}()" calls away
from the arch-specific code, and into the generic code - and in the
process protects it with the module_mutex so that the list operations
are now safe.

Future fixups:
 - move the module list handling code into kernel/module.c where it
   belongs.
 - get rid of 'module_bug_list' and just use the regular list of modules
   (called 'modules' - imagine that) that we already create and maintain
   for other reasons.

Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-05 11:29:27 -07:00
Stefano Stabellini
31e7e931cd xen: do not initialize PV timers on HVM if !xen_have_vector_callback
if !xen_have_vector_callback do not initialize PV timer unconditionally
because we still don't know how many cpus are available and if there is
more than one we won't be able to receive the timer interrupts on
cpu > 0.

This patch fixes an hang at boot when Xen does not support vector
callbacks and the guest has multiple vcpus.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
2010-10-05 13:39:23 +01:00
Andi Kleen
c62f981f93 perf, gcc-4.6: Fix set but unused variable
Just dead code I believe.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: andi@firstfloor.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-05 09:48:07 +02:00
Ingo Molnar
00e8976200 Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/util/ui/browsers/hists.c

Merge reason: fix the conflict and merge in changes for dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-05 09:47:14 +02:00
Borislav Petkov
366d4a43b1 x86, cpu: Fix X86_FEATURE_NOPL
ba0593bf55 cleared the aforementioned
cpuid bit only on 32-bit due to various problems with Virtual PC. This
somehow got lost during the 32- + 64-bit merge so restore the feature
bit on 64-bit. For that, set it explicitly for non-constant arguments of
cpu_has(). Update comment for future reference.

Signed-off-by: Borislav Petkov <bp@alien8.de>
LKML-Reference: <20101004073127.GA20305@liondog.tnic>
Cc: Ryan O'Neill <ryan@innosecc.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-04 11:22:24 -07:00
Linus Torvalds
5a4bbd01c8 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix memory leaks in pcc_cpufreq_do_osc
  [CPUFREQ] acpi-cpufreq: add missing __percpu markup
2010-10-04 11:14:21 -07:00
Thomas Gleixner
3bb9808e99 x86: Use genirq Kconfig
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100927121843.314600915@linutronix.de>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-04 11:01:15 +02:00
Andreas Herrmann
5c80cc78de x86, amd_nb: Enable GART support for AMD family 0x15 CPUs
AMD CPU family 0x15 still supports GART for compatibility reasons.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930124316.GG20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:32 -07:00
Andreas Herrmann
d4fbe4f035 x86, amd: Use compute unit information to determine thread siblings
This information is vital for different load balancing policies.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930124156.GF20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:32 -07:00
Andreas Herrmann
6057b4d331 x86, amd: Extract compute unit information for AMD CPUs
Get compute unit information from CPUID Fn8000_001E_EBX.
(See AMD CPUID Specification - publication # 25481, revision 2.34,
September 2010.)

Note that each core on a compute unit still has a core_id of its own.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930123857.GE20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:32 -07:00
Andreas Herrmann
23588c38a8 x86, amd: Add support for CPUID topology extension of AMD CPUs
Node information (ID, number of internal nodes) is provided via
CPUID Fn8000_001e_ECX.

See AMD CPUID Specification (Publication # 25481, Revision 2.34,
September 2010).

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930123628.GD20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:32 -07:00
Andreas Herrmann
420b13b60a x86, nmi: Support NMI watchdog on newer AMD CPU families
CPU families 0x12, 0x14 and 0x15 support this functionality.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930123357.GC20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:32 -07:00
Andreas Herrmann
3fdbf004c1 x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs
Instead of adapting the CPU family check in amd_special_default_mtrr()
for each new CPU family assume that all new AMD CPUs support the
necessary bits in SYS_CFG MSR.

Tom2Enabled is architectural (defined in APM Vol.2).
Tom2ForceMemTypeWB is defined in all BKDGs starting with K8 NPT.
In pre K8-NPT BKDG this bit is reserved (read as zero).

W/o this adaption Linux would unnecessarily complain about bad MTRR
settings on every new AMD CPU family, e.g.

[    0.000000] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 4863MB of RAM.

Cc: stable@kernel.org # .32.x, .35.x
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930123235.GB20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:31 -07:00
H. Peter Anvin
86ffb08519 Merge remote branch 'origin/x86/cpu' into x86/amd-nb 2010-10-01 16:18:11 -07:00
Linus Torvalds
f4a3330d76 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hpet: Fix bogus error check in hpet_assign_irq()
  x86, irq: Plug memory leak in sparse irq
  x86, cpu: After uncapping CPUID, re-run CPU feature detection
2010-10-01 15:02:41 -07:00
Robert Richter
5140434d5f oprofile, x86: Simplify init/exit functions
Now, that we only call the exit function if init succeeds with commit:

 979048e oprofile: don't call arch exit code from init code on failure

we can simplify the x86 init/exit functions too. Variable using_nmi
becomes obsolete.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-01 17:05:47 +02:00
Jiri Olsa
f6dedecc37 oprofile, x86: Adding backtrace dump for 32bit process in compat mode
This patch implements the oprofile backtrace  generation for 32 bit
applications running in the 64bit environment (compat mode).

With this change it's possible to get backtrace for 32bits applications
under the 64bits environment using oprofile's callgraph options.

opcontrol --setup -c ...
opreport -l -cg ...

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-01 16:07:18 +02:00
Jiri Olsa
40c6b3cb64 oprofile, x86: Using struct stack_frame for 64bit processes dump
Removing unnecessary struct frame_head and replacing it with
struct stack_frame.

The struct stack_frame is already defined and used in other places
in kernel, so there's no reason to define new structure.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-01 16:07:09 +02:00
Thomas Gleixner
0219896228 x86, hpet: Fix bogus error check in hpet_assign_irq()
create_irq() returns -1 if the interrupt allocation failed, but the
code checks for irq == 0.

Use create_irq_nr() instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <alpine.LFD.2.00.1009282310360.2416@localhost6.localdomain6>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-30 15:57:35 -07:00
Thomas Gleixner
1cf180c94e x86, irq: Plug memory leak in sparse irq
free_irq_cfg() is not freeing the cpumask_vars in irq_cfg. Fixing this
triggers a use after free caused by the fact that copying struct
irq_cfg is done with memcpy, which copies the pointer not the cpumask.

Fix both places.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <alpine.LFD.2.00.1009282052570.2416@localhost6.localdomain6>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-30 15:57:35 -07:00
Pekka Enberg
3682930623 [CPUFREQ] Fix memory leaks in pcc_cpufreq_do_osc
If acpi_evaluate_object() function call doesn't fail, we must kfree()
output.buffer before returning from pcc_cpufreq_do_osc().

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-09-30 16:14:23 -04:00
Namhyung Kim
86cf147494 [CPUFREQ] acpi-cpufreq: add missing __percpu markup
acpi_perf_data is a percpu pointer but was missing __percpu markup.
Add it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-09-30 16:14:22 -04:00
Cyrill Gorcunov
03e22198d2 perf, x86: Handle in flight NMIs on P4 platform
Stephane reported we've forgot to guard the P4 platform
against spurious in-flight performance IRQs. Fix it.

This fixes potential spurious 'dazed and confused' NMI
messages.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Cc: Robert Richter <robert.richter@amd.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <1285815698-4298-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-30 09:17:59 +02:00
Dan Carpenter
b365a85c68 x86, UV: Use allocated buffer in tlb_uv.c:tunables_read()
The original code didn't check that the value returned from
snprintf() was less than the size of the buffer.  Although it
didn't cause a runtime bug in this case, it makes the static
checkers complain.

Andrew Morton suggested a dynamically sized buffer would be
cleaner.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: Cliff Wickman <cpw@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
LKML-Reference: <20100929083118.GA6376@bicker>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-30 09:11:27 +02:00
Namhyung Kim
bd126b23a2 ACPI: add missing __percpu markup in arch/x86/kernel/acpi/cstate.c
cpu_cstate_entry is a percpu pointer
but was missing __percpu markup.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-09-28 21:38:20 -04:00
H. Peter Anvin
d900329e20 x86, cpu: After uncapping CPUID, re-run CPU feature detection
After uncapping the CPUID level, we need to also re-run the CPU
feature detection code.

This resolves kernel bugzilla 16322.

Reported-by: boris64 <bugzilla.kernel.org@boris64.net>
Cc: <stable@kernel.org> v2.6.29..2.6.35
LKML-Reference: <tip-@git.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-28 16:33:14 -07:00
Linus Torvalds
050026feae Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Avoid 'constant_test_bit()' misoptimization due to cast to non-volatile
2010-09-27 21:19:27 -07:00
Linus Torvalds
6a6aa2b7e4 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/amd-iommu: Fix rounding-bug in __unmap_single
  x86/amd-iommu: Work around S3 BIOS bug
  x86/amd-iommu: Set iommu configuration flags in enable-loop
  x86, setup: Fix earlyprintk=serial,0x3f8,115200
  x86, setup: Fix earlyprintk=serial,ttyS0,115200
2010-09-27 12:22:21 -07:00
Linus Torvalds
f0619343ce Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Catch spurious interrupts after disabling counters
  tracing/x86: Don't use mcount in kvmclock.c
  tracing/x86: Don't use mcount in pvclock.c
2010-09-27 12:21:48 -07:00
Ingo Molnar
c7a27aa465 Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent 2010-09-27 09:48:44 +02:00
Alexander Chumachenko
c9e2fbd909 x86: Avoid 'constant_test_bit()' misoptimization due to cast to non-volatile
While debugging bit_spin_lock() hang, it was tracked down to gcc-4.4
misoptimization of non-inlined constant_test_bit() due to non-volatile
addr when 'const volatile unsigned long *addr' cast to 'unsigned long *'
with subsequent unconditional jump to pause (and not to the test) leading
to hang.

Compiling with gcc-4.3 or disabling CONFIG_OPTIMIZE_INLINING yields inlined
constant_test_bit() and correct jump, thus working around the kernel bug.

Other arches than asm-x86 may implement this slightly differently;
2.6.29 mitigates the misoptimization by changing the function prototype
(commit c4295fbb60) but probably fixing the issue
itself is better.

Signed-off-by: Alexander Chumachenko <ledest@gmail.com>
Signed-off-by: Michael Shigorin <mike@osdn.org.ua>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-09-26 22:43:07 -07:00
Ma Ling
3b4b682bec x86, mem: Optimize memmove for small size and unaligned cases
movs instruction will combine data to accelerate moving data,
however we need to concern two cases about it.

1. movs instruction need long lantency to startup,
   so here we use general mov instruction to copy data.
2. movs instruction is not good for unaligned case,
   even if src offset is 0x10, dest offset is 0x0,
   we avoid and handle the case by general mov instruction.

Signed-off-by: Ma Ling <ling.ma@intel.com>
LKML-Reference: <1284664360-6138-1-git-send-email-ling.ma@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-09-24 18:57:11 -07:00
Jan Beulich
a46590533a x86/hwmon: fix initialization of coretemp
Using cpuid_eax() to determine feature availability on other than
the current CPU is invalid. And feature availability should also be
checked in the hotplug code path.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
2010-09-24 11:44:19 -07:00
Robert Richter
63e6be6d98 perf, x86: Catch spurious interrupts after disabling counters
Some cpus still deliver spurious interrupts after disabling a
counter. This caused 'undelivered NMI' messages. This patch
fixes this. Introduced by:

  4177c42: perf, x86: Try to handle unknown nmis with an enabled PMU

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: gorcunov@gmail.com <gorcunov@gmail.com>
Cc: fweisbec@gmail.com <fweisbec@gmail.com>
Cc: ying.huang@intel.com <ying.huang@intel.com>
Cc: ming.m.lin@intel.com <ming.m.lin@intel.com>
Cc: yinghai@kernel.org <yinghai@kernel.org>
Cc: andi@firstfloor.org <andi@firstfloor.org>
Cc: eranian@google.com <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100915162034.GO13563@erda.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-24 12:21:41 +02:00
Ingo Molnar
7329cf0201 Merge branch 'amd-iommu/2.6.36' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2010-09-24 11:19:53 +02:00
Ingo Molnar
a5a2bad55d Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-09-24 09:12:05 +02:00
Daniel Drake
3e3c486012 x86, olpc: Rework BIOS signature check
The XO-1.5 laptop is not currently detected as an OLPC machine because
it fails this XO-1-centric check.

Now that we have OLPC OFW support in the kernel, a more sensible
check is to see if we found OFW during boot and check the architecture
property.

Also remove a now-meaningless codepath, as we're always going to have
OFW support with OLPC.

Signed-off-by: Daniel Drake <dsd@laptop.org>
LKML-Reference: <20100923162846.D8D409D401B@zog.reactivated.net>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-23 11:15:00 -07:00
Daniel Drake
76fb657017 x86, olpc: Only enable PCI configuration type override on XO-1
This configuration type override is for XO-1 only and must not happen
on XO-1.5.

Signed-off-by: Daniel Drake <dsd@laptop.org>
LKML-Reference: <20100923162805.0F6549D401B@zog.reactivated.net>
Cc: Andres Solomon <dilinger@queued.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-23 11:14:18 -07:00
Bart Oldeman
6554287b1d x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.
Impact: fix kernel bug such as:
BUG: scheduling while atomic: dosemu.bin/19680/0x00000004
See also Ubuntu bug 455067 at
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/455067

Commits 4915a35e35
("Use preempt_conditional_sti/cli in do_int3, like on x86_64.")
and 3d2a71a596
("x86, traps: converge do_debug handlers")
started disabling preemption in int1 and int3 handlers on i386.
The problem with vm86 is that the call to handle_vm86_trap() may jump
straight to entry_32.S and never returns so preempt is never enabled
again, and there is an imbalance in the preempt count.

Commit be716615fe ("x86, vm86:
fix preemption bug"), which was later (accidentally?) reverted by commit
08d68323d1 ("hw-breakpoints: modifying
generic debug exception to use thread-specific debug registers")
fixed the problem for debug exceptions but not for breakpoints.

There are three solutions to this problem.

1. Reenable preemption before calling handle_vm86_trap(). This
was the approach that was later reverted.

2. Do not disable preemption for i386 in breakpoint and debug handlers.
This was the situation before October 2008. As far as I understand
preemption only needs to be disabled on x86_64 because a seperate stack is
used, but it's nice to have things work the same way on
i386 and x86_64.

3. Let handle_vm86_trap() return instead of jumping to assembly code.
By setting a flag in _TIF_WORK_MASK, either TIF_IRET or TIF_NOTIFY_RESUME,
the code in entry_32.S is instructed to return to 32 bit mode from
V86 mode. The logic in entry_32.S was already present to handle signals.
(I chose TIF_IRET because it's slightly more efficient in
do_notify_resume() in signal.c, but in fact TIF_IRET can probably be
replaced by TIF_NOTIFY_RESUME everywhere.)

I'm submitting approach 3, because I believe it is the most elegant
and prevents future confusion. Still, an obvious
preempt_conditional_cli(regs); is necessary in traps.c to correct the
bug.

[ hpa: This is technically a regression, but because:
  1. the regression is so old,
  2. the patch seems relatively high risk, justifying more testing, and
  3. we're late in the 2.6.36-rc cycle,

  I'm queuing it up for the 2.6.37 merge window.  It might, however,
  justify as a -stable backport at a latter time, hence Cc: stable. ]

Signed-off-by: Bart Oldeman <bartoldeman@users.sourceforge.net>
LKML-Reference: <alpine.DEB.2.00.1009231312330.4732@localhost.localdomain>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: <stable@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-23 11:07:49 -07:00
Joerg Roedel
04e0463e08 x86/amd-iommu: Fix rounding-bug in __unmap_single
In the __unmap_single function the dma_addr is rounded down
to a page boundary before the dma pages are unmapped. The
address is later also used to flush the TLB entries for that
mapping. But without the offset into the dma page the amount
of pages to flush might be miscalculated in the TLB flushing
path. This patch fixes this bug by using the original
address to flush the TLB.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-09-23 16:26:20 +02:00
Joerg Roedel
4c894f47bb x86/amd-iommu: Work around S3 BIOS bug
This patch adds a workaround for an IOMMU BIOS problem to
the AMD IOMMU driver. The result of the bug is that the
IOMMU does not execute commands anymore when the system
comes out of the S3 state resulting in system failure. The
bug in the BIOS is that is does not restore certain hardware
specific registers correctly. This workaround reads out the
contents of these registers at boot time and restores them
on resume from S3. The workaround is limited to the specific
IOMMU chipset where this problem occurs.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-09-23 16:26:03 +02:00
Joerg Roedel
e9bf519711 x86/amd-iommu: Set iommu configuration flags in enable-loop
This patch moves the setting of the configuration and
feature flags out out the acpi table parsing path and moves
it into the iommu-enable path. This is needed to reliably
fix resume-from-s3.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-09-23 16:24:50 +02:00
Steven Rostedt
46eb3b64dd jump label/x86/sparc64: Remove !CC_OPTIMIZE_FOR_SIZE config conditions
The !CC_OPTIMIZE_FOR_SIZE was added to enable the jump label functionality
because Jason noticed that the gcc option would not optimize the labels
and may even hurt performance.

But this is a gcc problem not a kernel one. Removing this condition should
add motivation to the gcc developers to actually fix it.

Cc: Jason Baron <jbaron@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-22 23:10:23 -04:00
Steven Rostedt
258af47479 tracing/x86: Don't use mcount in kvmclock.c
The guest can use the paravirt clock in kvmclock.c which is used
by sched_clock(), which in turn is used by the tracing mechanism
for timestamps, which leads to infinite recursion.

Disable mcount/tracing for kvmclock.o.

Cc: stable@kernel.org
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-22 23:01:19 -04:00
Jeremy Fitzhardinge
9ecd4e1689 tracing/x86: Don't use mcount in pvclock.c
When using a paravirt clock, pvclock.c can be used by sched_clock(),
which in turn is used by the tracing mechanism for timestamps,
which leads to infinite recursion.

Disable mcount/tracing for pvclock.o.

Cc: stable@kernel.org
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
LKML-Reference: <4C9A9A3F.4040201@goop.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-22 23:00:50 -04:00
Jan Beulich
234bb549ee x86, cleanups: Use clear_page/copy_page rather than memset/memcpy
When operating on whole pages, use clear_page() and copy_page() in
favor of memset() and memcpy(); after all that's what they are
intended for.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4C7FB8CA0200007800013F51@vpn.id2.novell.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-22 15:36:49 -07:00
Steven Rostedt
95fccd465e jump label: Remove duplicate structure for x86
The structure in the x86 jump label code uses the typedef jump_label_t,
which is defined by the #ifdef arch type. The structure does not need
to be duplicated there.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-22 17:37:43 -04:00
Jason Baron
d9f5ab7b1c jump label: x86 support
add x86 support for jump label. I'm keeping this patch separate so its clear
to arch maintainers what was required for x86 support this new feature.
Hopefully, it wouldn't be too painful for other archs.

Signed-off-by: Jason Baron <jbaron@redhat.com>
LKML-Reference: <f838f49f40fbea0254036194be66dc48b598dcea.1284733808.git.jbaron@redhat.com>

[ cleaned up some formatting ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-22 16:33:03 -04:00
Jason Baron
4c3ef6d793 jump label: Add jump_label_text_reserved() to reserve jump points
Add a jump_label_text_reserved(void *start, void *end), so that other
pieces of code that want to modify kernel text, can first verify that
jump label has not reserved the instruction.

Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
LKML-Reference: <06236663a3a7b1c1f13576bb9eccb6d9c17b7bfe.1284733808.git.jbaron@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-22 16:30:46 -04:00
Jason Baron
bf5438fca2 jump label: Base patch for jump label
base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
assembly gcc mechanism, we can now branch to labels from an 'asm goto'
statment. This allows us to create a 'no-op' fastpath, which can subsequently
be patched with a jump to the slowpath code. This is useful for code which
might be rarely used, but which we'd like to be able to call, if needed.
Tracepoints are the current usecase that these are being implemented for.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jason Baron <jbaron@redhat.com>
LKML-Reference: <ee8b3595967989fdaf84e698dc7447d315ce972a.1284733808.git.jbaron@redhat.com>

[ cleaned up some formating ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-22 16:29:41 -04:00
Ingo Molnar
90edf27fb8 Merge branch 'linus' into perf/core
Conflicts:
	kernel/hw_breakpoint.c

Merge reason: resolve the conflict.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-22 18:45:01 +02:00
Linus Torvalds
87ac6fa26e Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hw breakpoints: Fix pid namespace bug
  x86: Fix instruction breakpoint encoding
  oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540)
  kprobes: Fix Kconfig dependency
2010-09-21 13:21:42 -07:00
Yinghai Lu
74b3c444a9 x86, setup: Fix earlyprintk=serial,0x3f8,115200
earlyprintk can take and I/O port, so we need to handle this case in
the setup code too, otherwise 0x3f8 will be treated as a baud rate.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C7B05A6.4010801@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-21 10:18:33 -07:00
Yinghai Lu
83d9f65bda x86, setup: Fix earlyprintk=serial,ttyS0,115200
Torsten reported that there is garbage output,
after commit 8fee13a48e (x86,
setup: enable early console output from the decompressor)

It turns out we missed the offset for that case.

Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C7B0578.8090807@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-21 10:18:14 -07:00
Ingo Molnar
7ed569206e Merge commit 'v2.6.36-rc5' into perf/core
Merge reason: Pick up the latest fixes in -rc5.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-21 13:55:11 +02:00
Jiri Olsa
bb7ab785ad oprofile: Add Support for Intel CPU Family 6 / Model 29
This patch adds CPU type detection for dunnington processor (Family 6
/ Model 29) to be identified as core 2 family cpu type (wikipedia
source).

I tested oprofile on Intel(R) Xeon(R) CPU E7440 reporting itself as
model 29, and it runs without an issue.

Spec:

 http://www.intel.com/Assets/en_US/PDF/specupdate/320336.pdf

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-09-21 12:22:48 +02:00
Rusty Russell
9b6efcd2e2 lguest: update comments to reflect LHCALL_LOAD_GDT_ENTRY.
We used to have a hypercall which reloaded the entire GDT, then we
switched to one which loaded a single entry (to match the IDT code).

Some comments were not updated, so fix them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported by: Eviatar Khen <eviatarkhen@gmail.com>
2010-09-21 10:54:02 +09:30
H. Peter Anvin
c2b9ff24a0 x86, cpu: Re-run get_cpu_cap() after adjusting the CPUID level
At least on Intel, adjusting the max CPUID level can expose new CPUID
features, so we need to re-run get_cpu_cap() after changing the CPUID
level.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-20 18:03:28 -07:00
H. Peter Anvin
ce5f68246b x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line
When we're using MWAIT for play_dead, explicitly CLFLUSH the cache
line before executing MONITOR.  This is a potential workaround for the
Xeon 7400 erratum AAI65 after having a spurious wakeup and returning
around the loop.  "Potential" here because it is not certain that that
erratum could actually trigger; however, the CLFLUSH should be
harmless.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.kernel.org>
Cc: Len Brown <lenb@kernel.org>
2010-09-20 15:51:59 -07:00
Jason Baron
fa6f2cc770 jump label: Make text_poke_early() globally visible
Make text_poke_early available outside of alternative.c. The jump label
patchset wants to make use of it in order to set up the optimal no-op
sequences at run-time.

Signed-off-by: Jason Baron <jbaron@redhat.com>
LKML-Reference: <04cfddf2ba77bcabfc3e524f1849d871d6a1cf9d.1284733808.git.jbaron@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-20 18:19:51 -04:00
Jason Baron
f49aa44856 jump label: Make dynamic no-op selection available outside of ftrace
Move Steve's code for finding the best 5-byte no-op from ftrace.c to
alternative.c. The idea is that other consumers (in this case jump label)
want to make use of that code.

Signed-off-by: Jason Baron <jbaron@redhat.com>
LKML-Reference: <96259ae74172dcac99c0020c249743c523a92e18.1284733808.git.jbaron@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-20 18:19:39 -04:00
Andreas Herrmann
23ac4ae827 x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB
The file names are somehow misleading as the code is not specific to
AMD K8 CPUs anymore. The files accomodate code for other AMD CPU
northbridges as well.

Same is true for the config option which is valid for AMD CPU
northbridges in general and not specific to K8.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160343.GD4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-20 14:22:58 -07:00
Arnaud Lacombe
838a2e55e6 kbuild: migrate all arch to the kconfig mainmenu upgrade
Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Michal Marek <mmarek@suse.cz>
2010-09-19 22:54:11 -04:00
Thomas Gleixner
995bd3bb5c x86: Hpet: Avoid the comparator readback penalty
Due to the overly intelligent design of HPETs, we need to workaround
the problem that the compare value which we write is already behind
the actual counter value at the point where the value hits the real
compare register. This happens for two reasons:

1) We read out the counter, add the delta and write the result to the
   compare register. When a NMI or SMI hits between the read out and
   the write then the counter can be ahead of the event already

2) The write to the compare register is delayed by up to two HPET
   cycles in certain chipsets.

We worked around this by reading back the compare register to make
sure that the written value has hit the hardware. For certain ICH9+
chipsets this can require two readouts, as the first one can return
the previous compare register value. That's bad performance wise for
the normal case where the event is far enough in the future.

As we already know that the write can be delayed by up to two cycles
we can avoid the read back of the compare register completely if we
make the decision whether the delta has elapsed already or not based
on the following calculation:

  cmp = event - actual_count;

If cmp is less than 8 HPET clock cycles, then we decide that the event
has happened already and return -ETIME. That covers the above #1 and
#2 problems which would cause a wait for HPET wraparound (~306
seconds).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nix <nix@esperi.org.uk>
Tested-by: Artur Skawina <art.08.09@gmail.com>
Cc: Damien Wyart <damien.wyart@free.fr>
Tested-by: John Drescher <drescherjm@gmail.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <alpine.LFD.2.00.1009151500060.2416@localhost6.localdomain6>
2010-09-18 12:09:13 +02:00
H. Peter Anvin
a68e5c94f7 x86, hotplug: Move WBINVD back outside the play_dead loop
On processors with hyperthreading, when only one thread is offlined
the other thread can cause a spurious wakeup on the idled thread.  We
do not want to re-WBINVD when that happens.

Ideally, we should simply skip WBINVD unless we're the last thread on
a particular core to shut down, but there might be similar issues
elsewhere in the system.

Thus, revert to previous behavior of only WBINVD outside the loop.
Partly as a result, remove the mb()'s around it: they are not
necessary since wbinvd() is a serializing instruction, but they were
intended to make sure the compiler didn't do any funny loop
optimizations.

Reported-by: Asit Mallick <asit.k.mallick@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
LKML-Reference: <tip-ea53069231f9317062910d6e772cca4ce93de8c8@git.kernel.org>
2010-09-17 17:10:23 -07:00
H. Peter Anvin
ea53069231 x86, hotplug: Use mwait to offline a processor, fix the legacy case
The code in native_play_dead() has a number of problems:

1. We should use MWAIT when available, to put ourselves into a deeper
   sleep state.
2. We use the existence of CLFLUSH to determine if WBINVD is safe, but
   that is totally bogus -- WBINVD is 486+, whereas CLFLUSH is a much
   later addition.
3. We should do WBINVD inside the loop, just in case of something like
   setting an A bit on page tables.  Pointed out by Arjan van de Ven.

This code is based in part of a previous patch by Venki Pallipadi, but
unlike that patch this one keeps all the detection code local instead
of pre-caching a bunch of information.  We're shutting down the CPU;
there is absolutely no hurry.

This patch moves all the code to C and deletes the global
wbinvd_halt() which is broken anyway.

Originally-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
LKML-Reference: <20090522232230.162239000@intel.com>
2010-09-17 15:39:11 -07:00
H. Peter Anvin
bc83cccc76 x86, mwait: Move mwait constants to a common header file
We have MWAIT constants spread across three different .c files, for no
good reason.  Move them all into a common header file.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <tip-*@git.kernel.org>
2010-09-17 15:36:40 -07:00
Andreas Herrmann
900f9ac9f1 x86, k8-gart: Decouple handling of garts and northbridges
So far we only provide num_k8_northbridges. This is required in
different areas (e.g. L3 cache index disable, GART). But not all AMD
CPUs provide a GART. Thus it is useful to split off the GART handling
from the generic caching of AMD northbridge misc devices.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160254.GC4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-17 13:26:21 -07:00
Andreas Herrmann
3518dd14ca x86, cacheinfo: Fix dependency of AMD L3 CID
L3 cache index disable code uses PCI accesses to AMD northbridge functions.
Currently the code is #ifdef CONFIG_CPU_SUP_AMD.
But it should be #if (defined(CONFIG_CPU_SUP_AMD) && defined(CONFIG_PCI))
which in the end is a dependency to K8_NB.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160744.GF4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-17 13:25:56 -07:00
Cliff Wickman
3ee48b6af4 mm, x86: Saving vmcore with non-lazy freeing of vmas
During the reading of /proc/vmcore the kernel is doing
ioremap()/iounmap() repeatedly. And the buildup of un-flushed
vm_area_struct's is causing a great deal of overhead. (rb_next()
is chewing up most of that time).

This solution is to provide function set_iounmap_nonlazy(). It
causes a subsequent call to iounmap() to immediately purge the
vma area (with try_purge_vmap_area_lazy()).

With this patch we have seen the time for writing a 250MB
compressed dump drop from 71 seconds to 44 seconds.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: kexec@lists.infradead.org
Cc: <stable@kernel.org>
LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-17 09:11:56 +02:00
Linus Torvalds
a5b617368c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: hpet: Work around hardware stupidity
  x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y
  x86, cpufeature: Suppress compiler warning with gcc 3.x
  x86, UV: Fix initialization of max_pnode
2010-09-16 19:38:08 -07:00
Frederic Weisbecker
89e45aac42 x86: Fix instruction breakpoint encoding
Lengths and types of breakpoints are encoded in a half byte
into CPU registers. However when we extract these values
and store them, we add a high half byte part to them: 0x40 to the
length and 0x80 to the type.
When that gets reloaded to the CPU registers, the high part
is masked.

While making the instruction breakpoints available for perf,
I zapped that high part on instruction breakpoint encoding
and that broke the arch -> generic translation used by ptrace
instruction breakpoints. Writing dr7 to set an inst breakpoint
was then failing.

There is no apparent reason for these high parts so we could get
rid of them altogether. That's an invasive change though so let's
do that later and for now fix the problem by restoring that inst
breakpoint high part encoding in this sole patch.

Reported-by: Kelvie Wong <kelvie@ieee.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
2010-09-17 03:24:13 +02:00
Patrick Simmons
c33f543d32 oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540)
This patch adds CPU type detection for the Intel Celeron 540, which is
part of the Core 2 family according to Wikipedia; the family and ID pair
is absent from the Volume 3B table referenced in the source code
comments.  I have tested this patch on an Intel Celeron 540 machine
reporting itself as Family 6 Model 22, and OProfile runs on the machine
without issue.

Spec:

 http://download.intel.com/design/mobile/SPECUPDT/317667.pdf

Signed-off-by: Patrick Simmons <linuxrocks123@netscape.net>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-09-16 12:35:56 +02:00
Suresh Siddha
fa47f7e528 x86, x2apic: Simplify apic init in SMP and UP builds
Move enable_IR_x2apic() inside the default_setup_apic_routing(),
and for SMP platforms, move the default_setup_apic_routing() after
smp_sanity_check(). This cleans up the code that tries to avoid multiple
calls to default_setup_apic_routing() when smp_sanity_check() fails (which
goes through the APIC_init_uniprocessor() path).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100827181049.173087246@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-15 17:37:10 -07:00
Suresh Siddha
62a92f4c69 x86, intr-remap: Remove IRTE setup duplicate code
Remove IRTE setup duplicate code with prepare_irte().

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100827181049.095067319@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-15 17:36:45 -07:00
Suresh Siddha
75e3cfbed6 x86, intr-remap: Set redirection hint in the IRTE
Currently the redirection hint in the interrupt-remapping table entry
is set to 0, which means the remapped interrupt is directed to the
processors listed in the destination. So in logical flat mode
in the presence of intr-remapping, this results in a single
interrupt multi-casted to multiple cpu's as specified by the destination
bit mask. But what we really want is to send that interrupt to one of the cpus
based on the lowest priority delivery mode.

Set the redirection hint in the IRTE to '1' to indicate that we want
the remapped interrupt to be directed to only one of the processors
listed in the destination.

This fixes the issue of same interrupt getting delivered to multiple cpu's
in the logical flat mode in the presence of interrupt-remapping. While
there is no functional issue observed with this behavior, this will
impact performance of such configurations (<=8 cpu's using logical flat
mode in the presence of interrupt-remapping)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100827181049.013051492@sbsiddha-MOBL3.sc.intel.com>
Cc: Weidong Han <weidong.han@intel.com>
Cc: <stable@kernel.org> # [v2.6.32+]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-15 17:36:37 -07:00
Udo van den Heuvel
892df7f81c x86: HPET force enable for CX700 / VIA Epia LT
Allow using HPET with the hpet=force command line option on VIA EPIA
CX700 systems.

Signed-off-by: Udo van den Heuvel <udovdh@xs4all.nl>
Cc: Robert Hancock <hancockrwd@gmail.com>
LKML-Reference:  <4C8F04DC.5060303@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-09-15 16:27:04 +02:00
Namhyung Kim
6abded71d7 kprobes: Remove __dummy_buf
Remove __dummy_buf which is needed for kallsyms_lookup only.
use kallsysm_lookup_size_offset instead.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
LKML-Reference: <1284512670-2369-5-git-send-email-namhyung@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-15 10:44:02 +02:00
Namhyung Kim
6376b22975 kprobes: Make functions static
Make following (internal) functions static to make sparse
happier :-)

 * get_optimized_kprobe: only called from static functions
 * kretprobe_table_unlock: _lock function is static
 * kprobes_optinsn_template_holder: never called but holding asm code

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
LKML-Reference: <1284512670-2369-4-git-send-email-namhyung@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-15 10:44:01 +02:00
Ingo Molnar
3aabae7d9d Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-09-15 10:27:31 +02:00
Roland McGrath
eefdca043e x86-64, compat: Retruncate rax after ia32 syscall entry tracing
In commit d4d6715, we reopened an old hole for a 64-bit ptracer touching a
32-bit tracee in system call entry.  A %rax value set via ptrace at the
entry tracing stop gets used whole as a 32-bit syscall number, while we
only check the low 32 bits for validity.

Fix it by truncating %rax back to 32 bits after syscall_trace_enter,
in addition to testing the full 64 bits as has already been added.

Reported-by: Ben Hawkes <hawkes@sota.gen.nz>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-14 16:08:47 -07:00
H. Peter Anvin
36d001c70d x86-64, compat: Test %rax for the syscall number, not %eax
On 64 bits, we always, by necessity, jump through the system call
table via %rax.  For 32-bit system calls, in theory the system call
number is stored in %eax, and the code was testing %eax for a valid
system call number.  At one point we loaded the stored value back from
the stack to enforce zero-extension, but that was removed in checkin
d4d6715016.  An actual 32-bit process
will not be able to introduce a non-zero-extended number, but it can
happen via ptrace.

Instead of re-introducing the zero-extension, test what we are
actually going to use, i.e. %rax.  This only adds a handful of REX
prefixes to the code.

Reported-by: Ben Hawkes <hawkes@sota.gen.nz>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2010-09-14 16:08:46 -07:00
H. Peter Anvin
c41d68a513 compat: Make compat_alloc_user_space() incorporate the access_ok()
compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area.  A missing call could
introduce problems on some architectures.

This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.

This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures.  This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.

Reported-by: Ben Hawkes <hawkes@sota.gen.nz>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <jejb@parisc-linux.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: <stable@kernel.org>
2010-09-14 16:08:45 -07:00
Thomas Gleixner
54ff7e595d x86: hpet: Work around hardware stupidity
This more or less reverts commits 08be979 (x86: Force HPET
readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict
read back to affected ATI chipsets) to the status of commit 8da854c
(x86, hpet: Erratum workaround for read after write of HPET
comparator).

The delta to commit 8da854c is mostly comments and the change from
WARN_ONCE to printk_once as we know the call path of this function
already.

This needs really in depth explanation:

First of all the HPET design is a complete failure. Having a counter
compare register which generates an interrupt on matching values
forces the software to do at least one superfluous readback of the
counter register.

While it is nice in theory to program "absolute" time events it is
practically useless because the timer runs at some absurd frequency
which can never be matched to real world units. So we are forced to
calculate a relative delta and this forces a readout of the actual
counter value, adding the delta and programming the compare
register. When the delta is small enough we run into the danger that
we program a compare value which is already in the past. Due to the
compare for equal nature of HPET we need to read back the counter
value after writing the compare rehgister (btw. this is necessary for
absolute timeouts as well) to make sure that we did not miss the timer
event. We try to work around that by setting the minimum delta to a
value which is larger than the theoretical time which elapses between
the counter readout and the compare register write, but that's only
true in theory. A NMI or SMI which hits between the readout and the
write can easily push us beyond that limit. This would result in
waiting for the next HPET timer interrupt until the 32bit wraparound
of the counter happens which takes about 306 seconds.

So we designed the next event function to look like:

   match = read_cnt() + delta;
   write_compare_ref(match);
   return read_cnt() < match ? 0 : -ETIME;

At some point we got into trouble with certain ATI chipsets. Even the
above "safe" procedure failed. The reason was that the write to the
compare register was delayed probably for performance reasons. The
theory was that they wanted to avoid the synchronization of the write
with the HPET clock, which is understandable. So the write does not
hit the compare register directly instead it goes to some intermediate
register which is copied to the real compare register in sync with the
HPET clock. That opens another window for hitting the dreaded "wait
for a wraparound" problem.

To work around that "optimization" we added a read back of the compare
register which either enforced the update of the just written value or
just delayed the readout of the counter enough to avoid the issue. We
unfortunately never got any affirmative info from ATI/AMD about this.

One thing is sure, that we nuked the performance "optimization" that
way completely and I'm pretty sure that the result is worse than
before some HW folks came up with those.

Just for paranoia reasons I added a check whether the read back
compare register value was the same as the value we wrote right
before. That paranoia check triggered a couple of years after it was
added on an Intel ICH9 chipset. Venki added a workaround (commit
8da854c) which was reading the compare register twice when the first
check failed. We considered this to be a penalty in general and
restricted the readback (thus the wasted CPU cycles) to the known to
be affected ATI chipsets.

This turned out to be a utterly wrong decision. 2.6.35 testers
experienced massive problems and finally one of them bisected it down
to commit 30a564be which spured some further investigation.

Finally we got confirmation that the write to the compare register can
be delayed by up to two HPET clock cycles which explains the problems
nicely. All we can do about this is to go back to Venki's initial
workaround in a slightly modified version.

Just for the record I need to say, that all of this could have been
avoided if hardware designers and of course the HPET committee would
have thought about the consequences for a split second. It's out of my
comprehension why designing a working timer is so hard. There are two
ways to achieve it:

 1) Use a counter wrap around aware compare_reg <= counter_reg
    implementation instead of the easy compare_reg == counter_reg

    Downsides:

	- It needs more silicon.

	- It needs a readout of the counter to apply a relative
	  timeout. This is necessary as the counter does not run in
	  any useful (and adjustable) frequency and there is no
	  guarantee that the counter which is used for timer events is
	  the same which is used for reading the actual time (and
	  therefor for calculating the delta)

    Upsides:

	- None

  2) Use a simple down counter for relative timer events

    Downsides:

	- Absolute timeouts are not possible, which is not a problem
	  at all in the context of an OS and the expected
	  max. latencies/jitter (also see Downsides of #1)

   Upsides:

	- It needs less or equal silicon.

	- It works ALWAYS

	- It is way faster than a compare register based solution (One
	  write versus one write plus at least one and up to four
	  reads)

I would not be so grumpy about all of this, if I would not have been
ignored for many years when pointing out these flaws to various
hardware folks. I really hate timers (at least those which seem to be
designed by janitors).

Though finally we got a reasonable explanation plus a solution and I
want to thank all the folks involved in chasing it down and providing
valuable input to this.

Bisected-by: Nix <nix@esperi.org.uk>
Reported-by: Artur Skawina <art.08.09@gmail.com>
Reported-by: Damien Wyart <damien.wyart@free.fr>
Reported-by: John Drescher <drescherjm@gmail.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: stable@kernel.org
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-09-15 00:55:13 +02:00
basile@opensource.dyc.edu
08c2b394b9 x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y
The arch/x86/Makefile uses scripts/gcc-x86_$(BITS)-has-stack-protector.sh
to check if cc1 supports -fstack-protector.  When -fPIE is passed to cc1,
these scripts fail causing stack protection to be disabled even when it
is available.

This fix is similar to commit c47efe5548

Reported-by: Kai Dietrich <mail@cleeus.de>
Signed-off-by: Magnus Granberg <zorry@gentoo.org>
LKML-Reference: <20100913101319.748A1148E216@opensource.dyc.edu>
Signed-off-by: Anthony G. Basile <basile@opensource.dyc.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-13 15:53:16 -07:00
Tetsuo Handa
2fd818642a x86, cpufeature: Suppress compiler warning with gcc 3.x
Gcc 3.x generates a warning

  arch/x86/include/asm/cpufeature.h: In function `__static_cpu_has':
  arch/x86/include/asm/cpufeature.h:326: warning: asm operand 1 probably doesn't match constraints

on each file.
But static_cpu_has() for gcc 3.x does not need __static_cpu_has().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
LKML-Reference: <201008300127.o7U1RC6Z044051@www262.sakura.ne.jp>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-13 14:48:41 -07:00
Stephane Eranian
b0b2072df3 perf_events: Fix BTS interrupt handling to avoid being dazed by NMI (v2)
Fix a bug introduced with commit de725de and the change in the
meaning of the return value of intel_pmu_handle_irq(). With the
current code, when you are using the BTS, you get 'dazed by NMI'
each time the BTS buffer fills up.

BTS does interrupt on the PMU vector, thus NMI. You need to take
this into account in the return value of the function.

This version fixes initial patch which was missing changes to
perf_event_intel_ds.c.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: perfmon2-devel@lists.sf.net
Cc: eranian@gmail.com
Cc: robert.richter@amd.com
LKML-Reference: <4c8a1686.aae9d80a.5aa4.5e35@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-13 08:43:40 +02:00
Joe Perches
d0ed0c3266 x86: Remove pr_<level> uses of KERN_<level>
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <trivial@kernel.org>
LKML-Reference: <d40c60f4b036a26db8492848695bdafaa3b42791.1284267142.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-12 09:32:31 +02:00
Peter Zijlstra
5ee5e97ee9 x86, tsc: Fix a preemption leak in restore_sched_clock_state()
A real life genuine preemption leak..

Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10 18:17:45 -07:00
Venkatesh Pallipadi
351e5a703a x86, mtrr: Support mtrr lookup for range spanning across MTRR range
mtrr_type_lookup [start:end] looked up the resultant MTRR type for that
range, based on fixed and all variable MTRR ranges. It did check for multiple
MTRR var ranges overlapping [start:end] and returned the net type.

However, if the [start:end] range spanned across any var MTRR range,
mtrr_type_lookup would return an error return of 0xFE. This was based on
typical usage of mtrr_type_lookup in PAT mapping, where region being
mapped would not normally span across MTRR ranges and also trying
to keep the code simple.

Mark recently reported the problem with this limitation. When there are
two continguous MTRR's of type "writeback" and if there is a memory mapping
over a region starting in one MTRR range and ending in another MTRR range,
such mapping will fallback to "uncached" due to the above limitation.

Change below adds support for such lookups spanning multiple MTRR ranges.
We now have a wrapper mtrr_type_lookup that dynamically splits such a region
into smaller chunks that fit within one MTRR range and does a
__mtrr_type_lookup on it and combine the results later.

Reported-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <1284159350-19841-3-git-send-email-venki@google.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-10 16:11:20 -07:00
Venkatesh Pallipadi
a7f07cfbaa x86, mtrr: Refactor MTRR type overlap check code
Move the MTRR type overlap check into a new function. No functional change in
this patch. Just making it easier to add multiple region overlap check in
the following patch.

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <1284159350-19841-2-git-send-email-venki@google.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-10 16:11:10 -07:00
Jack Steiner
36ac4b987b x86, UV: Fix initialization of max_pnode
Fix calculation of "max_pnode" for systems where the the highest
blade has neither cpus or memory. (And, yes, although rare this
does occur).

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100910150808.GA19802@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-10 17:15:49 +02:00
Linus Torvalds
be6200aac9 Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Perform hardware_enable in CPU_STARTING callback
  KVM: i8259: fix migration
  KVM: fix i8259 oops when no vcpus are online
  KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts
2010-09-10 08:02:45 -07:00
Brian Gerst
db7829c6cc x86, percpu: Optimize this_cpu_ptr
Allow arches to implement __this_cpu_ptr, and provide an x86 version.

Before:
	movq $foo, %rax
	movq %gs:this_cpu_off, %rdx
	addq %rdx, %rax

After:
	movq $foo, %rax
	addq %gs:this_cpu_off, %rax

The benefit is doing it in one less instruction and not clobbering
a temporary register.

tj: * Beefed up the comment a bit and renamed in-macro temp variable
      to match neighboring macros.

    * Folded fix for const pointer case found in linux-next.

    * Fixed sparse notation.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-09-10 10:56:47 +02:00
Brian Gerst
b2b57fe053 x86, fpu: Merge fpu_save_init()
Make 64-bit use the 32-bit version of fpu_save_init().  Remove
unused clear_fpu_state().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-13-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:36 -07:00
Brian Gerst
58a992b9cb x86-32, fpu: Rewrite fpu_save_init()
Rewrite fpu_save_init() to prepare for merging with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-12-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:31 -07:00
Brian Gerst
eec73f813a x86, fpu: Remove PSHUFB_XMM5_* macros
The PSHUFB_XMM5_* macros are no longer used.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-11-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:25 -07:00
Brian Gerst
8eb91a577d x86, fpu: Remove unnecessary ifdefs from i387 code.
Remove ifdefs for code that the compiler can optimize away on 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-10-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:18 -07:00
Brian Gerst
a334fe43d8 x86-32, fpu: Remove math_emulate stub
check_fpu() in bugs.c halts boot if no FPU is found and math emulation
isn't enabled.  Therefore this stub will never be used.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-9-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:11 -07:00
Brian Gerst
820241356d x86-64, fpu: Simplify constraints for fxsave/fxtstor
Use the "R" constraint (legacy register) instead of listing all the
possible registers.  Clean up the comments as well.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-8-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:06 -07:00
Brian Gerst
10c11f3049 x86-64, fpu: Fix %cs value in convert_from_fxsr()
While %ds still contains the userspace selector, %cs is KERNEL_CS at
this point.  Always get %cs from pt_regs even for the current task.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-7-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:58 -07:00
Brian Gerst
a4d4fbc773 x86-64, fpu: Disable preemption when using TS_USEDFPU
Consolidates code and fixes the below race for 64-bit.

commit 9fa2f37bfeb798728241cc4a19578ce6e4258f25
Author: torvalds <torvalds>
Date:   Tue Sep 2 07:37:25 2003 +0000

    Be a lot more careful about TS_USEDFPU and preemption

    We had some races where we testecd (or set) TS_USEDFPU together
    with sequences that depended on the setting (like clearing or
    setting the TS flag in %cr0) and we could be preempted in between,
    which screws up the FPU state, since preemption will itself change
    USEDFPU and the TS flag.

    This makes it a lot more explicit: the "internal" low-level FPU
    functions ("__xxxx_fpu()") all require preemption to be disabled,
    and the exported "real" functions will make sure that is the case.

    One case - in __switch_to() - was switched to the non-preempt-safe
    internal version, since the scheduler itself has already disabled
    preemption.

    BKrev: 3f5448b5WRiQuyzAlbajs3qoQjSobw

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:45 -07:00
Brian Gerst
bfd946cb89 x86, fpu: Merge __save_init_fpu()
__save_init_fpu() is identical for 32-bit and 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:30 -07:00
Brian Gerst
51115d4d45 x86, fpu: Merge tolerant_fwait()
Commit e2e75c91 merged the math exception handler, allowing both 32-bit
and 64-bit to handle math exceptions from kernel mode.  Switch to using
the 64-bit version of tolerant_fwait() without fnclex, which simply
ignores the exception if one is still pending from userspace.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:25 -07:00
Brian Gerst
6ac8bac268 x86, fpu: Merge fpu_init()
Make fpu_init() handle 32-bit setup.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:20 -07:00
Brian Gerst
2df7a6e9e8 x86: Use correct type for %cr4
%cr4 is 64-bit in 64-bit mode (although the upper 32-bits are currently reserved).
Use unsigned long for the temporary variable to get the right size.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:13 -07:00
Peter Zijlstra
15ac9a395a perf: Remove the sysfs bits
Neither the overcommit nor the reservation sysfs parameter were
actually working, remove them as they'll only get in the way.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:31 +02:00
Peter Zijlstra
a4eaf7f146 perf: Rework the PMU methods
Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.

The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.

This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).

It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).

The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:

 1) We disable the counter:
    a) the pmu has per-counter enable bits, we flip that
    b) we program a NOP event, preserving the counter state

 2) We store the counter state and ignore all read/overflow events

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:30 +02:00
Peter Zijlstra
33696fc0d1 perf: Per PMU disable
Changes perf_disable() into perf_pmu_disable().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:29 +02:00
Peter Zijlstra
24cd7f54a0 perf: Reduce perf_disable() usage
Since the current perf_disable() usage is only an optimization,
remove it for now. This eases the removal of the __weak
hw_perf_enable() interface.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:29 +02:00
Peter Zijlstra
b0a873ebbf perf: Register PMU implementations
Simple registration interface for struct pmu, this provides the
infrastructure for removing all the weak functions.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:28 +02:00
Peter Zijlstra
51b0fe3954 perf: Deconstify struct pmu
sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"`

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:27 +02:00
Ingo Molnar
2aa61274ef Merge branch 'perf/urgent' into perf/core
Merge reason: Pick up pending fixes before applying dependent new changes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:40:08 +02:00
Cliff Wickman
37a2f9f30a x86, kdump: Change copy_oldmem_page() to use cached addressing
The copy of /proc/vmcore to a user buffer proceeds much faster
if the kernel addresses memory as cached.

With this patch we have seen an increase in transfer rate from
less than 15MB/s to 80-460MB/s, depending on size of the
transfer. This makes a big difference in time needed to save a
system dump.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: kexec@lists.infradead.org
Cc: <stable@kernel.org> # as far back as it would apply
LKML-Reference: <E1OtMLz-0001yp-Ia@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 09:46:23 +02:00
Andre Przywara
aeb9c7d618 x86, kvm: add new AMD SVM feature bits
The recently updated CPUID specification names new SVM feature bits.
Add them to the list of reported features.

Signed-off-by: Andre Przywara <andre.przywara@amd,com>
LKML-Reference: <1283778860-26843-5-git-send-email-andre.przywara@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-08 13:34:15 -07:00
Andre Przywara
6d886fd042 x86, cpu: Fix allowed CPUID bits for KVM guests
The AMD extensions to AVX (FMA4, XOP) work on the same YMM register set
as AVX, so they are safe for guests to use, as long as AVX itself
is allowed. Add F16C and AES on the way for the same reasons.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
LKML-Reference: <1283778860-26843-4-git-send-email-andre.przywara@amd.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-08 13:34:15 -07:00
Andre Przywara
33ed82fb6c x86, cpu: Update AMD CPUID feature bits
AMD's public CPUID specification has been updated and some bits have
got names. Add them to properly describe new CPU features.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
LKML-Reference: <1283778860-26843-3-git-send-email-andre.przywara@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-08 13:34:15 -07:00
Andre Przywara
7ef8aa72ab x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit
The AMD SSE5 feature set as-it has been replaced by some extensions
to the AVX instruction set. Thus the bit formerly advertised as SSE5
is re-used for one of these extensions (XOP).
Although this changes the /proc/cpuinfo output, it is not user visible, as
there are no CPUs (yet) having this feature.
To avoid confusion this should be added to the stable series, too.

Cc: stable@kernel.org [.32.x .34.x, .35.x]
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
LKML-Reference: <1283778860-26843-2-git-send-email-andre.przywara@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-08 13:32:55 -07:00
Linus Torvalds
1faa6ec8cc Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks
  io-mapping: Fix the address space annotations
  x86: Fix the address space annotations of iomap_atomic_prot_pfn()
  x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev
2010-09-08 11:14:10 -07:00
Linus Torvalds
899edae615 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Try to handle unknown nmis with an enabled PMU
  perf, x86: Fix handle_irq return values
  perf, x86: Fix accidentally ack'ing a second event on intel perf counter
  oprofile, x86: fix init_sysfs() function stub
  lockup_detector: Sync touch_*_watchdog back to old semantics
  tracing: Fix a race in function profile
  oprofile, x86: fix init_sysfs error handling
  perf_events: Fix time tracking for events with pid != -1 and cpu != -1
  perf: Initialize callchains roots's childen hits
  oprofile: fix crash when accessing freed task structs
2010-09-08 11:13:16 -07:00
Gleb Natapov
eebb5f31b8 KVM: i8259: fix migration
Top of kvm_kpic_state structure should have the same memory layout as
kvm_pic_state since it is copied by memcpy.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-09-08 14:50:58 -03:00
Avi Kivity
ae0635b358 KVM: fix i8259 oops when no vcpus are online
If there are no vcpus, found will be NULL.  Check before doing anything with
it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-09-08 14:50:56 -03:00
Avi Kivity
16518d5ada KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts
operand::val and operand::orig_val are 32-bit on i386, whereas cmpxchg8b
operands are 64-bit.

Fix by adding val64 and orig_val64 union members to struct operand, and
using them where needed.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-09-08 14:50:55 -03:00
Christian Dietrich
0f1cf415f0 x86: Remove unnecessary #ifdef ACPI/X86_IO_ACPI
The ACPI/X86_IO_ACPI ifdef isn't necessary at this point,
because it is checked in an outer ifdef level already and has no
effect here.

Cleanup only, no functional effect.

Signed-off-by: Christian Dietrich <qy03fugy@stud.informatik.uni-erlangen.de>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: vamos-dev@i4.informatik.uni-erlangen.de
LKML-Reference: <d4376e6d79b8dc0f89a4b3ce4a880904a7b93ead.1283782701.git.qy03fugy@stud.informatik.uni-erlangen.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-08 08:14:02 +02:00
Linus Torvalds
d56557af19 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: bus speed strings should be const
  PCI hotplug: Fix build with CONFIG_ACPI unset
  PCI: PCIe: Remove the port driver module exit routine
  PCI: PCIe: Move PCIe PME code to the pcie directory
  PCI: PCIe: Disable PCIe port services during port initialization
  PCI: PCIe: Ask BIOS for control of all native services at once
  ACPI/PCI: Negotiate _OSC control bits before requesting them
  ACPI/PCI: Do not preserve _OSC control bits returned by a query
  ACPI/PCI: Make acpi_pci_query_osc() return control bits
  ACPI/PCI: Reorder checks in acpi_pci_osc_control_set()
  PCI: PCIe: Introduce commad line switch for disabling port services
  PCI: PCIe AER: Introduce pci_aer_available()
  x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set
  PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs
2010-09-07 16:00:17 -07:00
Alexander van Heukelum
fe8e0c25ca x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE
The irq stacks, located in the percpu-area, need to be
THREAD_SIZE aligned. Add the infrastucture to align percpu
variables to larger-than-pagesize amounts within the percpu
area, and use it to specify the alignment for the irq stacks.
Also align the percpu area itself to THREAD_SIZE.

This should make irq stacks work with 8K THREAD_SIZE.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Tejun Heo <tj@kernel.org>
Cc: hch@lst.de
LKML-Reference: <1283799222.15941.1393621887@webmail.messagingengine.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-07 05:07:00 +02:00
Jin Dongming
592091c0e2 therm_throt.c: Trivial printk message fix for a unsuitable abbreviation of 'thermal'
In unexpected_thermal_interrupt(), "LVT TMR interrupt" is used
in error message.

I don't think TMR is a suitable abbreviation for thermal.
  1.TMR has been used in IA32 Architectures Software Developer's
    Manual, and is the abbreviation for Trigger Mode Register.
  2.There is not an standard abbreviation "TMR" defined for thermal
    in IA32 Architectures Software Developer's Manual.
  3.Though we could understand it as Thermal Monitor Register, it is
    easy to be misunderstood as a *TIMER* interrupt also.

I think this patch will fix it.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Jean Delvare <khali@linux-fr.org>
Cc: Brown Len <len.brown@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <4C7C492D.5020704@np.css.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 20:26:50 +02:00
Andreas Herrmann
1389298f7d x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks
kobject_add_internal failed for threshold_bank2 with -EEXIST,
don't try to register things with the same name in the same
directory:

  Pid: 1, comm: swapper Tainted: G        W  2.6.31 #1
  Call Trace:
  [<ffffffff81161b07>] ? kobject_add_internal+0x156/0x180
  [<ffffffff81161cc0>] ? kobject_add+0x66/0x6b
  [<ffffffff81161793>] ? kobject_init+0x42/0x82
  [<ffffffff81161cf9>] ? kobject_create_and_add+0x34/0x63
  [<ffffffff81393963>] ? threshold_create_bank+0x14f/0x259
  [<ffffffff8139310a>] ? mce_create_device+0x8d/0x1b8
  [<ffffffff81646497>] ? threshold_init_device+0x3f/0x80
  [<ffffffff81646458>] ? threshold_init_device+0x0/0x80
  [<ffffffff81009050>] ? do_one_initcall+0x4f/0x143
  [<ffffffff816413a0>] ? kernel_init+0x14c/0x1a2
  [<ffffffff8100c8da>] ? child_rip+0xa/0x20
  [<ffffffff81641254>] ? kernel_init+0x0/0x1a2
  [<ffffffff8100c8d0>] ? child_rip+0x0/0x20
  kobject_create_and_add: kobject_add error: -17

(Probably the for_each_cpu loop should be entirely removed.)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100827092006.GB5348@loge.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:35:49 +02:00
Andreas Herrmann
d9fadd7ba9 x86, AMD: Remove needless CPU family check (for L3 cache info)
Old 32-bit AMD CPUs (all w/o L3 cache) should always return 0
for cpuid_edx(0x80000006).

For unknown reason the 32-bit implementation differed from the
64-bit implementation. See commit 67cddd9479 ("i386: Add L3 cache
support to AMD CPUID4 emulation"). The current check is the
result of the x86 merge.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <20100902133710.GA5449@loge.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:33:48 +02:00
Borislav Petkov
260133ab65 x86, GART: Disable GART table walk probes
Current code tramples over bit F3x90[6] which can be used to
disable GART table walk probes. However, this bit should be set
for performance reasons (speed up GART table walks). We are
allowed to do that since we put GART tables in UC memory later
anyway. Make it so.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1283531981-7495-3-git-send-email-bp@amd64.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:28:34 +02:00
Borislav Petkov
57ab43e331 x86, GART: Remove superfluous AMD64_GARTEN
There is a GARTEN so use that and drop the duplicate.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1283531981-7495-2-git-send-email-bp@amd64.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:28:34 +02:00
Francisco Jerez
cc1a8e5233 x86: Fix the address space annotations of iomap_atomic_prot_pfn()
This patch fixes the sparse warnings when the return pointer of
iomap_atomic_prot_pfn() is used as an argument of iowrite32()
and friends.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
LKML-Reference: <1283633804-11749-1-git-send-email-currojerez@riseup.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:26:14 +02:00
Wu Fengguang
1c5f50ee34 x86, mm: fix uninitialized addr in kernel_physical_mapping_init()
This re-adds the lost chunk in commit 9b861528a8.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haicheng Li <haicheng.li@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20100903090407.GA19771@localhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 11:40:11 +02:00
Jan Beulich
7fe977dab3 i386: Make kernel_execve() suitable for stack unwinding
The explicit saving and restoring of %ebx was confusing stack
unwind data consumers, and it is plain unnecessary to do this
within the asm(), since that was only introduced for PIC user
mode consumers of the original _syscall3() macro this was
derived from.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <4C7FBC660200007800013F95@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:16:02 +02:00
Jan Beulich
df5d1874ce x86: Use {push,pop}{l,q}_cfi in more places
... plus additionally introduce {push,pop}f{l,q}_cfi. All in the
hope that the code becomes better readable this way (it gets
quite a bit smaller in any case).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBDA40200007800013FAF@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:14:11 +02:00
Jan Beulich
a34107b557 i386: Add unwind directives to syscall ptregs stubs
When these stubs are actual functions (i.e. having a return
instruction) and have stack manipulation instructions in them,
they should also be annotated to allow unwinding through them.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBCF00200007800013F99@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:14:10 +02:00
Jan Beulich
b1cccb1bb0 x86-64: Use symbolics instead of raw numbers in entry_64.S
... making the code a little less fragile.

Also use pushq_cfi instead of raw CFI annotations in two more
places, and add two missing annotations after stack pointer
adjustments which got modified here anyway.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBACF0200007800013F6A@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:14:10 +02:00
Jan Beulich
1f130a783a x86-64: Adjust frame type at paranoid_exit:
As this isn't an exception or interrupt entry point, it doesn't
have any of the hardware provide frame layouts active.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBAA80200007800013F67@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:14:10 +02:00
Jan Beulich
e6b04b6b5a x86-64: Fix unwind annotations in syscall stubs
With the return address removed from the stack, these should
really refer to their caller's register state.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBA3D0200007800013F61@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:14:09 +02:00
Robert Richter
4177c42a63 perf, x86: Try to handle unknown nmis with an enabled PMU
When the PMU is enabled it is valid to have unhandled nmis, two
events could trigger 'simultaneously' raising two back-to-back
NMIs. If the first NMI handles both, the latter will be empty
and daze the CPU.

The solution to avoid an 'unknown nmi' massage in this case was
simply to stop the nmi handler chain when the PMU is enabled by
stating the nmi was handled. This has the drawback that a) we
can not detect unknown nmis anymore, and b) subsequent nmi
handlers are not called.

This patch addresses this. Now, we check this unknown NMI if it
could be a PMU back-to-back NMI. Otherwise we pass it and let
the kernel handle the unknown nmi.

This is a debug log:

 cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
 cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
 cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
 cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
 cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
 cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
 cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
 cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
 cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
 cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
 cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
 cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
 cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
 cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
 cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
 cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
 Uhhuh. NMI received for unknown reason 00 on CPU 6.
 Do you have a strange power saving mode enabled?
 Dazed and confused, but trying to continue

Deltas:

 nmi #32334 340186
 nmi #32335 1327704
 nmi #32336 1819      <<<< back-to-back nmi [1]
 nmi #32337 85961
 nmi #32338 284507
 nmi #32339 1578809
 nmi #32340 217616
 nmi #32341 1798      <<<< back-to-back nmi [2]
 nmi #32342 240913
 nmi #32343 1512809
 nmi #32344 116672
 nmi #32345 412453
 nmi #32346 1462095   <<<< 1st nmi (standard) handling 2 counters
 nmi #32347 2046      <<<< 2nd nmi (back-to-back) handling one
 counter nmi #32348 1773      <<<< 3rd nmi (back-to-back)
 handling no counter! [3]

For  back-to-back nmi detection there are the following rules:

The PMU nmi handler was handling more than one counter and no
counter was handled in the subsequent nmi (see [1] and [2]
above).

There is another case if there are two subsequent back-to-back
nmis [3]. The 2nd is detected as back-to-back because the first
handled more than one counter. If the second handles one counter
and the 3rd handles nothing, we drop the 3rd nmi because it
could be a back-to-back nmi.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[ renamed nmi variable to pmu_nmi to avoid clash with .nmi in entry.S ]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: gorcunov@gmail.com
Cc: fweisbec@gmail.com
Cc: ying.huang@intel.com
Cc: ming.m.lin@intel.com
Cc: eranian@google.com
LKML-Reference: <1283454469-1909-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:05:18 +02:00
Peter Zijlstra
de725dec9d perf, x86: Fix handle_irq return values
Now that we rely on the number of handled overflows, ensure all
handle_irq implementations actually return the right number.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: robert.richter@amd.com
Cc: gorcunov@gmail.com
Cc: fweisbec@gmail.com
Cc: ying.huang@intel.com
Cc: ming.m.lin@intel.com
Cc: eranian@google.com
LKML-Reference: <1283454469-1909-4-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:05:18 +02:00
Don Zickus
2e556b5b32 perf, x86: Fix accidentally ack'ing a second event on intel perf counter
During testing of a patch to stop having the perf subsytem
swallow nmis, it was uncovered that Nehalem boxes were randomly
getting unknown nmis when using the perf tool.

Moving the ack'ing of the PMI closer to when we get the status
allows the hardware to properly re-set the PMU bit signaling
another PMI was triggered during the processing of the first
PMI.  This allows the new logic for dealing with the
shortcomings of multiple PMIs to handle the extra NMI by
'eat'ing it later.

Now one can wonder why are we getting a second PMI when we
disable all the PMUs in the begining of the NMI handler to
prevent such a case, for that I do not know.  But I know the fix
below helps deal with this quirk.

Tested on multiple Nehalems where the problem was occuring.
With the patch, the code now loops a second time to handle the
second PMI (whereas before it was not).

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: robert.richter@amd.com
Cc: gorcunov@gmail.com
Cc: fweisbec@gmail.com
Cc: ying.huang@intel.com
Cc: ming.m.lin@intel.com
Cc: eranian@google.com
LKML-Reference: <1283454469-1909-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:05:17 +02:00
Ingo Molnar
b4c69d45c4 Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent 2010-09-01 22:31:07 +02:00
Robert Richter
269f45c250 oprofile, x86: fix init_sysfs() function stub
The use of the return value of init_sysfs() with commit

 10f0412 oprofile, x86: fix init_sysfs error handling

discovered the following build error for !CONFIG_PM:

 .../linux/arch/x86/oprofile/nmi_int.c: In function ‘op_nmi_init’:
 .../linux/arch/x86/oprofile/nmi_int.c:784: error: expected expression before ‘do’
 make[2]: *** [arch/x86/oprofile/nmi_int.o] Error 1
 make[1]: *** [arch/x86/oprofile] Error 2

This patch fixes this.

Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-09-01 21:23:01 +02:00
Cyrill Gorcunov
c9cf4a019c perf, x86, Pentium4: Add RAW events verification
Implements verification of

- Bits of ESCR EventMask field (meaningful bits in field are hardware
  predefined and others bits should be set to zero)

- INSTR_COMPLETED event (it is available on predefined cpu model only)

- Thread shared events (they should be guarded by "perf_event_paranoid"
  sysctl due to security reason). The side effect of this action is
  that PERF_COUNT_HW_BUS_CYCLES become a "paranoid" general event.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Lin Ming <ming.m.lin@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100825182334.GB14874@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-01 08:26:56 +02:00
Robert Richter
10f0412f57 oprofile, x86: fix init_sysfs error handling
On failure init_sysfs() might not properly free resources. The error
code of the function is not checked. And, when reinitializing the exit
function might be called twice. This patch fixes all this.

Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-08-31 10:26:26 +02:00
Ingo Molnar
daab7fc734 Merge commit 'v2.6.36-rc3' into x86/memblock
Conflicts:
	arch/x86/kernel/trampoline.c
	mm/memblock.c

Merge reason: Resolve the conflicts, update to latest upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31 09:45:46 +02:00
Konrad Rzeszutek Wilk
7ac41ccf47 x86, iommu: Fix IOMMU_INIT alignment rules
This boot crash was observed:

 DMA-API: preallocated 32768 debug entries
 DMA-API: debugging enabled by kernel config
 BUG: unable to handle kernel paging request at 19da8955
 IP: [<f4ffffff>] 0xf4ffffff
 *pde = 00000000

The crux of the failure was that even if we did not use any
of the .iommu_table section, the linker would still insert it
in the vmlinux file. This patch fixes that and also fixes the
runtime crash where we would try to access the array.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1283191802-25086-1-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31 08:06:10 +02:00
Julia Lawall
9fbaf49c7f x86, kmemcheck: Remove double test
The opcodes 0x2e and 0x3e are tested for in the first Group 2
line as well.

The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@expression@
expression E;
@@

(
* E
  || ... || E
|
* E
  && ... && E
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
LKML-Reference: <1283010066-20935-5-git-send-email-julia@diku.dk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-30 09:19:28 +02:00
Konrad Rzeszutek Wilk
6f44d0337c x86, doc: Adding comments about .iommu_table and its neighbors.
Updating the linker section with comments about .iommu_table and
some other ones that I know of.

CC: Sam Ravnborg <sam@ravnborg.org>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282933173-19960-1-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 18:14:31 -07:00
Yinghai Lu
774ea0bcb2 x86: Remove old bootmem code
Requested by Ingo, Thomas and HPA.

The old bootmem code is no longer necessary, and the transition is
complete.  Remove it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:14:37 -07:00
Yinghai Lu
6f2a75369e x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve
memblock_memory_size() will return memory size in memblock.memory.region.
memblock_free_memory_size() will return free memory size in memblock.memory.region.

So We can get exact reseved size in specified range.

Set the size right after initmem_init(), because later bootmem API will
get area above 16M. (except some fallback).

Later after we remove the bootmem, We could call that just before paging_init().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:13:54 -07:00
Yinghai Lu
a587d2daeb x86: Remove not used early_res code
and some functions in e820.c that are not used anymore

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:13:51 -07:00
Yinghai Lu
a9ce6bc151 x86, memblock: Replace e820_/_early string with memblock_
1.include linux/memblock.h directly. so later could reduce e820.h reference.
2 this patch is done by sed scripts mainly

-v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:13:47 -07:00
Yinghai Lu
72d7c3b33c x86: Use memblock to replace early_res
1. replace find_e820_area with memblock_find_in_range
2. replace reserve_early with memblock_x86_reserve_range
3. replace free_early with memblock_x86_free_range.
4. NO_BOOTMEM will switch to use memblock too.
5. use _e820, _early wrap in the patch, in following patch, will
   replace them all
6. because memblock_x86_free_range support partial free, we can remove some special care
7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
   so adjust some calling later in setup.c::setup_arch()
   -- corruption_check and mptable_update

-v2: Move reserve_brk() early
    Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
    that could happen We have more then 128 RAM entry in E820 tables, and
    memblock_x86_fill() could use memblock_find_in_range() to find a new place for
    memblock.memory.region array.
    and We don't need to use extend_brk() after fill_memblock_area()
    So move reserve_brk() early before fill_memblock_area().
-v3: Move find_smp_config early
    To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
    in right place.
-v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
    memblock.reserved already..
    use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
-v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
    active_region for 32bit does include high pages
    need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
-v6: Use current_limit instead
-v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
-v8: Set memblock_can_resize early to handle EFI with more RAM entries
-v9: update after kmemleak changes in mainline

Suggested-by: David S. Miller <davem@davemloft.net>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:12:29 -07:00
Yinghai Lu
301ff3e88e x86, memblock: Use memblock_debug to control debug message print out
Also let memblock_x86_reserve_range/memblock_x86_free_range could print out name if memblock=debug is
specified

will also print ther name when reserve_memblock_area/free_memblock_area are called.

-v2: according to Ingo, put " if (memblock_debug) " in one place

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:11:35 -07:00
Yinghai Lu
e82d42be24 x86, memblock: Add memblock_x86_memory_in_range()
It will return memory size in specified range according to memblock.memory.region

Try to share some code with memblock_x86_free_memory_in_range() by passing get_free to
__memblock_x86_memory_in_range().

-v2: Ben want _in_range in the name instead of size

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:11:30 -07:00
Yinghai Lu
b52c17ce85 x86, memblock: Add memblock_x86_free_memory_in_range()
It will return free memory size in specified range.

We can not use memory_size - reserved_size here, because some reserved area
may not be in the scope of memblock.memory.region.

Use memblock.memory.region subtracting memblock.reserved.region to get free range array.
then count size of all free ranges.

-v2: Ben insist on using _in_range

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:11:16 -07:00
Yinghai Lu
6bcc8176d0 x86, memblock: Add memblock_x86_find_in_range_node()
It can be used to find NODE_DATA for numa.

Need to make sure early_node_map[] is filled before it is called, otherwise
it will fallback to memblock_find_in_range(), with node range.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:57 -07:00
Yinghai Lu
88ba088c18 x86, memblock: Add memblock_x86_register_active_regions() and memblock_x86_hole_size()
memblock_x86_register_active_regions() will be used to fill early_node_map,
the result will be memblock.memory.region AND numa data

memblock_x86_hole_size will be used to find hole size on memblock.memory.region
with specified range.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:52 -07:00
Yinghai Lu
4d5cf86ce1 x86, memblock: Add get_free_all_memory_range()
get_free_all_memory_range is for CONFIG_NO_BOOTMEM=y, and will be called by
free_all_memory_core_early().

It will use early_node_map aka active ranges subtract memblock.reserved to
get all free range, and those ranges will convert to slab pages.

-v4: increase range size

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:48 -07:00
Yinghai Lu
9dc5d569c1 x86, memblock: Add memblock_x86_reserve_range/memblock_x86_free_range
They are wrappers for core versions, which take start/end/name instead
of base/size.  This will make x86 conversion eaasier.

could add more debug print out

-v2: change get_max_mapped() to memblock.default_alloc_limit according to Michael
      Ellerman and Ben
     change to memblock_x86_reserve_range and memblock_x86_free_range according to Michael Ellerman
-v3: call check_and_double after reserve/free, so could avoid to use
      find_memblock_area. Suggested by Michael Ellerman

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:41 -07:00
Yinghai Lu
27de794365 x86, memblock: Add memblock_x86_to_bootmem()
memblock_x86_to_bootmem() will reserve memblock.reserved.region in
bootmem after bootmem is set up.

We can use it to with all arches that support memblock later.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:08:21 -07:00
Yinghai Lu
f88eff74aa bootmem, x86: Add weak version of reserve_bootmem_generic
It will be used memblock_x86_to_bootmem converting

It is an wrapper for reserve_bootmem, and x86 64bit is using special one.

Also clean up that version for x86_64. We don't need to take care of numa
path for that, bootmem can handle it how

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:08:13 -07:00
Yinghai Lu
fb74fb6db9 x86, memblock: Add memblock_x86_find_in_range_size()
size is returned according free range.
Will be used to find free ranges for early_memtest and memory corruption check

Do not mess it up with lib/memblock.c yet.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:08:06 -07:00
Frederic Weisbecker
98ee74a75c Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/util/callchain.h

Merge reason:
	Fix a non-trivial conflict with latest fixes
2010-08-27 02:30:07 +02:00
Shaohua Li
660a293ea9 x86, mm: Make spurious_fault check explicitly check the PRESENT bit
pte_present() returns true even present bit isn't set but _PAGE_PROTNONE
(global bit) bit is set. While with CONFIG_DEBUG_PAGEALLOC, free pages have
global bit set but present bit clear. This patch makes we could catch
free pages access with CONFIG_DEBUG_PAGEALLOC enabled.

[ hpa: added a comment in the code as a warning to janitors ]

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <1280217988.32400.75.camel@sli10-desk.sh.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 16:00:21 -07:00
Konrad Rzeszutek Wilk
ee1f284f38 x86, iommu: Utilize the IOMMU_INIT macros functionality.
We remove all of the sub-platform detection/init routines and instead
use on the .iommu_table array of structs to call the .early_init if
.detect returned a positive value. Also we can stop detecting other
IOMMUs if the IOMMU used the _FINISH type macro. During the
'pci_iommu_init' stage, we call .init for the second-stage
initialization if it was defined. Currently only SWIOTLB has this
defined and it used to de-allocate the SWIOTLB if the other detected
IOMMUs have deemed it unnecessary to use SWIOTLB.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-11-git-send-email-konrad.wilk@oracle.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:14:52 -07:00
Konrad Rzeszutek Wilk
22e6daf41b x86, GART/AMD-VI: Make AMD GART and IOMMU use IOMMU_INIT_* macros.
We utilize the IOMMU_INIT macros to create this dependency:

               [null]
                 |
       [pci_xen_swiotlb_detect]
                 |
       [pci_swiotlb_detect_override]
                 |
       [pci_swiotlb_detect_4gb]
                 |
         +-------+--------+
        /                  \
[detect_calgary]    [gart_iommu_hole_init]
                            |
                    [amd_iommu_detect]

Meaning that 'amd_iommu_detect' will be called after
'gart_iommu_hole_init'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-9-git-send-email-konrad.wilk@oracle.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
CC: Joerg Roedel <joerg.roedel@amd.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:14:30 -07:00
Konrad Rzeszutek Wilk
d2aa232f3d x86, calgary: Make Calgary IOMMU use IOMMU_INIT_* macros.
We utilize the IOMMU_INIT macros to create this dependency:

     [pci_xen_swiotlb_detect]
         |
     [pci_swiotlb_detect_override]
         |
     [pci_swiotlb_detect_4gb]
         |
      [detect_calgary]

Meaning that 'detect_calgary' is going to be called after
'pci_swiotlb_detect'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-8-git-send-email-konrad.wilk@oracle.com>
CC: Muli Ben-Yehuda <muli@il.ibm.com>
CC: "Jon D. Mason" <jdmason@kudzu.us>
CC: "Darrick J. Wong" <djwong@us.ibm.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:14:15 -07:00
Konrad Rzeszutek Wilk
5cb3a26793 x86, xen-swiotlb: Make Xen-SWIOTLB use IOMMU_INIT_* macros.
We utilize the IOMMU_INIT macros to create this dependency:

               [null]
                 |
       [pci_xen_swiotlb_detect]
                 |
       [pci_swiotlb_detect_override]
                 |
       [pci_swiotlb_detect_4gb]

In other words, we set 'pci_xen_swiotlb_detect' to be
the first detection to be run during start.

CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-7-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:14:06 -07:00
Konrad Rzeszutek Wilk
c116c5457c x86, swiotlb: Make SWIOTLB use IOMMU_INIT_* macros.
We utilize the IOMMU_INIT macros to create this dependency:

       [pci_xen_swiotlb_detect]
                 |
       [pci_swiotlb_detect_override]
                 |
       [pci_swiotlb_detect_4gb]

And set the SWIOTLB IOMMU_INIT to utilize 'pci_swiotlb_init'
for .init and 'pci_swiotlb_late_init' for .late_init.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-6-git-send-email-konrad.wilk@oracle.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:13:37 -07:00
Konrad Rzeszutek Wilk
efa631c26d x86, swiotlb: Simplify SWIOTLB pci_swiotlb_detect routine.
In 'pci_swiotlb_detect' we used to do two different things:
 a). If user provided 'iommu=soft' or 'swiotlb=force' we
     would set swiotlb=1 and return 1 (and forcing pci-dma.c
     to call pci_swiotlb_init() immediately).
 b). If 4GB or more would be detected and if user did not specify
     iommu=off, we would set 'swiotlb=1' and return whatever 'a)'
     figured out.

We simplify this by splitting a) and b) in two different routines.

CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-5-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:13:29 -07:00
Konrad Rzeszutek Wilk
5bef80a4b8 x86, iommu: Add proper dependency sort routine (and sanity check).
We are using a very simple sort routine which sorts the .iommu_table
array in the order of dependencies. Specifically each structure
of iommu_table_entry has a field 'depend' which contains the function
pointer to the IOMMU that MUST be run before us. We sort the array
of structures so that the struct iommu_table_entry with no
'depend' field are first, and then the subsequent ones are the
ones for which the 'depend' function has been already invoked
(in other words, precede us).

Using the kernel's version 'sort', which is a mergeheap is
feasible, but would require making the comparison operator
scan recursivly the array to satisfy the "heapify" process: setting the
levels properly. The end result would much more complex than it should
be an it is just much simpler to utilize this simple sort routine.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-4-git-send-email-konrad.wilk@oracle.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:13:19 -07:00
Konrad Rzeszutek Wilk
480125ba49 x86, iommu: Make all IOMMU's detection routines return a value.
We return 1 if the IOMMU has been detected. Zero or an error number
if we failed to find it. This is in preperation of using the IOMMU_INIT
so that we can detect whether an IOMMU is present. I have not
tested this for regression on Calgary, nor on AMD Vi chipsets as
I don't have that hardware.

CC: Muli Ben-Yehuda <muli@il.ibm.com>
CC: "Jon D. Mason" <jdmason@kudzu.us>
CC: "Darrick J. Wong" <djwong@us.ibm.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: David Woodhouse <David.Woodhouse@intel.com>
CC: Chris Wright <chrisw@sous-sol.org>
CC: Yinghai Lu <yinghai@kernel.org>
CC: Joerg Roedel <joerg.roedel@amd.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-3-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:13:13 -07:00
Konrad Rzeszutek Wilk
0444ad93ea x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and iommu_table_entry structure
This patch set adds a mechanism to "modularize" the IOMMUs we have
on X86. Currently the count of IOMMUs is up to six and they have a complex
relationship that requires careful execution order. 'pci_iommu_alloc'
does that today, but most folks are unhappy with how it does it.
This patch set addresses this and also paves a mechanism to jettison
unused IOMMUs during run-time. For details that sparked this, please
refer to: http://lkml.org/lkml/2010/8/2/282

The first solution that comes to mind is to convert wholesale
the IOMMU detection routines to be called during initcall
time frame. Unfortunately that misses the dependency relationship
that some of the IOMMUs have (for example: for AMD-Vi IOMMU to work,
GART detection MUST run first, and before all of that SWIOTLB MUST run).

The second solution would be to introduce a registration call wherein
the IOMMU would provide its detection/init routines and as well on what
MUST run before it. That would work, except that the 'pci_iommu_alloc'
which would run through this list, is called during mem_init. This means we
don't have any memory allocator, and it is so early that we haven't yet
started running through the initcall_t list.

This solution borrows concepts from the 2nd idea and from how
MODULE_INIT works. A macro is provided that each IOMMU uses to define
it's detect function and early_init (before the memory allocate is
active), and as well what other IOMMU MUST run before us.  Since most IOMMUs
depend on having SWIOTLB run first ("pci_swiotlb_detect") a convenience macro
to depends on that is also provided.

This macro is similar in design to MODULE_PARAM macro wherein
we setup a .iommu_table section in which we populate it with the values
that match a struct iommu_table_entry. During bootup we will sort
through the array so that the IOMMUs that MUST run before us are first
elements in the array. And then we just iterate through them calling the
detection routine and if appropiate, the init routines.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-2-git-send-email-konrad.wilk@oracle.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:12:53 -07:00
Haicheng Li
9b861528a8 x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes
When memory hotplug-adding happens for a large enough area
that a new PGD entry is needed for the direct mapping, the PGDs
of other processes would not get updated. This leads to some CPUs
oopsing like below when they have to access the unmapped areas.

[ 1139.243192] BUG: soft lockup - CPU#0 stuck for 61s! [bash:6534]
[ 1139.243195] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc
dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc
lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib
i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore
[ 1139.243229] irq event stamp: 8538759
[ 1139.243230] hardirqs last  enabled at (8538759): [<ffffffff8100c3fc>] restore_args+0x0/0x30
[ 1139.243236] hardirqs last disabled at (8538757): [<ffffffff810422df>] __do_softirq+0x106/0x146
[ 1139.243240] softirqs last  enabled at (8538758): [<ffffffff81042310>] __do_softirq+0x137/0x146
[ 1139.243245] softirqs last disabled at (8538743): [<ffffffff8100cb5c>] call_softirq+0x1c/0x34
[ 1139.243249] CPU 0:
[ 1139.243250] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc
dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc
lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib
i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore
[ 1139.243284] Pid: 6534, comm: bash Tainted: G   M       2.6.32-haicheng-cpuhp #7 QSSC-S4R
[ 1139.243287] RIP: 0010:[<ffffffff810ace35>]  [<ffffffff810ace35>] alloc_arraycache+0x35/0x69
[ 1139.243292] RSP: 0018:ffff8802799f9d78  EFLAGS: 00010286
[ 1139.243295] RAX: ffff8884ffc00000 RBX: ffff8802799f9d98 RCX: 0000000000000000
[ 1139.243297] RDX: 0000000000190018 RSI: 0000000000000001 RDI: ffff8884ffc00010
[ 1139.243300] RBP: ffffffff8100c34e R08: 0000000000000002 R09: 0000000000000000
[ 1139.243303] R10: ffffffff8246dda0 R11: 000000d08246dda0 R12: ffff8802599bfff0
[ 1139.243305] R13: ffff88027904c040 R14: ffff8802799f8000 R15: 0000000000000001
[ 1139.243308] FS:  00007fe81bfe86e0(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000
[ 1139.243311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1139.243313] CR2: ffff8884ffc00000 CR3: 000000026cf2d000 CR4: 00000000000006f0
[ 1139.243316] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1139.243318] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1139.243321] Call Trace:
[ 1139.243324]  [<ffffffff810ace29>] ? alloc_arraycache+0x29/0x69
[ 1139.243328]  [<ffffffff8135004e>] ? cpuup_callback+0x1b0/0x32a
[ 1139.243333]  [<ffffffff8105385d>] ? notifier_call_chain+0x33/0x5b
[ 1139.243337]  [<ffffffff810538a4>] ? __raw_notifier_call_chain+0x9/0xb
[ 1139.243340]  [<ffffffff8134ecfc>] ? cpu_up+0xb3/0x152
[ 1139.243344]  [<ffffffff813388ce>] ? store_online+0x4d/0x75
[ 1139.243348]  [<ffffffff811e53f3>] ? sysdev_store+0x1b/0x1d
[ 1139.243351]  [<ffffffff8110589f>] ? sysfs_write_file+0xe5/0x121
[ 1139.243355]  [<ffffffff810b539d>] ? vfs_write+0xae/0x14a
[ 1139.243358]  [<ffffffff810b587f>] ? sys_write+0x47/0x6f
[ 1139.243362]  [<ffffffff8100b9ab>] ? system_call_fastpath+0x16/0x1b

This patch makes sure to always replicate new direct mapping PGD entries
to the PGDs of all processes, as well as ensures corresponding vmemmap
mapping gets synced.

V1: initial code by Andi Kleen.
V2: fix several issues found in testing.
V3: as suggested by Wu Fengguang, reuse common code of vmalloc_sync_all().

[ hpa: changed pgd_change from int to bool ]

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
LKML-Reference: <4C6E4FD8.6080100@linux.intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 14:02:33 -07:00
Haicheng Li
6afb5157b9 x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions
No behavior change.

Move some of vmalloc_sync_all() code into a new function
sync_global_pgds() that will be useful for memory hotplug.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
LKML-Reference: <4C6E4ECD.1090607@linux.intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 14:02:29 -07:00
H. Peter Anvin
9ea77bdb39 x86, bios: Make the x86 early memory reservation a kernel option
Add a kernel command-line option so the x86 early memory reservation
size can be adjusted at runtime instead of only at compile time.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <tip-d0cd7425fab774a480cce17c2f649984312d0b55@git.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-25 17:10:49 -07:00
Borislav Petkov
acf01734b1 x86, tsc: Remove CPU frequency calibration on AMD
6b37f5a20c introduced the CPU frequency
calibration code for AMD CPUs whose TSCs didn't increment with the
core's P0 frequency. From F10h, revB onward, however, the TSC increment
rate is denoted by MSRC001_0015[24] and when this bit is set (which
should be done by the BIOS) the TSC increments with the P0 frequency
so the calibration is not needed and booting can be a couple of mcecs
faster on those machines.

Besides, there should be virtually no machines out there which don't
have this bit set, therefore this calibration can be safely removed. It
is a shaky hack anyway since it assumes implicitly that the core is in
P0 when BIOS hands off to the OS, which might not always be the case.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100825162823.GE26438@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-25 13:32:52 -07:00
Linus Torvalds
d4348c6789 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag
  tracing/trace_stack: Fix stack trace on ppc64
2010-08-25 10:50:07 -07:00
Linus Torvalds
5e686019df Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states
  sched: Fix rq->clock synchronization when migrating tasks
2010-08-25 08:40:56 -07:00
Lin Ming
8d33091992 perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag
If on Pentium4 CPUs the FORCE_OVF flag is set then an NMI happens
on every event, which can generate a flood of NMIs. Clear it.

Reported-by: Vince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25 15:15:33 +02:00
Ingo Molnar
7de5d895b2 Merge branch 'linus' into perf/core
Merge reason: pick up perf fixes

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25 13:10:00 +02:00
Lin Ming
04fba67163 perf: Remove unused variable
This fixes the following build warning introduced by the
callchain rework:

  arch/x86/kernel/cpu/perf_event.c:1574: warning: ‘perf_callchain_entry_nmi’ defined but not used

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1282718949.16443.75.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25 13:09:41 +02:00
Hugh Dickins
b7d4608977 x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT
laptop: whereas cloning from kernel to low mappings pgd range does need
to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel
pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY.

Signed-off-by: Hugh Dickins <hughd@google.com>
LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-24 23:05:17 -07:00
H. Peter Anvin
d0cd7425fa x86, bios: By default, reserve the low 64K for all BIOSes
The laundry list of BIOSes that need the low 64K reserved is getting
very long, so make it the default across all BIOSes.  This also allows
the code to be simplified and unified with the reservation code for
the first 4K.

This resolves kernel bugzilla 16661 and who knows what else...

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-*@git.kernel.org>
2010-08-24 17:32:04 -07:00
Linus Torvalds
c05e1e23b8 Merge branch 'for-upstream/pvhvm' of git://xenbits.xensource.com/people/ianc/linux-2.6
* 'for-upstream/pvhvm' of git://xenbits.xensource.com/people/ianc/linux-2.6:
  xen: pvhvm: make it clearer that XEN_UNPLUG_* define bits in a bitfield
  xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary
  xen: pvhvm: allow user to request no emulated device unplug
2010-08-23 18:29:18 -07:00
Alok Kataria
b0f4c062fb x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI
VMI was the only user of the alloc_pmd_clone hook, given that VMI
is now removed we can also remove this hook.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1282608357.19396.36.camel@ank32.eng.vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-23 17:09:44 -07:00
Alok Kataria
9863c90f68 x86, vmware: Remove deprecated VMI kernel support
With the recent innovations in CPU hardware acceleration technologies
from Intel and AMD, VMware ran a few experiments to compare these
techniques to guest paravirtualization technique on VMware's platform.
These hardware assisted virtualization techniques have outperformed the
performance benefits provided by VMI in most of the workloads. VMware
expects that these hardware features will be ubiquitous in a couple of
years, as a result, VMware has started a phased retirement of this
feature from the hypervisor.

Please note that VMI has always been an optimization and non-VMI kernels
still work fine on VMware's platform.
Latest versions of VMware's product which support VMI are,
Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
releases for these products will continue supporting VMI.

For more details about VMI retirement take a look at this,
http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html

This feature removal was scheduled for 2.6.37 back in September 2009.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1282600151.19396.22.camel@ank32.eng.vmware.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-23 15:18:50 -07:00
Ma Ling
59daa706fb x86, mem: Optimize memcpy by avoiding memory false dependece
All read operations after allocation stage can run speculatively,
all write operation will run in program order, and if addresses are
different read may run before older write operation, otherwise wait
until write commit. However CPU don't check each address bit,
so read could fail to recognize different address even they
are in different page.For example if rsi is 0xf004, rdi is 0xe008,
in following operation there will generate big performance latency.
1. movq (%rsi),	%rax
2. movq %rax,	(%rdi)
3. movq 8(%rsi), %rax
4. movq %rax,	8(%rdi)

If %rsi and rdi were in really the same meory page, there are TRUE
read-after-write dependence because instruction 2 write 0x008 and
instruction 3 read 0x00c, the two address are overlap partially.
Actually there are in different page and no any issues,
but without checking each address bit CPU could think they are
in the same page, and instruction 3 have to wait for instruction 2
to write data into cache from write buffer, then load data from cache,
the cost time read spent is equal to mfence instruction. We may avoid it by
tuning operation sequence as follow.

1. movq 8(%rsi), %rax
2. movq %rax,	8(%rdi)
3. movq (%rsi),	%rax
4. movq %rax,	(%rdi)

Instruction 3 read 0x004, instruction 2 write address 0x010, no any
dependence.  At last on Core2 we gain 1.83x speedup compared with
original instruction sequence.  In this patch we first handle small
size(less 20bytes), then jump to different copy mode. Based on our
micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X
improvement, and up to 1.5X improvement for 1024 bytes on Corei7.  (We
use our micro-benchmark, and will do further test according to your
requirment)

Signed-off-by: Ma Ling <ling.ma@intel.com>
LKML-Reference: <1277753065-18610-1-git-send-email-ling.ma@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-23 14:56:41 -07:00
Ma, Ling
fdf4289679 x86, mem: Don't implement forward memmove() as memcpy()
memmove() allow source and destination address to be overlap, but
there is no such limitation for memcpy().  Therefore, explicitly
implement memmove() in both the forwards and backward directions, to
give us the ability to optimize memcpy().

Signed-off-by: Ma Ling <ling.ma@intel.com>
LKML-Reference: <C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-23 14:14:27 -07:00
Shaohua Li
61c77326d1 x86, mm: Avoid unnecessary TLB flush
In x86, access and dirty bits are set automatically by CPU when CPU accesses
memory. When we go into the code path of below flush_tlb_fix_spurious_fault(),
we already set dirty bit for pte and don't need flush tlb. This might mean
tlb entry in some CPUs hasn't dirty bit set, but this doesn't matter. When
the CPUs do page write, they will automatically check the bit and no software
involved.

On the other hand, flush tlb in below position is harmful. Test creates CPU
number of threads, each thread writes to a same but random address in same vma
range and we measure the total time. Under a 4 socket system, original time is
1.96s, while with the patch, the time is 0.8s. Under a 2 socket system, there is
20% time cut too. perf shows a lot of time are taking to send ipi/handle ipi for
tlb flush.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <20100816011655.GA362@sli10-desk.sh.intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andrea Archangeli <aarcange@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-23 10:04:57 -07:00
Ian Campbell
1dc7ce99b0 xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary
It is not immediately clear what this option causes to become
ignored. The actual meaning is that it is not necessary to unplug the
emulated devices to safely use the PV ones, even if the platform does
not support the unplug protocol. (pressumably the user will only add
this option if they have ensured that their domain configuration is
safe).

I think xen_emul_unplug=unnecessary better captures this.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
2010-08-23 11:59:29 +01:00
Ian Campbell
c93a4dfb31 xen: pvhvm: allow user to request no emulated device unplug
this allows the user to disable pvhvm and revert to emulated devices
in case of a system misconfiguration (e.g. initramfs with only
emulated drivers in it).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
2010-08-23 11:59:28 +01:00
Linus Torvalds
3dc8d7f07e Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PIT: free irq source id in handling error path
  KVM: destroy workqueue on kvm_create_pit() failures
  KVM: fix poison overwritten caused by using wrong xstate size
2010-08-22 11:27:36 -07:00
Samuel Thibault
ddb0c5a689 Replace Configure with Enable in description of MAXSMP
The "Configure" word tends to make user believe they have to say 'yes'
to be able to choose the number of procs/nodes.  "Enable" should be
unambiguous enough.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-21 12:38:58 -07:00
Sergey Senozhatsky
51e3c1b558 x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev
Fix BUG: using smp_processor_id() in preemptible thermal_throttle_add_dev.
We know the cpu number when calling thermal_throttle_add_dev, so we can
remove smp_processor_id call in thermal_throttle_add_dev by supplying
the cpu number as argument.

This should resolve kernel bugzilla 16615/16629.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
LKML-Reference: <20100820073634.GB5209@swordfish.minsk.epam.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-20 19:56:00 -07:00
Linus Torvalds
36423a5ed5 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, apic: Fix apic=debug boot crash
  x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
  x86-32: Fix dummy trampoline-related inline stubs
  x86-32: Separate 1:1 pagetables from swapper_pg_dir
  x86, cpu: Fix regression in AMD errata checking code
2010-08-20 14:25:08 -07:00
Peter Zijlstra
ed80526166 perf: Remove superfluous return values from perf_callchain_*()
Fixes these build warnings introduced by the callchain
rework:

 arch/x86/kernel/cpu/perf_event.c: In function ‘perf_callchain_kernel’:
 arch/x86/kernel/cpu/perf_event.c:1646: warning: ‘return’ with a value, in function returning void
 arch/x86/kernel/cpu/perf_event.c: In function ‘perf_callchain_user’:
 arch/x86/kernel/cpu/perf_event.c:1699: warning: ‘return’ with a value, in function returning void
 arch/x86/kernel/cpu/perf_event.c: At top level:
 arch/x86/kernel/cpu/perf_event.c:1607: warning: ‘perf_callchain_entry_nmi’ defined but not used

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-20 14:59:39 +02:00
Suresh Siddha
cd7240c0b9 x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states
TSC's get reset after suspend/resume (even on cpu's with invariant TSC
which runs at a constant rate across ACPI P-, C- and T-states). And in
some systems BIOS seem to reinit TSC to arbitrary large value (still
sync'd across cpu's) during resume.

This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less
than rq->age_stamp (introduced in 2.6.32). This leads to a big value
returned by scale_rt_power() and the resulting big group power set by the
update_group_power() is causing improper load balancing between busy and
idle cpu's after suspend/resume.

This resulted in multi-threaded workloads (like kernel-compilation) go
slower after suspend/resume cycle on core i5 laptops.

Fix this by recomputing cyc2ns_offset's during resume, so that
sched_clock() continues from the point where it was left off during
suspend.

Reported-by: Florian Pritz <flo@xssn.at>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: <stable@kernel.org> # [v2.6.32+]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-20 14:59:02 +02:00
Daniel Kiper
05e407603e x86, apic: Fix apic=debug boot crash
Fix a boot crash when apic=debug is used and the APIC is
not properly initialized.

This issue appears during Xen Dom0 kernel boot but the
fix is generic and the crash could occur on real hardware
as well.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: jeremy@goop.org
Cc: <stable@kernel.org> # .35.x, .34.x, .33.x, .32.x
LKML-Reference: <20100819224616.GB9967@router-fw-old.local.net-space.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-20 10:18:28 +02:00
Borislav Petkov
d7c53c9e82 x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.

Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.

[ v2.1: fix !HOTPLUG_CPU build ]

Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100819181029.GC17171@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-19 14:47:43 -07:00
Linus Torvalds
b3ea36b7a2 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kprobes/x86: Fix the return address of multiple kretprobes
  perf tools: Fix build error on read only source.
  perf, x86: Fix Intel-nhm PMU programming errata workaround
2010-08-19 09:06:49 -07:00
KUMANO Syuhei
737480a0d5 kprobes/x86: Fix the return address of multiple kretprobes
Fix the return address of subsequent kretprobes when multiple
kretprobes are set on the same function.

For example:

 # cd /sys/kernel/debug/tracing
 # echo "r:event1 sys_symlink" > kprobe_events
 # echo "r:event2 sys_symlink" >> kprobe_events
 # echo 1 > events/kprobes/enable
 # ln -s /tmp/foo /tmp/bar

(without this patch)

 # cat trace
              ln-897   [000] 20404.133727: event1: (kretprobe_trampoline+0x0/0x4c <- sys_symlink)
              ln-897   [000] 20404.133747: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)

(with this patch)

 # cat trace
              ln-740   [000] 13799.491076: event1: (system_call_fastpath+0x16/0x1b <- sys_symlink)
              ln-740   [000] 13799.491096: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)

Signed-off-by: KUMANO Syuhei <kumano.prog@gmail.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
LKML-Reference: <1281853084.3254.11.camel@camp10-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-19 12:49:56 +02:00
Ingo Molnar
c8710ad389 Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-08-19 12:48:09 +02:00
Frederic Weisbecker
927c7a9e92 perf: Fix race in callchains
Now that software events don't have interrupt disabled anymore in
the event path, callchains can nest on any context. So seperating
nmi and others contexts in two buffers has become racy.

Fix this by providing one buffer per nesting level. Given the size
of the callchain entries (2040 bytes * 4), we now need to allocate
them dynamically.

v2: Fixed put_callchain_entry call after recursion.
    Fix the type of the recursion, it must be an array.

v3: Use a manual pr cpu allocation (temporary solution until NMIs
    can safely access vmalloc'ed memory).
    Do a better separation between callchain reference tracking and
    allocation. Make the "put" path lockless for non-release cases.

v4: Protect the callchain buffers with rcu.

v5: Do the cpu buffers allocations node affine.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David Miller <davem@davemloft.net>
Cc: Borislav Petkov <bp@amd64.org>
2010-08-19 01:32:31 +02:00
Frederic Weisbecker
f72c1a931e perf: Factorize callchain context handling
Store the kernel and user contexts from the generic layer instead
of archs, this gathers some repetitive code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
2010-08-19 01:32:11 +02:00
Frederic Weisbecker
56962b4449 perf: Generalize some arch callchain code
- Most archs use one callchain buffer per cpu, except x86 that needs
  to deal with NMIs. Provide a default perf_callchain_buffer()
  implementation that x86 overrides.

- Centralize all the kernel/user regs handling and invoke new arch
  handlers from there: perf_callchain_user() / perf_callchain_kernel()
  That avoid all the user_mode(), current->mm checks and so...

- Invert some parameters in perf_callchain_*() helpers: entry to the
  left, regs to the right, following the traditional (dst, src).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
2010-08-19 01:30:59 +02:00
Frederic Weisbecker
70791ce9ba perf: Generalize callchain_store()
callchain_store() is the same on every archs, inline it in
perf_event.h and rename it to perf_callchain_store() to avoid
any collision.

This removes repetitive code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
2010-08-19 01:30:11 +02:00
Frederic Weisbecker
c1a65932fd perf: Drop unappropriate tests on arch callchains
Drop the TASK_RUNNING test on user tasks for callchains as
this check doesn't seem to make any sense.

Also remove the tests for !current that is not supposed to
happen and current->pid as this should be handled at the
generic level, with exclude_idle attribute.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
2010-08-19 01:29:35 +02:00
H. Peter Anvin
8848a91068 x86-32: Fix dummy trampoline-related inline stubs
Fix dummy inline stubs for trampoline-related functions when no
trampolines exist (until we get rid of the no-trampoline case
entirely.)

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4C6C294D.3030404@zytor.com>
2010-08-18 12:42:24 -07:00
Joerg Roedel
fd89a13792 x86-32: Separate 1:1 pagetables from swapper_pg_dir
This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:

1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.

2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.

3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.

4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.

5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.

6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.

7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.

There are two ways to fix this issue:

        1. Always do a global TLB flush when a new cr3 is loaded and the
        old page table was swapper_pg_dir. I consider this a hack hard
        to understand and with performance implications

        2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
        does.

This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.

-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100816123833.GB28147@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-18 09:17:20 -07:00
Hans Rosenfeld
07a7795ca2 x86, cpu: Fix regression in AMD errata checking code
A bug in the family-model-stepping matching code caused the presence of
errata to go undetected when OSVW was not used. This causes hangs on
some K8 systems because the E400 workaround is not enabled.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
LKML-Reference: <1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-18 09:16:28 -07:00
Zhang, Yanmin
351af0725e perf, x86: Fix Intel-nhm PMU programming errata workaround
Fix the Errata AAK100/AAP53/BD53 workaround, the officialy documented
workaround we implemented in:

 11164cd: perf, x86: Add Nehelem PMU programming errata workaround

doesn't actually work fully and causes a stuck PMU state
under load and non-functioning perf profiling.

A functional workaround was found by trial & error.

Affects all Nehalem-class Intel PMUs.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1281073148.2125.63.camel@ymzhang.sh.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@kernel.org> # .35.x
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-18 11:17:39 +02:00
Linus Torvalds
392abeea52 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  vt,console,kdb: preserve console_blanked while in kdb
  vt: fix regression warnings from KMS merge
  arm,kgdb: fix GDB_MAX_REGS no longer used
  kgdb: add missing __percpu markup in arch/x86/kernel/kgdb.c
  kdb: fix compile error without CONFIG_KALLSYMS
2010-08-17 18:36:19 -07:00
David Howells
d7627467b7 Make do_execve() take a const filename pointer
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:

arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type

This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to.  This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel().  A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().

do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.

Further kernel_execve() and sys_execve() need to be changed to match.

This has been test built on x86_64, frv, arm and mips.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-17 18:07:43 -07:00
Jesse Barnes
23b90cfd7b x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set
Otherwise we'll duplicate definitions with the pci.h stubs.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-17 09:29:36 -07:00
Xiao Guangrong
6b5d7a9f6f KVM: PIT: free irq source id in handling error path
Free irq source id if create pit workqueue fail

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-17 12:04:23 +03:00
Namhyung Kim
8c8aefce93 kgdb: add missing __percpu markup in arch/x86/kernel/kgdb.c
breakinfo->pev is a pointer to percpu pointer but was missing __percpu markup.
Add it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-08-16 15:58:30 -05:00
Linus Torvalds
2245ba2a3a Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  gcc-4.6: ACPI: fix unused but set variables in ACPI
  ACPI thermal: make procfs I/F depend on CONFIG_ACPI_PROCFS
  ACPI video: make procfs I/F depend on CONFIG_ACPI_PROCFS
  ACPI processor: remove deprecated ACPI procfs I/F
  ACPI power_resource: remove unused procfs I/F
  ACPI: remove deprecated ACPI procfs I/F
  ACPI: introduce drivers/acpi/sysfs.c
  ACPI: introduce module parameter acpi.aml_debug_output
  ACPI: introduce drivers/acpi/debugfs.c
  ACPI, APEI, ERST debug support
  ACPI, APEI, Manage GHES as platform devices
  ACPI, APEI, Rename CPER and GHES severity constants
  ACPI, APEI, Fix a typo of error path of apei_resources_request
  ACPI / ACPICA: Fix reference counting problems with GPE handlers
  ACPI: Add the check of ADR flag in course of finding ACPI handle for PCI device
  ACPI / Sleep: Drop acpi_suspend_finish()
  ACPI / Sleep: Consolidate suspend and hibernation routines
  ACPI / Wakeup: Simplify enabling of wakeup devices
  ACPI / Sleep: Rework enabling wakeup devices
  ACPI / Sleep: Free NVS copy if suspending of devices fails

Fixed up totally buggered "ACPI: fix unused but set variables in ACPI"
patch that doesn't even compile in the merge.

Thanks to Sedat Dilek <sedat.dilek@googlemail.com> for noticing the
breakage before I even pulled.  And a big "Grrr.." at Len for not even
bothering to compile the tree before asking me to pull.
2010-08-15 17:37:07 -07:00
Xiaotian Feng
3185bf8c23 KVM: destroy workqueue on kvm_create_pit() failures
kernel needs to destroy workqueue if kvm_create_pit() fails, otherwise
after pit is freed, the workqueue is leaked.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-15 14:17:35 +03:00
Xiaotian Feng
f45755b834 KVM: fix poison overwritten caused by using wrong xstate size
fpu.state is allocated from task_xstate_cachep, the size of task_xstate_cachep
is xstate_size. xstate_size is set from cpuid instruction, which is often
smaller than sizeof(struct xsave_struct). kvm is using sizeof(struct xsave_struct)
to fill in/out fpu.state.xsave, as what we allocated for fpu.state is
xstate_size, kernel will write out of memory and caused poison/redzone/padding
overwritten warnings.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Sheng Yang <sheng@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-15 14:10:15 +03:00
Len Brown
95ee46aa86 Merge branch 'linus' into release
Conflicts:
	drivers/acpi/debug.c

Signed-off-by: Len Brown <len.brown@intel.com>
2010-08-15 01:06:31 -04:00
Sam Ravnborg
8b1bb90701 defconfig reduction
Use the defconfig files generated by "make savedefconfig" for
remaining defconfig files.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2010-08-14 22:26:53 +02:00
Sam Ravnborg
bf56fba670 archs: replace unifdef-y with header-y
unifdef-y and header-y have same semantic, so drop unifdef-y

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2010-08-14 22:26:51 +02:00
Linus Torvalds
c206d44ffd Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: Make kdump avoid stack dumps - fix !CONFIG_KEXEC breakage
  x86, UV: Initialize BAU hub map
  x86, UV: Make kdump avoid stack dumps
2010-08-13 18:00:25 -07:00
Linus Torvalds
78b148696d Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] acpi-cpufreq: add missing __percpu markup
2010-08-13 17:59:09 -07:00