Commit Graph

3733 Commits

Author SHA1 Message Date
Suresh Siddha
ad5ca55f6b x86, cpa: srlz cpa(), global flush tlb after splitting big page and before doing cpa
Do a global flush tlb after splitting the large page and before we do the
actual change page attribute in the PTE.

With out this, we violate the TLB application note, which says
    "The TLBs may contain both ordinary and large-page translations for
     a 4-KByte range of linear addresses. This may occur if software
     modifies the paging structures so that the page size used for the
     address range changes. If the two translations differ with respect
     to page frame or attributes (e.g., permissions), processor behavior
     is undefined and may be implementation-specific."

And also serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity
mappings) using cpa_lock. So that we don't allow any other cpu, with stale
large tlb entries change the page attribute in parallel to some other cpu
splitting a large page entry along with changing the attribute.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:17 +02:00
Suresh Siddha
8311eb84bf x86, cpa: remove cpa pool code
Interrupt context no longer splits large page in cpa(). So we can do away
with cpa memory pool code.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:16 +02:00
Suresh Siddha
55121b4369 x86, cpa: no need to check alias for __set_pages_p/__set_pages_np
No alias checking needed for setting present/not-present mapping. Otherwise,
we may need to break large pages for 64-bit kernel text mappings (this adds to
complexity if we want to do this from atomic context especially, for ex:
with CONFIG_DEBUG_PAGEALLOC). Let's keep it simple!

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:15 +02:00
Suresh Siddha
0b8fdcbcd2 x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC
Don't use large pages for kernel identity mapping with DEBUG_PAGEALLOC.
This will remove the need to split the large page for the
allocated kernel page in the interrupt context.

This will simplify cpa code(as we don't do the split any more from the
interrupt context). cpa code simplication in the subsequent patches.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:14 +02:00
Suresh Siddha
a2699e477b x86, cpa: make the kernel physical mapping initialization a two pass sequence
In the first pass, kernel physical mapping will be setup using large or
small pages but uses the same PTE attributes as that of the early
PTE attributes setup by early boot code in head_[32|64].S

After flushing TLB's, we go through the second pass, which setups the
direct mapped PTE's with the appropriate attributes (like NX, GLOBAL etc)
which are runtime detectable.

This two pass mechanism conforms to the TLB app note which says:

"Software should not write to a paging-structure entry in a way that would
 change, for any linear address, both the page size and either the page frame
 or attributes."

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:13 +02:00
Suresh Siddha
b2bc273146 x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:11 +02:00
Ingo Molnar
f81b691a3d Merge commit 'v2.6.27-rc6' into x86/pat 2008-09-14 17:26:53 +02:00
Linus Torvalds
93811d94f7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix memmap=exactmap boot argument
  x86: disable static NOPLs on 32 bits
  xen: fix 2.6.27-rc5 xen balloon driver warnings
2008-09-09 12:23:41 -07:00
Prarit Bhargava
d6be118a97 x86: fix memmap=exactmap boot argument
When using kdump modifying the e820 map is yielding strange results.

For example starting with

 BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000100 - 0000000000093400 (usable)
 BIOS-e820: 0000000000093400 - 00000000000a0000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fee0000 (usable)
 BIOS-e820: 000000003fee0000 - 000000003fef3000 (ACPI data)
 BIOS-e820: 000000003fef3000 - 000000003ff80000 (ACPI NVS)
 BIOS-e820: 000000003ff80000 - 0000000040000000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)

and booting with args

memmap=exactmap memmap=640K@0K memmap=5228K@16384K memmap=125188K@22252K memmap=76K#1047424K memmap=564K#1047500K

resulted in:

 user-defined physical RAM map:
 user: 0000000000000000 - 0000000000093400 (usable)
 user: 0000000000093400 - 00000000000a0000 (reserved)
 user: 0000000000100000 - 000000003fee0000 (usable)
 user: 000000003fee0000 - 000000003fef3000 (ACPI data)
 user: 000000003fef3000 - 000000003ff80000 (ACPI NVS)
 user: 000000003ff80000 - 0000000040000000 (reserved)
 user: 00000000e0000000 - 00000000f0000000 (reserved)
 user: 00000000fec00000 - 00000000fec10000 (reserved)
 user: 00000000fee00000 - 00000000fee01000 (reserved)
 user: 00000000ff000000 - 0000000100000000 (reserved)

But should have resulted in:

 user-defined physical RAM map:
 user: 0000000000000000 - 00000000000a0000 (usable)
 user: 0000000001000000 - 000000000151b000 (usable)
 user: 00000000015bb000 - 0000000008ffc000 (usable)
 user: 000000003fee0000 - 000000003ff80000 (ACPI data)

This is happening because of an improper usage of strcmp() in the
e820 parsing code.  The strcmp() always returns !0 and never resets the
value for e820.nr_map and returns an incorrect user-defined map.

This patch fixes the problem.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-09 11:54:53 -07:00
Linus Torvalds
14469a8dd2 x86: disable static NOPLs on 32 bits
On 32-bit, at least the generic nops are fairly reasonable, but the
default nops for 64-bit really look pretty sad, and the P6 nops really do
look better.

So I would suggest perhaps moving the static P6 nop selection into the
CONFIG_X86_64 thing.

The alternative is to just get rid of that static nop selection, and just
have two cases: 32-bit and 64-bit, and just pick obviously safe cases for
them.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-09-08 11:35:43 -07:00
Linus Torvalds
64f996f670 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: cpu_init(): fix memory leak when using CPU hotplug
  x86: pda_init(): fix memory leak when using CPU hotplug
  x86, xen: Use native_pte_flags instead of native_pte_val for .pte_flags
  x86: move mtrr cpu cap setting early in early_init_xxxx
  x86: delay early cpu initialization until cpuid is done
  x86: use X86_FEATURE_NOPL in alternatives
  x86: add NOPL as a synthetic CPU feature bit
  x86: boot: stub out unimplemented CPU feature words
2008-09-06 19:36:23 -07:00
Linus Torvalds
f532522565 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clocksource, acpi_pm.c: check for monotonicity
  clocksource, acpi_pm.c: use proper read function also in errata mode
  ntp: fix calculation of the next jiffie to trigger RTC sync
  x86: HPET: read back compare register before reading counter
  x86: HPET fix moronic 32/64bit thinko
  clockevents: broadcast fixup possible waiters
  HPET: make minimum reprogramming delta useful
  clockevents: prevent endless loop lockup
  clockevents: prevent multiple init/shutdown
  clockevents: enforce reprogram in oneshot setup
  clockevents: prevent endless loop in periodic broadcast handler
  clockevents: prevent clockevent event_handler ending up handler_noop
2008-09-06 19:33:26 -07:00
Andreas Herrmann
23952a96ae x86: cpu_init(): fix memory leak when using CPU hotplug
Exception stacks are allocated each time a CPU is set online.
But the allocated space is never freed. Thus with one CPU hotplug
offline/online cycle there is a memory leak of 24K (6 pages) for
a CPU.

Fix is to allocate exception stacks only once -- when the CPU is
set online for the first time.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-06 20:48:16 +02:00
Andreas Herrmann
d04ec773d7 x86: pda_init(): fix memory leak when using CPU hotplug
pda->irqstackptr is allocated whenever a CPU is set online.
But it is never freed. This results in a memory leak of 16K
for each CPU offline/online cycle.

Fix is to allocate pda->irqstackptr only once.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-06 20:48:02 +02:00
Eduardo Habkost
e4a6be4d28 x86, xen: Use native_pte_flags instead of native_pte_val for .pte_flags
Using native_pte_val triggers the BUG_ON() in the paravirt_ops
version of pte_flags().

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-06 20:13:58 +02:00
Yinghai Lu
dd786dd12c x86: move mtrr cpu cap setting early in early_init_xxxx
Krzysztof Helt found MTRR is not detected on k6-2

root cause:
	we moved mtrr_bp_init() early for mtrr trimming,
and in early_detect we only read the CPU capability from cpuid,
so some cpu doesn't have that bit in cpuid.

So we need to add early_init_xxxx to preset those bit before mtrr_bp_init
for those earlier cpus.

this patch is for v2.6.27

Reported-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-06 17:50:55 +02:00
Krzysztof Helt
12cf105cd6 x86: delay early cpu initialization until cpuid is done
Move early cpu initialization after cpu early get cap so the
early cpu initialization can fix up cpu caps.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-06 17:50:38 +02:00
Thomas Gleixner
72d43d9bc9 x86: HPET: read back compare register before reading counter
After fixing the u32 thinko I sill had occasional hickups on ATI chipsets
with small deltas. There seems to be a delay between writing the compare
register and the transffer to the internal register which triggers the
interrupt. Reading back the value makes sure, that it hit the internal
match register befor we compare against the counter value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-09-06 07:21:17 +02:00
Thomas Gleixner
f7676254f1 x86: HPET fix moronic 32/64bit thinko
We use the HPET only in 32bit mode because:
1) some HPETs are 32bit only
2) on i386 there is no way to read/write the HPET atomic 64bit wide

The HPET code unification done by the "moron of the year" did
not take into account that unsigned long is different on 32 and
64 bit.

This thinko results in a possible endless loop in the clockevents
code, when the return comparison fails due to the 64bit/332bit
unawareness. 

unsigned long cnt = (u32) hpet_read() + delta can wrap over 32bit.
but the final compare will fail and return -ETIME causing endless
loops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-09-06 07:21:17 +02:00
H. Peter Anvin
f31d731e44 x86: use X86_FEATURE_NOPL in alternatives
Use X86_FEATURE_NOPL to determine if it is safe to use P6 NOPs in
alternatives.  Also, replace table and loop with simple if statement.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-09-05 16:14:01 -07:00
H. Peter Anvin
b6734c35af x86: add NOPL as a synthetic CPU feature bit
The long noops ("NOPL") are supposed to be detected by family >= 6.
Unfortunately, several non-Intel x86 implementations, both hardware
and software, don't obey this dictum.  Instead, probe for NOPL
directly by executing a NOPL instruction and see if we get #UD.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-09-05 16:13:52 -07:00
H. Peter Anvin
b74b06c5f6 x86: boot: stub out unimplemented CPU feature words
The CPU feature detection code in the boot code is somewhat minimal,
and doesn't include all possible CPUID words.  In particular, it
doesn't contain the code for CPU feature words 2 (Transmeta),
3 (Linux-specific), 5 (VIA), or 7 (scattered).  Zero them out, so we
can still set those bits as known at compile time; in particular, this
allows creating a Linux-specific NOPL flag and have it required (and
therefore resolvable at compile time) in 64-bit mode.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-09-05 16:13:44 -07:00
Linus Torvalds
1c402c8cd1 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: add io delay quirk for Presario F700
2008-09-05 14:36:21 -07:00
Jeremy Fitzhardinge
110e0358e7 x86: make sure the CPA test code's use of _PAGE_UNUSED1 is obvious
The CPA test code uses _PAGE_UNUSED1, so make sure its obvious.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-05 17:09:57 +02:00
Thomas Gleixner
7cfb043533 HPET: make minimum reprogramming delta useful
The minimum reprogramming delta was hardcoded in HPET ticks,
which is stupid as it does not work with faster running HPETs.
The C1E idle patches made this prominent on AMD/RS690 chipsets,
where the HPET runs with 25MHz. Set it to 5us which seems to be
a reasonable value and fixes the problems on the bug reporters
machines. We have a further sanity check now in the clock events,
which increases the delta when it is not sufficient.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Tested-by: Dmitry Nezhevenko <dion@inhex.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-05 11:11:54 +02:00
Alok N Kataria
de014d6176 x86: Change warning message in TSC calibration.
When calibration against PIT fails, the warning that we print is misleading.
In a virtualized environment the VM may get descheduled while calibration
or, the check in PIT calibration may fail due to other virtualization
overheads.

The warning message explicitly assumes that calibration failed due to SMI's
which may not be the case. Change that to something proper.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-03 20:10:37 -07:00
Chuck Ebbert
e6a5652fd1 x86: add io delay quirk for Presario F700
Manually adding "io_delay=0xed" fixes system lockups in ioapic
mode on this machine.

System Information
	Manufacturer: Hewlett-Packard
	Product Name: Presario F700 (KA695EA#ABF)

Base Board Information
	Manufacturer: Quanta
	Product Name: 30D3

Reference:
https://bugzilla.redhat.com/show_bug.cgi?id=459546

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-09-03 16:42:51 -07:00
Linus Torvalds
ec0c15afb4 Split up PIT part of TSC calibration from native_calibrate_tsc
The TSC calibration function is still very complicated, but this makes
it at least a little bit less so by moving the PIT part out into a
helper function of its own.

Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-03 07:30:13 -07:00
Thomas Gleixner
fbb16e2438 [x86] Fix TSC calibration issues
Larry Finger reported at http://lkml.org/lkml/2008/9/1/90:
An ancient laptop of mine started throwing errors from b43legacy when
I started using 2.6.27 on it. This has been bisected to commit bfc0f59
"x86: merge tsc calibration".

The unification of the TSC code adopted mostly the 64bit code, which
prefers PMTIMER/HPET over the PIT calibration.

Larrys system has an AMD K6 CPU. Such systems are known to have
PMTIMER incarnations which run at double speed. This results in a
miscalibration of the TSC by factor 0.5. So the resulting calibrated
CPU/TSC speed is half of the real CPU speed, which means that the TSC
based delay loop will run half the time it should run. That might
explain why the b43legacy driver went berserk.

On the other hand we know about systems, where the PIT based
calibration results in random crap due to heavy SMI/SMM
disturbance. On those systems the PMTIMER/HPET based calibration logic
with SMI detection shows better results.

According to Alok also virtualized systems suffer from the PIT
calibration method.

The solution is to use a more wreckage aware aproach than the current
either/or decision.

1) reimplement the retry loop which was dropped from the 32bit code
during the merge. It repeats the calibration and selects the lowest
frequency value as this is probably the closest estimate to the real
frequency

2) Monitor the delta of the TSC values in the delay loop which waits
for the PIT counter to reach zero. If the maximum value is
significantly different from the minimum, then we have a pretty safe
indicator that the loop was disturbed by an SMI.

3) keep the pmtimer/hpet reference as a backup solution for systems
where the SMI disturbance is a permanent point of failure for PIT
based calibration

4) do the loop iteration for both methods, record the lowest value and
decide after all iterations finished.

5) Set a clear preference to PIT based calibration when the result
makes sense.

The implementation does the reference calibration based on
HPET/PMTIMER around the delay, which is necessary for the PIT anyway,
but keeps separate TSC values to ensure the "independency" of the
resulting calibration values.

Tested on various 32bit/64bit machines including Geode 266Mhz, AMD K6
(affected machine with a double speed pmtimer which I grabbed out of
the dump), Pentium class machines and AMD/Intel 64 bit boxen.

Bisected-by:  Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 20:35:56 -07:00
Linus Torvalds
011fec7486 Un-break printk strings in x86 PCI probing code
Breaking lines due to some imaginary problem with a long line length is
often stupid and wrong, but never more so when it splits a string that
is printed out into multiple lines.  This really ended up making it much
harder to find where some error strings were printed out, because a
simple 'grep' didn't work.

I'm sure there is tons more of this particular idiocy hiding in other
places, but this particular case hit me once more last week. So fix it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 10:38:28 -07:00
Linus Torvalds
b460947211 Revert "x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3"
This reverts commit a2bd7274b4.

It wasn't really right to begin with (there's a better fix for the
problem with e820 reservations clashing with PCI BAR's pending), but it
also actually causes more regressions, so it should be reverted even
before the better fix is finalized.

Rafael reports that this commit broke AHCI detection, and thus causes
the kernel to not boot on his quad core test box.

Reported-and-bisected-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: David Witbrodt <dawitbro@sbcglobal.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-29 14:46:05 -07:00
Linus Torvalds
e52c8857e0 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: update defconfigs
  x86: msr: fix bogus return values from rdmsr_safe/wrmsr_safe
  x86: cpuid: correct return value on partial operations
  x86: msr: correct return value on partial operations
  x86: cpuid: propagate error from smp_call_function_single()
  x86: msr: propagate errors from smp_call_function_single()
  smp: have smp_call_function_single() detect invalid CPUs
2008-08-28 12:30:59 -07:00
H. Peter Anvin
c1b362e3b4 x86: update defconfigs
Enable some option commonly used by testers in defconfig, including
some very common device drivers and network boot support.  defconfig
is still not meant to be a kitchen-sink configuration.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-27 08:14:17 +02:00
H. Peter Anvin
9ea2b82ed6 x86: cpuid: correct return value on partial operations
Return the correct return value when the CPUID driver partially
completes a request (we should return the number of bytes actually
read or written, instead of the error code.)

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-25 17:46:12 -07:00
H. Peter Anvin
85f1cb6015 x86: msr: correct return value on partial operations
Return the correct return value when the MSR driver partially
completes a request (we should return the number of bytes actually
read or written, instead of the error code.)

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-25 17:46:12 -07:00
H. Peter Anvin
4b46ca701b x86: cpuid: propagate error from smp_call_function_single()
Propagate error (-ENXIO) from smp_call_function_single() in the CPUID
driver.  This can happen when a CPU is unplugged while the CPUID
driver is open.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-25 17:45:48 -07:00
H. Peter Anvin
c6f31932d0 x86: msr: propagate errors from smp_call_function_single()
Propagate error (-ENXIO) from smp_call_function_single().  These
errors can happen when a CPU is unplugged while the MSR driver is
open.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-25 17:45:48 -07:00
Linus Torvalds
d25e26b61d [x86] Clean up MAXSMP Kconfig, and limit NR_CPUS to 512
This fixes a regression that was indirectly caused by commit
1184dc2ffe ("x86: modify Kconfig to allow
up to 4096 cpus").

Allowing 4k CPU's is not practical at this time, because we still have a
number of places that have several 'cpumask_t's on the stack, and a
4k-bit cpumask is 512 bytes of stack-space for each such variable.  This
literally caused functions like 'smp_call_function_mask' to have a 2.5kB
stack frame, and several functions to have 2kB stackframes.

With an 8kB stack total, smashing the stack was simply much too likely.
At least bugzilla entry

	http://bugzilla.kernel.org/show_bug.cgi?id=11342

was due to this.

The earlier commit to not inline load_module() into sys_init_module()
fixed the particular symptoms of this that Alan Brunelle saw in that
bugzilla entry, but the huge stack waste by cpumask_t's was the more
direct cause.

Some day we'll have allocation helpers that allocate large CPU masks
dynamically, but in the meantime we simply cannot allow cpumasks this
large.

Cc: Alan D. Brunelle <Alan.Brunelle@hp.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-25 14:15:38 -07:00
Linus Torvalds
ec73adba51 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: add X86_FEATURE_XMM4_2 definitions
  x86: fix cpufreq + sched_clock() regression
  x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
  x86: do not enable TSC notifier if we don't need it
  x86 MCE: Fix CPU hotplug problem with multiple multicore AMD CPUs
  x86: fix: make PCI ECS for AMD CPUs hotplug capable
  x86: fix: do not run code in amd_bus.c on non-AMD CPUs
2008-08-25 11:26:33 -07:00
Avi Kivity
cd5998ebfb KVM: MMU: Fix torn shadow pte
The shadow code assigns a pte directly in one place, which is nonatomic on
i386 can can cause random memory references.  Fix by using an atomic setter.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-08-25 17:24:27 +03:00
Peter Zijlstra
52a8968ce9 x86: fix cpufreq + sched_clock() regression
I noticed that my sched_clock() was slow on a number of machine, so I
started looking at cpufreq.

The below seems to fix the problem for me.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-25 14:39:19 +02:00
Ingo Molnar
f58899bb02 Merge branch 'linus' into x86/urgent 2008-08-25 14:39:12 +02:00
Yinghai Lu
a2bd7274b4 x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
David Witbrodt tracked down (and bisected) a hpet bootup hang on his
system to the following problem: a BIOS bug made the hpet device
visible as a generic PCI device. If e820 reserved entries happen to
be registered first in the resource tree [which v2.6.26 started doing],
then the PCI code will reallocate that device's BAR to some other
address - breaking timer IRQs and hanging the system.

( Normally hpet devices are hidden by the BIOS from the OS's PCI
  discovery via chipset magic. Sometimes the hpet is not a PCI device
  at all. )

Solve this fundamental fragility by making non-PCI platform drivers
insert resources into the resource tree even if it overlaps the e820
reserved entry, to keep the resource manager from updating the BAR.

Also do these checks for the ioapic and mmconfig addresses, and emit
a warning if this happens.

Bisected-by: David Witbrodt <dawitbro@sbcglobal.net>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: David Witbrodt <dawitbro@sbcglobal.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-25 10:02:03 +02:00
Linus Torvalds
060700b571 x86: do not enable TSC notifier if we don't need it
Impact: crash on non-TSC-equipped CPUs

Don't enable the TSC notifier if we *either*:

1. don't have a CPU, or
2. have a CPU with constant TSC.

In either of those cases, the notifier is either damaging (1) or useless(2).

From: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-24 17:16:28 -07:00
Adrian Bunk
7a8fc9b248 removed unused #include <linux/version.h>'s
This patch lets the files using linux/version.h match the files that
#include it.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-23 12:14:12 -07:00
Rafael J. Wysocki
8735728ef8 x86 MCE: Fix CPU hotplug problem with multiple multicore AMD CPUs
During CPU hot-remove the sysfs directory created by
threshold_create_bank(), defined in
arch/x86/kernel/cpu/mcheck/mce_amd_64.c, has to be removed before
its parent directory, created by mce_create_device(), defined in
arch/x86/kernel/cpu/mcheck/mce_64.c .  Moreover, when the CPU in
question is hotplugged again, obviously the latter has to be created
before the former.  At present, the right ordering is not enforced,
because all of these operations are carried out by CPU hotplug
notifiers which are not appropriately ordered with respect to each
other.  This leads to serious problems on systems with two or more
multicore AMD CPUs, among other things during suspend and hibernation.

Fix the problem by placing threshold bank CPU hotplug callbacks in
mce_cpu_callback(), so that they are invoked at the right places,
if defined.  Additionally, use kobject_del() to remove the sysfs
directory associated with the kobject created by
kobject_create_and_add() in threshold_create_bank(), to prevent the
kernel from crashing during CPU hotplug operations on systems with
two or more multicore AMD CPUs.

This patch fixes bug #11337.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Andi Kleen <andi@firstfloor.org>
Tested-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-23 17:49:19 +02:00
Robert Richter
91ede005d7 x86: fix: make PCI ECS for AMD CPUs hotplug capable
Until now, PCI ECS setup was performed at boot time only and for cpus
that are enabled then. This patch fixes this and adds cpu hotplug.

Tests sequence (check if ECS bit is set when bringing cpu online again):

 # ( perl -e 'sysseek(STDIN, 0xC001001F, 0)'; hexdump -n 8 -e '2/4 "%08x " "\n"' )   < /dev/cpu/1/msr
 00000008 00404010
 # ( perl -e 'sysseek(STDOUT, 0xC001001F, 0); print pack "l*", 8, 0x00400010' ) > /dev/cpu/1/msr
 # ( perl -e 'sysseek(STDIN, 0xC001001F, 0)'; hexdump -n 8 -e '2/4 "%08x " "\n"' )   < /dev/cpu/1/msr
 00000008 00400010
 # echo 0 > /sys/devices/system/cpu/cpu1/online
 # echo 1 > /sys/devices/system/cpu/cpu1/online
 # ( perl -e 'sysseek(STDIN, 0xC001001F, 0)'; hexdump -n 8 -e '2/4 "%08x " "\n"' )   < /dev/cpu/1/msr
 00000008 00404010

Reported-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-23 17:39:31 +02:00
Robert Richter
9b4e27b528 x86: fix: do not run code in amd_bus.c on non-AMD CPUs
Jan Beulich wrote:

> Even worse - this would even try to access the MSR on non-AMD CPUs
> (currently probably prevented just by the fact that only AMD ones use
> family values of 0x10 or higher).

This patch adds cpu vendor check to the postcore_initcalls.

Reported-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-23 17:39:30 +02:00
Venki Pallipadi
01de05af94 x86: have set_memory_array_{uc,wb} coalesce memtypes, fix
Fix the start addr for free_memtype calls in the error path.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-23 17:33:12 +02:00
Linus Torvalds
358c323c17 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: work around MTRR mask setting, v2
  x86: fix section mismatch warning - uv_cpu_init
  x86: fix VMI for early params
  x86: fix two modpost warnings in mm/init_64.c
  x86: fix 1:1 mapping init on 64-bit (memory hotplug case)
  x86: work around MTRR mask setting
  x86: PAT Update validate_pat_support for intel CPUs
  devmem, x86: PAT Change /dev/mem mmap with O_SYNC to use UC_MINUS
  x86: PAT proper tracking of set_memory_uc and friends
  x86: fix BUG: unable to handle kernel paging request (numaq_tsc_disable)
  x86: export pv_lock_ops non-GPL
  x86, mmiotrace: silence section mismatch warning - leave_uniprocessor
  x86: use WARN() in arch/x86/kernel
  x86: use WARN() in arch/x86/mm/ioremap.c
  werror: fix pci calgary
  x86: fix oprofile + hibernation badness
  x86, SGI UV: hardcode the TLB flush interrupt system vector
  x86: fix Xorg startup/shutdown slowdown with PAT
  x86: fix "kernel won't boot on a Cyrix MediaGXm (Geode)"
  x86 iommu: remove unneeded parenthesis
2008-08-22 08:23:53 -07:00