for http://bugzilla.kernel.org/show_bug.cgi?id=10613
BIOS bug, APIC version is 0 for CPU#0! fixing up to 0x10. (tell your hw vendor)
v2: fix 64 bit compilation
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Gabriel C <nix.or.die@googlemail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
this way 32-bit is more similar to 64-bit, and smarter e820 and numa.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It looks good to move bugs_64.c to cpu/bugs_64.c.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
on 64-bit we only get valid max_pfn_mapped after init_memory_mapping().
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
on 32-bit in head_32.S after initial page table is done, we get initial
max_pfn_mapped, and then kernel_physical_mapping_init will give us
a final one.
We need to use that to make sure find_e820_area will get valid addresses
for boot_map and for NODE_DATA(0) on numa32.
XEN PV and lguest may need to assign max_pfn_mapped too.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
make mptable to be consistent with acpi routing, so we could:
1. kexec kernel with acpi=off
2. work around BIOSes where acpi routing is working, but mptable is
not right, so can use kernel/kexec to start other OSes that don't have
good acpi support.
command line: update_mptable
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
we don't need to call memory_present that early.
numa and sparse will call memory_present later and might
even fail, it will call memory_present for the full range.
also for sparse it will call alloc_bootmem ... before we set up bootmem.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume. Prevent system suspend in case we
would be unable to resume.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Tested-by: Patrick <ragamuffin@datacomm.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Update the UV address macros to better describe the
fields of UV physical addresses. Improve comments
in the header files. Add additional MMR definitions.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Wed, 2008-05-28 at 04:47 +0200, Andi Kleen wrote:
> > So... why not just remove the setting of __GFP_NORETRY? Why is it
> > wrong to oom-kill things in this case?
>
> When the 16MB zone overflows (which can be common in some workloads)
> calling the OOM killer is pretty useless because it has barely any
> real user data [only exception would be the "only 16MB" case Alan
> mentioned]. Killing random processes in this case is bad.
>
> I think for 16MB __GFP_NORETRY is ok because there should be
> nothing freeable in there so looping is useless. Only exception would be the
> "only 16MB total" case again but I'm not sure 2.6 supports that at all
> on x86.
>
> On the other hand d_a_c() does more allocations than just 16MB, especially
> on 64bit and the other zones need different strategies.
Okay, so how about this then ?
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce IRQx_VECTOR on 32-bit, so that #ifdef noise is kept
down. There should be no object code change.
[ mingo@elte.hu: merged to x86/irq not x86/i8259 due to x86/irq having
restructured the vector code into asm-x86/irq_vectors.h, which this
patch touches. ]
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements PCI extended configuration space access for
AMD's Barcelona CPUs. It extends the method using CF8/CFC IO
addresses. An x86 capability bit has been introduced that is set for
CPUs supporting PCI extended config space accesses.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
on two node system (16g RAM) with numa config I got this crash:
get_memcfg_from_srat: assigning address to rsdp
RSD PTR v0 [ACPIAM]
ACPI: Too big length in RSDT: 92
failed to get NUMA memory information from SRAT table
NUMA - single node, flat memory mode
Node: 0, start_pfn: 0, end_pfn: 153
Setting physnode_map array to node 0 for pfns:
0
...
Pid: 0, comm: swapper Not tainted 2.6.26-rc4 #4
[<80b41289>] hlt_loop+0x0/0x3
[<8011efa0>] ? alloc_remap+0x50/0x70
[<8079e32e>] alloc_node_mem_map+0x5e/0xa0
[<8012e77b>] ? printk+0x1b/0x20
[<80b590f6>] free_area_init_node+0xc6/0x470
[<80b588fc>] ? __alloc_bootmem_node+0x2c/0x50
[<80b58ad8>] ? find_min_pfn_for_node+0x38/0x70
[<8012e77b>] ? printk+0x1b/0x20
[<80b597c4>] free_area_init_nodes+0x254/0x2d0
[<80b544d7>] zone_sizes_init+0x97/0xa0
[<80b48a03>] setup_arch+0x383/0x530
[<8012e77b>] ? printk+0x1b/0x20
[<80b41aa4>] start_kernel+0x64/0x350
[<80b412d8>] i386_start_kernel+0x8/0x10
=======================
this patch increases the acpi table limit to 32.
Also match early_ioremap() with early_iounmap().
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
reserve early numa kva, so it will not clash with new RAMDISK
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
introduce init_pg_table_start, so xen PV could specify the value.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Create a separate centaur_64.c file in the cpu/ dir for
the useful parts to live in.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Create a separate intel_64.c file in the cpu/ dir for
the useful parts to live in.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Create a separate amd_64.c file in the cpu/ dir for
the useful parts to live in.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
arch/x86/kernel/mmconf-fam10h_64.c is missing the prototypes, which
are decalred in arch/x86/kernel/setup_64.c. Move the prototypes and
the inline stubs to the appropriate header file.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The commit
commit 4b82b27770
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Sat May 24 19:36:35 2008 +0400
set nmi_watchdog to NMI_IO_APIC as by default. This causes hangs on some
machines with buggy watchdogs. Fix it - i.e. restore old behaviour.
Thanks to Sitsofe Wheeler and Adrian Bunk for catching the problem
and Maciej W. Rozycki for explanation what is going on there.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
CC: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some small cleanups for aperture_64.c; they should not really change
any code.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
remove extra -1 in reseve_early calling
panic if can not find space for new RAMDISK
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add pte_flags() to extract the flags from a pte. This is a special
case of pte_val() which is only guaranteed to return the pte's flags
correctly; the page number may be corrupted or missing.
The intent is to allow paravirt implementations to return pte flags
without having to do any translation of the page number (most notably,
Xen).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since cpu_online_map is touched (by for_each_online_cpu)
at moment when cpu_callin_map is already filled up we can
get rid of its checking at all
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
apic_write_around will be expanded to apic_write in 64bit mode
anyway. Only a few CPUs (well, old CPUs to be precise) requires
such an action. In general it should not hurt and could be cleaned
up for apic_write (just in case)
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
traps_32.c already holds these functions so do the same for traps_64.c
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make 64bit die_nmi() to produce the same message as 32bit mode has
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
By slightly changing 32bit mode die_nmi() we may unify the
interface and make it common for both (32/64bit) modes
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
UP builds with LOCAL_APIC=y and IO_APIC=n fail with a missing
reference to mp_bus_not_pci. Distangle the mpparse code some more and
move the ioapic specific bus check into a separate function.
This code needs sume urgent un#ifdef surgery all over the place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This pushes the lock a fair way down and the final kill looks like it
should be an easy project for someone who wants to have a shot at it.
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
and make e820_mark_nosave_regions to take limit_pfn to use max_low_pfn
for 32bit and end_pfn for 64bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sitsofe Wheeler reported boot problems on linux-next.
It looks like the same issue as found by Soeren Sandman in 7575217f656a93,
"x86: initialize all fields of mp_irqs[mp_irq_entries]".
But his fix is also not complete, as dstapic is used before it assigned.
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit "x86: make config_irqsrc not MPspec specific" introduced some uses
of uninitialized fields in mp_config_acpi_legacy_irqs(). I need the
following patch to get sched-devel/master to boot.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the new output is:
MPTABLE: OEM ID: SUN
MPTABLE: Product ID: 4600 M2
MPTABLE: APIC at: 0x
instead of it all in one line with <6> and double Product ID...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
use find_e820_area to find addess for new RAMDISK, instead of using ram blindly
also print out low ram and bootmap info
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add to the kernels boot memory map 'memmap' entries found in
the EFI memory descriptors passed in from the BIOS.
On EFI systems, up to E820MAX == 128 memory map entries can
be passed via the legacy E820 interface (limited by the size
of the 'zeropage'). These entries can be duplicated in the
EFI descriptors also passed from the BIOS, and possibly more
entries passed by the EFI interface, which does not have the
E820MAX limit on number of memory map entries.
This code doesn't worry about the likely duplicate, overlapping
or (unlikely) conflicting entries between the EFI map and the
E820 map. It just dumps all the EFI entries into the memmap[]
array (which already has the E820 entries) and lets the existing
routine sanitize_e820_map() sort the mess out.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Elaborate on the comment for sanitize_e820_map(), epxlaining more what
it does, what it inputs, and what it returns. Rearrange the placement of
this comment to fit kernel conventions, before the routine's code rather
than buried inside it.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The map size counter passed into, and back out of, sanitize_e820_map(),
was an eight bit type (char or u8), as derived from its origins in
legacy BIOS E820 structures. This patch changes that type to an 'int',
to allow this sanitize routine to also be used on larger maps (larger
than the 256 count that fits in a char). The legacy BIOS E820 interface
of course does not change; that remains at 8 bits for this count, holding
up to E820MAX == 128 entries. But the kernel internals can handle more
when those additional memory map entries are passed from the BIOS via
EFI interfaces.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Extend internal boot time memory tables to allow for up to
three entries per node, which may be larger than the 128 E820MAX
entries handled by the legacy BIOS E820 interface. The EFI
interface, if present, is capable of passing memory map
entries for these larger node counts.
This patch requires an earlier patch that rewrote code depending
on these array sizes from using E820MAX explicitly to size loops,
to instead using ARRAY_SIZE() of the applicable array.
Another patch following this one will provide the code to pick
up additional memory entries passed via the EFI interface from
the BIOS and insert them in the following, now enlarged, arrays.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is motivated by a subsequent patch which will allow for more
memory map entries on EFI supported systems than can be passed via the x86
legacy BIOS E820 interface. The legacy interface is limited to E820MAX ==
128 memory entries, and that "E820MAX" manifest constant was used as the
size for several arrays and loops over those arrays.
The primary change in this patch is to change code loop sizes over those
arrays from using the constant E820MAX, to using the ARRAY_SIZE() macro
evaluated for the array being looped. That way, a subsequent patch can
change the size of some of these arrays, without breaking this code.
This patch also adds a parameter to the sanitize_e820_map() routine,
which had an implicit size for the array passed it of E820MAX entries.
This new parameter explicitly passes the size of said array. Once again,
this will allow a subsequent patch to change that array size for some
calls to sanitize_e820_map() without breaking the code.
As part of enhancing the sanitize_e820_map() interface this way, I further
combined the unnecessarily distinct x86_32 and x86_64 declarations for
this routine into a single, commonly used, declaration.
This patch in itself should make no difference to the resulting kernel
binary.
[ mingo@elte.hu: merged to -tip ]
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Standardize a few pointer declarations to not have the
extra space after the '*' character.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
disable the noisy print out.
also use the one the less spare mtrr reg.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
there is a typo in the mask value, need to remove that extra 0,
to avoid 4bit clearing.
Signed-off-by: Yinghal Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
otherwise fixed MTRR for family 10h may not be changed.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Loop through mtrr chunk_size and gran_size from 1M to 2G to find out
the optimal value so user does not need to add mtrr_chunk_size and
mtrr_gran_size to the kernel command line.
If optimal value is not found, print out all list to help select less
optimal value.
Add mtrr_spare_reg_nr= so user could set 2 instead of 1, if the card
need more entries.
v2: find the one with more spare entries
v3: fix hole_basek offset
v4: tight the compare between range and range_new
loop stop with 4g
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Mika Fischer <mika.fischer@zoopnet.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
v9: address format change requests by Ingo
more case handling in range_to_var_with_hole
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
v2: process hole then end_pfn
fix update_memory_range with whole cover comparing
Signed-off-by: Yinghai Lu <yinghai.lu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
converting MTRR layout from continous to discrete, some time could run out of
MTRRs. So add gran_sizek to prevent that by dumpping small RAM piece less than
gran_sizek.
previous trimming only can handle highest_pfn from mtrr to end_pfn from e820.
when have more than 4g RAM installed, there will be holes below 4g. so need to
check ram below 4g is coverred well.
need to be applied after
[PATCH] x86: mtrr cleanup for converting continuous to discrete layout v7
Signed-off-by: Yinghai Lu <yinghai.lu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
some BIOS like to use continus MTRR layout, and X driver can not add
WB entries for graphical cards when 4g or more RAM installed.
the patch will change MTRR to discrete.
mtrr_chunk_size= could be used to have smaller continuous block to hold holes.
default is 256m, could be set according to size of graphics card memory.
mtrr_gran_size= could be used to send smallest mtrr block to avoid run out of MTRRs
v2: fix -1 for UC checking
v3: default to disable, and need use enable_mtrr_cleanup to enable this feature
skip the var state change warning.
remove next_basek in range_to_mtrr()
v4: correct warning mask.
v5: CONFIG_MTRR_SANITIZER
v6: fix 1g, 2g, 512 aligment with extra hole
v7: gran_sizek to prevent running out of MTRRs.
v8: fix hole_basek caculation caused when removing next_basek
gran_sizek using when basek is 0.
need to apply
[PATCH] x86: fix trimming e820 with MTRR holes.
right after this one.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The x86_64 code has centralized the memory setup code in
e820_64.c. This patch copies that approach to i386:
- early_param("mem", ...) parsing is moved from
setup_32.c to e820_32.c.
- setup_memory_map() and finish_e820_parsing() are
factored out from setup_arch(), and declarations
are added to e820_32.h.
- print_memory_map() is made static and removed from
e820_32.h.
- user_defined_memmap is marked as __initdata.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/cpu/mtrr/generic.c:216:12: warning: symbol 'lo' shadows an earlier one
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the not longer used handlers for reserved vectors.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Just moved trailing statements to the next line, removed space before
open/close parenthesis, wrapped long lines.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We should better use already defined flags from processor-flags.h instead
of defining own ones
[>>> object code check >>>]
original
md5sum: 9cfa6dbf045a046bb5dfb85f8bcfe8c4 arch/x86/kernel/head_64.o
text data bss dec hex filename
37361 4432 8192 49985 c341 arch/x86/kernel/head_64.o
patched
md5sum: 9cfa6dbf045a046bb5dfb85f8bcfe8c4 arch/x86/kernel/head_64.o
text data bss dec hex filename
37361 4432 8192 49985 c341 arch/x86/kernel/head_64.o
[<<< object code check <<<]
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch contains the following cleanups:
- make the following needlessly global code static:
- dma_alloc_pages()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/mmconf-fam10h_64.c is missing the prototypes, which
are decalred in arch/x86/kernel/setup_64.c. Move the prototypes and
the inline stubs to the appropriate header file.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
sparse mutters:
arch/x86/kernel/vsmp_64.c:126:5: warning: symbol 'is_vsmp_box' was not declared. Should it be static?
arch/x86/kernel/vsmp_64.c:145:13: warning: symbol 'vsmp_init' was not declared. Should it be static?
Include the appropriate headers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/tsc_64.c:245:13: warning: constant 0x100000000 is so big it is long
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LAPIC interrupts, which don't go through the generic interrupt handling
code, aren't accounted for in /proc/stat. Hence this patch adds a
mechanism architectures can use to accordingly adjust the statistics.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
.. allowing it to be write-protected just as other read-only data
under CONFIG_DEBUG_RODATA.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: prevent PGE flush from interruption/preemption
x86: use explicit copy in vdso_gettimeofday()
namespacecheck: automated fixes
x86/xen: fix arbitrary_virt_to_machine()
x86: don't read maxlvt before checking if APIC is mapped
x86: disable TSC for sched_clock() when calibration failed
x86: distangle user disabled TSC from unstable
x86: fix setup of cyc2ns in tsc_64.c
The leftovers of the i8259 unification have nothing to do with i8259
at all. They contain interrupt init code and the i8259_xx name is just
misleading now.
Rename them to irqinit_32/64.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove #ifdefs where the only difference is formatting of comments.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove #ifdefs around includes; including too much should be always
safe.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make conversion of i8259 very mechanical -- i8259 was generated by
diff -D, with too different parts left in i8259_32 and
i8259_64.c. Only "by hand" changes were removal of #ifdef from middle
of the comment (prevented compilation) and removal of one static to
allow splitting into files.
Of course, it will need some cleanups now, and those will follow.
Signed-of-by: Pavel Machek <pavel@suse.cz>
The leftovers of the i8259 unification have nothing to do with i8259
at all. They contain interrupt init code and the i8259_xx name is just
misleading now.
Rename them to initirq_32/64.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove #ifdefs where the only difference is formatting of comments.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove #ifdefs around includes; including too much should be always
safe.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch removes the Makefile turd and uses the nice CFLAGS_REMOVE macro
in the x86/kernel directory.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
David S. Miller noticed the following bug: the -pg instrumentation
function callback is named differently on each platform. On x86 it
is mcount, on sparc it is _mcount. So the export does not make sense
in kernel/trace/ftrace.c - move it to x86.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
text_poke is sleepable.
The original fix by Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The fault label to jump to on fault of updating the code was misplaced
preventing the fault from being recorded.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
disable the tracer while kexec pulls the rug from under the old
kernel.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch replaces the indirect call to the mcount function
pointer with a direct call that will be patched by the
dynamic ftrace routines.
On boot up, the mcount function calls the ftace_stub function.
When the dynamic ftrace code is initialized, the ftrace_stub
is replaced with a call to the ftrace_record_ip, which records
the instruction pointers of the locations that call it.
Later, the ftraced daemon will call kstop_machine and patch all
the locations to nops.
When a ftrace is enabled, the original calls to mcount will now
be set top call ftrace_caller, which will do a direct call
to the registered ftrace function. This direct call is also patched
when the function that should be called is updated.
All patching is performed by a kstop_machine routine to prevent any
type of race conditions that is associated with modifying code
on the fly.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch moves the memory management of the ftrace
records out of the arch code and into the generic code
making the arch code simpler.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch patches the call to mcount with nops instead
of a jmp over the mcount call.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add preempt off timings. A lot of kernel core code is taken from the RT patch
latency trace that was written by Ingo Molnar.
This adds "preemptoff" and "preemptirqsoff" to /debugfs/tracing/available_tracers
Now instead of just tracing irqs off, preemption off can be selected
to be recorded.
When this is selected, it shares the same files as irqs off timings.
One can either trace preemption off, irqs off, or one or the other off.
By echoing "preemptoff" into /debugfs/tracing/current_tracer, recording
of preempt off only is performed. "irqsoff" will only record the time
irqs are disabled, but "preemptirqsoff" will take the total time irqs
or preemption are disabled. Runtime switching of these options is now
supported by simpling echoing in the appropriate trace name into
/debugfs/tracing/current_tracer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
set to a non-zero value the ftrace routine will be called everytime
we enter a kernel function that is not marked with the "notrace"
attribute.
The ftrace routine will then call a registered function if a function
happens to be registered.
[ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
so don't blame Arnaldo for all of this ;-) ]
Update:
It is now possible to register more than one ftrace function.
If only one ftrace function is registered, that will be the
function that ftrace calls directly. If more than one function
is registered, then ftrace will call a function that will loop
through the functions to call.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add the notrace annotations to the vsyscall functions - there we are
not in kernel context yet, so the tracer function cannot (and must not)
be called.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit 2d474871e2fb092eb46a0930aba5442e10eb96cc
Author: Mike Travis <travis@sgi.com>
Date: Mon May 12 21:21:13 2008 +0200
A check for unmapped apic was added before reading maxlvt but the early
read of maxlvt wasn't removed.
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
When the TSC calibration fails then TSC is still used in
sched_clock(). Disable it completely in that case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
tsc_enabled is set to 0 from the command line switch "notsc" and from
the mark_tsc_unstable code. Seperate those functionalities and replace
tsc_enable with tsc_disable. This makes also the native_sched_clock()
decision when to use TSC understandable.
Preparatory patch to solve the sched_clock() issue on 32 bit.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When the TSC is calibrated against the PIT due to the nonavailability
of PMTIMER/HPET or due to SMI interference then the setup of the per
CPU cyc2ns variables is skipped. This is unlikely to happen but it
would definitely render sched_clock() unusable.
This was introduced with commit 53d517cdba
x86: scale cyc_2_nsec according to CPU frequency
Update the per CPU cyc2ns variables in all exit pathes of tsc_calibrate.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
Unconditionally enable PAT support on Centaur and Transmeta CPUs.
All known models that advertise PAT have no known errata.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] return to old errno choice in mkdir() et.al.
[Patch] fs/binfmt_elf.c: fix wrong return values
[PATCH] get rid of leak in compat_execve()
[Patch] fs/binfmt_elf.c: fix a wrong free
[PATCH] avoid multiplication overflows and signedness issues for max_fds
[PATCH] dup_fd() part 4 - race fix
[PATCH] dup_fd() - part 3
[PATCH] dup_fd() part 2
[PATCH] dup_fd() fixes, part 1
[PATCH] take init_files to fs/file.c
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Dave Jones <davej@redhat.com>
The longrun cpufreq module reports a false minimum frequency 3MHz on
300-600MHz Crusoe processor. This may be due to a calculation bug
in the module.
Original patch from Kaz Sasayama <kazssym@hypercore.co.jp>
submitted as http://bugs.debian.org/468149 patch ported to x86
Cc: Kaz Sasayama <kazssym@hypercore.co.jp>
Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: Dave Jones <davej@redhat.com>
The most common error with powernow-k8 is an ACPI _PSS error
caused either by failure to load the ACPI processor module
or a bad parse of the _PSS object. Make the error message
returned to the user in these situations more straightforward
and easier to understand.
-Mark Langsdorf
Operating System Research Center
AMD
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The previous revert of 0c07ee38c9 left
out the mwait disable condition for AMD family 10H/11H CPUs.
Andreas Herrman said:
It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
depend on a clock divisor and current Pstate of the core.
If all cores of a processor are in halt state (C1) the processor can
enter the C1E (C1 enhanced) state. If mwait is used this will never
happen.
Thus HLT saves more power than MWAIT here.
It might be best to switch off the mwait flag for these AMD CPU
families like it was introduced with commit
f039b75471 (x86: Don't use MWAIT on AMD
Family 10)
Re-add the AMD families 10H/11H check and disable the mwait usage for
those.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Vegard Nossum reports:
| powertop shows between 200-400 wakeups/second with the description
| "<kernel IPI>: Rescheduling interrupts" when all processors have load (e.g.
| I need to run two busy-loops on my 2-CPU system for this to show up).
|
| The bisect resulted in this commit:
|
| commit 0c07ee38c9
| Date: Wed Jan 30 13:33:16 2008 +0100
|
| x86: use the correct cpuid method to detect MWAIT support for C states
remove the functional effects of this patch and make mwait unconditional.
A future patch will turn off mwait on specific CPUs where that causes
power to be wasted.
Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The user_regset_view table for the 32-bit regsets on the 64-bit build had
the wrong sizes for the FP regsets. This bug had no user-visible effect
(just on kernel modules using the user_regset interfaces and the like).
But the fix is trivial and risk-free.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix this symbol export problem:
Building modules, stage 2.
MODPOST 193 modules
ERROR: "csum_partial" [fs/reiserfs/reiserfs.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
This is due to a known weakness of symbol exports: if a symbol's
only in-core user is an EXPORT_SYMBOL from a lib-y section, the
symbol is not linked in.
The solution is to move the export to x8664_ksyms_64.c - but the real
solution would be to fix kbuild.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/setup_64.c:954: warning: passing argument 2 of 'set_bit' from incompatible pointer type
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After resume on a 2cpu laptop, kernel builds collapse with a sed hang,
sh or make segfault (often on 20295564), real-time signal to cc1 etc.
Several hurdles to jump, but a manually-assisted bisect led to -rc1's
d2bcbad5f3 x86: do not zap_low_mappings
in __smp_prepare_cpus. Though the low mappings were removed at bootup,
they were left behind (with Global flags helping to keep them in TLB)
after resume or cpu online, causing the crashes seen.
Reinstate zap_low_mappings (with local __flush_tlb_all) for each cpu_up
on x86_32. This used to be serialized by smp_commenced_mask: that's now
gone, but a low_mappings flag will do. No need for native_smp_cpus_done
to repeat the zap: let mem_init zap BSP's low mappings just like on UP.
(In passing, fix error code from native_cpu_up: do_boot_cpu returns a
variety of diagnostic values, Dprintk what it says but convert to -EIO.
And save_pg_dir separately before zap_low_mappings: doesn't matter now,
but zapping twice in succession wiped out resume's swsusp_pg_dir.)
That worked well on the duo and one quad, but wouldn't boot 3rd or 4th
cpu on P4 Xeon, oopsing just after unlock_ipi_call_lock. The TLB flush
IPI now being sent reveals a long-standing bug: the booting cpu has its
APIC readied in smp_callin at the top of start_secondary, but isn't put
into the cpu_online_map until just before that unlock_ipi_call_lock.
So native_smp_call_function_mask to online cpus would send_IPI_allbutself,
including the cpu just coming up, though it has been excluded from the
count to wait for: by the time it handles the IPI, the call data on
native_smp_call_function_mask's stack may well have been overwritten.
So fall back to send_IPI_mask while cpu_online_map does not match
cpu_callout_map: perhaps there's a better APICological fix to be
made at the start_secondary end, but I wouldn't know that.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The latest rev of Intel doc AP-485 details a new cache
descriptor that we don't yet support.
A 6MB 24-way assoc L2 cache.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
use per_cpu for per CPU data.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cleanup gart handling on amd64 a bit: move common code into
enable_gart_translation , and use symbolic register names where
appropriate.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
we can use free_bootmem() directly.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1. use symbolic register names where appropriate.
2. num to bus or slot changing
3. handle for new opteron for bus other than 0
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
because we try to reserve dma32 early, so we have chance to get aperture
from 64M.
with some sequence aperture allocated from RAM, could become E820_RESERVED.
and then if doing a kexec with a big kernel that uncompressed size is above
64M we could have a range conflict with still using gart.
So allocate gart aperture from 512M instead.
Also change the fallback_aper_order to 5, because we don't have chance to get
2G or 4G aperture.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
some systems are using 32M for gart and agp when memory is less than 4G.
Kernel will reject and try to allcate another 64M that is not needed,
and we will waste 64M of perfectly good RAM.
this patch adds a workaround by checking aper_base/order between NB and
agp bridge. If they are the same, and memory size is less than 4G, it
will allow it.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
while looking at Rafael J. Wysocki's system boot log,
I found a funny printout:
Node 0: aperture @ de000000 size 32 MB
Aperture too small (32 MB)
AGP bridge at 00:04:00
Aperture from AGP @ de000000 size 4096 MB (APSIZE 0)
Aperture too small (0 MB)
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
...
agpgart: Detected AGP bridge 20
agpgart: Aperture pointing to RAM
agpgart: Aperture from AGP @ de000000 size 4096 MB
agpgart: Aperture too small (0 MB)
agpgart: No usable aperture found.
agpgart: Consider rebooting with iommu=memaper=2 to get a good aperture.
it means BIOS allocated the correct gart on the NB and AGP bridge, but
because a bug in the silicon (the agp bridge reports the wrong order,
it wants 4G instead) the kernel will reject that allocation.
Also, because the size is only 32MB, and we try to get another 64M for gart,
late fix_northbridge can not revert that change because it still reads
the wrong size from agp bridge.
So try to double check the order value from the agp bridge, before calling
aperture_valid().
[ mingo@elte.hu: 32-bit fix. ]
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move symbolic constants into gart.h, and use them instead of hardcoded
constant.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move the 4KSTACKS related code to one place. This allows to un#ifdef
do_IRQ() and share the executed on stack for the stack overflow printk
and the softirq call.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add KERN_WARNING to the printk as this could not be done in the
original patch, which allegedly only moves code around.
Un#ifdef do_IRQ.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Previously the reporting printk would run on the process stack, which risks
overflow an already low stack. Instead execute it on the interrupt stack.
This makes it more likely for the printk to make it actually out.
It adds one not taken test/branch more to the interrupt path when
stack overflow checking is enabled. We could avoid that by duplicating
more code, but that seemed not worth it.
Based on an observation by Eric Sandeen.
v2: Fix warnings in some configs
Signed-off-by: Andi Kleen <andi@firstfloor.org>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fixes the build error introduced by my FIRST_SYSTEM_VECTOR patch
Signed-off-by: Alan Mayer <ajm@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The SGI UV system needs several more system vectors than a vanilla
x86_64 system. Rather than burden the other archs with extra system
vectors that they don't use, change FIRST_SYSTEM_VECTOR to a variable,
so that it can be dynamic.
Signed-off-by: Alan Mayer <ajm@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The interrupt vector defines are copied 4 times around with minimal
differences. Move them all into asm-x86/irq_vectors.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eliminate the 6 bank restriction in 64 bit mce reporting code. This
restriction is artificial (due to static creation of sysfs files) and 32
bit code does not have any such restriction.
This change helps in reporting the details of machine checks on a
machine check exception with errors in bank 6 and above on CPUs that
support those banks. Without the patch, machine check errors in those
banks are not reported.
We still have 128 (MCE_EXTENDED_BANK) bank restriction instead of max
256 supported in hardware. That is not changed in the patch below as it
will have some user level mcelog utility dependency, with bank 128 being
used for thermal reporting currently.
The patch below does not create sysfs control (bankNctl) for banks
higher than 6 as well. That needs some pre-cleanup in /sysfs mce layout,
removal of per cpu /sysfs entries for bankctl as they are really global
system level control today. That change will follow. This basic change
is critical to report the detailed errors on banks higher than 6.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have a lot of HPET quirks available which might force enable HPET
even when the BIOS does not enable it. Some of those quirks depend on
the command line option "hpet=force".
Andrew pointed out that hoping that the user will find out about this
boot option is not really helpful.
Emit a kernel info which informs the user about the "hpet=force" boot
option when we enter a quirk which depends on this option and the user
did not provide it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add quirk to allow forced usage of HPET on ATI SB400.
I stumbled over machines where HPET is enabled but not reported
by BIOS. This patch configures the HPET base address and makes
it known to the OS.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
While reading through the HPET code I realized that the
computation of .mult variables could be done with less
lines of code, resulting in a 1.6% text size saving
for hpet.o
So I propose the following patch, which applies against
today's Linus -git tree.
>From 0c6507e400e9ca5f7f14331e18f8c12baf75a9d3 Mon Sep 17 00:00:00 2001
From: Carlos R. Mafra <crmafra@ift.unesp.br>
Date: Mon, 5 May 2008 19:38:53 -0300
The computation of clocksource_hpet.mult
tmp = (u64)hpet_period << HPET_SHIFT;
do_div(tmp, FSEC_PER_NSEC);
clocksource_hpet.mult = (u32)tmp;
can be streamlined if we note that it is equal to
clocksource_hpet.mult = div_sc(hpet_period, FSEC_PER_NSEC, HPET_SHIFT);
Furthermore, the computation of hpet_clockevent.mult
uint64_t hpet_freq;
hpet_freq = 1000000000000000ULL;
do_div(hpet_freq, hpet_period);
hpet_clockevent.mult = div_sc((unsigned long) hpet_freq,
NSEC_PER_SEC, hpet_clockevent.shift);
can also be streamlined with the observation that hpet_period and hpet_freq are
inverse to each other (in proper units).
So instead of computing hpet_freq and using (schematically)
div_sc(hpet_freq, 10^9, shift) we use the trick of calling with the
arguments in reverse order, div_sc(10^6, hpet_period, shift).
The different power of ten is due to frequency being in Hertz (1/sec)
and the period being in units of femtosecond. Explicitly,
mult = (hpet_freq * 2^shift)/10^9 (before)
mult = (10^6 * 2^shift)/hpet_period (after)
because hpet_freq = 10^15/hpet_period.
The comments in the code are also updated to reflect the changes.
As a result,
text data bss dec hex filename
2957 425 92 3474 d92 arch/x86/kernel/hpet.o
3006 425 92 3523 dc3 arch/x86/kernel/hpet.o.old
a 1.6% reduction in text size.
Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Polish the ds.h interface and add support for PEBS.
Ds.c is meant to be the resource allocator for per-thread and per-cpu
BTS and PEBS recording.
It is used by ptrace/utrace to provide execution tracing of debugged tasks.
It will be used by profilers (e.g. perfmon2).
It may be used by kernel debuggers to provide a kernel execution trace.
Changes in detail:
- guard DS and ptrace by CONFIG macros
- separate DS and BTS more clearly
- simplify field accesses
- add functions to manage PEBS buffers
- add simple protection/allocation mechanism
- added support for Atom
Opens:
- buffer overflow handling
Currently, only circular buffers are supported. This is all we need
for debugging. Profilers would want an overflow notification.
This is planned to be added when perfmon2 is made to use the ds.h
interface.
- utrace intermediate layer
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To allow linker to catch sections overlapping we have to declare
them in appropriate order.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The phys_cpu_present_map is an expected symbol in the SMP harness.
Unfortunately, x86 recently moved this and a few others to
kernel/setup.c where it doesn't quite work because voyager has to
define its own. Use CONFIG_X86_LOCAL_APIC to isolate these
definitions and fix up another area in setup.c where CONFIG_X86_SMP
should be used instead of CONFIG_SMP.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: toralf.foerster@gmx.de
Cc: Mike Travis <travis@sgi.com>
Cc: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rene Herman reported:
> commit 8779f2fc3b
>
> "x86: don't try to allocate from DMA zone at first"
>
> breaks all of ISA DMA. Or all of ALSA ISA DMA at least. All
> ISA soundcards are silent following that commit -- no error
> messages, everything appears fine, just silence.
That patch is buggy. We had an implicit assumption that
dev = NULL for ISA devices that require 24bit DMA.
The recent work on x86 dma_alloc_coherent() breaks the ISA DMA buffer
allocation, which is represented by "dev = NULL" and requires 24bit
DMA implicitly.
Bisected-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86: rdc: leds build/config fix
x86: sysfs cpu?/topology is empty in 2.6.25 (32-bit Intel system)
x86: revert commit 709f744 ("x86: bitops asm constraint fixes")
x86: restrict keyboard io ports reservation to make ipmi driver work
x86: fix fpu restore from sig return
x86: remove spew print out about bus to node mapping
x86: revert printk format warning change which is for linux-next
x86: cleanup PAT cpu validation
x86: geode: define geode_has_vsa2() even if CONFIG_MGEODE_LX is not set
x86: GEODE: cache results from geode_has_vsa2() and uninline
x86: revert geode config dependency
On some of our (single board computer) boards (x86) we are using an
IPMI controller that uses I/O ports 0x62 and 0x66 for a KCS (keyboard
controller style) IPMI system interface.
Trying to load the openipmi driver fails, because the ports
(0x62/0x66) are reserved for keyboard. keyboard reserves the full
range 0x60-0x6F while it doesn't need to.
Reserve only ports 0x60 and 0x64 for the legacy PS/2 i8042 keyboad
controller instead of 0x60-0x6F to allow the openipmi driver to work.
[ tglx: added 64bit fixup ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If the task never used fpu, initialize the fpu before restoring the FP
state from the signal handler context. This will allocate the fpu
state, if the task never needed it before.
Reported-and-bisected-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Frederik Deweerdt <deweerdt@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move the scattered checks for PAT support to a single function. Its
moved to addon_cpuid_features.c as this file is shared between 32 and
64 bit.
Remove the manipulation of the PAT feature bit and just disable PAT in
the PAT layer, based on the PAT bit provided by the CPU and the
current CPU version/model white list.
Change the boot CPU check so it works on Voyager somewhere in the
future as well :) Also panic, when a secondary has PAT disabled but
the primary one has alrady switched to PAT. We have no way to undo
that.
The white list is kept for now to ensure that we can rely on known to
work CPU types and concentrate on the software induced problems
instead of fighthing CPU erratas and subtle wreckage caused by not yet
verified CPUs. Once the PAT code has stabilized enough, we can remove
the white list and open the can of worms.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This moves geode_has_vsa2 into a .c file, caches the result we get from
the VSA virtual registers, and causes the function to no longer be inline.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix pcspkr dependancies: make the pcspkr platform
drivers to depend on a platform device, and
not the other way around.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
CC: Vojtech Pavlik <vojtech@suse.cz>
CC: Michael Opdenacker <michael-lists@free-electrons.com>
[fixed for 2.6.26-rc1 by tiwai]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Fix x86 setup printk format warming:
next-20080430/arch/x86/kernel/setup.c:172: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'ssize_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
http://bugzilla.kernel.org/show_bug.cgi?id=10547
Newer Dell OptiPlex 745s hang before rebooting after 'sudo reboot'.
A patch for some versions of the OptiPlex was proposed here --
http://lkml.org/lkml/2007/6/5/59 -- and is included in 2.6.23 and
later kernels, according to
http://lxr.linux.no/linux+v2.6.23/arch/i386/kernel/reboot.c . However,
the DMI_BOARD_NAME ("0WF810") is too restrictive. Newer OptiPlex
machines have a DMI_BOARD_NAME of "0RF703". I therefore suggest
adding another clause to reboot.c, similar to the one in the original
patch, but matching a DMI_BOARD_NAME of "0RF703".
On further inspection, it seems that there are other DMI_BOARD_NAMEs
for this same machine. They seem to change from time to time, which
means that the current code is fragile. Moreover, using bios reboot
should not break non-SFF OptiPlex 745s, and so a reasonable fix is to
simply drop the match on DMI_BOARD_NAME.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch makes the needlessly global additional_cpus static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In kernel/acpi/realmode/Makefile use the 'always'
variable to say that wakeup.bin should always
be made.
In acpi/Makefile we then do not need to specify the
requested target and we avoid the message from make:
`arch/x86/kernel/acpi/realmode/wakeup.bin' is up to date.
Add wakeup.lds to list af targets to avoid rebuilding
wakeup.bin - from Roland McGrath.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@suse.cz>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since the pv_apic_ops are only present if CONFIG_X86_LOCAL_APIC is compiled
in, kvmclock failed to build without this option. This patch fixes this.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation. This removes almost 250 lines of duplicated
code.
It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)
I still haven't changed the cris version even though Linus says the BKL
isn't needed. The arch maintainer can easily do it if there are really
no obstacles.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't warn in read_apic_id() when preemptible but only one CPU online.
Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The .asciz directive takes any number of strings, but each one is zero-
terminated, and string pasting is not done as in C. That results in only the
first line being output.
Replace .asciz with multiple .ascii directives and terminate with .asciz.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The iommu_sac_force variable is needlessly defined global,
and this patch makes it static. Additionally, this variable
needs not be explicitly initialized.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes one sparse warning by including the appropriate
header for the reboot_force symbol.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the 'reboot_force' flag is a notion that non-PC subarchitectures do
not have.
also, unify the X86_BIOS_REBOOT option between 32-bit and 64-bit
and get rid of a few unnecessary Kconfig and Makefile complications
that way.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Al Viro pointed out that there's a missing readl() of timer->hpet_config,
found by Sparse.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (179 commits)
ACPI: Fix acpi_processor_idle and idle= boot parameters interaction
acpi: fix section mismatch warning in pnpacpi
intel_menlo: fix build warning
ACPI: Cleanup: Remove unneeded, multiple local dummy variables
ACPI: video - fix permissions on some proc entries
ACPI: video - properly handle errors when registering proc elements
ACPI: video - do not store invalid entries in attached_array list
ACPI: re-name acpi_pm_ops to acpi_suspend_ops
ACER_WMI/ASUS_LAPTOP: fix build bug
thinkpad_acpi: fix possible NULL pointer dereference if kstrdup failed
ACPI: check a return value correctly in acpi_power_get_context()
#if 0 acpi/bay.c:eject_removable_drive()
eeepc-laptop: add hwmon fan control
eeepc-laptop: add backlight
eeepc-laptop: add base driver
ACPI: thinkpad-acpi: bump up version to 0.20
ACPI: thinkpad-acpi: fix selects in Kconfig
ACPI: thinkpad-acpi: use a private workqueue
ACPI: thinkpad-acpi: fluff really minor fix
ACPI: thinkpad-acpi: use uppercase for "LED" on user documentation
...
Fixed conflicts in drivers/acpi/video.c and drivers/misc/intel_menlow.c
manually.
Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define our own
set_restore_sigmask() function. This saves the costly SMP-safe set_bit
operation, which we do not need for the sigmask flag since TIF_SIGPENDING
always has to be set too.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci:
x86: add pci=check_enable_amd_mmconf and dmi check
x86: work around io allocation overlap of HT links
acpi: get boot_cpu_id as early for k8_scan_nodes
x86_64: don't need set default res if only have one root bus
x86: double check the multi root bus with fam10h mmconf
x86: multi pci root bus with different io resource range, on 64-bit
x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit
x86: get mp_bus_to_node early
x86 pci: remove checking type for mmconfig probe
x86: remove unneeded check in mmconf reject
driver core: try parent numa_node at first before using default
x86: seperate mmconf for fam10h out from setup_64.c
x86: if acpi=off, force setting the mmconf for fam10h
x86_64: check MSR to get MMCONFIG for AMD Family 10h
x86_64: check and enable MMCONFIG for AMD Family 10h
x86_64: set cfg_size for AMD Family 10h in case MMCONFIG
x86: mmconf enable mcfg early
x86: clear pci_mmcfg_virt when mmcfg get rejected
x86: validate against acpi motherboard resources
Fixed up fairly trivial conflicts in arch/x86/pci/{init.c,pci.h} due to
OLPC support manually.
* git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] state info wrong after resume
[CPUFREQ] allow use of the powersave governor as the default one
[CPUFREQ] document the currently undocumented parts of the sysfs interface
[CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism
Drop the macro definitions in asm-offsets_*.c and use kbuild.h
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove proc_root export. Creation and removal works well if parent PDE is
supplied as NULL -- it worked always that way.
So, one useless export removed and consistency added, some drivers created
PDEs with &proc_root as parent but removed them as NULL and so on.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support for OLPC XO hardware. Open Firmware on XOs don't contain
the VSA, so it is necessary to emulate the PCI BARs in the kernel. This also
adds functionality for running EC commands, and a CONFIG_OLPC.
A number of OLPC drivers depend upon CONFIG_OLPC.
olpc_ec_timeout is a hack to work around Embedded Controller bugs.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: geode_has_vsa build fix]
[akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper extern for late_time_init in include/linux/init.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for __do_softirq() in include/linux/interrupt.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function detect_vsmp_box is a void function in the PCI case.
Change the !PCI stub to void too.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
As written, this can never be true.
Spotted by the Sparse checker.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The 64-bit vDSO image is in a special ".vdso" section for no reason
I can determine. Furthermore, the location of the vdso_end symbol
includes some wrongly-calculated padding space in the image, which
is then (correctly) rounded to page size, resulting in an extra page
of zeros in the image mapped in to user processes.
This changes it to put the vdso.so image into normal initdata as we
have always done for the 32-bit vDSO images. The extra padding is
gone, so the user VMA is one page instead of two. The image that
was already copied around at boot time is now in initdata, so we
recover that wasted space after boot.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, affected_cpus shows which CPUs need to have their frequency
coordinated in software. When hardware coordination is in use, the contents
of this file appear the same as when no coordination is required. This can
lead to some confusion among user-space programs, for example, that do not
know that extra coordination is required to force a CPU core to a particular
speed to control power consumption.
To fix this, create a "related_cpus" attribute that always displays the
coordination map regardless of whatever coordination strategy the cpufreq
driver uses (sw or hw). If the cpufreq driver does not provide a value, fall
back to policy->cpus.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
We checked the hardware freq with OS cached freq value in get_cur_freqon_cpu().
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
This bug was introduced in the 2.6.24 i386/x86_64 tree merge, where
MSI-X vector allocation will eventually fail. The cause is the new
bit array tracking used vectors is not getting cleared properly on
IRQ destruction on the 32-bit APIC code.
This can be seen easily using the ixgbe 10 GbE driver on multi-core
systems by simply loading and unloading the driver a few times.
Depending on the number of available vectors on the host system, the
MSI-X allocation will eventually fail, and the driver will only be
able to use legacy interrupts.
I am generating the same patch for both stable trees for 2.6.24 and
2.6.25.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This cleans up a few MSR-using drivers in the following manner:
- Ensures MSRs are all defined in asm/geode.h, rather than in misc
places
- Makes the naming consistent; cs553[56] ones begin with MSR_,
GX-specific ones start with MSR_GX_, and LX-specific ones start
with MSR_LX_. Also, make the names match the data sheet.
- Use MSR names rather than numbers in source code
- Document the fact that the LX's MSR_PADSEL has the wrong value
in the data sheet. That's, uh, good to note.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits)
KVM: kill file->f_count abuse in kvm
KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
KVM: SVM: remove selective CR0 comment
KVM: SVM: remove now obsolete FIXME comment
KVM: SVM: disable CR8 intercept when tpr is not masking interrupts
KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
KVM: export kvm_lapic_set_tpr() to modules
KVM: SVM: sync TPR value to V_TPR field in the VMCB
KVM: ppc: PowerPC 440 KVM implementation
KVM: Add MAINTAINERS entry for PowerPC KVM
KVM: ppc: Add DCR access information to struct kvm_run
ppc: Export tlb_44x_hwater for KVM
KVM: Rename debugfs_dir to kvm_debugfs_dir
KVM: x86 emulator: fix lea to really get the effective address
KVM: x86 emulator: fix smsw and lmsw with a memory operand
KVM: x86 emulator: initialize src.val and dst.val for register operands
KVM: SVM: force a new asid when initializing the vmcb
KVM: fix kvm_vcpu_kick vs __vcpu_run race
KVM: add ioctls to save/store mpstate
KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
...
This patch writes 0 (actually, what really matters is that the
LSB is cleared) to the system time msr before shutting down
the machine for kexec.
Without it, we can have a random memory location being written
when the guest comes back
It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c)
and crash_shutdown, used in the path of crash_kexec() (kexec.c)
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.
Don't report the feature if two dimensional paging is enabled.
[avi:
- guest/host split
- fix 32-bit truncation issues
- adjust to mmu_op
- adjust to ->release_*() renamed
- add ->release_pud()]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.
We do use it, however, for fine-grained adjustment.
Other than that, time comes from the host.
[randy dunlap: add missing include]
[randy dunlap: disallow on Voyager or Visual WS]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
OK, so 25-mm1 gave a lockdep error which made me look into this.
The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561
The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.
So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:
idle routines are entered with IRQs disabled
idle routines will exit with IRQs enabled
Nearly all already did this in one form or another.
Merge the 32 and 64 bit bits so they no longer have different bugs.
As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so will disable that feature by default, and only enable that via
pci=check_enable_amd_mmconf or for system match with dmi table.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mingo@elte.hu: split from "x86_64: get boot_cpu_id as early for k8_scan_nodes]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
some BIOS only let AMD fam 10h handle bus0, and nvidia mcp55/ck804
to handle other buses. at that case MCFG will cover all over them.
but with acpi=off, we can not use MCFG. this patch will double check
the busnbits, and if it is less handling 256 bues, and acpi=off
will forcely reset the mmconf in msr, so we still use mmconf in above case.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
So we can use MMCONF when MMCONF is not set by BIOS
using TOP_MEM2 msr to get memory top, and try to scan fam10h mmio routing to
make sure the range is not conflicted with some prefetch MMIO that is above 4G.
(current only LinuxBIOS assign 64 bit mmio above 4G for some co-processor)
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
typical case: four sockets system, every node has 4g ram, and we are using:
memmap=10g$4g
to mask out memory on node1 and node2
when numa is enabled, early_node_mem is used to get node_data and node_bootmap.
if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.
so check it and print out some info about it.
need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.
depends on "mm: make reserve_bootmem can crossed the nodes".
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Export linked list of struct setup_data via debugfs.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add free_early to early reservation mechanism - this way early bootup
failure paths can stop wasting memory.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes section mismatch warnings in unlock_ExtINT_logic().
WARNING: arch/x86/kernel/built-in.o(.text+0x14a92): Section mismatch in reference from the function unlock_ExtINT_logic()
to the function .init.text:find_isa_irq_pin()
The function unlock_ExtINT_logic() references
the function __init find_isa_irq_pin().
This is often because unlock_ExtINT_logic lacks a __init
annotation or the annotation of find_isa_irq_pin is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x12cc9): Section mismatch in reference from the function unlock_ExtINT_logic()
unlock_ExtINT_logic() is only used by __init check_timer(). Annotate unlock_ExtINT_logic() witch __init.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix folowing warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x10799): Section mismatch in reference from the function uniq_ioapic_id()
uniq_ioapic_id() is only used by __init mp_register_ioapic(). Annotate uniq_ioapic_id() with __init.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes section mismatch warnings of __cpuinit
setup_trampoline() on 32-bit host.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
get_bios_ebda() exists in asm/rio.h and asm/bios_ebda.h.
This patch removes the one in asm/rio.h.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the magic number in the third argment of div_sc().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the magic number in the second argument of clocksource_hz2mult()
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
memset and NULL check after alloc_bootmem() are unnecessary.
Because it returns zeroed memory and it never return NULL.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use bitmap library for pin_programmed rather than reinvent
bitmaps.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove duplicate code by using MP_intsrc_info() in mpparse.c
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use BUILD_BUG_ON() instead of compile-time error technique with
extern non-exsistent function.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that there are no more special cases in sys32_ptrace, we
can convert to using the generic compat_sys_ptrace entry point.
The sys32_ptrace function gets simpler and becomes compat_arch_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This removes the special-case handling for PTRACE_GETSIGINFO
and PTRACE_SETSIGINFO from x86_64's sys32_ptrace. The generic
compat_ptrace_request code handles these.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This lifts the set_fs(USER_DS) call for signal handler setup out of the
three places copying the same code into the one place that calls them
all. There is no change in what it does.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This lifts the code diddling the TF and DF bits for signal handler setup
out of the several places copying the same code into the one place that
calls them all. There is no change in what it does.
I also separated the recently-added DF bit clearing from the TF diddling.
The compiler turns them back into one instruction anyway. The tossing
in of DF to the same line of code with no new comments was a bit more
arcane than seems wise.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is claimed that NexGen CPUs were never shipped:
http://lkml.org/lkml/2008/4/20/179
Also, the kernel support for these chips has been broken for
a long time, the code intended to support NexGen thereby being
essentially dead.
As an outcome of the discussion that can be found using the URL
above, this patch removes the NexGen support altogether.
The changes in this patch survived a defconfig build for i386, a
couple of successful randconfig builds, as well as a runtime test,
which consisted in booting a 32-bit x86 box up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In arch/x86/kernel/setup_64.c, the standard_io_resources array
is needlessly defined as global. This patch makes this variable
static.
This patch was successfully build-tested using the defconfig
for x86_64. Runtime test was performed by booting a 64-bit x86
box up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are no users for the function amd_init_cpu() defined in
arch/x86/kernel/cpu/amd.c. This patch removes this routine.
This patch was build-tested using defconfigs for i386 and x86_64,
and a few randconfig instances. Runtime tests were performed by
booting 32- and 64-bit x86 boxen up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At least on my Barcelona, I see MCE log entries after cold boot caused
by BIOS not properly clearing the respective registers. Therefore, this
patch extends the workaround to families 0x10 and 0x11 (the latter just
for completeness, I have nothing to verify this against).
At the same time, provide a way to make these entries visible via the
'mce=bootlog' command line option even on these machines.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
.. since it uses ILL_BADSTK (which is meaningless in the context of
SIGSEGV).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There apparently was an unnoticed conflict between an earlier patch to
this file and mine (d1e084746b), which
I noticed only now. I suppose a change like the one below (untested) is
needed; I didn't get any response on a confirmation request for this from
the submitter of the first patch.
The issue is the writing of the 'checkbit' member at the end of
setup_intel_arch_watchdog(), which my patch made go to intel_arch_wd_ops
rather than wd_ops.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Two prior changes resulted in the "ecx" clobber being lost.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next: (52 commits)
xen: add balloon driver
xen: allow compilation with non-flat memory
xen: fold xen_sysexit into xen_iret
xen: allow set_pte_at on init_mm to be lockless
xen: disable preemption during tlb flush
xen pvfb: Para-virtual framebuffer, keyboard and pointer driver
xen: Add compatibility aliases for frontend drivers
xen: Module autoprobing support for frontend drivers
xen blkfront: Delay wait for block devices until after the disk is added
xen/blkfront: use bdget_disk
xen: Make xen-blkfront write its protocol ABI to xenstore
xen: import arch generic part of xencomm
xen: make grant table arch portable
xen: replace callers of alloc_vm_area()/free_vm_area() with xen_ prefixed one
xen: make include/xen/page.h portable moving those definitions under asm dir
xen: add resend_irq_on_evtchn() definition into events.c
Xen: make events.c portable for ia64/xen support
xen: move events.c to drivers/xen for IA64/Xen support
xen: move features.c from arch/x86/xen/features.c to drivers/xen
xen: add missing definitions in include/xen/interface/vcpu.h which ia64/xen needs
...
Add some autogenerated files to various .gitignore files
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up the codepath, remove alignment restrictions and do sanity
checking of the end result, to make sure we patched the right site.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel_text_address returns true even for modules which is not wanted
in text_poke. Use core_kernel_text instead.
This is a regression introduced in e587cadd8f
which caused occasionaly crashes after suspend/resume.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: H. Peter Anvin <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
xen_sysexit and xen_iret were doing essentially the same thing. Rather
than having a separate implementation for xen_sysexit, we can just strip
the stack back to an iret frame and jump into xen_iret. This removes
a lot of code and complexity - specifically, another critical region.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use jmp rather than call for the iret fixup, so its consistent with
the sysexit fixup, and it simplifies the stack (which is already
complex).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
64-bit Xen supports sysenter for 32-bit guests, so support its
use. (sysenter is faster than int $0x80 in 32-on-64.)
sysexit is still not supported, so we fake it up using iret.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make KERNEL_PGD_PTRS common, as previously it was only being defined
for 32-bit.
There are a couple of follow-on changes from this:
- KERNEL_PGD_PTRS was being defined in terms of USER_PGD_PTRS. The
definition of USER_PGD_PTRS doesn't really make much sense on x86-64,
since it can have two different user address-space configurations.
I renamed USER_PGD_PTRS to KERNEL_PGD_BOUNDARY, which is meaningful
for all of 32/32, 32/64 and 64/64 process configurations.
- USER_PTRS_PER_PGD was also defined and was being used for similar
purposes. Converting its users to KERNEL_PGD_BOUNDARY left it
completely unused, and so I removed it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.
[ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
SCSI: convert struct class_device to struct device
DRM: remove unused dev_class
IB: rename "dev" to "srp_dev" in srp_host structure
IB: convert struct class_device to struct device
memstick: convert struct class_device to struct device
driver core: replace remaining __FUNCTION__ occurrences
sysfs: refill attribute buffer when reading from offset 0
PM: Remove destroy_suspended_device()
Firmware: add iSCSI iBFT Support
PM: Remove legacy PM (fix)
Kobject: Replace list_for_each() with list_for_each_entry().
SYSFS: Explicitly include required header file slab.h.
Driver core: make device_is_registered() work for class devices
PM: Convert wakeup flag accessors to inline functions
PM: Make wakeup flags available whenever CONFIG_PM is set
PM: Fix misuse of wakeup flag accessors in serial core
Driver core: Call device_pm_add() after bus_add_device() in device_add()
PM: Handle device registrations during suspend/resume
block: send disk "change" event for rescan_partitions()
sysdev: detect multiple driver registrations
...
Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
sched: build fix
sched: better rt-group documentation
sched: features fix
sched: /debug/sched_features
sched: add SCHED_FEAT_DEADLINE
sched: debug: show a weight tree
sched: fair: weight calculations
sched: fair-group: de-couple load-balancing from the rb-trees
sched: fair-group scheduling vs latency
sched: rt-group: optimize dequeue_rt_stack
sched: debug: add some debug code to handle the full hierarchy
sched: fair-group: SMP-nice for group scheduling
sched, cpuset: customize sched domains, core
sched, cpuset: customize sched domains, docs
sched: prepatory code movement
sched: rt: multi level group constraints
sched: task_group hierarchy
sched: fix the task_group hierarchy for UID grouping
sched: allow the group scheduler to have multiple levels
sched: mix tasks and groups
...
This isn't needed, we can just walk the devices in bus order with no
problems at all, as we really want to remove pci_get_device_reverse from
the kernel tree.
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After 2.6.24 there was a plan to make the PM core acquire all device
semaphores during a suspend/hibernation to protect itself from
concurrent operations involving device objects. That proved to be
too heavy-handed and we found a better way to achieve the goal, but
before it happened, we had introduced the functions
device_pm_schedule_removal() and destroy_suspended_device() to allow
drivers to "safely" destroy a suspended device and we had adapted some
drivers to use them. Now that these functions are no longer necessary,
it seems reasonable to remove them and modify their users to use the
normal device unregistration instead.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add /sysfs/firmware/ibft/[initiator|targetX|ethernetX] directories along with
text properties which export the the iSCSI Boot Firmware Table (iBFT)
structure.
What is iSCSI Boot Firmware Table? It is a mechanism for the iSCSI tools to
extract from the machine NICs the iSCSI connection information so that they
can automagically mount the iSCSI share/target. Currently the iSCSI
information is hard-coded in the initrd. The /sysfs entries are read-only
one-name-and-value fields.
The usual set of data exposed is:
# for a in `find /sys/firmware/ibft/ -type f -print`; do echo -n "$a: "; cat $a; done
/sys/firmware/ibft/target0/target-name: iqn.2007.com.intel-sbx44:storage-10gb
/sys/firmware/ibft/target0/nic-assoc: 0
/sys/firmware/ibft/target0/chap-type: 0
/sys/firmware/ibft/target0/lun: 00000000
/sys/firmware/ibft/target0/port: 3260
/sys/firmware/ibft/target0/ip-addr: 192.168.79.116
/sys/firmware/ibft/target0/flags: 3
/sys/firmware/ibft/target0/index: 0
/sys/firmware/ibft/ethernet0/mac: 00:11:25:9d:8b:01
/sys/firmware/ibft/ethernet0/vlan: 0
/sys/firmware/ibft/ethernet0/gateway: 192.168.79.254
/sys/firmware/ibft/ethernet0/origin: 0
/sys/firmware/ibft/ethernet0/subnet-mask: 255.255.252.0
/sys/firmware/ibft/ethernet0/ip-addr: 192.168.77.41
/sys/firmware/ibft/ethernet0/flags: 7
/sys/firmware/ibft/ethernet0/index: 0
/sys/firmware/ibft/initiator/initiator-name: iqn.2007-07.com:konrad.initiator
/sys/firmware/ibft/initiator/flags: 3
/sys/firmware/ibft/initiator/index: 0
For full details of the IBFT structure please take a look at:
ftp://ftp.software.ibm.com/systems/support/system_x_pdf/ibm_iscsi_boot_firmware_table_v1.02.pdf
[akpm@linux-foundation.org: fix build]
Signed-off-by: Konrad Rzeszutek <konradr@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Peter Jones <pjones@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Here is a simple patch to use an allocated array of cpumasks to
represent cpumask_of_cpu() instead of constructing one on the stack.
It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is
currently only set for x86_64 SMP. Otherwise the the existing
cpumask_of_cpu() is used but has been changed to produce an lvalue
so a pointer to it can be used.
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Modify sched_affinity functions to pass cpumask_t variables by reference
instead of by value.
* Use new set_cpus_allowed_ptr function.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: Paul Jackson <pj@sgi.com>
Cc: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:
-int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
+int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
* Cleanup uses of CPU_MASK_ALL.
* Collapse other NR_CPUS changes to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
Use pointers to cpumask_t arguments whenever possible.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: Len Brown <len.brown@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
and MAXNODES counts.
* In some cases, the cpumask variable was initialized but then overwritten
with another value. This is the case for changes like this:
- cpumask_t oldmask = CPU_MASK_ALL;
+ cpumask_t oldmask;
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Change the following static arrays sized by NR_CPUS to
per_cpu data variables:
_cpuid4_info *cpuid4_info[NR_CPUS];
_index_kobject *index_kobject[NR_CPUS];
kobject * cache_kobject[NR_CPUS];
* Remove the local NR_CPUS array with a kmalloc'd region in
show_shared_cpu_map().
Also some minor complaints from checkpatch.pl fixed.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes smpboot.c so that it can start slave cpus running
in UV non-unique apicid mode. The SIPI must be sent using a UV-specific
mechanism.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The code in pci-dma_{32,64}.c are now sufficiently
close to each other. We merge them in pci-dma.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
if the device hasn't provided a mask, abort allocation.
Note that we're using a fallback device now, so it does not cover
the case of a NULL device: just drivers passing NULL masks around.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Just return our allocation if we don't have an mmu. For i386, where this patch
is being applied, we never have. So our goal is just to have the code to look like
x86_64's.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The claim is that i386 does it. Just it does not.
So remove it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the same gfp masks for x86_64 and i386.
It involves using HIGHMEM or DMA32 where necessary, for the sake
of code compatibility, (no real effect), and using the NORETRY
mask for i386.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch puts in the code to retry allocation in case it fails. By its
own, it does not make much sense but making the code look like x86_64.
But later patches in this series will make we try to allocate from
zones other than DMA first, which will possibly fail.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If we fail, we'll loop into the allocation again,
and then allocate in the DMA zone.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can use a fallback dev for cases of a NULL device being passed (mostly ISA)
This comes from x86_64 implementation.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can do it here to, in the same way x86_64 does.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
virt_to_bus() is deprecated according to the docs, and moreover,
won't return the right thing in i386 if we're dealing with high memory mappings.
So we make our allocation function return a page, and then use page_address() (for
virtual addr) and page_to_phys() (for physical addr) instead.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We call unmap_single, if available.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It goes to pci-dma.c, and is removed from the arch-specific files.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
i386 implements the declare coherent memory API, and x86_64 does not
it is reflected in pieces of dma_alloc_coherent and dma_free_coherent.
Those pieces are isolated in separate functions, that are declared
as empty macros in x86_64. This way we can make the code the same.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
They are placed in an ifdef, since they are i386 specific
the structure definition goes to dma-mapping.h.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
we merge the iommu initialization parameters in pci-dma.c
Nice thing, that both architectures at least recognize the same
parameters.
usedac i386 parameter is marked for deprecation
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The code for both arches are very similar, so this patch merge them.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
via_no_dac provides a fixup that is the same for both
architectures. Move it to pci-dma.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch moves the bootmem functions, that are largely
x86_64-specific into pci-dma.c. The code goes inside an ifdef.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
initcalls that triggers the various possibiities for
dma subsys are moved to pci-dma.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
merge pci-base_32.c and pci-nommu_64.c into pci-nommu.c
Their code were made the same, so now they can be merged.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move dma_ops structure definition to pci-dma.c, where it
belongs.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is done to get the code closer to x86_64.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In the very same way i386 do, we use WARN_ON functions
in map_simple and map_sg.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To make the code usable in i386, where we have high memory mappings,
we drop te virt_to_bus(sg_virt()) construction in favour of sg_phys.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds flush_write_buffers() in some functions of pci-nommu_64.c
They are added anywhere i386 would also have it. This is not a problem
for x86_64, since flush_rite_buffers() an nop for it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch implements mapping_error for pci-nommu_64.c.
It takes care to keep the same compatible behaviour it already
had. Although this file is not (yet) used for i386, we introduce
the i386 version here. Again, care is taken, even at the expense of
an ifdef, to keep the same behaviour inconditionally.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This functions are now called conditionally on their
existence in the struct. So just delete them, instead
of keeping an empty implementation.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch introduces pci-dma.c, a common file for pci dma
between i386 and x86_64. As a start, dma_set_mask() is the same
between architectures, and is placed there.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ERROR: "dma_supported" [drivers/ssb/ssb.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/qla2xxx/qla2xxx.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/aic7xxx/aic7xxx.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/aic7xxx/aic79xx.ko] undefined!
ERROR: "dma_supported" [drivers/net/pcnet32.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/saa7134/saa7134.ko] undefined!
ERROR: "dma_set_mask" [drivers/media/video/meye.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx8802.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx8800.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx88-alsa.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx23885/cx23885.ko] undefined!
They just need to be exported like on x86_64.
dma_supported() and dma_set_mask() were previously inlined,
but are now moved to pci-dma_32.c.
Since they're used by various drivers, they need to be
exported.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We provide a map_error function in pci-base_32.c to make
sure i386 keeps with the same behaviour it used to.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's initially 0, since we don't expect any DMA there.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is the way x86_64 does, so this make them equal. They have
to be extern now in the header, and the extern definition is moved to
the common dma-mapping.h header.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the old i386 implementation is moved to pci-base_32.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
i386 base does not need it, so it gets an empty function.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
That's already the name of the game for x86_64. For i386,
we add a pci-base_32.c, that will hold the default operations.
The function call itself goes through dma-mapping.h , the common
header
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
a system with 256 GB of RAM, when NUMA is disabled crashes the
following way:
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Cannot allocate aperture memory hole (ffff8101c0000000,65536K)
Kernel panic - not syncing: Not enough memory for aperture
Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33
Call Trace:
[<ffffffff84037c62>] panic+0xb2/0x190
[<ffffffff840381fc>] ? release_console_sem+0x7c/0x250
[<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90
[<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50
[<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680
[<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310
[<ffffffff84506a2f>] ? _etext+0x0/0x1
[<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40
[<ffffffff847ac795>] mem_init+0x45/0x1a0
[<ffffffff8479ff35>] start_kernel+0x295/0x380
[<ffffffff8479f1c2>] _sinittext+0x1c2/0x230
the root cause is : memmap PMD is too big,
[ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0
almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G.
solution will be:
1. make memmap allocation get memory above 4G...
2. reserve some dma32 range early before we try to set up memmap for all.
and release that before pci_iommu_alloc, so gart or swiotlb could get some
range under 4g limit for sure.
the patch is using method 2.
because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP
will get
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init)
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Only allocate the FPU area when the application actually uses FPU, i.e., in the
first lazy FPU trap. This could save memory for non-fpu using apps.
for example: on my system after boot, there are around 300 processes, with
only 17 using FPU.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Split the FPU save area from the task struct. This allows easy migration
of FPU context, and it's generally cleaner. It also allows the following
two optimizations:
1) only allocate when the application actually uses FPU, so in the first
lazy FPU trap. This could save memory for non-fpu using apps. Next patch
does this lazy allocation.
2) allocate the right size for the actual cpu rather than 512 bytes always.
Patches enabling xsave/xrstor support (coming shortly) will take advantage
of this.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
this function doesnt just 'find' the max_pfn - it also has
other side-effects such as registering sparse memory maps.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code forever. This can cause
time jumps in the range of hours.
This was reported in:
http://lkml.org/lkml/2007/8/23/96
and
http://lkml.org/lkml/2008/3/31/23
I was able to reproduce the problem with a gettimeofday loop test on a
dual core and a quad core machine which both have sychronized
TSCs. The TSCs seems not to be perfectly in sync though, but the
kernel is not able to detect the slight delta in the sync check. Still
there exists an extremly small window where this delta can be observed
with a real big time jump. So far I was only able to reproduce this
with the vsyscall gettimeofday implementation, but in theory this
might be observable with the syscall based version as well.
CPU 0 updates the clock source variables under xtime/vyscall lock and
CPU1, where the TSC is slighty behind CPU0, is reading the time right
after the seqlock was unlocked.
The clocksource reference data was updated with the TSC from CPU0 and
the value which is read from TSC on CPU1 is less than the reference
data. This results in a huge delta value due to the unsigned
subtraction of the TSC value and the reference value. This algorithm
can not be changed due to the support of wrapping clock sources like
pm timer.
The huge delta is converted to nanoseconds and added to xtime, which
is then observable by the caller. The next gettimeofday call on CPU1
will show the correct time again as now the TSC has advanced above the
reference value.
To prevent this TSC specific wreckage we need to compare the TSC value
against the reference value and return the latter when it is larger
than the actual TSC value.
I pondered to mark the TSC unstable when the readout is smaller than
the reference value, but this would render an otherwise good and fast
clocksource unusable without a real good reason.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements the PR_GET_TSC and PR_SET_TSC prctl()
commands on the x86 platform (both 32 and 64 bit.) These
commands control the ability to read the timestamp counter
from userspace (the RDTSC instruction.)
While the RDTSC instuction is a useful profiling tool,
it is also the source of some non-determinism in ring-3.
For deterministic replay applications it is useful to be
able to trap and emulate (and record the outcome of) this
instruction.
This patch uses code earlier used to disable the timestamp
counter for the SECCOMP framework. A side-effect of this
patch is that the SECCOMP environment will now also disable
the timestamp counter on x86_64 due to the addition of the
TIF_NOTSC define on this platform.
The code which enables/disables the RDTSC instruction during
context switches is in the __switch_to_xtra function, which
already handles other unusual conditions, so normal
performance should not have to suffer from this change.
Signed-off-by: Erik Bosman <ejbosman@cs.vu.nl>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This annotates NMI functions with notrace. Some tracers may be able
to live with this, but some cannot. The safest is to turn it off,
it's not particularly interesting anyway.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- noexec32 is on by default for years already
- add noexec32 to kernel-parameters and fix noexec typo in there
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix section mismatch warnings which occurs on my x86_64 box while compiling
linux-next-20080410:
Warning messages:
WARNING: arch/x86/kernel/built-in.o(.text+0x7bc2): Section mismatch in reference from the function bad_addr() to the
variable .init.data:early_res
The function bad_addr() references
the variable __initdata early_res.
This is often because bad_addr lacks a __initdata
annotation or the annotation of early_res is wrong.
WARNING: arch/x86/kernel/built-in.o(.text+0x7c3b): Section mismatch in reference from the function bad_addr_size() to
the variable .init.data:early_res
The function bad_addr_size() references
the variable __initdata early_res.
This is often because bad_addr_size lacks a __initdata
annotation or the annotation of early_res is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I've made a small investigation about vm86.h inclusion rules and it
looks like everything is more or less ok.
Files that rely on asm/vm86.h symbols are:
- kprobes.c
- process_32.c
- signal_32.c
- traps_32.c
- vm86_32.c
File process_32.c includes vm86.h explicitly. We can remove that
include and it won't break anything.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove old comments that include the old arch/i386 directory.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ramdisk is reserved via reserve_early in x86_64_start_kernel,
later early_res_to_bootmem() will convert to reservation in bootmem.
so don't need to reserve that again.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make x86 EFI code works when EFI_PAGE_SHIFT != PAGE_SHIFT. The
memrage_efi_to_native() provided in this patch can be used on other
EFI platform such as IA64 too.
This patch has been tested on Intel x86_64 platform with EFI 64/32
firmware.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
TF_MASK is no longer defined, use X86_EFLAGS_TF.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kgdb core fixes:
- Check to see that mm->mmap_cache is not null before calling
flush_cache_range(), else on arch=ARM it will cause a fatal
fault.
- Breakpoints should only be restored if they are in the BP_ACTIVE
state.
- Fix a typo in comments to "kgdb_register_io_module"
x86 kgdb fixes:
- Fix the x86 arch handler such that on a kill or detach that the
appropriate cleanup on the single stepping flags gets run.
- Add in the DIE_NMIWATCHDOG call for x86_64
- Touch the nmi watchdog before returning the system to normal
operation after performing any kind of kgdb operation, else
the possibility exists to trigger the watchdog.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add HW breakpoints into the arch specific portion of x86 kgdb. In the
current x86 kernel.org kernels HW breakpoints are changed out in lazy
fashion because there is no infrastructure around changing them when
changing to a kernel task or entering the kernel mode via a system
call. This lazy approach means that if a user process uses HW
breakpoints the kgdb will loose out. This is an acceptable trade off
because the developer debugging the kernel is assumed to know what is
going on system wide and would be aware of this trade off.
There is a minor bug fix to the kgdb core so as to correctly call the
hw breakpoint functions with a valid value from the enum.
There is also a minor change to the x86_64 startup code when using
early HW breakpoints. When the debugger is connected, the cpu startup
code must not zero out the HW breakpoint registers or you cannot hit
the breakpoints you are interested in, in the first place.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes the hang regression with kgdb when the NMI interrupt
comes in while the master core is returning from an exception.
Adjust the NMI logic such that KGDB will not stop NMI exceptions from
occurring by in general returning NOTIFY_DONE. It is not possible to
distinguish the debug NMI sync vs the normal NMI apic interrupt so
kgdb needs to catch the unknown NMI if it the debugger was previously
active on one of the cpus.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
simplified and streamlined kgdb support on x86, both 32-bit and 64-bit,
based on patch from:
Subject: kgdb: core-lite
From: Jason Wessel <jason.wessel@windriver.com>
[ and countless other authors - see the patch for details. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Move wakeup code to .c, so that video mode setting code can be shared
between boot and wakeup. Remove nasty assembly code in 64-bit case by
re-using trampoline code. Stack setup was fixed to clear high 16bits
of %esp, maybe that fixes some machines.
.c code sharing and morse code was done H. Peter Anvin, Sam Ravnborg
reviewed kbuild related stuff, and it seems okay to him. Rafael did
some cleanups.
[rjw:
* Made the patch stop breaking compilation on x86-32
* Added arch/x86/kernel/acpi/sleep.h
* Got rid of compiler warnings in arch/x86/kernel/acpi/sleep.c
* Fixed 32-bit compilation on x86-64 systems
* Added include/asm-x86/trampoline.h and fixed the non-SMP
compilation on 64-bit x86
* Removed arch/x86/kernel/acpi/sleep_32.c which was not used
* Fixed some breakage caused by the integration of smpboot.c done
under us in the meantime]
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this patch fixes section mismatch warnings (on x86_64 host) in setup_trampoline(),
which was referencing __initdata variables trampoline_data and trampoline_end.
Warning messages:
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x2b6a): Section mismatch in reference from the function setup_trampoline()
to the variable .init.data:trampoline_data
The function __cpuinit setup_trampoline() references
a variable __initdata trampoline_data.
If trampoline_data is only used by setup_trampoline then
annotate trampoline_data with a matching annotation.
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x2b71): Section mismatch in reference from the function setup_trampoline()
to the variable .init.data:trampoline_end
The function __cpuinit setup_trampoline() references
a variable __initdata trampoline_end.
If trampoline_end is only used by setup_trampoline then
annotate trampoline_end with a matching annotation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes mismatch warnings in smp_checks() (in arch/x86/kernel/smpboot.c):
WARNING: arch/x86/kernel/built-in.o(.text+0x11922): Section mismatch in reference from the function smp_checks()
to the variable .cpuinit.data:smp_b_stepping
The function smp_checks() references
the variable __cpuinitdata smp_b_stepping.
This is often because smp_checks lacks a __cpuinitdata
annotation or the annotation of smp_b_stepping is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > Make sure that we clear the "shutdown status flag" in the CMOS
> > register after each CPU is brought up. This fixes a problem where the
> > "shutdown status flag" may remain set when a CPU is brought up after
> > booting.
>
> btw., what problem does this result in, exactly?
The shutdown status flag set to "0xA", corresponds to "JMP double word
request without INT init".
This JMP at reboot time is at an unintended location. And results in
Triple faults in our case.
Though this error at reboot can be safely ignored in a VM environment,
am not sure what the effect would be on a physical system. May be it
will result in a triple fault and an eventual hardware reset thus
masking this BUG in the kernel.
This fix just makes sure that we reset that status flag after
initialization is done.
Fix paranoia about using BIOS quickboot mechanism.
Make sure that we clear the "shutdown status flag" in the CMOS register
after each CPU is brought up. This fixes a problem where the "shutdown
status flag" may remain set when a CPU is brought up after booting.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Arai <arai@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use cpumask_of_cpu() rather than the pair of cpus_clear() and cpu_set().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
No need to clear the memory allocated by alloc_bootmem().
It is already filled with zero.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove duplicate code by using ioapic_read_entry() and ioapic_write_entry()
in io_apic_{32,64}.c
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If one can find an ack pending pin, there is no need to check
the rest of them.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We should call for kfree if only we really need it.
Though it's safe to call kfree with NULL pointer passed
in this code we've already tested the pointer and can
eliminate the call
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu pointed out a bug in the previous patches,
fix double-shift of apicid.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cleanup references to the early cpu maps for the non-SMP configuration
and remove some functions called for SMP configurations only.
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
UV supports really big systems. So big, in fact, that the APICID register
does not contain enough bits to contain an APICID that is unique across all
cpus.
The UV BIOS supports 3 APICID modes:
- legacy mode. This mode uses the old APIC mode where
APICID is in bits [31:24] of the APICID register.
- x2apic mode. This mode is whitebox-compatible. APICIDs
are unique across all cpus. Standard x2apic APIC operations
(Intel-defined) can be used for IPIs. The node identifier
fits within the Intel-defined portion of the APICID register.
- x2apic-uv mode. In this mode, the APICIDs on each node have
unique IDs, but IDs on different node are not unique. For example,
if each mode has 32 cpus, the APICIDs on each node might be
0 - 31. Every node has the same set of IDs.
The UV hub is used to route IPIs/interrupts to the correct node.
Traditional APIC operations WILL NOT WORK.
In x2apic-uv mode, the ACPI tables all contain a full unique ID (note:
exact bit layout still changing but the following is close):
nnnnnnnnnnlc0cch
n = unique node number
l = socket number on board
c = core
h = hyperthread
Only the "lc0cch" bits are written to the APICID register. The remaining bits are
supplied by having the get_apic_id() function "OR" the extra bits into the value
read from the APICID register. (Hmmm.. why not keep the ENTIRE APICID register
in per-cpu data....)
The x2apic-uv mode is recognized by the MADT table containing:
oem_id = "SGI"
oem_table_id = "UV-X"
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add kernel support for new ACPI "sapic" tables that contain 16-bit APICIDs.
This patch simply adds parsing of an optional SAPIC table if present.
Otherwise, the traditional local APIC table is used.
Note: the SAPIC table is not a new ACPI table - it exists on other architectures
but is not currently recognized by x86_64.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Increase the number of bits in an apicid from 8 to 32.
By default, MP_processor_info() gets the APICID from the
mpc_config_processor structure. However, this structure limits
the size of APICID to 8 bits. This patch allows the caller of
MP_processor_info() to optionally pass a larger APICID that will
be used instead of the one in the mpc_config_processor struct.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add functions that can be used to determine if an x86_64
system is a SGI "UV" system. UV systems come in 3 types and
are identified by the OEM ID in the MADT.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce a function to read the local APIC_ID.
This change is in preparation for additional changes to
the APICID functions that will come in a later patch.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch renames VM_MASK to X86_VM_MASK (which
in turn defined as alias to X86_EFLAGS_VM) to better
distinguish from virtual memory flags. We can't just
use X86_EFLAGS_VM instead because it is also used
for conditional compilation
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The memory resource is also used for main memory, and we need it to
allocate physical addresses for memory hotplug. Knobbling io space is
enough to get the job done anyway.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Report when microcode was successfully updated. It used to be there but
now with DEBUG unset it becomes very silent. Also some cosmetic fixes.
Signed-off-by: Ben Castricum <lk08@bencastricum.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Upcoming 64 bit processors from Centaur can use sysenter.
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Jesse Ahrens <jahrens@centtech.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
By including processor-flags.h we are allowed to use predefined
macroses instead of keeping own ones
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On AMD SMM protected memory is part of the address map, but handled
internally like an MTRR. That leads to large pages getting split
internally which has some performance implications. Check for the
AMD TSEG MSR and split the large page mapping on that area
explicitely if it is part of the direct mapping.
There is also SMM ASEG, but it is in the first 1MB and already covered by
the earlier split first page patch.
Idea for this came from an earlier patch by Andreas Herrmann
On a RevF dual Socket Opteron system kernbench shows a clear
improvement from this:
(together with the earlier patches in this series, especially the
split first 2MB patch)
[lower is better]
no split stddev split stddev delta
Elapsed Time 87.146 (0.727516) 84.296 (1.09098) -3.2%
User Time 274.537 (4.05226) 273.692 (3.34344) -0.3%
System Time 34.907 (0.42492) 34.508 (0.26832) -1.1%
Percent CPU 322.5 (38.3007) 326.5 (44.5128) +1.2%
=> About 3.2% improvement in elapsed time for kernbench.
With GB pages on AMD Fam1h the impact of splitting is much higher of course,
since it would split two full GB pages (together with the first
1MB split patch) instead of two 2MB pages. I could not benchmark
a clear difference in kernbench on gbpages, so I kept it disabled
for that case
That was only limited benchmarking of course, so if someone
was interested in running more tests for the gbpages case
that could be revisited (contributions welcome)
I didn't bother implementing this for 32bit because it is very
unlikely the 32bit lowmem mapping overlaps into the TSEG near 4GB
and the 2MB low split is already handled for both.
[ mingo@elte.hu: do it on gbpages kernels too, there's no clear reason
why it shouldnt help there. ]
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Intel recommends to not use large pages for the first 1MB
of the physical memory because there are fixed size MTRRs there
which cause splitups in the TLBs.
On AMD doing so is also a good idea.
The implementation is a little different between 32bit and 64bit.
On 32bit I just taught the initial page table set up about this
because it was very simple to do. This also has the advantage
that the risk of a prefetch ever seeing the page even
if it only exists for a short time is minimized.
On 64bit that is not quite possible, so use set_memory_4k() a little
later (in check_bugs) instead.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When end_pfn is not aligned to 2MB (or 1GB) then the kernel might
map more memory than end_pfn. Account this in max_pfn_mapped.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently they are in .text.head because the rest of head_64.S.
.text.head is not removed as init data, but the early exception handlers
should be because they are not needed after early boot of the BP.
So move them over.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The early exception handlers are currently set up using a macro
recursion. There is only one user left. Replace the macro with a
standard loop in place.
Noop patch, just a cleanup.
[ tglx@linutronix.de: simplified ]
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All of early setup runs with interrupts disabled, so there is no
need to set up early exception handlers for vectors >= 32
This saves some minor text size.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Ingo Molnar (mingo@elte.hu) wrote:
>
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
>
> > The shadow vmap for DEBUG_RODATA kernel text modification uses
> > virt_to_page to get the pages from the pointer address.
> >
> > However, I think vmalloc_to_page would be required in case the page is
> > used for modules.
> >
> > Since only the core kernel text is marked read-only, use
> > kernel_text_address() to make sure we only shadow map the core kernel
> > text, not modules.
>
> actually, i think we should mark module text readonly too.
>
Yes, but in the meantime, the x86 tree would need this patch to make
kprobes work correctly on modules.
I suspect that without this fix, with the enhanced hotplug and kprobes
patch, kprobes will use text_poke to insert breakpoints in modules
(vmalloced pages used), which will map the wrong pages and corrupt
random kernel locations instead of updating the correct page.
Work that would write protect the module pages should clearly be done,
but it can come in a later time. We have to make sure we interact
correctly with the page allocation debugging, as an example.
Here is the patch against x86.git 2.6.25-rc5 :
The shadow vmap for DEBUG_RODATA kernel text modification uses virt_to_page to
get the pages from the pointer address.
However, I think vmalloc_to_page would be required in case the page is used for
modules.
Since only the core kernel text is marked read-only, use kernel_text_address()
to make sure we only shadow map the core kernel text, not modules.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
vSMP detection: access pci config space early in boot to detect if the
system is a vSMPowered box, and cache the result in a flag, so that
is_vsmp_box() retrieves the value of the flag always.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The sysenter path tries to enable interrupts immediately. Unfortunately
this doesn't work in a paravirt environment, because not enough kernel
state has been set up at that point (namely, pointing %fs to the kernel
percpu data segment). To fix this, defer ENABLE_INTERRUPTS until after
the kernel state has been set up.
Unfortunately this means that we're running with interrupts disabled
for a while without calling the IRQ tracing code, but that can't be
called without setting up %fs either.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch does clean up relocate_kernel_(32|64).S a bit by getting rid
of local PAGE_ALIGNED macro. We should use well-known PAGE_SIZE instead
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
make known_pat_cpu to think amd k8 and fam10h is ok too.
also make tom2 below to be WRBACK
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Sets up pat_init() infrastructure.
PAT MSR has following setting.
PAT
|PCD
||PWT
|||
000 WB _PAGE_CACHE_WB
001 WC _PAGE_CACHE_WC
010 UC- _PAGE_CACHE_UC_MINUS
011 UC _PAGE_CACHE_UC
We are effectively changing WT from boot time setting to WC.
UC_MINUS is used to provide backward compatibility to existing /dev/mem
users(X).
reserve_memtype and free_memtype are new interfaces for maintaining alias-free
mapping. It is currently implemented in a simple way with a linked list and
not optimized. reserve and free tracks the effective memory type, as a result
of PAT and MTRR setting rather than what is actually requested in PAT.
pat_init piggy backs on mtrr_init as the rules for setting both pat and mtrr
are same.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Initializing to zero is generally bad idea, I hope it is right for
__init data, too.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
do simple memtest after init_memory_mapping
use find_e820_area_size to find all ram range that is not reserved.
and do some simple bits test to find some bad ram.
if find some bad ram, use reserve_early to exclude that range.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After an experimental cleanup of <linux/percpu.h>, these files were
exposed as invoking kmalloc() without including <linux/slab.h>.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I was trying to get the address of instruction to be executed
next after the kprobed instruction. But regs->eip in post_handler()
contains value which is useless to the user. It's pre-corrected value.
This value is difficult to use without access to resume_execution(), which
is not exported anyway.
I moved the invocation of post_handler() to *after* resume_execution().
Now regs->eip contains meaningful value in post_handler().
I do not think this change breaks any backward-compatibility.
To make meaning of the old value, post_handler() would need access to
resume_execution() which is not exported. I have difficulty to believe
that previous, uncorrected, regs->eip can be meaningfully used in
post_handler().
Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use force_sig in handle_vm86_trap like other machine traps do.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we're stopped at syscall entry tracing, ptrace can change the %rax
value from -ENOSYS to something else. If no system call is actually made
because the syscall number (now in orig_rax) is bad, then we now always
reset %rax to -ENOSYS again.
This changes it to leave the return value alone after entry tracing.
That way, the %rax value set by ptrace is there to be seen in user mode
(or in syscall exit tracing). This is consistent with what the 32-bit
kernel does.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch removes the write-only timer_uses_ioapic_pin_0
(gsi can't be <= 15 in the line of it's fake usage in mpparse_32.c).
Spotted by the GNU C compiler.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Indicate TSCs are unreliable as time sources if the platform is
a multi chassi ScaleMP vSMPowered machine.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Re-arrange set_vsmp_pv_ops so that pv_ops are set only if
the platform has capability to support paravirtualized irq ops
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Fix the the build breakage when PARAVIRT is defined
but PCI is not
This fixes problem reported at:
http://marc.info/?l=linux-kernel&m=120525966600698&w=2
- Make is_vsmp_box() available even when PARAVIRT is not defined.
This is needed to determine if tsc's are reliable as a time source
even when PARAVIRT is not defined.
- split vsmp_init to use is_vsmp_box() and set_vsmp_pv_ops()
set_vsmp_pv_ops will do nothing if PCI is not enabled in the config.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
is_vsmp_box() currently does not work on vSMPowered systems, as pci cfg
space is not read correctly -- This patch fixes it.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the last leftovers from the files. Move the ones
that are still used to the files they belong, the others
that grep can't reach, simply throw away.
Merge comments ontop of file and that's it: smpboot integrated
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They are i386 specific (the x86_64 definitions live
elsewhere, and should remain there), so are enclosed around
an ifdef
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this is the last remaining function in smpboot_32.c
Since it is i386 specific, move it around an ifdef to
smpboot.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the previous changes, code for native_smp_prepare_cpus()
in i386 and x86_64 now look very similar. merge them into
smpboot.c. Minor differences are inside ifdef
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86_64 has two nr_ioapics = 0 statements. In 32-bit, it can be done
too. We do it through the smpboot_clear_io_apic() inline function,
to cope with subarchitectures (visws) that does not compile mpparse in
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They are mostly inocuous. APIC_INTEGRATED will expand to 1,
check_phys_apicid_present is checking for the same thing it was before,
etc. But the code is identical to i386 now, and will allow us to
integrate it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This test exists in x86_64 and also applies to i386. So we add it
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
An APIC test is moved, and code is replaced by the mach-default
already defined function (smpboot_setup_io_apic).
setup_portio_remap() is added, but it is a nop in mach-default.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add function calls to native_smp_prepare_cpus in i386
to match x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch get rid of smp_boot_cpus(), since it does not
boot any cpu anymore. Its code is split in a way to make
it closer to x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
if smp configuration is not found at all, hook into 0.
This is done to match x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They look similar enough, and are merged. Only difference
(zap_low_mapping for i386) is inside ifdef
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
it is practically the same between arches now, so it is
moved to smpboot.c. Minor differences (gdt initialization)
live inside an ifdef
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It now looks the same between architectures, so we
merge it in smpboot.c. Minor differences goes inside
an ifdef
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a very large patch, because it depends on a lot
of auxiliary static functions. But they all have been modified
to the point that they're sufficiently close now. So they're just
merged in smpboot.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is to match i386. The former name was cuter,
but the current is more meaningful and more general,
since cpu_id can be a logical id.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
voyager would conflict with it, but the types are ultimately
compatible. So remove the extern definition from voyager_smp.c
in favour of the common one
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move map_cpu_to_logical_apicid() and unmap_cpu_to_logical_apicid()
to smpboot.c. They take together all the bunch of static functions
they rely upon
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that we boot cpus here, callin_map has this meaning (same
as x86_64)
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
wakeup_secondary_via_INIT => wakeup_secondary_cpu.
This is to match i386, where init is not always used.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After the inclusion, a lot of files needs fixing for conflicts,
some of them in the headers themselves, to accomodate for both
i386 and x86_64 versions.
[ mingo@elte.hu: build fix ]
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch provides minor adjustments for do_boot_cpus
in both architectures to allow for integration
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We do it to make it close to x86_64. The later needs it,
otherwise the nmi watchdog can get into the scene and kill us
with a hammer.
Enabling irqs here used to trigger a bug in i386. This is because
time irq handling relies upon structures that are only initialized
after smp initcalls (More precisely, it will find
per_cpu(hrtimer_bases, cpu)->cb_pending list not initialized and crash)
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It splits setup_local_APIC in two, providing a function corresponding
to the ending part of it. As a side effect, smp_callin looks the same
between i386 and x86_64.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
it is a little bit more complicated than x86_64 due to erratas and
other stuff, but its existance will ease integration
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We introduce empty macros just to make them look like the same
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use a new worker, with help of the create_idle struct
to fork the idle thread. We now have two workers, the first
of them triggered by __smp_prepare_cpu. But the later is
going away soon.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After all the infrastructure work, we're now prepared
to boot the cpus from cpu_up, and not from prepare_cpus.
So the difference between cold boot and hotplug is effectively
over, and the functions are used to the purposes they're meant to.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It was okay when cpus were cold booted before this point.
But with the new state machine, they will not have arrived to
the trampoline yet. zapping low mappings will have the bad effect
of breaking it completely after paging enablement
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Only call schedule_work if keventd is already running.
This is already the way x86_64 does
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
it is redundant, since it is already done by set_cpu_sibling_map()
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Two more files goes away. nmi_64.h and nmi_32.h gives birth
to nmi.h
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We move it to apic_32.c, since it's irq related anyway,
and only called from that file.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We do it and also fix conflicts, which makes x86_64 automatically
closer to i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Do it and also fix conflicts, which automatically makes
x86_64 look closer to i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the cpu count is changed accordingly: now, what matters is
online cpus.
Also, we add those functions for x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impressing friends is a very important thing.
Do it in a separate function to make it even more
explicit, and ease integration.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is the way x86_64 does, and complement the already
present patch that does the bios cpu to apicid mapping here
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We fill the per-cpu (or array) that maps
bios cpu id to apicid in mpparse_32.c, the way x86_64 does
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We use the same routing as x86_64, moved now to setup.c.
Just with a few ifdefs inside.
Note that this routing uses prefill_possible_map().
It has the very nice side effect of allowing hotplugging of
cpus that are marked as present but disabled by acpi bios.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this will serve as a reference as to whether or not to
use the per_cpu variables in mpparse. Done the same way
as x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This mapping already exists in x86_64, just provide it for
i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have already removed the only condition that could fail here.
so just don't test for any return value
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Do tests before do_boot_cpu in native_cpu_up for i386.
Tests are a little bit broader than originally, and are the
same as x86_64. Test for smp_callin is not applicable right now
and is deferred.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Isolate all sanity checking in a smp_sanity_check()
function as x86_64 does.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
goal is to have i386 and x86_64 closer, so we
add barriers to match
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch does not change the behaviour of x86_64, since APIC_INTEGRATED
is always defined as (1). But the code now matches exactly i386 version
(well, this part of the code, at least)
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is done so we call setup_secondary_clock() in the same place x86_64
does. A separate patch for this is appearantly not needed. But clock
initialization is such a delicate thing, that it's safer to do this way
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This matches x86_64 behaviour, which is a superior one IMHO
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
now that it is the same between arches, put it into smpboot.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Call it conditionally for secondary cpus. This behaviour
matches i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
provide two specialized identify_secondary_cpu() and identify_boot_cpu()
routines for x86_64. Although not strictly needed, they are functionally
correct, and will ease integration with i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The split of smp_store_cpu_info in a quirks-only part
will ease integration with x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is used to match i386. The definition for the non-paravirt
case is moved to smp.h instead of smp_32.h
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch replaces apic_read() for apic_read_around()
and apic_write for apic_write_around() in smpboot_64.c
We do it to have a common usage between x86_64 and i386.
In the former, it will always simply expand to apic_write
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add loglevel facilities to printks in __inquire_remote_apic.
the levels are the ones to match x86_64 ones.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
change some variables' types in __inquire_remote_apic to
match x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix coding style in pci-dma_64.c and add stubs for documentation. I
hope someone fills the rest, I understand maybe off and soft...
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When acpi=off or there is no SRAT defined, apicid_to_node is got from K8
Northbridge PCI configuration space in k8_scan_nodes() in
arch/x86_64/mm/k8toplogy.c.
The problem is that it assumes bsp apic id is 0 at that point.
For four socket system with Quad core cpus installed, all cpus apic id
is offset by 4, and bsp apic id is 4.
For eight socket system with dual core cpus installed, all cpus apic id
is offset by 2, and bsp apic id is 2.
We need get boot_cpu_id --- bsp apic id, before k8_scan_nodes by called.
So create early_acpi_boot_init and early_get_smp_config for get boot_cpu_id.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Otherwise, enabling (or better, subsequent disabling) of single
stepping would cause a kernel oops on CPUs not having this MSR.
The patch could have been added a conditional to the MSR write in
user_disable_single_step(), but centralizing the updates seems safer
and (looking forward) better manageable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the x86 native_smp_send_reschedule_function(), don't send the IPI if the
cpu has gone offline already. Warn nevertheless!!
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix:
ERROR: do not initialise externals to 0 or NULL
Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
store initial_apicid from early identify. it is could be different from
phys_proc_id later.
also print it out in /proc/cpuinfo.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix a memcpy that should be a text_poke (in apply_alternatives).
Use kernel_wp_save/kernel_wp_restore in text_poke to support DEBUG_RODATA
correctly and so the CPU HOTPLUG special case can be removed.
Add text_poke_early, for alternatives and paravirt boot-time and module load
time patching.
Changelog:
- Fix text_set and text_poke alignment check (mixed up bitwise and and or)
- Remove text_set
- Export add_nops, so it can be used by others.
- Document text_poke_early.
- Remove clflush, since it breaks some VIA architectures and is not strictly
necessary.
- Add kerneldoc to text_poke and text_poke_early.
- Create a second vmap instead of using the WP bit to support Xen and VMI.
- Move local_irq disable within text_poke and text_poke_early to be able to
be sleepable in these functions.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: H. Peter Anvin <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
for system with apicid lifting, boot cpu apicid will be 4
got:
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU 0/4 -> Node 0
CPU: Physical Processor ID: 1
CPU: Processor Core ID: 0
so try to offset apicid back before get phys_proc_id with bits shift.
then we can get correct socket ID
also remove remove cpu_data(0) reference.
because cpu_data(0) only be ready after smp_prepare_cpus with the assignment
from boot_cpu_data to current_cpu_data aka cpu_data(0).
and check_bugs()==>identify_cpu(&boot_cpu_data) is quite before than
smp_prepare_cpus. So just use boot_cpu_id instead.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
wraps the busy loop for wait_for_init_deasserted() in a function,
so smp_callin in x86_64 looks like more i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
setup_trampoline() looks very similar between architectures, and this
patch unifies them. The i386 version allocates bootmem memory, while
the x86_64 version uses a fixed address.
In this patch, we initialize the global trampoline_base to the x86_64 version,
and i386 allocation can later override it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The parameter passing parsing is done in the common smpboot.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
it was already cleared two lines above, and so, this removal
is bogus
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
it is already used in x86_64. In i386, it only
removes from cpu_online_map
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This lock does not protect cpu_online_map, so its
length can be shortened, and in some cases, removed.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
set_cpu_sibling_map() and remove_sibling_info() are
equal between architectures, and are now moved to common
file
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move definitions that are now equal in type from
smpboot_{32,64}.c to smpboot.c
cpu_callin_map is put temporarily in smp_64.h (already
exists in smp_32.h), and will soon be merged.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch merges the copyright notices, and valuable
comments that were left back on smp_{32,64}.c. With that,
files are empty, and are deleted
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this patch creates tlb_32.c and tlb_64.c, with
tlb-related functions that used to live in smp*.c files.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch moves all ipi and apic related functions
from smp_32.c to ipi.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this patch moves all the functions and data structures that look
like exactly the same from smp_{32,64}.c to smp.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
function definition is moved to common header.
x86_64 version is now called native_smp_send_stop
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This can be safely added to i386. After that,
functions look exactly the same for both arches
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
with the hlt_works change, it is possible to have i386
and x86_64 stop_this_cpu() looking exactly the same. They
can, after that, be merged.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the two versions (the inner version, and the outer version, that takes
the locks) of smp_call_function_mask are made into one. With the changes,
i386 and x86_64 versions look exactly the same.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This function is used in smp_send_stop(). It's like
smp_call_function_mask, but always go to all online cpus,
and does not take any locks.
It is added to x86_64, but will soon be unified in a common file
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch creates smpcommon.c with functions that are
equal between architectures. The i386-only init_gdt
is ifdef'd.
Note that smpcommon.o figures twice in the Makefile:
this is because sub-architectures like voyager that does
not use the normal smp_$(BITS) files also have to access them
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
with this removal, exports for both i386 and x86_64,
regarding the "smp_call_function" series are now the same.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this patches moves prefill_possible_map() to smpboot.c
Right now it is x86_64-specific, but nothing intrinsically
prevents it to be used by i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this patch allows a cpu to be marked as present but disabled in i386,
just as x86_64 currently does.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
definition is moved to common header. x86_64 version is now called
native_smp_cpus_done
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
definition is moved to common header. x86_64 version is now called
native_smp_prepare_cpus
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
definition is moved to common header. x86_64 version is now called
native_prepare_boot_cpu
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
function definition is moved to common header. x86_64 version
is now called native_cpu_up
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
definition is moved to common header, x86_64 function name
now is native_smp_call_function_mask
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
function definition is moved to common header, x86_64 version is now called
native_smp_send_reschedule
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the smp_ops symbol is temporarily defined in smp_64.c, but it will soon
be unified
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge pointed out that looking at the boot_params
struct to determine if the system is running in a paravirtual
environment is not reliable for the Xen case, currently. He also
points out that there already exists a function to determine if
the system is running in a paravirtual environment. So let's use
that instead. This gets rid of the preprocessor test too.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge pointed out that looking at the boot_params
struct to determine if the system is running in a paravirtual
environment is not reliable for the Xen case, currently. He also
points out that there already exists a function to determine if
the system is running in a paravirtual environment. So let's use
that instead. This gets rid of the preprocessor test too.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current tsc_init() clears the TSC feature bit if the TSC khz
cannot be calculated, causing us to panic in
arch/x86/kernel/cpu/bugs.c check_config(). We should simply mark it
unstable.
Frankly, someone should take an axe to this code. mark_tsc_unstable()
not only marks it unstable, but sets tsc_enabled to 0, which seems
redundant but is actually important here because means it won't be
used by sched_clock() either. Perhaps a tristate enum "UNUSABLE,
UNSTABLE, OK" would be clearer, and separate mark_tsc_unstable() and
mark_tsc_broken() functions?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is an add-on to the 64-bit ebda patch. It makes
the functions reserve_ebda_region (renamed from reserve_ebda)
and copy_e820_map equal to the 32-bit versions of the previous
patch.
Changes:
Use u64 and u32 for local variables in copy_e820_map.
The amount of conventional memory and the start of the EBDA are
detected by reading the BIOS data area directly. Paravirtual
environments do not provide this area, so we bail out early
in that case. They will just have to set up a correct memory
map to start with.
Add a safety net for zeroed out BIOS data area.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Explicitly reserve_early the whole address range from the end of
conventional memory as reported by the bios data area up to the
1Mb mark. Regard the info retrieved from the BIOS data area with
a bit of paranoia, though, because some biosses forget to register
the EBDA correctly.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds explicit detection of the EBDA and reservation
of the rom and adapter address space 0xa0000-0x100000 to the
i386 kernels. Before this patch, the EBDA size was hardcoded
as 4Kb. Also, the reservation of the adapter range was done by
modifying the e820 map which is now not necessary any longer,
and that code is removed from copy_e820_map.
The amount of conventional memory and the start of the EBDA are
detected by reading the BIOS data area directly. Paravirtual
environments do not provide this area, so we bail out early
in that case. They will just have to set up a correct memory
map to start with.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Depends on:
[PATCH 2/3] x86: coding style fixes to arch/x86/kernel/early_printk.c
Remove two:
ERROR: do not initialise statics to 0 or NULL
paolo@paolo-desktop:/tmp/c$ size *
text data bss dec hex filename
1172 280 12 1464 5b8 early_printk.o.after
1172 280 12 1464 5b8 early_printk.o.before
This patch is changing the binary output:
paolo@paolo-desktop:/tmp/c$ md5sum *
dad9a9a881e0eeda62cc5645bd3d7cad early_printk.o.after
da32f5cd8f248970e4809e1005393e95 early_printk.o.before
because the two variables moved to another section. No
change in functionality.
Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
early_init_intel() on 64-bit is introduced by
commit 2b16a23538
Author: Andi Kleen <ak@suse.de>
Date: Wed Jan 30 13:32:40 2008 +0100
x86: move X86_FEATURE_CONSTANT_TSC into early cpu feature detection
sets CONSTANT_TSC for intel cpus - but it is already set in init_intel().
don't need to set that two times in early_init_intel() and init_intel().
this patch removes the init_intel() one.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
quad core 8 socket system will have apic id lifting.the apic id range could
be [4, 0x23]. and apic_is_clustered_box will think that need to three clusters
and that is larger than 2. So it is treated as a clustered_box.
and will get:
Marking TSC unstable due to TSCs unsynchronized
even if the CPUs have X86_FEATURE_CONSTANT_TSC set.
this quick fix will check if the cpu is from AMD.
but vsmp still needs that checking...
this patch is fix to make sure that vsmp not to be passed.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
when comparing the e820 direct from BIOS, and the one by kexec:
BIOS-provided physical RAM map:
- BIOS-e820: 0000000000000000 - 0000000000097400 (usable)
+ BIOS-e820: 0000000000000100 - 0000000000097400 (usable)
BIOS-e820: 0000000000097400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dffa0000 (usable)
- BIOS-e820: 00000000dffae000 - 00000000dffb0000 type 9
+ BIOS-e820: 00000000dffae000 - 00000000dffb0000 (reserved)
BIOS-e820: 00000000dffb0000 - 00000000dffbe000 (ACPI data)
BIOS-e820: 00000000dffbe000 - 00000000dfff0000 (ACPI NVS)
BIOS-e820: 00000000dfff0000 - 00000000e0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
- BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
=======> that is the local apic address... somewhere we lost it
BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000004020000000 (usable)
found one entry about reserved is missing for the kernel by kexec.
it turns out init_apic_mappings is called before e820_reserve_resources
in setup_arch. but e820_reserve_resources is using request_resource.
it will not handle the conflicts.
there are three ways to fix it:
1. change request_resource in e820_reserve_resources to to insert_resource
2. move init_apic_mappings after e820_reserve_resources
3. use late_initcall to insert lapic resource.
this patch is using method 3, that is less intrusive.
in later version could consider to use method 1.
before patch
fed20000-ffffffff : PCI Bus #00
fee00000-fee00fff : Local APIC
fefff000-feffffff : pnp 00:09
ff700000-ffffffff : reserved
with patch will get map in first kernel
fed20000-ffffffff : PCI Bus #00
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
fefff000-feffffff : pnp 00:09
ff700000-ffffffff : reserved
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e820_resource_resources could use insert_resource instead of request_resource
also move code_resource, data_resource, bss_resource, and crashk_res
out of e820_reserve_resources.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Copy x86_64 and add a head32.c so we can start moving early
architecture initialization out of assembly.
[ Sam Ravnborg <sam@ravnborg.org>: updated it to x86 ]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
quad core 8 socket system will have apic id lifting.the apic id range could
be [4, 0x23]. and apic_is_clustered_box will think that need to three clusters
and that is large than 2. So it is treated as clustered_box.
and will get
Marking TSC unstable due to TSCs unsynchronized
even the CPUs have X86_FEATURE_CONSTANT_TSC set.
this patch will check if the cpu is from AMD.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now cpu/proc.c and cpu/proc_64.c are same.
So cpu/proc_64.c can be removed.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change /proc/cpuinfo on 32-bit, it will look like on 64-bit.
'power management' line is added and power management information
will be printed at the line.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86 /proc/cpuinfo code can be unified.
This is the first step of unification.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch make the file errors free.
Only 4 "WARNING: line over 80 characters" left.
arch/x86/kernel/cpu/mcheck/p5.o:
text data bss dec hex filename
452 0 4 456 1c8 p5.o.before
452 0 4 456 1c8 p5.o.after
md5:
50c945ef150aa95bf0481cc3e1dc3315 p5.o.before.asm
50c945ef150aa95bf0481cc3e1dc3315 p5.o.after.asm
Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Kills more than 150 errors/warnings
Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix code to access CMOS rtc registers so that it does not use inb_p and
outb_p routines, which are deprecated. Extensive research on all known
CMOS RTC chipset timing shows that there is no need for a delay in
accessing the registers of these chips even on old machines. These chipa
are never on an expansion bus, but have always been "motherboard"
resources, either in the processor chipset or explicitly on the
motherboard, and they are not part of the ISA/LPC or PCI buses, so
delays should not be based on bus timing. The reason to fix it:
1) port 80 writes often hang some laptops that use ENE EC chipsets,
esp. those designed and manufactured by Quanta for HP;
2) RTC accesses are timing sensitive, and extra microseconds may matter;
3) the new "io_delay" function is calibrated by expansion bus timing needs,
thus is not appropriate for access to CMOS rtc registers.
Signed-off-by: David P. Reed <dpreed@reed.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Replace the hardcoded list of initialization functions for each CPU
vendor by a list in an ELF section, which is read at initialization in
arch/x86/kernel/cpu/cpu.c to fill the cpu_devs[] array. The ELF
section, named .x86cpuvendor.init, is reclaimed after boot, and
contains entries of type "struct cpu_vendor_dev" which associates a
vendor number with a pointer to a "struct cpu_dev" structure.
This first modification allows to remove all the VENDOR_init_cpu()
functions.
This patch also removes the hardcoded calls to early_init_amd() and
early_init_intel(). Instead, we add a "c_early_init" member to the
cpu_dev structure, which is then called if not NULL by the generic CPU
initialization code. Unfortunately, in early_cpu_detect(), this_cpu is
not yet set, so we have to use the cpu_devs[] array directly.
This patch is part of the Linux Tiny project, and is needed for
further patch that will allow to disable compilation of unused CPU
support code.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It becomes to early for ioremap, so we use early_ioremap
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change Makefile so vsmp_64.o object is dependent
on PARAVIRT, rather than X86_VSMP
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ tglx@linutronix.de: cleanup the other structs as well ]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The extended century readout does not solve the year 2038 problem on
32bit!
v2: Fix compilation on !ACPI, pointed out by tglx
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We assume that the RTC clock is BCD, so print a warning if it claims
to be binary.
[ tglx@linutronix.de: changed to WARN_ON - we want to know that!
If no one reports it we can remove the complete if (RTC_ALWAYS_BCD)
magic, which has RTC_ALWAYS_BCD defined to 1 since Linux 1.0 ... ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We know it is already after 2000. Use the year 2000 offset for both 32
and 64 bit, which removes ifdefs and the 1970 magic.
[ tglx@linutronix.de: remove 1970 magic, replace bogus commit message ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change size to unsigned long, becase caller and user all used unsigned long.
Also make bad_addr take an alignment parameter.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If the vDSO was not mapped, don't use it as the "restorer" for a signal
handler. Whether we have a pointer in mm->context.vdso depends on what
happened at exec time, so we shouldn't check any global flags now.
Background:
Currently, every 32-bit exec gets the vDSO mapped even if it's disabled
(the process just doesn't get told about it). Because it's in fact
always there, the bug that this patch fixes cannot happen now. With
the second patch, it won't be mapped at all when it's disabled, which is
one of the things that people might really want when they disable it (so
nothing they didn't ask for goes into their address space).
The 32-bit signal handler setup when SA_RESTORER is not used refers to
current->mm->context.vdso without regard to whether the vDSO has been
disabled when the process was exec'd. This patch fixes this not to use
it when it's null, which becomes possible after the second patch. (This
never happens in normal use, because glibc's sigaction call uses
SA_RESTORER unless glibc detected the vDSO.)
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
people sometimes do crazy stuff like building really large static
arrays into their kernels or building allyesconfig kernels. Give
more space to the kernel and push modules up a bit: kernel has
512 MB and modules have 1.5 GB.
Should be enough for a few years ;-)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/apm_32.c:1215: warning: label 'out' defined but not used
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
AFAICT pm_send_all is a nop when noone uses pm_register...
Hmm.. can we just force CONFIG_PM_LEGACY=n, and see what happens?
Or maybe this is better idea? It may break build somewhere, but it
should be easy to fix... (it builds here, i386 and x86-64).
Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
The prevent_tail_call() macro works around the problem of the compiler
clobbering argument words on the stack, which for asmlinkage functions
is the caller's (user's) struct pt_regs. The tail/sibling-call
optimization is not the only way that the compiler can decide to use
stack argument words as scratch space, which we have to prevent.
Other optimizations can do it too.
Until we have new compiler support to make "asmlinkage" binding on the
compiler's own use of the stack argument frame, we have work around all
the manifestations of this issue that crop up.
More cases seem to be prevented by also keeping the incoming argument
variables live at the end of the function. This makes their original
stack slots attractive places to leave those variables, so the compiler
tends not clobber them for something else. It's still no guarantee, but
it handles some observed cases that prevent_tail_call() did not.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch also resolves hangs on boot:
http://lkml.org/lkml/2008/2/23/263http://bugzilla.kernel.org/show_bug.cgi?id=10093
The bug was causing once-in-few-reboots 10-15 sec wait during boot on
certain laptops.
Earlier commit 40d6a14662 added
smp_call_function in cpu_idle_wait() to kick cpus that are in tickless
idle. Looking at cpu_idle_wait code at that time, code seemed to be
over-engineered for a case which is rarely used (while changing idle
handler).
Below is a simplified version of cpu_idle_wait, which just makes a dummy
smp_call_function to all cpus, to make them come out of old idle handler
and start using the new idle handler. It eliminates code in the idle
loop to handle cpu_idle_wait.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc expects all toplevel assembly to return to the original section type.
The code in alteranative.c does not do this. This caused some strange bugs
in sched-devel where code would end up in the .rodata section and when
the kernel sets the NX bit on all .rodata, the kernel would crash when
executing this code.
This patch adds a .previous marker to return the code back to the
original section.
Credit goes to Andrew Pinski for telling me it wasn't a gcc bug but a
bug in the toplevel asm code in the kernel. ;-)
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In time_cpufreq_notifier() the cpu id to act upon is held in freq->cpu. Use it
instead of smp_processor_id() in the call to set_cyc2ns_scale().
This makes the preempt_*able() unnecessary and lets set_cyc2ns_scale() update
the intended cpu's cyc2ns.
Related mail/thread: http://lkml.org/lkml/2007/12/7/130
Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
revert:
| commit 47001d6033
| Author: Thomas Gleixner <tglx@linutronix.de>
| Date: Tue Apr 1 19:45:18 2008 +0200
|
| x86: tsc prevent time going backwards
it has been identified to cause suspend regression - and the
commit fixes a longstanding bug that existed before 2.6.25 was
opened - so it can wait some more until the effects are better
understood.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We handle a broken tsc these days, so no need to panic. We clear the
TSC bit when tsc_init decides it's unreliable (eg. under lguest w/ bad
host TSC), leading to bogus panic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The commits:
commit 37a47db8d7
Author: Balaji Rao <balajirrao@gmail.com>
Date: Wed Jan 30 13:30:03 2008 +0100
x86: assign IRQs to HPET timers, fix
and
commit e3f37a54f6
Author: Balaji Rao <balajirrao@gmail.com>
Date: Wed Jan 30 13:30:03 2008 +0100
x86: assign IRQs to HPET timers
have been identified to cause a regression on some platforms due to
the assignement of legacy IRQs which makes the legacy devices
connected to those IRQs disfunctional.
Revert them.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=10382
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code for ever. This can cause
time jumps in the range of hours.
This was reported in:
http://lkml.org/lkml/2007/8/23/96
and
http://lkml.org/lkml/2008/3/31/23
I was able to reproduce the problem with a gettimeofday loop test on a
dual core and a quad core machine which both have sychronized
TSCs. The TSCs seems not to be perfectly in sync though, but the
kernel is not able to detect the slight delta in the sync check. Still
there exists an extremly small window where this delta can be observed
with a real big time jump. So far I was only able to reproduce this
with the vsyscall gettimeofday implementation, but in theory this
might be observable with the syscall based version as well.
CPU 0 updates the clock source variables under xtime/vyscall lock and
CPU1, where the TSC is slighty behind CPU0, is reading the time right
after the seqlock was unlocked.
The clocksource reference data was updated with the TSC from CPU0 and
the value which is read from TSC on CPU1 is less than the reference
data. This results in a huge delta value due to the unsigned
subtraction of the TSC value and the reference value. This algorithm
can not be changed due to the support of wrapping clock sources like
pm timer.
The huge delta is converted to nanoseconds and added to xtime, which
is then observable by the caller. The next gettimeofday call on CPU1
will show the correct time again as now the TSC has advanced above the
reference value.
To prevent this TSC specific wreckage we need to compare the TSC value
against the reference value and return the latter when it is larger
than the actual TSC value.
I pondered to mark the TSC unstable when the readout is smaller than
the reference value, but this would render an otherwise good and fast
clocksource unusable without a real good reason.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix obsolete printks in aperture-64. We used not to handle missing
agpgart, but we handle it okay now.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
right now if there's no CPU support for nmi_watchdog=2 we'll just
refuse it silently.
print a useful warning.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
implement nmi_watchdog=2 on this class of CPUs:
cpu family : 15
model : 6
model name : Intel(R) Pentium(R) D CPU 3.00GHz
the watchdog's ->setup() method is safe anyway, so if the CPU
cannot support it we'll bail out safely.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This avoids using wrmsr on MSR_IA32_DEBUGCTLMSR when it's not needed.
No wrmsr ever needs to be done if noone has ever used block stepping.
Without this change, using ptrace on 2.6.25 on an x86 KVM guest
will tickle KVM's missing support for the MSR and crash the guest
kernel. Though host KVM is the buggy one, this makes for a regression
in the guest behavior from 2.6.24->2.6.25 that we can easily avoid.
I also corrected some bad whitespace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the problem that makedumpfile sometimes fails on x86_64 machine.
This patch adds the symbol "phys_base" to a vmcoreinfo data. The
vmcoreinfo data has the minimum debugging information only for dump
filtering. makedumpfile (dump filtering command) gets it to distinguish
unnecessary pages, and makedumpfile creates a small dumpfile.
On x86_64 kernel which compiled with CONFIG_PHYSICAL_START=0x0 and
CONFIG_RELOCATABLE=y, makedumpfile fails like the following:
# makedumpfile -d31 /proc/vmcore dumpfile
The kernel version is not supported.
The created dumpfile may be incomplete.
_exclude_free_page: Can't get next online node.
makedumpfile Failed.
#
The cause is the lack of the symbol "phys_base" in a vmcoreinfo data.
If the symbol "phys_base" does not exist, makedumpfile considers an
x86_64 kernel as non relocatable. As the result, makedumpfile
misunderstands the physical address where the kernel is loaded, and it
cannot translate a kernel virtual address to physical address correctly.
To fix this problem, this patch adds the symbol "phys_base" to a
vmcoreinfo data.
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/x86/kernel/ptrace.c:548: warning: 'ptrace_bts_get_size' defined but not used
arch/x86/kernel/ptrace.c:558: warning: 'ptrace_bts_read_record' defined but not used
arch/x86/kernel/ptrace.c:607: warning: 'ptrace_bts_clear' defined but not used
arch/x86/kernel/ptrace.c:617: warning: 'ptrace_bts_drain' defined but not used
arch/x86/kernel/ptrace.c:720: warning: 'ptrace_bts_config' defined but not used
arch/x86/kernel/ptrace.c:788: warning: 'ptrace_bts_status' defined but not used
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
we could call find_max_pfn() directly instead of setup_memory() to get
max_pfn needed for mtrr trimming.
otherwise setup_memory() is called two times... that is duplicated...
[ mingo@elte.hu: both Thomas and me simulated a double call to
setup_bootmem_allocator() and can confirm that it is a real bug
which can hang in certain configs. It's not been reported yet but
that is probably due to the relatively scarce nature of
MTRR-trimming systems. ]
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Wed, 26 Mar 2008 11:56:22 -0600
Jordan Crouse <jordan.crouse@amd.com> wrote:
> On 26/03/08 14:31 +0100, Stefan Pfetzing wrote:
> > Hello Jordan,
> >
> > I just tried to build your geodwdt driver for the geode watchdog. Therefore
> > I pulled your repository from http://git.infradead.org/geode.git (or more,
> > the git url).
> >
> > I tried to build the geodewdt driver as a module - which didn't work, and
> > it failed with the same problem as earlier mentioned on lkmk [1]. I also
> > checked the fix [2], but that seems to be already in your (or linus) tree -
> > and so I'm unsure what the problem is.
> >
> > [1] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884074
> > [2] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884174
> >
> > Building directly into the kernel seems to work.
> >
> > Maybe you have some idea?
>
> Hmm - that is strange. Exporting the symbols should work. I recommend
> starting over with a clean tree.
>
> CCing Andres - any thoughts?
>
> Jordan
>
Er, yeah. The patch below should fix it. This should probably go into
2.6.25.
Oops, EXPORT_SYMBOL_GPL wasn't being declared due to this header
being missing.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I have found that using SMI to change the cpu's frequency on my DELL
Latitude L400 clobbers the ECX register in speedstep_set_state, causing
unneccessary retries because the "state" variable has changed silently (GCC
assumes it is still present in ECX).
play safe and avoid gcc caching any register across IO port accesses
that trigger SMIs.
Signed-off by: <Stephan.Diestelhorst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Convert function comment blocks to kernel-doc notation.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Revert
commit f62f1fc9ef
Author: Yinghai Lu <yhlu.kernel@gmail.com>
Date: Fri Mar 7 15:02:50 2008 -0800
x86: reserve dma32 early for gart
The patch has a dependency on bootmem modifications which are not .25
material that late in the -rc cycle. The problem which is addressed by
the patch is limited to machines with 256G and more memory booted with
NUMA disabled. This is not a .25 regression and the audience which is
affected by this problem is very limited, so it's safer to do the
revert than pulling in intrusive bootmem changes right now.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix the bug reported here:
http://bugzilla.kernel.org/show_bug.cgi?id=10232
use update_memory_range() instead of add_memory_range() directly
to avoid closing the gap.
( the new code only affects and runs on systems where the MTRR
workaround triggers. )
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
we have seen a little problem in rebooting Dell Optiplex 745 with the
0KW626 board. Here is a small patch enabling reboot with this board,
which forces the default reboot path it into the BIOS reboot mode.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The fault_msg text is not explictly nul terminated now in startup
assembly. Do so by converting .ascii to .asciz.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
aperture_64.c takes a piece of memory and makes it into iommu
window... but such window may not be saved by swsusp -- that leads to
oops during hibernation.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
this patch allows hpet=force on nVidia nForce 430 southbridge.
This patch was tested by me on my old Asus A8N-VM CSM (where bios does not
support hpet and does not advertise it via acpi entry). My nForce430 version:
lspci -nn | grep LPC
00:0a.0 ISA bridge [0601]: nVidia Corporation MCP51 LPC Bridge [10de:0260]
(rev a2)
Kernel 2.6.24.3 after patching and using hpet=force reports this:
dmesg | grep -i hpet
Kernel command line: root=/dev/sda8 ro vga=773 video=vesafb:mtrr:4,ywrap
vt.default_utf8=0 hpet=force
Force enabled HPET at base address 0xfed00000
hpet clockevent registered
Time: hpet clocksource has been installed.
grep -i hpet /proc/timer_list
Clock Event Device: hpet
set_next_event: hpet_legacy_next_event
set_mode: hpet_legacy_set_mode
grep Clock /proc/timer_list (before patching)
Clock Event Device: pit
Clock Event Device: lapic
grep Clock /proc/timer_list (after patching)
Clock Event Device: hpet
Clock Event Device: lapic
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
a system with 256 GB of RAM, when NUMA is disabled crashes the
following way:
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Cannot allocate aperture memory hole (ffff8101c0000000,65536K)
Kernel panic - not syncing: Not enough memory for aperture
Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33
Call Trace:
[<ffffffff84037c62>] panic+0xb2/0x190
[<ffffffff840381fc>] ? release_console_sem+0x7c/0x250
[<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90
[<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50
[<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680
[<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310
[<ffffffff84506a2f>] ? _etext+0x0/0x1
[<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40
[<ffffffff847ac795>] mem_init+0x45/0x1a0
[<ffffffff8479ff35>] start_kernel+0x295/0x380
[<ffffffff8479f1c2>] _sinittext+0x1c2/0x230
the root cause is : memmap PMD is too big,
[ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0
almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G.
solution will be:
1. make memmap allocation get memory above 4G...
2. reserve some dma32 range early before we try to set up memmap for all.
and release that before pci_iommu_alloc, so gart or swiotlb could get some
range under 4g limit for sure.
the patch is using method 2.
because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP
will get
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init)
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We recently got some of the "Desktop Form Factor" Optiplex 745's in. I
noticed that there's an entry for the SFF one's, but the BIOS model number
of the DFF differs from that of the SFF. We have been reliably
experiencing the same (as far as I can tell) reboot bug as the SFF boxes.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
when numa disabled I got this compile warning:
arch/x86/kernel/setup64.c: In function setup_per_cpu_areas:
arch/x86/kernel/setup64.c:147: warning: the address of
contig_page_data will always evaluate as true
it seems we missed checking if the node is online before we try to refer
NODE_DATA. Fix it.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
memory-less node support:
this patch uses updated dev_to_node, because dev_to_node already makes sure
it returns an online node.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The code to restart syscalls after signals depends on checking for a
negative orig_ax, and for particular negative -ERESTART* values in ax.
These fields are 64 bits and for a 32-bit task they get zero-extended.
The syscall restart behavior is lost, a regression from a native 32-bit
kernel and from 64-bit tasks' behavior.
This patch fixes the problem by doing sign-extension where it matters.
For orig_ax, the only time the value should be -1 but winds up as
0x0ffffffff is via a 32-bit ptrace call. So the patch changes ptrace to
sign-extend the 32-bit orig_eax value when it's stored; it doesn't
change the checks on orig_ax, though it uses the new current_syscall()
inline to better document the subtle importance of the used of
signedness there.
The ax value is stored a lot of ways and it seems hard to get them all
sign-extended at their origins. So for that, we use the
current_syscall_ret() to sign-extend it only for 32-bit tasks at the
time of the -ERESTART* comparisons.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes 64-bit ptrace calls setting the 64-bit orig_ax field for a
32-bit task sign-extend the low 32 bits up to 64. This matches what a
64-bit debugger expects when tracing a 32-bit task.
This follows on my "x86_64 ia32 syscall restart fix". This didn't
matter until that was fixed.
The debugger ignores or zeros the high half of every register slot it
sets (including the orig_rax pseudo-register) uniformly. It expects
that the setting of the low 32 bits always has the same meaning as a
32-bit debugger setting those same 32 bits with native 32-bit
facilities.
This never arose before because the syscall restart check never
matched any -ERESTART* values due to lack of sign extension. Before
that fix, even 32-bit ptrace setting orig_eax to -1 failed to trigger
the restart check anyway. So this was never noticed as a regression
of 64-bit debuggers vs 32-bit debuggers on the same 64-bit kernel.
Signed-off-by: Roland McGrath <roland@redhat.com>
[ Changed to just do the sign-extension unconditionally on x86-64,
since orig_ax is always just a small integer and doesn't need
the full 64-bit range ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich noticed that the reboot fixups went missing during
reboot.c unification.
(commit 4d022e35fd)
Geode and a few other rare boards with special reboot quirks are
affected.
Reported-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
convert_fxsr_to_user() in 2.6.24's i387_32.c did this, and
convert_to_fxsr() also does the inverse, so I assume it's an oversight
that it is no longer being done.
[ mingo@elte.hu:
we encode it this way because there's no space for the 'FPU Last
Instruction Opcode' (->fop) field in the legacy user_i387_ia32_struct
that PTRACE_GETFPREGS/PTRACE_SETFPREGS uses.
it's probably pure legacy - i'd be surprised if any user-space relied on
the FPU Last Opcode in any way. But indeed we used to do it previously
so the most conservative thing is to preserve that piece of information.
]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The Linux kernel currently does not clear the direction flag before
calling a signal handler, whereas the x86/x86-64 ABI requires that.
Linux had this behavior/bug forever, but this becomes a real problem
with gcc version 4.3, which assumes that the direction flag is
correctly cleared at the entry of a function.
This patches changes the setup_frame() functions to clear the
direction before entering the signal handler.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
We don't need to printk a message every time we transition.
Leave the code there, but ifdef'd out, as it's useful when
adding support for new processors.
Reported-by: Petr Titěra <P.Titera@century.cz>
Signed-off-by: Dave Jones <davej@redhat.com>
This bug got introduced by the recent i387 merge:
commit 4421011120
Author: Roland McGrath <roland@redhat.com>
Date: Wed Jan 30 13:31:50 2008 +0100
x86: x86 i387 user_regset
Current usage of unlazy_fpu() in ptrace specific routines is wrong.
unlazy_fpu() will not init fpu if the task never used math. So the
ptrace calls can expose the parent tasks FPU data in some cases.
Replace it with the init_fpu() which will init the math state, if the
task never used math before.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
revert the BTS ptrace extension for now.
based on general objections from Roland McGrath:
http://lkml.org/lkml/2008/2/21/323
we'll let the BTS functionality cook some more and re-enable
it in v2.6.26. We'll leave the dead code around to help the
development of this code.
(X86_BTS is not defined at the moment)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
delay the removal of this symbol export by one more kernel release,
giving external modules such as VirtualBox a chance to stop using it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a recent fix:
commit ce28b9864b
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Feb 20 23:57:30 2008 +0100
x86: fix vsyscall wreckage
removed the broken /kernel/vsyscall64 handler completely.
This triggers the following debug check:
sysctl table check failed: /kernel/vsyscall64 No proc_handler
Restore the sane part of the proc handler.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix a kernel bug (vmware boot problem) reported by Tomasz Grobelny,
which occurs with certain .config variants and gccs.
The x86 TLS cleanup in commit efd1ca52d0
made the sys_set_thread_area and sys_get_thread_area functions ripe for
tail call optimization. If the compiler chooses to use it for them, it
can clobber the user trap frame because these are asmlinkage functions.
Reported-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Diffing dmesg between git7 and git8 doesn't sched any light since
> git8 also removed the printouts of the x86 caps as they were being
> initialised and updated. I'm currently adding those printouts back
> in the hope of seeing where and when the caps get broken.
That turned out to be very illuminating:
--- dmesg-2.6.24-git7 2008-02-24 18:01:25.295851000 +0100
+++ dmesg-2.6.24-git8 2008-02-24 18:01:25.530358000 +0100
...
CPU: After generic identify, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000
CPU: After all inits, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+CPU: After applying cleared_cpu_caps, caps: 00000013 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Notice how the TSC cap bit goes from Off to On.
(The first two lines are printout loops from -git7 forward-ported
to -git8, the third line is the same printout loop added just after
the xor-with-cleared_cpu_caps[] loop.)
Here's how the breakage occurs:
1. arch/x86/kernel/tsc_32.c:tsc_init() sees !cpu_has_tsc,
so bails and calls setup_clear_cpu_cap(X86_FEATURE_TSC).
2. include/asm-x86/cpufeature.h:setup_clear_cpu_cap(bit) clears
the bit in boot_cpu_data and sets it in cleared_cpu_caps
3. arch/x86/kernel/cpu/common.c:identify_cpu() XORs all caps
in with cleared_cpu_caps
HOWEVER, at this point c->x86_capability correctly has TSC
Off, cleared_cpu_caps has TSC On, so the XOR incorrectly
sets TSC to On in c->x86_capability, with disastrous results.
The real bug is that clearing bits with XOR only works if the
bits are known to be 1 prior to the XOR, and that's not true here.
A simple fix is to convert the XOR to AND-NOT instead. The following
patch does that, and allows my 486 to boot 2.6.25-rc kernels again.
[ mingo@elte.hu: fixed a similar bug in setup_64.c as well. ]
The breakage was introduced via commit 7d851c8d3d.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, c_idle is declared in the stack, and thus, have no static address.
Peter Zijlstra points out this simple solution, in which c_idle.work
is initializated separatedly. Note that the INIT_WORK macro has a static
declaration of a key inside.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Acked-by: Peter Zijlstra <pzijlstr@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, there is no way for print_stack_trace() to determine whether
a given stack trace entry was deemed reliable or not, simply because
save_stack_trace() does not record this information. (Perhaps needless
to say, this makes the saved stack traces A LOT harder to read, and
probably with no other benefits, since debugging features that use
save_stack_trace() most likely also require frame pointers, etc.)
This patch reverts to the old behaviour of only recording the reliable trace
entries for saved stack traces.
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There doesn't seem to be any reason for swapper_pg_pmd being global.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Inside a KVM virtual machine the MTRRs are usually blank. This confuses Linux
and causes a warning message at boot. This patch removes that warning message
when running Linux as a KVM guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pointed out by pageexec@freemail.hu:
> what happens here is that gcc treats the argument area as owned by the
> callee, not the caller and is allowed to do certain tricks. for ssp it
> will make a copy of the struct passed by value into the local variable
> area and pass *its* address down, and it won't copy it back into the
> original instance stored in the argument area.
>
> so once sys_execve returns, the pt_regs passed by value hasn't at all
> changed and its default content will cause a nice double fault (FWIW,
> this part took me the longest to debug, being down with cold didn't
> help it either ;).
To fix this we pass in pt_regs by pointer.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
based on a report from Arne Georg Gleditsch about user-space apps
misbehaving after toggling /proc/sys/kernel/vsyscall64, a review
of the code revealed that the "NOP patching" done there is
fundamentally unsafe for a number of reasons:
1) the patching code runs without synchronizing other CPUs
2) it inserts NOPs even if there is no clock source which provides vread
3) when the clock source changes to one without vread we run in
exactly the same problem as in #2
4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then
the syscall is not patched out
as a result it is possible to break user-space via this patching.
The only safe thing for now is to remove the patching.
This code was broken since v2.6.21.
Reported-by: Arne Georg Gleditsch <arne.gleditsch@dolphinics.no>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The KERNEL_TEXT_SIZE constant was mis-named, as we not only map the kernel
text but data, bss and init sections as well.
That name led me on the wrong path with the KERNEL_TEXT_SIZE regression,
because i knew how big of _text_ my images have and i knew about the 40 MB
"text" limit so i wrongly thought to be on the safe side of the 40 MB limit
with my 29 MB of text, while the total image size was slightly above 40 MB.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
recently the 64-bit allyesconfig bzImage kernel started spontaneously
rebooting during early bootup.
after a few fun hours spent with early init debugging, it turns out
that we've got this rather annoying limit on the size of the kernel
image:
#define KERNEL_TEXT_SIZE (40*1024*1024)
which limit my vmlinux just happened to pass:
text data bss dec hex filename
29703744 4222751 8646224 42572719 2899baf vmlinux
40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/
So it happily crashed right in head_64.S, which - as we all know - is
the most debuggable code in the whole architecture ;-)
So increase the limit to allow an up to 128MB kernel image to be mapped.
(should anyone be that crazy or lazy)
We have a full 4K of pagetable (level2_kernel_pgt) allocated for these
mappings already, so there's no RAM overhead and the limit was rather
pointless and arbitrary.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
notsc is ignored in 32-bit kernels if CONFIG_X86_TSC is on.. which is
bad, fix it.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix mtrr kernel-doc warning:
Warning(linux-2.6.24-git12//arch/x86/kernel/cpu/mtrr/main.c:677): No description found for parameter 'end_pfn'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have been promoting Transmeta TM3x00/TM5x00 chips to i686-class
based on the notion that they contain all the user-space visible
features of an i686-class chip. However, this is not actually true:
they lack the EA-taking long NOPs (0F 1F /0). Since this is a
userspace-visible incompatibility, downgrade these CPUs to the
manufacturer-defined i586 level.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Simple typo fix for regression introduced by the user_regset changes.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove redundant irq_desc[NR_IRQS] element initialization in
init_ISA_irqs(). irq_desc[NR_IRQS] is already statically
initialized with the same values in kernel/irq/handle.c .
besides the clean-up value this also saves some space:
text data bss dec hex filename
1389 356 14 1759 6df i8259_32.o.before
1325 356 14 1695 69f i8259_32.o.after
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Though we use PDA for regular task stack but that
is not acceptable for init_task wich is special
one. We still have to allocate init_task's stack
in that manner.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's much better to use PAGE_SIZE then magic 4096
(though it's almost synonym in most cases on x86 but
not for *all* cases ;)
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
initial_code are initially used to hold a function pointer
from __init and later from __cpuinit. This confuses modpost
and changing initial_code to REFDATA silence the warning.
(But now we do not discard the variable anymore).
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch_register_cpu() is only defined for HOTPLUG_CPU code
so simple fix is to ignore references by annotating the
function __ref.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
srat_detect_node() is only used by __cpuinit init_intel().
So the trivial fix is to annotate srat_detect_node() with __cpuinit.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
nearby_node() were only used by __cpuinit amd_detect_cmp()
So annotating nearby_node() __cpuinit was the trivial fix.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/nmi_64.c:50: warning: 'unknown_nmi_panic_callback' declared 'static' but never defined
This patch also fixes nmi_32.c
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Yes, it should.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/efi_32.c:42:6: warning: symbol 'efi_call_phys_prelog' was not declared. Should it be static?
arch/x86/kernel/efi_32.c:84:6: warning: symbol 'efi_call_phys_epilog' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/kprobes.c:584:16: warning: symbol 'kretprobe_trampoline_holder' was not declared. Should it be static?
arch/x86/kernel/kprobes.c:676:6: warning: symbol 'trampoline_handler' was not declared. Should it be static?
Make them static and add the __used attribute, approach taken from the
arm kprobes implementation.
kretprobe_trampoline_holder uses inline assemly to define the global
symbol kretprobe_trampoline, but nothing ever calls the holder explicitly.
trampoline handler is only called from inline assembly in the same file,
mark it used and static.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch removes the mca-pentium boot option that was a noop.
besides the source code cleanup factor, this saves some text as well:
arch/x86/kernel/cpu/bugs.o:
text data bss dec hex filename
651 77 4 732 2dc bugs.o.before
631 53 4 688 2b0 bugs.o.after
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
drivers/lguest/x86/switcher_32.S:(.text+0x3815f8):
undefined reference to `LGUEST_PAGES_regs_trapnum'
This problem was caused by asm-offsets.c only having the offsets when
lguest *guest* support was set, not lguest host (host support used to
imply guest support, so now they're separate these bugs come out).
Lguest guest support and host support are separate config options:
they used to be tied together. Sort out which parts of asm-offsets are
needed for Guest and Host.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting
from __START_KERNEL_map. The kernel itself only needs _text to _end
mapped in the high alias. On relocatible kernels the ASM setup code
adjusts the compile time created high mappings to the relocation. This
creates invalid pmd entries for negative offsets:
0xffffffff80000000 -> pmd entry: ffffffffff2001e3
It points outside of the physical address space and is marked present.
This starts at the virtual address __START_KERNEL_map and goes up to
the point where the first valid physical address (0x0) is mapped.
Zap the mappings before _text and after _end right away in early
boot. This removes also the invalid entries.
Furthermore it simplifies the range check for high aliases.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: DMI: quirk for FSC ESPRIMO Mobile V5505
ACPI: DMI blacklist updates
pnpacpi: __initdata is not an identifier
ACPI: static acpi_chain_head
ACPI: static acpi_find_dsdt_initrd()
ACPI: static acpi_no_initrd_override_setup()
thinkpad_acpi: static
ACPI suspend: Execute _WAK with the right argument
cpuidle: Add Documentation
ACPI, cpuidle: Clarify C-state description in sysfs
ACPI: fix suspend regression due to idle update
extern should not appear in C files. Also, the definitions
do not match the prototype currently, not sure what way you
want to go with this, I've switched the prototype to return
int, but I can see going to the void return as well.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When the GART table is unmapped from the kernel direct mappings
during early bootup, make sure we have no leftover cachelines in it.
Note: the clflush done by set_memory_np() was not enough, because
clflush does not work on unmapped pages.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The EFI-runtime mapping code changed a larger memory area than it
should have, due to a pages/bytes parameter mixup.
noticed by Andi Kleen.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jiri Kosina reported the following deadlock scenario with
show_unhandled_signals enabled:
[ 68.379022] gnome-settings-[2941] trap int3 ip:3d2c840f34
sp:7fff36f5d100 error:0<3>BUG: sleeping function called from invalid
context at kernel/rwsem.c:21
[ 68.379039] in_atomic():1, irqs_disabled():0
[ 68.379044] no locks held by gnome-settings-/2941.
[ 68.379050] Pid: 2941, comm: gnome-settings- Not tainted 2.6.25-rc1 #30
[ 68.379054]
[ 68.379056] Call Trace:
[ 68.379061] <#DB> [<ffffffff81064883>] ? __debug_show_held_locks+0x13/0x30
[ 68.379109] [<ffffffff81036765>] __might_sleep+0xe5/0x110
[ 68.379123] [<ffffffff812f2240>] down_read+0x20/0x70
[ 68.379137] [<ffffffff8109cdca>] print_vma_addr+0x3a/0x110
[ 68.379152] [<ffffffff8100f435>] do_trap+0xf5/0x170
[ 68.379168] [<ffffffff8100f52b>] do_int3+0x7b/0xe0
[ 68.379180] [<ffffffff812f4a6f>] int3+0x9f/0xd0
[ 68.379203] <<EOE>>
[ 68.379229] in libglib-2.0.so.0.1505.0[3d2c800000+dc000]
and tracked it down to:
commit 03252919b7
Author: Andi Kleen <ak@suse.de>
Date: Wed Jan 30 13:33:18 2008 +0100
x86: print which shared library/executable faulted in segfault etc. messages
the problem is that we call down_read() from an atomic context.
Solve this by returning from print_vma_addr() if the preempt count is
elevated. Update preempt_conditional_sti / preempt_conditional_cli to
unconditionally lift the preempt count even on !CONFIG_PREEMPT.
Reported-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a new sysfs entry under cpuidle states. desc - can be used by driver to
communicate to userspace any specific information about the state.
This helps in identifying the exact hardware C-states behind the ACPI C-state
definition.
Idea is to export this through powertop, which will help to map the C-state
reported by powertop to actual hardware C-state.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
arch/x86/kernel/i8253.c:98:27: warning: symbol 'pit_clockevent' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch enhances EFI runtime code memory mapping as following:
- Move __supported_pte_mask & _PAGE_NX checking before invoking
runtime_code_page_mkexec(). This makes it possible for compiler to
eliminate runtime_code_page_mkexec() on machine without NX support.
- Use set_memory_x/nx in early_mapping_set_exec(). This eliminates the
duplicated implementation.
This patch has been tested on Intel x86_64 platform with EFI64/32
firmware.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andi Kleen pointed out that the cache attribute logic is reverse in
efi_enter_virtual_mode(). This problem alone is harmless as we do not
(yet) do cache attribute conflict resolution. (This bug was not present
in the original EFI submission - I introduced it while fixing up rejects.)
While reviewing this code I noticed a second, worse problem: the use of
uninitialized md->virt_addr.
Fix both problems.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When reboot_32.c and reboot_64.c were unified (commit 4d022e35fd...),
the machine_ops code was broken, leading to xen pvops kernels failing
to properly halt/poweroff/reboot etc. This fixes that up.
Signed-off-by: Jody Belka <knew-linux@pimb.org>
Cc: Miguel Boton <mboton@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since we may not have a pci_dev for the device we need to access, we can't
use pci_read_config_word. But raw_pci_read is an internal implementation
detail; it's better to use the architected pci_bus_read_config_word
interface. Using PCI_DEVFN instead of a mysterious constant helps
reassure everyone that we really do intend to access device 8.
[ Thanks to Grant Grundler for pointing out to me that this is exactly
what the write immediately above this is doing -- enabling device 8 to
respond to config space cycles.
- Matthew
Grant also says:
"Can you also add a comment which points at the Intel
documentation?
The 'Intel E7320 Memory Controller Hub (MCH) Datasheet' at
http://download.intel.com/design/chipsets/datashts/30300702.pdf
Page 69 documents register F4h (DEVPRES1).
And I just doubled checked that the 0xf4 register value is
restored later in the quirk (obvious when you look at the code
but not from the patch"
so here it is.
- Linus ]
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want to allow different implementations of pci_raw_ops for standard
and extended config space on x86. Rather than clutter generic code with
knowledge of this, we make pci_raw_ops private to x86 and use it to
implement the new raw interface -- raw_pci_read() and raw_pci_write().
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move arch/x86/kernel/suspend_64.c to arch/x86/power .
Move arch/x86/kernel/suspend_asm_64.S to arch/x86/power
as hibernate_asm_64.S .
Update purpose and copyright information in
arch/x86/power/suspend_64.c and
arch/x86/power/hibernate_asm_64.S .
Update the Makefiles in arch/x86, arch/x86/kernel and
arch/x86/power to reflect the above changes.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled
kernel are in PAE format.
early_ioremap is updated to use the standard page table accessors.
Clear any mappings beyond max_low_pfn from the boot page tables in
native_pagetable_setup_start because the initial mappings can extend
beyond the range of physical memory and into the vmalloc area.
Derived from patches by Eric Biederman and H. Peter Anvin.
[ jeremy@goop.org: PAE swapper_pg_dir needs to be page-sized fix ]
Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use a common irq_return entry point for all the iret places, which
need the paravirt INTERRUPT return wrapper.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Each AMD Geode MFGPT timer interrupt output is paired with another
timer; esentially the interrupt goes if either timer fires. This
is okay, but the handlers need to be aware of this. Make sure in
the timer tick handler that our timer really did expire.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We *really* don't want to be reading MFGPTx_SETUP and writing back those
values. What we want to be doing is clearing CMP1 and CMP2 unconditionally;
otherwise, we have races where CMP1 and/or CMP2 fire after we've read
MFGPTx_SETUP. They can also fire between when we've written ~CNTEN to
the register, and when the new register values get copied to the timer's
version of the register. By clearing both fields, we're okay.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There isn't much value to always detecting the MFGPT timers on
Geode platforms; detection is only needed when something wants
to use the timers. Move the detection code so that it gets
called the first time a timer is allocated.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We need to be called from elsewhere, and this gets some #ifdefs out
of the .c file.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Drop F_AVAIL and the 'flags' field, replacing with an 'avail' bit. This
looks more understandable to me.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We had planned to use the 'owner' field for allowing re-allocation of
MFGPTs; however, doing it by module owner name isn't flexible enough. So,
drop this for now. If it turns out that we need timers in modules, we'll
need to come up with a scheme that matches the write-once fields of the
MFGPTx_SETUP register, and drops ponies from the sky.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The GEODE MFGPT code assumed that 32kHz was 32000 Hz while the boards
run on a 32.768 kHz digital watch crystal. In practise, it will not
change the timer's frequency as the skew was only 2.4%, but it
should provide more accurate intervals.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- uninline timer functions; the compiler knows better than we do
whether or not to inline these.
- mfgpt_start_timer() had an unused 'clock' argument, drop it.
From both Jordan and myself.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set.
Not all architectures support the A.OUT binfmt, so the ELF binfmt should not
be permitted to go looking for A.OUT libraries to load in such a case. Not
only that, but under such conditions A.OUT core dumps are not produced either.
To make this work, this patch also does the following:
(1) Makes the existence of the contents of linux/a.out.h contingent on
CONFIG_ARCH_SUPPORTS_AOUT.
(2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT
core dumping code.
(3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This
is then included only where needed. This means that this bit of arch
code will be stored in the appropriate A.OUT binfmt module rather than
the core kernel.
(4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not
needed) and FRV.
This patch depends on the previous patch to move STACK_TOP[_MAX] out of
asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT
format is available.
[jdike@addtoit.com: uml: re-remove accidentally restored code]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix typo in comments.
BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
checkpatch.pl will be complaining.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Add missing printk levels to e_powersaver
[CPUFREQ] Fix sparse warning in powernow-k8
[CPUFREQ] Support Model D parts and newer in e_powersaver
[CPUFREQ] Powernow-k8: Update to support the latest Turion processors
[CPUFREQ] fix configuration help message
[CPUFREQ] powernow-k8 print pstate instead of fid/did for family 10h
[CPUFREQ] Eliminate cpufreq_userspace scaling_setspeed deadlock
[CPUFREQ] gx-suspmod.c: use boot_cpu_data instead of current_cpu_data
[CPUFREQ] fix incorrect comment on show_available_freqs() in freq_table.c
[CPUFREQ] drivers/cpufreq: Add missing "space"
[CPUFREQ] arch/x86: Add missing "space"
[CPUFREQ] Remove pointless Kconfig dependancy
This patch fixes the configuration dependencies in the vmcoreinfo data.
i386's "node_data" is defined in arch/x86/mm/discontig_32.c,
and x86_64's one is defined in arch/x86/mm/numa_64.c.
They depend on CONFIG_NUMA:
arch/x86/mm/Makefile_32:7
obj-$(CONFIG_NUMA) += discontig_32.o
arch/x86/mm/Makefile_64:7
obj-$(CONFIG_NUMA) += numa_64.o
ia64's "pgdat_list" is defined in arch/ia64/mm/discontig.c,
and it depends on CONFIG_DISCONTIGMEM and CONFIG_SPARSEMEM:
arch/ia64/mm/Makefile:9-10
obj-$(CONFIG_DISCONTIGMEM) += discontig.o
obj-$(CONFIG_SPARSEMEM) += discontig.o
ia64's "node_memblk" is defined in arch/ia64/mm/numa.c,
and it depends on CONFIG_NUMA:
arch/ia64/mm/Makefile:8
obj-$(CONFIG_NUMA) += numa.o
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the BOOTMEM_EXCLUSIVE, introduced in the previous patch, to avoid
conflicts while reserving the memory for the kdump capture kernel
(crashkernel=).
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.
This patch:
Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past. This is to avoid conflicts.
Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:1238:9: warning: symbol '__ptr' shadows an earlier one
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:1238:9: originally declared here
Signed-off-by: Dave Jones <davej@redhat.com>
Patch by VIA that updates e_powersaver.c to work with our model D parts
and newer.
From: Jesse Ahrens <jahrens@centtech.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The latest series of Turion X2 processors have a new XFAM
model. Add support for them to powernow-k8.h.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
In preemptible kernel will report BUG: using smp_processor_id() in
preemptible, so use boot_cpu_data instead of current_cpu_data.
discussion in :
http://lkml.org/lkml/2007/7/25/32
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
CC: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Pavel Emelyanov reported that his networking card did not work
and bisected it down to:
"
The commit
093af8d7f0
x86_32: trim memory by updating e820
broke my e1000 card: on loading driver says that
e1000: probe of 0000:04:03.0 failed with error -5
and the interface doesn't appear.
"
on a 32-bit kernel, base will overflow when try to do PAGE_SHIFT,
and highest_addr will always less 4G.
So use pfn instead of address to avoid the overflow when more than
4g RAM is installed on a 32-bit kernel.
Many thanks to Pavel Emelyanov for reporting and testing it.
Bisected-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Tested-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The .rodata section shouldn't just be read-only,
but also non-executable. This is free since we've broken
up the 2MB page already anyway.
also update test_nx to check for this.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This change broke recovery of exceptions in iret:
commit 72fe485854
Author: Glauber de Oliveira Costa <gcosta@redhat.com>
x86: replace privileged instructions with paravirt macros
The ENTRY(native_iret) macro adds alignment padding before the iretq
instruction, so "iret_label" no longer points exactly at the instruction.
It was sloppy to leave the old "iret_label" label behind when replacing
its nearby use. Removing it would have revealed the other use of the
label later in the file, and upon noticing that use, anyone exercising
the minimum of attention to detail expected of anyone touching this
subtle code would realize it needed to change as well.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:830:7: warning: symbol 'hi' shadows an earlier one
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:824:6: originally declared here
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:830:15: warning: symbol 'lo' shadows an earlier one
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:824:14: originally declared here
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This was being used to ensure the proper alignment of the FXSAVE/FXRSTOR data.
This would create a sparse error in the _correct_ cases, hiding further
warnings. Use BUILD_BUG_ON instead.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In my revamp of the x86 ptrace code for setting register values,
I accidentally omitted a check that was there in the old code.
Allowing %cs to be 0 causes a bad crash in recovery from iret failure.
This patch fixes that regression against 2.6.24, and adds a comment
that should help prevent this subtlety from being overlooked again.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unify the x86-64 behavior for 32-bit processes that set
bogus %cs/%ss values (the only ones that can fault in iret)
match what the native i386 behavior is. (do not kill the task
via do_exit but generate a SIGSEGV signal)
[ tglx@linutronix.de: build fix ]
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
calibrate_delay() must be __cpuinit, not __{dev,}init.
I've verified that this is correct for all users.
While doing the latter, I also did the following cleanups:
- remove pointless additional prototypes in C files
- ensure all users #include <linux/delay.h>
This fixes the following section mismatches with CONFIG_HOTPLUG=n,
CONFIG_HOTPLUG_CPU=y:
WARNING: vmlinux.o(.text+0x1128d): Section mismatch: reference to .init.text.1:calibrate_delay (between 'check_cx686_slop' and 'set_cx86_reorder')
WARNING: vmlinux.o(.text+0x25102): Section mismatch: reference to .init.text.1:calibrate_delay (between 'smp_callin' and 'cpu_coregroup_map')
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christian Zankel <chris@zankel.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wires up the new timerfd API to the x86 family.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pci-gart needs to unmap the IOMMU aperture to prevent cache corruptions.
Switch this over to using set_memory_np() instead of clear_kernel_mapping().
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
mtrr.h was included everywhere needed. Fixes the following sparse
warnings. Also, the return types in the extern definitions were
incorrect.
arch/x86/kernel/cpu/mtrr/amd.c:113:12: warning: symbol 'amd_init_mtrr' was not declared. Should it be static?
arch/x86/kernel/cpu/mtrr/cyrix.c:268:12: warning: symbol 'cyrix_init_mtrr' was not declared. Should it be static?
arch/x86/kernel/cpu/mtrr/centaur.c:218:12: warning: symbol 'centaur_init_mtrr' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cpu.h was already included everywhere needed.
Fixes following sparse warnings:
arch/x86/kernel/cpu/amd.c:343:12: warning: symbol 'amd_init_cpu' was not declared. Should it be static?
arch/x86/kernel/cpu/cyrix.c:444:12: warning: symbol 'cyrix_init_cpu' was not declared. Should it be static?
arch/x86/kernel/cpu/cyrix.c:456:12: warning: symbol 'nsc_init_cpu' was not declared. Should it be static?
arch/x86/kernel/cpu/centaur.c:467:12: warning: symbol 'centaur_init_cpu' was not declared. Should it be static?
arch/x86/kernel/cpu/transmeta.c:112:12: warning: symbol 'transmeta_init_cpu' was not declared. Should it be static?
arch/x86/kernel/cpu/intel.c:296:12: warning: symbol 'intel_cpu_init' was not declared. Should it be static?
arch/x86/kernel/cpu/nexgen.c:56:12: warning: symbol 'nexgen_init_cpu' was not declared. Should it be static?
arch/x86/kernel/cpu/umc.c:22:12: warning: symbol 'umc_init_cpu' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/process_32.c:254:43: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fixes sparse warning:
arch/x86/kernel/cpu/intel.c:48:15: warning: symbol 'ppro_with_ram_bug' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch eliminates numbers in LDT allocation code
trying to make it clear to understand from where
these numbers come.
No code changed:
text data bss dec hex filename
1896 0 0 1896 768 ldt.o.before
1896 0 0 1896 768 ldt.o.after
md5:
6cbec8705008ddb4b704aade60bceda3 ldt.o.before.asm
6cbec8705008ddb4b704aade60bceda3 ldt.o.after.asm
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Both trampolines actually *do* set up stack. (Is the "we jump into
compressed/head.S" comment still true?)
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cyrix_arr_init was #if 0 all the way back to at least v2.6.12.
This was the only place where arr3_protected was set to anything
but zero. Eliminate this variable.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move the CPU feature string names to a separate file (common to 32
and 64 bits); additionally, make <asm/cpufeature.h> includable by host
code in preparation for including the CPU feature strings in the boot
code.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Instead of grabbing the BKL on seek, use the inode mutex in the style
of generic_file_llseek().
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
After /dev/*/cpuid was introduced, Intel changed the semantics of the
CPUID instruction to be sentitive to %ecx as well as %eax. This patch
allows querying of %ecx-sensitive levels by placing the %ecx value in
the upper 32 bits of the file position (lower 32 bits always were used
for the %eax value.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the _ASM_EXTABLE macro from <asm/asm.h>, instead of open-coding
__ex_table entires in arch/x86/kernel/test_nx.c.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The module scx200 were renamed to scx200_32 by the
merge of the 32 and 64 bit x86 arch trees.
Keep the _32 prefix on the .c file as it is 32 bit
specific and fix the module name in the Makefile.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The apm module were renamed to apm_32 during the merge of 32 and 64 bit
x86 which is unfortunate. As apm is 32 bit specific we like to keep the
_32 in the filename but the module should be named apm.
Fix this in the Makefile.
Reported-by: "A.E.Lawrence" <lawrence_a_e@ntlworld.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "A.E.Lawrence" <lawrence_a_e@ntlworld.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jeff Chua bisected down a vmware guest boot breakage (hang) to
this paravirt change:
commit 8d947344c4
Author: Glauber de Oliveira Costa <gcosta@redhat.com>
Date: Wed Jan 30 13:31:12 2008 +0100
x86: change write_idt_entry signature
fix the off-by-one indexing bug ...
Bisected-by: Jeff Chua <jeff.chua.linux@gmail.com>
Tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The ACPI_PDC_SMP_T_SWCOORD bit is set by and OS that is capable of
native ACPI throttling software coordination for mutli-processors
using the _TSD information.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'suspend' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (38 commits)
suspend: cleanup reference to swsusp_pg_dir[]
PM: Remove obsolete /sys/devices/.../power/state docs
Hibernation: Invoke suspend notifications after console switch
Suspend: Invoke suspend notifications after console switch
Suspend: Clean up suspend_64.c
Suspend: Add config option to disable the freezer if architecture wants that
ACPI: Print message before calling _PTS
ACPI hibernation: Call _PTS before suspending devices
Hibernation: Introduce begin() and end() callbacks
ACPI suspend: Call _PTS before suspending devices
ACPI: Separate disabling of GPEs from _PTS
ACPI: Separate invocations of _GTS and _BFS from _PTS and _WAK
Suspend: Introduce begin() and end() callbacks
suspend: fix ia64 allmodconfig build
ACPI: clear GPE earily in resume to avoid warning
Suspend: Clean up Kconfig (V2)
Hibernation: Clean up Kconfig (V2)
Hibernation: Update messages
Suspend: Use common prefix in messages
Hibernation: Remove unnecessary variable declaration
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6: (64 commits)
PCI: make pci_bus a struct device
PCI: fix codingstyle issues in include/linux/pci.h
PCI: fix codingstyle issues in drivers/pci/pci.h
PCI: PCIE ASPM support
PCI: Fix fakephp deadlock
PCI: modify SB700 SATA MSI quirk
PCI: Run ACPI _OSC method on root bridges only
PCI ACPI: AER driver should only register PCIe devices with _OSC
PCI ACPI: Added a function to register _OSC with only PCIe devices.
PCI: constify function pointer tables
PCI: Convert drivers/pci/proc.c to use unlocked_ioctl
pciehp: block new requests from the device before power off
pciehp: workaround against Bad DLLP during power off
pciehp: wait for 1000ms before LED operation after power off
PCI: Remove pci_enable_device_bars() from documentation
PCI: Remove pci_enable_device_bars()
PCI: Remove users of pci_enable_device_bars()
PCI: Add pci_enable_device_{io,mem} intefaces
PCI: avoid save the same type of cap multiple times
PCI: correctly initialize a structure for pcie_save_pcix_state()
...
There's a freakishly long comment in suspend_64.c, shorten it.
Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
fix bootup crash in native_read_tsc() that was reported on an Athlon-XP
and bisected. The correct feature boundary for X86_FEATURE_MFENCE_RDTSC
is not XMM but XMM2.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid section mismatch involving arch_register_cpu.
Marking arch_register_cpu as __init and removing the export
for non-hotplug-cpu configurations makes the following warning
go away:
Section mismatch in reference from the function
arch_register_cpu() to the function .devinit.text:register_cpu()
The function arch_register_cpu() references
the function __devinit register_cpu().
This is often because arch_register_cpu lacks a __devinit
annotation or the annotation of register_cpu is wrong.
The only external user of arch_register_cpu in the tree is
in drivers/acpi/processor_core.c where it is guarded by
ACPI_HOTPLUG_CPU (which depends on HOTPLUG_CPU).
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The casts will always be needed, may as well make them the right
signedness. The ebx variables can easily be unsigned, may as well.
arch/x86/kernel/cpu/common.c:261:21: warning: incorrect type in argument 2 (different signedness)
arch/x86/kernel/cpu/common.c:261:21: expected unsigned int *eax
arch/x86/kernel/cpu/common.c:261:21: got int *<noident>
arch/x86/kernel/cpu/common.c:262:9: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/cpu/common.c:262:9: expected unsigned int *ebx
arch/x86/kernel/cpu/common.c:262:9: got int *<noident>
arch/x86/kernel/cpu/common.c:263:9: warning: incorrect type in argument 4 (different signedness)
arch/x86/kernel/cpu/common.c:263:9: expected unsigned int *ecx
arch/x86/kernel/cpu/common.c:263:9: got int *<noident>
arch/x86/kernel/cpu/common.c:264:9: warning: incorrect type in argument 5 (different signedness)
arch/x86/kernel/cpu/common.c:264:9: expected unsigned int *edx
arch/x86/kernel/cpu/common.c:264:9: got int *<noident>
arch/x86/kernel/cpu/common.c:293:30: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/cpu/common.c:293:30: expected unsigned int *ebx
arch/x86/kernel/cpu/common.c:293:30: got int *<noident>
arch/x86/kernel/cpu/common.c:350:22: warning: incorrect type in argument 2 (different signedness)
arch/x86/kernel/cpu/common.c:350:22: expected unsigned int *eax
arch/x86/kernel/cpu/common.c:350:22: got int *<noident>
arch/x86/kernel/cpu/common.c:351:10: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/cpu/common.c:351:10: expected unsigned int *ebx
arch/x86/kernel/cpu/common.c:351:10: got int *<noident>
arch/x86/kernel/cpu/common.c:352:10: warning: incorrect type in argument 4 (different signedness)
arch/x86/kernel/cpu/common.c:352:10: expected unsigned int *ecx
arch/x86/kernel/cpu/common.c:352:10: got int *<noident>
arch/x86/kernel/cpu/common.c:353:10: warning: incorrect type in argument 5 (different signedness)
arch/x86/kernel/cpu/common.c:353:10: expected unsigned int *edx
arch/x86/kernel/cpu/common.c:353:10: got int *<noident>
arch/x86/kernel/cpu/common.c:362:30: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/cpu/common.c:362:30: expected unsigned int *ebx
arch/x86/kernel/cpu/common.c:362:30: got int *<noident>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Not necessary to expose it, also fixes sparse warning.
arch/x86/kernel/early_printk.c:196:16: warning: symbol 'early_console' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x1eb41): Section mismatch in reference from the function calgary_handle_quirks() to the function .init.text:calgary_set_split_completion_timeout()
calgary_handle_quirks() are only called at
__init time (in calgary_init_one() via handle_quirks ops).
So annotate this function and the sister function __init.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix following warning:
WARNING: o-x86_64/arch/x86/kernel/built-in.o(.text+0x13d15): Section mismatch in reference from the function acpi_map_lsapic() to the function .cpuinit.text:mp_register_lapic()
The function acpi_map_lsapic() is exported and thus not annotated.
But the sole user is acpi/processor_core.c in a __cpuinit path.
So create a small wrapper and put back the annotation thus
avoiding the warning.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the following warnings:
WARNING: arch/x86/kernel/built-in.o(.exit.text+0xf8): Section mismatch in reference from the function msr_exit() to the variable .cpuinit.data:msr_class_cpu_notifier
WARNING: arch/x86/kernel/built-in.o(.exit.text+0x158): Section mismatch in reference from the function cpuid_exit() to the variable .cpuinit.data:cpuid_class_cpu_notifier
WARNING: arch/x86/kernel/built-in.o(.exit.text+0x171): Section mismatch in reference from the function microcode_exit() to the variable .cpuinit.data:mc_cpu_notifier
In all three cases there were a function annotated __exit
that referenced a variable annotated __cpuinitdata.
The fix was to replace the annotation of the notifier
with __refdata to tell modpost that the reference to
a _cpuinit function in the notifier are OK.
The unregister call that references the notifier
variable will simple delete the function pointer
so there is no problem ignoring the reference.
Note: This looks like another case where __cpuinit
has been used as replacement for proper use
of CONFIG_HOTPLUG_CPU to decide what code are used for
HOTPLUG_CPU.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Silence the following warning:
WARNING: o-x86_64/arch/x86/kernel/built-in.o(.text+0x17cd3): Section mismatch in reference from the function remove_cpu_from_maps() to the variable .cpuinit.data:cpu_initialized
remove_cpu:maps() had a single user: __cpu_disable() so
mark it static and annotate it with __ref to silence the
warning from modpost.
_cpu_disable() has a single user in kernel/cpu.c:
=> take_cpu_down()
which again has a single user in the following call:
=> __stop_machine_run(take_cpu_down, &tcd_param, cpu);
Here a kthread is created.
So maybe the warning is correct and the right fix is to
remove the __cpuinitdata annotation of cpu_initialized?
Note: The analysis were disturbed by the fact that we had a variable
with the same name in cpu/common.c - but this is 32 bit only]
Note: Should smpboot_64 use cpu_clear()?
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Because in i386 early boot stage, boot_cpu_data may be not available,
which makes clflush_cach_range() into infinite loop, which is called
by change_page_attr(). This patch fixes this by setting
boot_cpu_data.x86_clflush_size in early_cpu_detect().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/cpu/intel_cacheinfo.c:355:7: warning: symbol 'i' shadows an earlier one
arch/x86/kernel/cpu/intel_cacheinfo.c:296:39: originally declared here
arch/x86/kernel/cpu/intel_cacheinfo.c:367:18: warning: incorrect type in argument 2 (different signedness)
arch/x86/kernel/cpu/intel_cacheinfo.c:367:18: expected unsigned int *eax
arch/x86/kernel/cpu/intel_cacheinfo.c:367:18: got int *
arch/x86/kernel/cpu/intel_cacheinfo.c:367:28: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/cpu/intel_cacheinfo.c:367:28: expected unsigned int *ebx
arch/x86/kernel/cpu/intel_cacheinfo.c:367:28: got int *
arch/x86/kernel/cpu/intel_cacheinfo.c:367:38: warning: incorrect type in argument 4 (different signedness)
arch/x86/kernel/cpu/intel_cacheinfo.c:367:38: expected unsigned int *ecx
arch/x86/kernel/cpu/intel_cacheinfo.c:367:38: got int *
arch/x86/kernel/cpu/intel_cacheinfo.c:367:48: warning: incorrect type in argument 5 (different signedness)
arch/x86/kernel/cpu/intel_cacheinfo.c:367:48: expected unsigned int *edx
arch/x86/kernel/cpu/intel_cacheinfo.c:367:48: got int *
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
one early crash on one 8 node 256g machine:
Command line: console=uart8250,io,0x3f8,115200n8 initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug apic=debug acpi.debug_level=0x0000000f pci=routeirq ip=dhcp load_ramdisk=1 ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dffe0000 (usable)
BIOS-e820: 00000000dffe0000 - 00000000dffee000 (ACPI data)
BIOS-e820: 00000000dffee000 - 00000000dffff050 (ACPI NVS)
BIOS-e820: 00000000dffff050 - 00000000e0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000004020000000 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
end_pfn_map = 67239936
Kernel panic - not syncing: Duplicated early reservation d40000-e42000
Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #3
Call Trace:
[<ffffffff80221545>] lapic_get_maxlvt+0x0/0x10
[<ffffffff80221657>] clear_local_APIC+0x5/0xcf
[<ffffffff80221726>] disable_local_APIC+0x5/0x17
[<ffffffff8021fe16>] smp_send_stop+0x46/0x4c
[<ffffffff80235293>] panic+0x94/0x13e
[<ffffffff80bc3b03>] sctp_eps_proc_init+0x12/0x34
[<ffffffff80b9f1c5>] reserve_early+0x30/0x6c
[<ffffffff80803925>] init_memory_mapping+0x2cd/0x2dc
[<ffffffff80b9dc01>] setup_arch+0x21f/0x44e
[<ffffffff80b978be>] start_kernel+0x6f/0x2c7
[<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3
it turns out there is overlap between pgtable and bss...
in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end
need to round up table_start to PAGE_SIZE.
also make the panic more informative.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.
If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.
With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.
In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.
An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.
A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt
It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff
Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes some bugs of EFI memory handing code.
- On x86_64, it is possible that EFI memory map can not be mapped via
identity map, so efi_map_memmap is removed, just use early_ioremap.
- On i386, the EFI memory map mapping take effect cross paging_init,
so it is not necessary to use efi_map_memmap.
- EFI memory map is unmapped in efi_enter_virtual_mode to avoid
early_ioremap leak.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch makes reboot_type of BOOT_EFI is used on i386 too. Because
correpsonding reboot code of i386 and x86_64 is merged.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd.
I also cleaned up the NX test code quite a bit, and got rid of the ugly
exception table sorting stuff.
From: Arjan van de Ven <arjan@linux.intel.com>
This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option
as well as the NX CPU feature/mappings. Both testcases can move to tests/
once that patch gets merged into mainline.
(I'm half considering moving the rodata test into mm/init.c but I'll
wait with that until init.c is unified)
As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h
for the RODATA sections, which lead to 1 page less being marked read only.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The set_memory_* and set_pages_* family of API's currently requires the
callers to do a global tlb flush after the function call; forgetting this is
a very nasty deathtrap. This patch moves the global tlb flush into
each of the callers
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch converts various users of change_page_attr() to the new,
more intent driven set_page_*/set_memory_* API set.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
64bit uses end_pfn_map and 32bit uses max_low_pfn. There are several
files which have #ifdef'ed defines which map either to end_pfn_map or
max_low_pfn. Replace this by a universal define and clean up all the
other instances.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
get more testing of the c_p_a() code done by not turning off
PSE on DEBUG_PAGEALLOC.
this simplifies the early pagetable setup code, and tests
the largepage-splitup code quite heavily.
In the end, all the largepages will be split up pretty quickly,
so there's no difference to how DEBUG_PAGEALLOC worked before.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes some bugs of making EFI runtime code executable.
- Use change_page_attr in i386 too. Because the runtime code may be
mapped not through ioremap.
- If there is no _PAGE_NX in __supported_pte_mask, the change_page_attr
is not called.
- Make efi_ioremap map pages as PAGE_KERNEL_EXEC_NOCACHE, because EFI runtime
code may be mapped through efi_ioremap.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The SMP trampoline always runs in real mode, so making it executable
in the page tables doesn't make much sense because it executes
before page tables are set up. That was the only user of
set_kernel_exec(). Remove set_kernel_exec().
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch renames bt_ioremap to early_ioremap, which is used in
x86_64. This makes it easier to merge i386 and x86_64 usage.
[ mingo@elte.hu: fix ]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch replaces boot_ioremap invokation with bt_ioremap and
removes the boot_ioremap implementation.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch makes it possible for bt_ioremap() to be used before
paging_init(), via providing an early implementation of set_fixmap()
that can be used before paging_init().
This way boot_ioremap() can be replaced by bt_ioremap().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add mm to paravirt_alloc_pd, partly to make it consistent with
paravirt_alloc_pt, and because later changes will make use of it.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
old sequence:
size ==> >4G ==> point to RAM
changed to:
>4G ==> point to RAM ==> size
some bios even leave aper to unclear, so check size at last.
To avoid reporting:
Node 0: Aperture @ 4a42000000 size 32 MB
Aperture too small (32 MB)
with this change we will get:
Node 0: Aperture @ 4a42000000 size 32 MB
Aperture beyond 4G. Ignoring.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some consumer ICH9 boards (such as the Abit IP35 Pro) do not provide a BIOS
option for enabling the HPET. The same ICH workaround used for 6,7,8 can be
applied to 9. Here I enable the only PCI id that was visible on my system.
I have confirmed the HPETs work both from userspace and as a clocksource for
the running kernel (2.6.24 here) after applying this patch.
Force enabled HPET at base address 0xfed00000
hpet clockevent registered
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
hpet0: 4 64-bit timers, 14318180 Hz
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
in init/main.c boot_cpu_init() does that before.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
in init/main.c boot_cpu_init() already does that before setup_arch
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix the following warning:
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x7a3): Section mismatch: reference to .init.text:amd_detect_cmp in 'init_amd'
The function amd_detect_cmp were annotated __init and
was only used from init_amd() which are annotated __cpuinit.
Annotate amd_detect_cmp() with _cpuinit to fix it.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix following warning:
WARNING: arch/x86/kernel/built-in.o(__ksymtab+0x2b0): Section mismatch: reference to .cpuinit.text:arch_register_cpu in '__ksymtab_arch_register_cpu'
Annotating exported symbols are wrong.
Previously the warning were hidden by avoiding the export
in the non HOTPLUG_CPU case but the improved checks in
modpost caught it anyway.
Fix it by removing the __cpuinit annotation and rearrange the
code a bit to save one ifdef/endif pair.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix following warnings:
WARNING: arch/x86/kernel/built-in.o(.text+0x139e1): Section mismatch: reference to .init.data:early_qrk in 'check_dev_quirk'
WARNING: arch/x86/kernel/built-in.o(.text+0x139f5): Section mismatch: reference to .init.data:early_qrk in 'check_dev_quirk'
WARNING: arch/x86/kernel/built-in.o(.text+0x13a0c): Section mismatch: reference to .init.data:early_qrk in 'check_dev_quirk'
WARNING: arch/x86/kernel/built-in.o(.text+0x13a12): Section mismatch: reference to .init.data:early_qrk in 'check_dev_quirk'
WARNING: arch/x86/kernel/built-in.o(.text+0x13a1a): Section mismatch: reference to .init.data:early_qrk in 'check_dev_quirk'
WARNING: arch/x86/kernel/built-in.o(.text+0x13a36): Section mismatch: reference to .init.data:early_qrk in 'check_dev_quirk'
WARNING: arch/x86/kernel/built-in.o(.text+0x13a42): Section mismatch: reference to .init.data:
Warning was caused by access to the __initdata annotated variable
from the non-annotated static function check_dev_quirk().
check_dev_quirk() were only used from a function annotated
__init so add __init annotation to check_dev_quirk() to fix it.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x10ea0): Section mismatch: reference to .cpuinit.data:num_processors in 'acpi_unmap_lsapic'
The exported function acpi_unmap_lsapic() references
the variable num_processors that is annotated __cpuinitdata.
Remove the annotation of num_processors as we never know
when an exported function are called.
And drop the needless initialsation to 0.
Warning was seen on 64 bit but similar pattern were seen
in 32 bit - so fix it up there too.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix the following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x3): Section mismatch: reference to .cpuinit.data:force_mwait in 'mwait_usable'
[Seen on 64 bit only but similar pattern exist on 32 bit so fix it there too]
mwait_usable() were only used by a function annotated __cpuinit
so annotate mwait_usable() with __cpuinit to fix the warning.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix following warning:
WARNING: arch/x86/kernel/cpu/mcheck/built-in.o(.text+0x1584): Section mismatch: reference to .cpuinit.text:threshold_create_device in 'threshold_cpu_callback'
threshold_cpu_callback() is only used by threshold_cpu_notifier.
threshold_cpu_notifier is only used for cpu hot plug as it is registered
using register_hotcpu_notifier().
Mark them both __cpuinit to fix the warning.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix following warning:
WARNING: arch/x86/kernel/cpu/mcheck/built-in.o(.text+0x752): Section mismatch: reference to .cpuinit.text:mce_create_device in 'mce_cpu_callback'
mce_cpu_callback() is only used by mce_cpu_notofier.
The notifier is only used for hotplugable cpu's as it is
registered using register_hotcpu_notifier(),
Annotate them both __cpuinit to fix the warning.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The RDC R-321x SoC needs a reboot fixup which
uses its internal hardware watchdog set to
reset the CPU on next tick.
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The existing Geode GPIO API only allows for updating one GPIO at once. There
are instances where users want to update multiple GPIOs at once. With the
current API, they are given two choices; either ignore the GPIO API:
outl(0xc000, gpio_base + GPIO_OUTPUT_VAL);
outl(0xc000, gpio_base + GPIO_OUTPUT_ENABLE);
Alternatively, call each GPIO update separately:
geode_gpio_set(14, GPIO_OUTPUT_VAL);
geode_gpio_set(15, GPIO_OUTPUT_VAL);
geode_gpio_set(14, GPIO_OUTPUT_ENABLE);
geode_gpio_set(15, GPIO_OUTPUT_ENABLE);
Neither are desirable. This patch changes the GPIO API to allow for setting
of multiple GPIOs at once; rather than being passed an integer, we pass
a bitmask and provide a translation function. The above code would now
look like this:
geode_gpio_set(geode_gpio(14)|geode_gpio(15), GPIO_OUTPUT_VAL);
geode_gpio_set(geode_gpio(14)|geode_gpio(15), GPIO_OUTPUT_ENABLE);
Since there are no upstream users of the GPIO API yet (afaik), best to
change this now. This also adds a bit of sanity checking; it is no
longer possible to use a GPIO above 28.
Note the semantics of geode_gpio_isset() have changed:
geode_gpio_isset(geode_gpio(3)|geode_gpio(4), ...)
will only return true iff both GPIOs are set.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix cpu MHz reporting for AMD family 0x11 when powernow-k8 is
disabled.
Just adhere to the CONSTANT_TSC feature bit for AMD CPUs when deciding
whether cpu_khz needs calibration. The additional check for CPU family
is not needed and prevents calibration for future CPUs.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
improve the MTTR trimming messages and also trigger a WARN_ON()
so that kerneloops.org can pick it up and categorize it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The new "mfgptfix" boot command line option may be usd to fix MFGPT
timers on AMD Geode platforms when the BIOS has incorrectly applied
a workaround. TinyBIOS version 0.98 is known to be affected, 0.99
fixes the problem by letting the user disable the workaround.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No one uses struct cpu_model_info on x86_64 now.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use KSYM_NAME_LEN instead of numeric value
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
when MTRRs are not covering the whole e820 table, we need to trim the
RAM and need to update e820.
reuse some code on 64-bit as well.
here need to add early_get_cap and use it in early_cpu_detect, and move
mtrr_bp_init early.
The code successfully trimmed the memory map on Justin's system:
from:
[ 0.000000] BIOS-e820: 0000000100000000 - 000000022c000000 (usable)
to:
[ 0.000000] modified: 0000000100000000 - 0000000228000000 (usable)
[ 0.000000] modified: 0000000228000000 - 000000022c000000 (reserved)
According to Justin it makes quite a difference:
| When I boot the box without any trimming it acts like a 286 or 386,
| takes about 10 minutes to boot (using raptor disks).
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Tested-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the previous patch in the old RTC driver. It also removes the direct
rtc_interrupt() call from arch/x86/kernel/hpetc.c so that there's finally no
(code) dependency to CONFIG_RTC in arch/x86/kernel/hpet.c.
Because of this, it's possible to compile the drivers/char/rtc.ko driver as
module and still use the HPET emulation functionality. This is also expressed
in Kconfig.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
enabled, then interrupts don't work for the rtc-cmos driver which results in
RTC_AIE*, RTC_PIE* and RTC_ALM being unusable. This affects hwclock from
util-linux-ng at least on i386 since that uses RTC_PIE_ON. (For x86-64, a
polling method is used for unknown reasons.)
This patch series now
1. export the functions from arch/x86/kernel/hpet.c that the old char/rtc
driver uses to work around that problem,
2. makes it possible to compile the old rtc driver as module, while still
having CONFIG_HPET_EMULATE_RTC enabled and
3. makes use of the exported functions in (1) in the new rtc-cmos driver.
This patch:
This patch makes the RTC emulation functions in arch/x86/kernel/hpet.c usable
for kernel modules. It
- exports the functions (EXPORT_SYMBOL_GPL()),
- adds an interface to register the interrupt callback function
instead of using only a fixed callback function and
- replaces the rtc_get_rtc_time() function which depends on
CONFIG_RTC with a call to get_rtc_time() which is defined in
include/asm-generic/rtc.h.
The only dependency to CONFIG_RTC is the call to rtc_interrupt() which is
removed by the next patch. After this, there's no (code) dependency of
this functions to CONFIG_RTC=y any more.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I am preparing to convert the boot time page table to the kernels
native format. To achieve that I need to enable PAE. Enabling PSE
and the no execute bit would not hurt. So this patch modifies
the boot cpu path to execute all of the kernels enable code
if and only if we have the proper bits set in mmu_cr4_features.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently in head_32.S there are two ways we test to see if we
are the boot cpu. By looking at %ebx and by looking at the
static variable ready. When changing things around I have
found that it gets tricky to preserve %ebx. So this
patch just switches head.S over to the more reliable
test of always using ready.
Hopefully later we can kill these tests entirely.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Small fomatting fixes to 64-bit as well, trailing whitespace
and extra semicolon, also move the ifdefs for CONFIG_KALLSYMS
into the function itself.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We are driving a motherboard port so use a 2uS explicit delay at this
point.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
export __supported_pte_mask variable as GPL symbol.
lguest is a user of it.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Exporrt check_tsc_unstable function as GPL symbol. lguest is
a user of it.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
An older binutils bug caused us to not fix up alternatives.
This problem involved mutex.c but we dont do lockdep section tricks
there anymore, so this workaround is moot. Keep the printk nevertheless,
just in case ... We can remove that later on.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
add warning to check_tsc_warp() - if get_cycles() does not progress.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
100 million max # of loops is a bit too much - reduce it to 10 million.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change static bios_cpu_apicid array to a per_cpu data variable.
This includes using a static array used during initialization
similar to the way x86_cpu_to_apicid[] is handled.
There is one early use of bios_cpu_apicid in apic_is_clustered_box().
The other reference in cpu_present_to_apicid() is called after
smp_set_apicids() has setup the percpu version of bios_cpu_apicid.
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change the following static arrays sized by NR_CPUS to
per_cpu data variables:
char cpu_to_node_map[NR_CPUS];
fixup:
- Split cpu_to_node function into "early" and "late" versions
so that x86_cpu_to_node_map_early_ptr is not EXPORT'ed and
the cpu_to_node inline function is more streamlined.
- This also involves setting up the percpu maps as early as possible.
- Fix X86_32 NUMA build errors that previous version of this
patch caused.
V2->V3:
- add early_cpu_to_node function to keep cpu_to_node efficient
- move and rename smp_set_apicids() to setup_percpu_maps()
- call setup_percpu_maps() as early as possible
V1->V2:
- Removed extraneous casts
- Fix !NUMA builds with '#ifdef CONFIG_NUMA"
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add a generic option to clear any cpuid bit. I added it because it was
very easy to add with the new generic cpuid disable bitmap and perhaps
it will be useful in the future.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To disable CLFLUSH usage, especially in change_page_attr().
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Modern 32bit userland doesn't even boot when the TSC is disabled
because ld.so tends to contain RDTSCs. So make notsc only effective for the
kernel, similar to 64bit.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This convers nofxsr, mem=nopentium and nosep to use the new
generic cpuid disable bitmap instead of using own variables.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There are already various options to disable specific cpuid bits
on the command line. They all use their own variable. Add a generic
mask to make this easier in the future.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This finally makes paravirt-ops able to compile and boot under x86_64.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
paravirt_pagetable_setup_{start,done}() are not used (yet) under x86_64,
and native_pagetable_setup_{start,done}() don't exist on x86_64. So they
don't need to be set.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds the __parainstructions section to vmlinux.lds.S.
It's needed for the patching system.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>