call x86_64's init_rsp stack_start, just as i386 does.
Put a zeroed stack segment for consistency. With this,
we can eliminate one ugly ifdef in smpboot.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Making a variable page-aligned by using
__attribute__((section(".data.page_aligned"))) is fragile because if
sizeof(variable) is not also a multiple of page size, it leaves
variables in the remainder of the section unaligned.
This patch introduces two new qualifiers, __page_aligned_data and
__page_aligned_bss to set the section *and* the alignment of
variables. This makes page-aligned variables more robust because the
linker will make sure they're aligned properly. Unfortunately it
requires *all* page-aligned data to use these macros...
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch moves the reserve_crashkernel() to setup.c and removes the
architecture-specific version. Both versions were more or less the same.
I tested it on both x86-64 and i386, with CONFIG_KEXEC on and off (so
that it compiles).
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: yhlu.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Someone could write 0 bytes to /proc/sgi_uv/ptc_statistics,
causing
optstr[count - 1] = '\0';
to write to who-knows-where.
(Andi Kleen noticed this need from a patch I sent for
similar code in the ia64 world (sn2_ptc_proc_write()).)
(count less than zero is not possible here, as count is unsigned)
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
v6: 6/19 close the security hole in uv_ptc_proc_write())
> Found a potential security hole while doing that:
> static ssize_t uv_ptc_proc_write(struct file *file, const char __user *user,
> size_t count, loff_t *data)
> if (copy_from_user(optstr, user, count))
> return -EFAULT;
>
> is count guaranteed to never be larger than 64?
is fixed below.
It adds tlb_uv.o to the Makefile.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: mingo@elte.hu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
physid_mask_of_physid() causes a huge stack (12k) to be created if the
number of APICS is large. Replace physid_mask_of_physid() with a
new function that does not create large stacks. This is a problem only
on large x86_64 systems.
this paves the way to increase MAX_APICS.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: linux-mm@kvack.org
Cc: mingo@elte.hu
Cc: tglx@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/tlb_uv.c: In function ‘uv_table_bases_init':
arch/x86/kernel/tlb_uv.c:612: error: ‘bau_tabsp' undeclared (first use in this function)
arch/x86/kernel/tlb_uv.c:612: error: (Each undeclared identifier is reported only once
arch/x86/kernel/tlb_uv.c:612: error: for each function it appears in.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
TLB shootdown for SGI UV.
v1: 6/2 original
v2: 6/3 corrections/improvements per Ingo's review
v3: 6/4 split atomic operations off to a separate patch (Jeremy's review)
v4: 6/12 include <mach_apic.h> rather than <asm/mach-bigsmp/mach_apic.h>
(fixes a !SMP build problem that Ingo found)
fix the index on uv_table_bases[blade]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
TLB shootdown for SGI UV.
Depends on patch (in tip/x86/irq):
x86-update-macros-used-by-uv-platform.patch Jack Steiner May 29
This patch provides the ability to flush TLB's in cpu's that are not on
the local node. The hardware mechanism for distributing the flush
messages is the UV's "broadcast assist unit".
The hook to intercept TLB shootdown requests is a 2-line change to
native_flush_tlb_others() (arch/x86/kernel/tlb_64.c).
This code has been tested on a hardware simulator. The real hardware
is not yet available.
The shootdown statistics are provided through /proc/sgi_uv/ptc_statistics.
The use of /sys was considered, but would have required the use of
many /sys files. The debugfs was also considered, but these statistics
should be available on an ongoing basis, not just for debugging.
Issues to be fixed later:
- The IRQ for the messaging interrupt is currently hardcoded as 200
(see UV_BAU_MESSAGE). It should be dynamically assigned in the future.
- The use of appropriate udelay()'s is untested, as they are a problem
in the simulator.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch uses reserve_bootmem_generic() instead of reserve_bootmem()
to reserve the crashkernel memory on x86_64. That's necessary for NUMA
machines, see 00212fef81:
[PATCH] Fix kdump Crash Kernel boot memory reservation for NUMA machines
This patch will fix a boot memory reservation bug that trashes memory on
the ES7000 when loading the kdump crash kernel.
The code in arch/x86_64/kernel/setup.c to reserve boot memory for the crash
kernel uses the non-numa aware "reserve_bootmem" function instead of the
NUMA aware "reserve_bootmem_generic". I checked to make sure that no other
function was using "reserve_bootmem" and found none, except the ones that
had NUMA ifdef'ed out.
I have tested this patch only on an ES7000 with NUMA on and off (numa=off)
in a single (non-NUMA) and multi-cell (NUMA) configurations.
Signed-off-by: Amul Shah <amul.shah@unisys.com>
Looks-good-to: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The switch-back to reserve_bootmem() was accidentally introduced in
5c3391f9f7 when adding the BOOTMEM_EXCLUSIVE
parameter.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967.
It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.
The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix setup.c printk format warning:
linux-next-20080605/arch/x86/kernel/setup.c: In function 'setup_per_cpu_areas':
linux-next-20080605/arch/x86/kernel/setup.c:173: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'ssize_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Without this patch, my link fails with:
arch/x86/kernel/built-in.o(.cpuinit.text+0x3c6e): In function `get_local_pda':
: undefined reference to `_cpu_pda'
arch/x86/kernel/built-in.o(.cpuinit.text+0x3cd1): In function `get_local_pda':
: undefined reference to `after_bootmem'
arch/x86/kernel/built-in.o(.cpuinit.text+0x3cec): In function `get_local_pda':
: undefined reference to `_cpu_pda'
make[2]: *** [.tmp_vmlinux1] Error 1
Caused by commit 766da892634694f795b18b9538407816896fc470
x86: remove static boot_cpu_pda array v2
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Remove the boot_cpu_pda array and pointer table from the data section.
Allocate the pointer table and array during init. do_boot_cpu()
will reallocate the pda in node local memory and if the cpu is being
brought up before the bootmem array is released (after_bootmem = 0),
then it will free the initial pda. This will happen for all cpus
present at system startup.
This removes 512k + 32k bytes from the data section.
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Consolidate node_to_cpumask operations and remove the 256k
byte node_to_cpumask_map. This is done by allocating the
node_to_cpumask_map array after the number of possible nodes
(nr_node_ids) is known.
* Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
been increased. It now shows faults when calling node_to_cpumask()
and node_to_cpumask_ptr().
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Restore the nodenumber field in the x86_64 pda. This field is slightly
different than the x86_cpu_to_node_map mainly because it's a static
indication of which node the cpu is on while the cpu to node map is a
dyanamic mapping that may get reset if the cpu goes offline. This also
simplifies the numa_node_id() macro.
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Introduce a new PER_CPU macro called "EARLY_PER_CPU". This is
used by some per_cpu variables that are initialized and accessed
before there are per_cpu areas allocated.
["Early" in respect to per_cpu variables is "earlier than the per_cpu
areas have been setup".]
This patchset adds these new macros:
DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)
EXPORT_EARLY_PER_CPU_SYMBOL(_name)
DECLARE_EARLY_PER_CPU(_type, _name)
early_per_cpu_ptr(_name)
early_per_cpu_map(_name, _idx)
early_per_cpu(_name, _cpu)
The DEFINE macro defines the per_cpu variable as well as the early
map and pointer. It also initializes the per_cpu variable and map
elements to "_initvalue". The early_* macros provide access to
the initial map (usually setup during system init) and the early
pointer. This pointer is initialized to point to the early map
but is then NULL'ed when the actual per_cpu areas are setup. After
that the per_cpu variable is the correct access to the variable.
The early_per_cpu() macro is not very efficient but does show how to
access the variable if you have a function that can be called both
"early" and "late". It tests the early ptr to be NULL, and if not
then it's still valid. Otherwise, the per_cpu variable is used
instead:
#define early_per_cpu(_name, _cpu) \
(early_per_cpu_ptr(_name) ? \
early_per_cpu_ptr(_name)[_cpu] : \
per_cpu(_name, _cpu))
A better method is to actually check the pointer manually. In the
case below, numa_set_node can be called both "early" and "late":
void __cpuinit numa_set_node(int cpu, int node)
{
int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
if (cpu_to_node_map)
cpu_to_node_map[cpu] = node;
else
per_cpu(x86_cpu_to_node_map, cpu) = node;
}
* Add a flag "arch_provides_topology_pointers" that indicates pointers
to topology cpumask_t maps are available. Otherwise, use the function
returning the cpumask_t value. This is useful if cpumask_t set size
is very large to avoid copying data on to/off of the stack.
* The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while
the non-debug case has been optimized a bit.
* Remove an unreferenced compiler warning in drivers/base/topology.c
* Clean up #ifdef in setup.c
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Travis wrote:
> Ingo Molnar wrote:
>> * Mike Travis <travis@sgi.com> wrote:
>>
>>> [Ingo - please replace "PATCH 07/11" with this one.]
>>>
>>> * Remove 544k bytes from the kernel by removing the boot_cpu_pda
>>> array from the data section and allocating it during startup.
>>>
>>> Fixed panic in setup_per_cpu_areas when HOTPLUG_CPU not set.
>>>
>>> For inclusion into sched-devel/latest tree.
>> sched-devel.git randconfig testing found another crash with your queue:
>>
>> [ 0.111060] Brought up 1 CPUs
>> [ 0.111986] Total of 1 processors activated (4022.73 BogoMIPS).
>> [ 0.112987] Testing NMI watchdog ... <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
>> [ 0.114982] IP: [<ffffffff8180d4a0>] check_nmi_watchdog+0xb0/0x210
>> [ 0.114982] PGD 0
>> [ 0.114982] Oops: 0000 [1] SMP
>> [ 0.114982] CPU 0
>> [............]
>>
>> http://redhat.com/~mingo/misc/config-Mon_Apr_28_23_25_25_CEST_2008.bad
>> http://redhat.com/~mingo/misc/log-Mon_Apr_28_23_25_25_CEST_2008.bad
>>
>> Ingo
>
> Hi Ingo,
>
> I need a bit more information on your hardware configuration. Building a
> kernel with the above config file started up fine on both the Intel and AMD
> boxes.
>
> Based on the above output it looks like it might be a UP machine?
...
Ok, I think I found it. In check_nmi_watchdog():
for (cpu = 0; cpu < NR_CPUS; cpu++)
prev_nmi_count[cpu] = cpu_pda(cpu)->__nmi_count;
As I mentioned it works fine on both of my systems so could you try it out?
Thanks!
Mike
--
* Change function check_nmi_watchdog() to use nr_cpu_ids instead of NR_CPUS.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some HP laptops have a problem with their DSDT reporting as
HP/SB400/10000, which includes some code which overrides all temperature
trip points to 16C if the INTIN2 input of the I/O APIC is enabled. This
input is incorrectly designated the ISA IRQ 0 via an interrupt source
override even though it is wired to the output of the master 8259A and
INTIN0 is not connected at all. So far two models have been identified,
namely nx6125 and nx6325.
Use a knob provided by the I/O APIC interrupt registration code to
abandon any attempts to route IRQ 0 through the I/O APIC for these
systems.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As discovered recently some systems exhibit problems when the 8254 timer
IRQ is routed through the I/O APIC. These problems do not affect the
timer IRQ itself and therefore cannot be detected when the correctness of
operation of the interrupt is verified in check_timer(). Therefore the
I/O APIC path of the timer IRQ has to be disabled entirely.
This is a change that lets platforms ask for the timer IRQ not to be
registered in the I/O APIC interrupt tables. The local APIC and ExtINTA
paths are unaffected. This request is only taken into account for ACPI
platforms as MP table systems seem unaffected so far.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
if the system doesn't have ioapic, we don't need to store entries for mptable
update
also let mp_config_acpi_gsi not call func in mpparse
so later could decouple mpparse with acpi more easily
Reported-by: Daniel Exner <dex@dragonslave.de>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Exner <dex@dragonslave.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1. let 64bit support 88 and e801 too
2. introduce default_machine_specific_memory_setup, and reuse it
for voyager
v2: fix 64 bit compiling
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
seperate SRAT finding and parsing from get_memcfg_from_srat,
and let getmemcfg_from_srat only handle array from previous step.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so don't punish all other cpus without that problem when init highmem
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
and make 32-bit resource registration more like 64 bit.
also move probe_roms back to setup_32.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Because of the size limits of struct boot_params (zero page), the
maximum number of E820 memory map entries can be passed to kernel is
128. As pointed by Paul Jackson, there is some machine produced by SGI
with so many nodes that the number of E820 memory map entries is more
than 128. To enabling Linux kernel on these system, a new setup data
type named SETUP_E820_EXT is defined to pass additional memory map
entries to Linux kernel.
This patch is based on x86/auto-latest branch of git-x86 tree and has
been tested on x86_64 and i386 platform.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use early_node_map to init high pages, so we can remove page_is_ram() and
page_is_reserved_early() in the big loop with add_one_highpage
also remove page_is_reserved_early(), it is not needed anymore.
v2: fix the build of other platforms
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
rename update_memory_range to e820_update_range
rename add_memory_region to e820_add_region
to make it more clear that they are about e820 map operations.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so we don't get the same value multiple times.
also make mp_config_acpi_legacy_irqs more readable by moving assignments
together.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Daniel Exner reported IO-APIC enumeration breakage in linux-next.
Alexey Starikovskiy found out that it might be related to
commit 2944e16b25 "x86: update mptable".
use enable_update_mptable to decide if need check before add mp_irqs array.
Reported-by: Daniel Exner <webmaster@dragonslave.de>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1. move that before zone_sizes_init ...
2. add free_early for one old one, otherwise it will be be reserved again
when we init highmem.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1. add reserve_bootmem_generic for 32bit
2. change len to unsigned long
3. make early_res_to_bootmem to use it
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
we are checking mptable early for numaq, so don't need to reserve_bootmem
for it. bootmem is not there yet.
do the same thing as 64-bit.
found it on 64g above system from 64-bit kernel kexec to 32 bit kernel with
numaq support.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
since we now have 32-bit support for e820_register_active_regions(),
we can merge the parsing of the mem=/memmap= boot parameters.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch uses reserve_bootmem_generic() instead of reserve_bootmem()
to reserve the crashkernel memory on x86_64. That's necessary for NUMA
machines, see 00212fef81:
[PATCH] Fix kdump Crash Kernel boot memory reservation for NUMA machines
This patch will fix a boot memory reservation bug that trashes memory on
the ES7000 when loading the kdump crash kernel.
The code in arch/x86_64/kernel/setup.c to reserve boot memory for the crash
kernel uses the non-numa aware "reserve_bootmem" function instead of the
NUMA aware "reserve_bootmem_generic". I checked to make sure that no other
function was using "reserve_bootmem" and found none, except the ones that
had NUMA ifdef'ed out.
I have tested this patch only on an ES7000 with NUMA on and off (numa=off)
in a single (non-NUMA) and multi-cell (NUMA) configurations.
Signed-off-by: Amul Shah <amul.shah@unisys.com>
Looks-good-to: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The switch-back to reserve_bootmem() was accidentally introduced in
5c3391f9f7 when adding the BOOTMEM_EXCLUSIVE
parameter.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967.
It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.
The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
> That helped a lot, the system seems to work normally now.
>
> Here's the relevant snippet from dmesg:
>
> [ 0.108006] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 0.108006] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> [ 0.108006] ...trying to set up timer (IRQ0) through the 8259A ... <3>
> [ 0.108006] ..... (found apic 0 pin 2) ...<3> failed.
> [ 0.108006] ...trying to set up timer as Virtual Wire IRQ...<3> works.
>
> and the whole thing is at: http://www.sisk.pl/kernel/debug/20080618/dmesg-2.log
Hmm, that only proved the 8259A is indeed wired to the pin #2 of the I/O
APIC.
> I, personally, don't have any and AMD only has SB600 documentation on its
> web page (it's still marked as "AMD confidential" ;-)).
Well, the IC block is most likely the same as that's not rocket science
and once done there is no need to fiddle with that. That written, I am
afraid there is nothing useful about the IC in the document, except that
it's there and consists of an I/O APIC providing 24 inputs and the usual
pair of 8259A cores. Thanks for the reference anyway.
> There is an interrupt controller in there, but I'm not sure if there's any
> 8259A. The northbridge is on the CPU, actually.
I will praise the day someone ships an x86 machine without an 8259A core!
As expressed in another mail I suspect there may actually be a direct
route from the 8254 to INTIN0 in the southbridge -- this is what other
bootstrap logs seen in the Internet suggest. This would mean this
particular BIOS is buggy (is it the latest version?) and provides an
incorrect IRQ override in its ACPI tables, for example because the
responsible block has been blindly copied from a machine using a commoner
wiring. This could be moderately easily fixed up with a quirk based on
the PCI ID (after checking it again, we actually used to have a quirk for
ATI in this area, but the way it was done suggests the issue was not
understood well enough).
Could you please remove the hack sent yesterday and test the patch
provided below? I do hope it builds, but I have no immediate means to
check it. Please report the output. The intent is to test INTIN0
directly before testing INTIN2 through the 8259A. Thanks.
Aside of that, what I have gathered from your reports (please correct me
if I have got it wrong) is that when the through-8259A mode is used, then
after a while 8254 timer interrupts stop arriving. What's interesting,
the "Virtual Wire IRQ" seems to work for you correctly (that's quite an
odd setup where a local APIC input is used in the native mode -- please
post /proc/interrupts for confirmation), which in turn implies the master
8259A drives its INT output as we expect. Why would the I/O APIC input
have problems then? Hmm...
[ mingo@elte.hu: revert the "x86: fix IO APIC breakage on HP nx6325"
version. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Thu, 19 Jun 2008, Rafael J. Wysocki wrote:
> > With such a configuration the "x86: I/O APIC: timer through 8259A
> > second-chance" patch should not matter, because the only change it
> > introduces is an attempt to try the same I/O APIC pin again, but with the
> > IRQ0 line of the master 8259A enabled. That's not a terribly unusual
> > configuration and nothing should get confused in the system.
>
> But it _does_ get confused, really.
Something certainly gets confused, but so far I am not sure which bit
exactly it is, are you?
> > Barring the unlikely possibility of the 8259A actually being wired to
> > INTIN2 of the I/O APIC I can see two possible explanations:
> >
> > 1. The 8259A interrupt actually escapes to the CPU somehow and is handled
> > as an ExtINTA interrupt. This would make the code in check_timer()
> > decide it has found a working configuration, while actually it has been
> > fooled.
[...]
> Here you go:
>
> [ 0.108006] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 0.108006] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> [ 0.108006] ...trying to set up timer (IRQ0) through the 8259A ... <3>
> [ 0.108006] ..... (found apic 0 pin 2) ...<3> works.
>
> The full dmesg is at: http://www.sisk.pl/kernel/debug/20080618/dmesg-1.log
Thanks. In this case I suspect the case #1 quoted above happens, that is
the 8259A manages to deliver its interrupt somehow. Note at this stage it
is meant to be in the AEOI mode, so it can happily resubmit the interrupt
indefinitely with no additional handling as long as it receives INTA
cycles.
Can you please try the patch below on top of "x86: I/O APIC: timer
through 8259A second-chance" to see whether my hypothesis is true? It
modifies the through-8259A setup path so that the APIC input gets masked,
but the 8259A has the timer interrupt still enabled. Let me know how the
timer interrupt is routed in this case.
Bisected-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If configured to use the I/O APIC, the NMI watchdog is deemed to fail if
the chip has been deactivated as a result of "nosmp". Downgrade to the
local APIC watchdog similarly to what is done for the UP case.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For the UP case the NMI watchdog downgrade is done consistently in
APIC_init_uniprocessor() now. Remove redundant code used only when
BIOS-disabled local APIC is activated.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If configured to use the I/O APIC, the NMI watchdog is deemed to fail if
the chip will not be used in the UP configuration, because "noapic" has
been specified or the chip is simply not there. Downgrade to the local
APIC watchdog to rectify.
The new #ifdef is ugly, I know. A proper solution is to provide suitable
definitions of smp_found_config, etc. for !CONFIG_X86_IO_APIC in a header.
Likewise the whole if () condition should be moved to a static inline
function. Such clean-ups are beyond the scope of this change and can be
done once the whole issue of the timer has been sorted out.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
nmi_watchdog=1 hangs on 64-bit:
[ 0.250000] Detected 12.564 MHz APIC timer.
[ 0.254178] APIC timer registered as dummy, due to nmi_watchdog=1!
[ 0.260366] Testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (0->0)!
[ ... ]
[ 0.470003] calling genl_init+0x0/0xd0
[ hard hang ]
bisected it down to:
git-bisect start
git-bisect good 1beee8dc8c
git-bisect bad 11582ece0aaa2d0f94f345c08a4ab9997078a083
git-bisect bad 5479c623bb44089844022c03d4c0eb16d5b7a15f
git-bisect bad cfb4c7fabeb499e1c29f9d1878968e37a938e28a
git-bisect good 246dd412d3
git-bisect bad 3f8237eaff7dc1e35fa791dae095574fd974e671
git-bisect good 90e23b13ab849e2a11f00c655eb3a2011b4623be
git-bisect bad 833526a34eeefc117df3191a594c3c3a4f15a9ac
git-bisect good 791b93d3dfaf16c23e978bec0cc0a3dd9d855d63
git-bisect bad 65767c64068f2c93e56a1accfed5c78230ac12d7
git-bisect bad 2abc5c05dd82c188e3bdf6641a274f013348d14b
git-bisect bad 317e1f2597ffb4d4db940577bbe56dc6e881ef07
| 317e1f2597ffb4d4db940577bbe56dc6e881ef07 is first bad commit
| commit 317e1f2597ffb4d4db940577bbe56dc6e881ef07
| Author: Maciej W. Rozycki <macro@linux-mips.org>
| Date: Wed May 21 22:10:22 2008 +0100
| x86: I/O APIC: clean up the 8259A on a NMI watchdog failure
the problem is that in the dummy-lapic branch we rely on the i8259A
but if the NMI watchdog fails we turn off IRQ 0 - which doesnt work
too well ;-)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Not sure but maybe it is better to use NMI_DISABLED,
will take a look. But for now this patch is not change
anything in logic so it will not hurt/broke the kernel.
For most cases nmi_watchdog assignment is by one of NMI_*
macro so I think there it make sense too.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/io_apic_64.c: In function 'check_timer':
arch/x86/kernel/io_apic_64.c:1688: error: 'vector' undeclared (first use in this function)
arch/x86/kernel/io_apic_64.c:1688: error: (Each undeclared identifier is reported only once
arch/x86/kernel/io_apic_64.c:1688: error: for each function it appears in.)
Some systems incorrectly report the ExtINTA pin of the I/O APIC as the
genuine target of the timer interrupt. Here is a change that copies timer
pin information found to the other pin if one has been found only. This
way both a direct and a through-8259A route is tested with the pin letting
these problematic systems work well enough. If no timer pin information
has been found for the I/O APIC, then local APIC variations are tried
only, similarly to what is done without the change (except without the
misleading messages).
Obviously if we try the first-chance path without being told by the BIOS
to do so, we should not complain either, so do not print the message in
this case.
The 64-bit variation should be updated with a call to
replace_pin_at_irq() which can be done with the upcoming merge. Since
add_pin_to_irq() is now always called in the first-chance path, the
condition to require it in the second-chance path no longer happens.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Keep the timer interrupt line masked when reconfiguring its interrupt
redirection entry in the I/O APIC.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unmask the timer interrupt line set up in the through-8259A mode
explicitly after setup_timer_IRQ0_pin() has set up the I/O APIC interrupt
redirection entry to let the two operations be unbound from each other.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rename setup_ExtINT_IRQ0_pin() to setup_timer_IRQ0_pin() to better
reflect the upcoming role of a function setting up a (semi-)arbitrary I/O
APIC pin appropriately for the 8254 timer. By "appropriate" the following
settings are meant: edge-triggered, active-high, all the other settings
per-architecture. Adjust comments to reflect code appropriately. No
functional changes.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The LINT0 line of the local APIC is masked in the LVT0 entry in
check_timer() before this function is ever called. Removed the
redundant unmasking for better control.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For a better control the masking and unmasking of the timer interrupt
line in the 8259A operating in the 'Virtual Wire' mode has been moved out
of setup_ExtINT_IRQ0_pin() now, so remove the redundant calls from the
function.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When the through-8259A mode is used for the timer, the call to
set_irq_handler() will register a NULL handler name, resulting in
"IO-APIC-<NULL>" reported. Fix by calling ioapic_register_intr() as done
for all the other I/O APIC interrupts.
The 64-bit variation calls set_irq_chip_and_handler_name() here
needlessly and should get fixed with the upcoming merge.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The local APIC interrupt handler gets registered with
set_irq_chip_and_handler_name(), which results in
"local-APIC-edge-fasteoi" reported as the name of the handler. Fix by
removing the type of the handler left over from before the generic
handlers were introduced.
The 64-bit variation should get fixed with the upcoming merge.
NB It should really use the "edge" handler and not the "fasteoi" one,
but that's a separate issue.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no point in keeping the 8259A enabled if the I/O APIC NMI
watchdog has failed and the 8259A is not used to pass through regular
timer interrupts. This fixes problems with some systems where some logic
gets confused.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If configured to use the I/O APIC, the NMI watchdog is deemed to fail if
the chip has been deactivated as a result of "nosmp". Downgrade to the
local APIC watchdog similarly to what is done for the UP case.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The local APIC is no longer forced off when "nosmp" has been specified.
Correct the message printed.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Disable the 8259A acting in the "virtual wire" mode to keep the interrupt
line inactive while fiddling with local APIC interrupt vector registers
associated with its destination inputs. To be on the safe side,
especially concerning flipping the trigger mode.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Disable the 8259A when routing of the timer interrupt through the chip to
the local APIC of the primary processor has failed.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the "disable_8254_timer" and "enable_8254_timer" kernel
parameters. Now that AEOI acknowledgements are no longer needed for
correct timer operation, the 8259A can be kept disabled unconditionally
unless interrupts, either timer or watchdog ones, are actually passed
through it.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The code that used to be in do_slow_gettimeoffset() that relied on the
IRR bit of the master 8259A PIC for IRQ0 to check the state of the output
timer 0 of the PIT is no longer there. As a result, there is no need to
use the POLL command to acknowledge the timer interrupt in the "8259A
Virtual Wire", except for the NMI watchdog when the i82489DX APIC is used
(this is because this particular APIC treats NMIs as level-triggered and
keeping the input asserted would keep motherboard NMI sources held off for
too long). Remove the unneeded bits and adjust comments accordingly.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
v2: fix early_panic on this config:
http://redhat.com/~mingo/misc/config-Thu_Jun_19_14_22_37_CEST_2008.bad
reason : struct cpu_vendor_dev size is 16, need to make table to be 16
byte alignment
also print out the cpu supported...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
"Form follows function". Code is now where it belongs to.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Tue, 17 Jun 2008, Rafael J. Wysocki wrote:
>
> BTW, with the C1E patches reverted I don't get the
> WARNING: at /home/rafael/src/linux-next/kernel/smp.c:215 smp_call_function_single+0x3d/0xa2
> in the log. Thomas?
The BROADCAST_FORCE notification uses smp_function_call and therefor
must be run with interrupts enabled.
While at it, add a comment for the BROADCAST_EXIT notifier as well.
Reported-and-bisected-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
C1E on AMD machines is like C3 but without control from the OS. Up to
now we disabled the local apic timer for those machines as it stops
when the CPU goes into C1E. This excludes those machines from high
resolution timers / dynamic ticks, which hurts especially X2 based
laptops.
The current boot time C1E detection has another, more serious flaw
as well: some BIOSes do not enable C1E until the ACPI processor module
is loaded. This causes systems to stop working after that point.
To work nicely with C1E enabled machines we use a separate idle
function, which checks on idle entry whether C1E was enabled in the
Interrupt Pending Message MSR. This allows us to do timer broadcasting
for C1E and covers the late enablement of C1E as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since the trampoline code is now used for ACPI resume from suspend to RAM,
the trampoline page tables have to be fixed up during boot not only on SMP
systems, but also on UP systems that use the trampoline.
Reference: http://bugzilla.kernel.org/show_bug.cgi?id=10923
Reported-by: Dionisus Torimens <djtm@gmx.net>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: pm list <linux-pm@lists.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some Dell laptops enter resume with apparent garbage in the segment
descriptor registers (almost certainly the result of a botched
transition from protected to real mode.) The only way to clean that
up is to enter protected mode ourselves and clean out the descriptor
registers.
This fixes resume on Dell XPS M1210 and Dell D620.
Reference: http://bugzilla.kernel.org/show_bug.cgi?id=10927
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: pm list <linux-pm@lists.linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This code removes a leftover from the iommu_enable function. The ctrl variable
is assigned but never used.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds a check if the early detect code has found AMD IOMMU hardware
descriptions and does not try to initialize hardware if the check failed.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch removes the amd_iommu=off kernel parameter and honors the generic
iommu=off parameter for the same purpose.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes the domain TLB flushing behavior of the driver. When there
is more than one page to flush it flushes the whole domain TLB instead of every
single page. So we send only a single command to the IOMMU in every case which
is faster to execute.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The set_bit_string call in the address allocator is not necessary because its
already called in iommu_area_alloc().
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Quirks getting ignored was a bug. Below patch fixes the bug, until
we have the dynamic banks support.
Sysfs choice configuration should not have any issues with the earlier patch
as we look for NR_SYSFS_BANKS in do_machine_check().
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Max Asbock <masbock@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
First announce ourself, then start working. Currently this module reports
itself when all is completed which is not most modules do. Plus some
cosmetic/whitespace cleanups.
Signed-off-by: Ben Castricum <lk0806@bencastricum.nl>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fedora reports that mem_init()'s zap_low_mappings(), extended to SMP in
61165d7a03 x86: fix app crashes after SMP
resume causes 32-bit Intel Mac machines to reboot very early when
booting with EFI.
The EFI code appears to manage low mappings for itself when needed; but
like many before it, confuses PSE with PAE. So it has only been mapping
half the space it needed when PSE but not PAE. This remained unnoticed
until we moved the SMP zap_low_mappings() before
efi_enter_virtual_mode(). Presumably could have been noticed years ago
if anyone ran a UP kernel on such machines?
Reported-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Glauber Costa <gcosta@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Peter Jones <pjones@redhat.com>
ptrace has always returned only -EIO for all failures to access
registers. The user_regset calls are allowed to return a more
meaningful variety of errors. The REGSET_XFP calls use -ENODEV
for !cpu_has_fxsr hardware. Make ptrace return the traditional
-EIO instead of the error code from the user_regset call.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch disables suspend/resume on machines with AMD IOMMU enabled. Real
suspend/resume support for AMD IOMMU is currently being worked on. Until this
is ready it will be disabled to avoid data corruption when the IOMMU is not
properly re-enabled at resume. The patch is based on a similar patch for the
GART driver written by Pavel Machek.
The overall driver merged into tip/master is tested with parallel disk and
network loads and showed no problems in a test running for 3 days.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ptrace GET/SET FPXREGS broken
x86: fix cpu hotplug crash
x86: section/warning fixes
x86: shift bits the right way in native_read_tscp
When I update kernel 2.6.25 from 2.6.24, gdb does not work.
On 2.6.25, ptrace(PTRACE_GETFPXREGS, ...) returns ENODEV.
But 2.6.24 kernel's ptrace() returns EIO.
It is issue of compatibility.
I attached test program as pt.c and patch for fix it.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <errno.h>
#include <sys/ptrace.h>
#include <sys/types.h>
struct user_fxsr_struct {
unsigned short cwd;
unsigned short swd;
unsigned short twd;
unsigned short fop;
long fip;
long fcs;
long foo;
long fos;
long mxcsr;
long reserved;
long st_space[32]; /* 8*16 bytes for each FP-reg = 128 bytes */
long xmm_space[32]; /* 8*16 bytes for each XMM-reg = 128 bytes */
long padding[56];
};
int main(void)
{
pid_t pid;
pid = fork();
switch(pid){
case -1:/* error */
break;
case 0:/* child */
child();
break;
default:
parent(pid);
break;
}
return 0;
}
int child(void)
{
ptrace(PTRACE_TRACEME);
kill(getpid(), SIGSTOP);
sleep(10);
return 0;
}
int parent(pid_t pid)
{
int ret;
struct user_fxsr_struct fpxregs;
ret = ptrace(PTRACE_GETFPXREGS, pid, 0, &fpxregs);
if(ret < 0){
printf("%d: %s.\n", errno, strerror(errno));
}
kill(pid, SIGCONT);
wait(pid);
return 0;
}
/* in the kerel, at kernel/i387.c get_fpxregs() */
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Vegard Nossum reported crashes during cpu hotplug tests:
http://marc.info/?l=linux-kernel&m=121413950227884&w=4
In function _cpu_up, the panic happens when calling
__raw_notifier_call_chain at the second time. Kernel doesn't panic when
calling it at the first time. If just say because of nr_cpu_ids, that's
not right.
By checking the source code, I found that function do_boot_cpu is the culprit.
Consider below call chain:
_cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
So do_boot_cpu is called in the end. In do_boot_cpu, if
boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So later
on, when _cpu_up calls __raw_notifier_call_chain at the second time to
report CPU_UP_CANCELED, because this cpu is already cleared from
cpu_possible_map, get_cpu_sysdev returns NULL.
Many resources are related to cpu_possible_map, so it's better not to
change it.
Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
cpu_possible_map.
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
small speedup.
Paravirt replacements were added to the i386 module loader by commit
139ec7c416. This adds the same code to
the x86_64 module loader.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
From the code:
"B stepping K8s sometimes report an truncated RIP for IRET exceptions
returning to compat mode. Check for these here too."
The code then proceeds to truncate the upper 32 bits of %rbp. This means
that when do_page_fault() is finally called, its prologue,
do_page_fault:
push %rbp
movl %rsp, %rbp
will put the truncated base pointer on the stack. This means that the
stack tracer will not be able to follow the base-pointer changes and
will see all subsequent stack frames as unreliable.
This patch changes the code to use a different register (%rcx) for the
checking and leaves %rbp untouched.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/process.c: In function 'cpu_idle_wait':
arch/x86/kernel/process.c:64: error: too many arguments to function 'smp_call_function'
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/amd_iommu.c: In function ‘amd_iommu_init_dma_ops':
arch/x86/kernel/amd_iommu.c:940: error: lvalue required as left operand of assignment
arch/x86/kernel/amd_iommu.c:941: error: lvalue required as left operand of assignment
due to !CONFIG_GART_IOMMU.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This converts x86, x86-64, and xen to use the new helpers for
smp_call_function() and friends, and adds support for
smp_call_function_single().
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds paravirt-ops hooks in pv_mmu_ops for ptep_modify_prot_start and
ptep_modify_prot_commit. This allows the hypervisor-specific backends to
implement these in some more efficient way.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch updates the kvm host code to use the pvclock structs
and functions, thereby making it compatible with Xen.
The patch also fixes an initialization bug: on SMP systems the
per-cpu has two different locations early at boot and after CPU
bringup. kvmclock must take that in account when registering the
physical address within the host.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds structs for the paravirt clocksource ABI
used by both xen and kvm (pvclock-abi.h).
It also adds some helper functions to read system time and
wall clock time from a paravirtual clocksource (pvclock.[ch]).
They are based on the xen code. They are enabled using
CONFIG_PARAVIRT_CLOCK.
Subsequent patches of this series will put the code in use.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
As suggested by Ingo, remove all references to tsc from init/calibrate.c
TSC is x86 specific, and using tsc in variable names in a generic file should
be avoided. lpj_tsc is now called lpj_fine, since it is related to fine tuning
of lpj value. Also tsc_rate_* is called timer_rate_*
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/tsc_32.c: In function ‘tsc_init':
arch/x86/kernel/tsc_32.c:421: error: ‘lpj_tsc' undeclared (first use in this function)
arch/x86/kernel/tsc_32.c:421: error: (Each undeclared identifier is reported only once
arch/x86/kernel/tsc_32.c:421: error: for each function it appears in.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On the x86 platform we can use the value of tsc_khz computed during tsc
calibration to calculate the loops_per_jiffy value. Its very important
to keep the error in lpj values to minimum as any error in that may
result in kernel panic in check_timer. In virtualization environment, On
a highly overloaded host the guest delay calibration may sometimes
result in errors beyond the ~50% that timer_irq_works can handle,
resulting in the guest panicking.
Does some formating changes to lpj_setup code to now have a single
printk to print the bogomips value.
We do this only for the boot processor because the AP's can have
different base frequencies or the BIOS might boot a AP at a different
frequency.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Record the address of the mcount call-site. Currently all archs except sparc64
record the address of the instruction following the mcount call-site. Some
general cleanups are entailed. Storing mcount addresses in rec->ip enables
looking them up in the kprobe hash table later on to check if they're kprobe'd.
Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: davem@davemloft.net
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Kevin Winchester reported a GART related direct rendering failure against
linux-next-20080611, which shows up via these log entries:
PCI: Using ACPI for IRQ routing
PCI: Cannot allocate resource region 0 of device 0000:00:00.0
agpgart: Detected AGP bridge 0
agpgart: Aperture conflicts with PCI mapping.
agpgart: Aperture from AGP @ e0000000 size 128 MB
agpgart: Aperture conflicts with PCI mapping.
agpgart: No usable aperture found.
agpgart: Consider rebooting with iommu=memaper=2 to get a good aperture.
instead of the expected:
PCI: Using ACPI for IRQ routing
agpgart: Detected AGP bridge 0
agpgart: Aperture from AGP @ e0000000 size 128 MB
Kevin bisected it down to this change in tip/x86/gart:
"x86: checking aperture size order".
agp check is using request_mem_region(), and could fail if e820 is reserved...
change it back to e820_any_mapped().
Reported-and-bisected-by: "Kevin Winchester" <kjwinchester@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
General Software writes their own VSA2 module for their version
of the Geode BIOS, which returns a different ID then the standard
VSA2. This was causing the framebuffer driver to break for most
GSW boards.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Cc: tglx@linutronix.de
Cc: linux-geode@lists.infradead.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for
i386 and prints a error message on failure.
The patch is still for 2.6.26 since it is only bug fixing. The unification
of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6"
(invalid opcode) and a kernel halt immediately after the
kernel has been uncompressed. The BUG shows EIP pointing
to an rdtsc instruction in native_read_tsc(), invoked from
native_sched_clock().
(This error occurs so early that not even the serial console
can capture it.)
A bisection showed that this bug first occurs in 2.6.26-rc3-git7,
via commit 9ccc906c97:
>x86: distangle user disabled TSC from unstable
>
>tsc_enabled is set to 0 from the command line switch "notsc" and from
>the mark_tsc_unstable code. Seperate those functionalities and replace
>tsc_enable with tsc_disable. This makes also the native_sched_clock()
>decision when to use TSC understandable.
>
>Preparatory patch to solve the sched_clock() issue on 32 bit.
>
>Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The core reason for this bug is that native_sched_clock() gets
called before tsc_init().
Before the commit above, tsc_32.c used a "tsc_enabled" variable
which defaulted to 0 == disabled, and which only got enabled late
in tsc_init(). Thus early calls to native_sched_clock() would skip
the TSC and use jiffies instead.
After the commit above, tsc_32.c uses a "tsc_disabled" variable
which defaults to 0, meaning that the TSC is Ok to use. Early calls
to native_sched_clock() now erroneously try to use the TSC on
!cpu_has_tsc processors, leading to invalid opcode exceptions.
My proposed fix is to initialise tsc_disabled to a "soft disabled"
state distinct from the hard disabled state set up by the "notsc"
kernel option. This fixes the native_sched_clock() problem. It also
allows tsc_init() to be simplified: instead of setting tsc_disabled = 1
on every error return, we just set tsc_disabled = 0 once when all
checks have succeeded.
I've verified that this lets my 486 boot again. I've also verified
that a Core2 machine still uses the TSC as clocksource after the patch.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Patrick McHardy reported a crash:
> > I get this oops once a day, its apparently triggered by something
> > run by cron, but the process is a different one each time.
> >
> > Kernel is -git from yesterday shortly before the -rc6 release
> > (last commit is the usb-2.6 merge, the x86 patches are missing),
> > .config is attached.
> >
> > I'll retry with current -git, but the patches that have gone in
> > since I last updated don't look related.
> >
> > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at
> > 000001ff
> > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118
> > [62060.043009] *pde = 00000000
> > [62060.043009] Oops: 0002 [#1] PREEMPT
Vegard Nossum analyzed it:
> This decodes to
>
> 0: 0f ae 00 fxsave (%eax)
>
> so it's related to the floating-point context. This is the exact
> location of the crash:
>
> $ addr2line -e arch/x86/kernel/process_32.o -i ab0
> include/asm/i387.h:232
> include/asm/i387.h:262
> arch/x86/kernel/process_32.c:595
>
> ...so it looks like prev_task->thread.xstate->fxsave has become NULL.
> Or maybe it never had any other value.
Somehow (as described below) TS_USEDFPU is set but the fpu is not
allocated or freed.
Another possible FPU pre-emption issue with the sleazy FPU optimization
which was benign before but not so anymore, with the dynamic FPU allocation
patch.
New task is getting exec'd and it is prempted at the below point.
flush_thread() {
...
/*
* Forget coprocessor state..
*/
clear_fpu(tsk);
<----- Preemption point
clear_used_math();
...
}
Now when it context switches in again, as the used_math() is still set
and fpu_counter can be > 5, we will do a math_state_restore() which sets
the task's TS_USEDFPU. After it continues from the above preemption point
it does clear_used_math() and much later free_thread_xstate().
Now, at the next context switch, it is quite possible that xstate is
null, used_math() is not set and TS_USEDFPU is still set. This will
trigger unlazy_fpu() causing kernel oops.
Fix this by clearing tsk's fpu_counter before clearing task's fpu.
Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's not explicitly marked as asmlinkage, but invoked from x86_32
startup code with parameters on stack.
No other architectures define early_printk and none of them are affected
by this change, since defines asmlinkage as empty token.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
attached is a no-brainer that makes kernel correctly report
NR_BANKS for MCE. We are right now limited to NR_BANKS==6, but the
error message will use the available number of banks instead of the
defined maximum.
For a Nehalem based system it will print:
"MCE: warning: using only 9 banks"
while the correct message would be
"MCE: warning: using only 6 banks"
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On x86, we do early PCI probing to apply some quirks for chipset bugs.
However, in a recent cleanup (7bcbc78dea) a
thinko was introduced that causes us to probe all subfunctions of even single
function devices (a function was factored out of an inner loop and a "break"
became a "return"). Fix that up by making check_dev_quirk() return a value so
we can keep the factored code intact.
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: fixup write combine comment in pci_mmap_resource
x86: PAT export resource_wc in pci sysfs
x86, pci-dma.c: don't always add __GFP_NORETRY to gfp
suspend-vs-iommu: prevent suspend if we could not resume
x86: pci-dma.c: use __GFP_NO_OOM instead of __GFP_NORETRY
pci, x86: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
PCI: use dev_to_node in pci_call_probe
PCI: Correct last two HP entries in the bfsort whitelist
Recently (around 2.6.25) I've noticed that RTC no longer works for me. It
turned out this is because I use pnpacpi=off kernel option to work around
the parport_pc bugs. I always did so, but RTC used to work fine in the
past, and now it have regressed.
The patch fixes the problem by creating the platform device for the RTC
when PNP is disabled. This may also help running the PNP-enabled kernel
on an older PCs.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Cc: David Brownell <david-b@pacbell.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ACPI PM: Add possibility to change suspend sequence
There are some systems out there that don't work correctly with
our current suspend/hibernation code ordering. Provide a workaround
for these systems allowing them to pass 'acpi_sleep=old_ordering' in
the kernel command line so that it will use the pre-ACPI 2.0 ("old")
suspend code ordering.
Unfortunately, this requires us to add a platform hook to the
resuming of devices for recovering the platform in case one of the
device drivers' .suspend() routines returns error code. Namely,
ACPI 1.0 specifies that _PTS should be called before suspending
devices, but _WAK still should be called before resuming them in
order to undo the changes made by _PTS. However, if there is an
error during suspending devices, they are automatically resumed
without returning control to the PM core, so the _WAK has to be
called from within device_resume() in that cases.
The patch also reorders and refactors the ACPI suspend/hibernation
code to avoid duplication as far as reasonably possible.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Alessandro Suardi reported:
> Recently upgraded my FC6 desktop to Fedora 9; with the
> latest nautilus RPM updates my VNC session went nuts
> with nautilus pegging the CPU for everything that breathed.
>
> I now reverted to an earlier nautilus package, but during
> the peak CPU period my kernel spat this:
>
> [314185.623294] ------------[ cut here ]------------
> [314185.623414] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()
> [314185.623514] Modules linked in: iptable_filter ip_tables x_tables
> sunrpc ipv6 fuse snd_via82xx snd_ac97_codec ac97_bus snd_mpu401_uart
> snd_rawmidi via686a hwmon parport_pc sg parport uhci_hcd ehci_hcd
> [314185.623924] Pid: 12314, comm: nautilus Not tainted 2.6.26-rc5-git2 #4
> [314185.624021] [<c0115b95>] warn_on_slowpath+0x41/0x7b
> [314185.624021] [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021] [<c0128396>] ? up_read+0x16/0x28
> [314185.624021] [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021] [<c012fa33>] ? __lock_acquire+0xbb4/0xbc3
> [314185.624021] [<c012d0a0>] check_flags+0x4c/0x128
> [314185.624021] [<c012fa73>] lock_acquire+0x31/0x7d
> [314185.624021] [<c0128cf6>] __atomic_notifier_call_chain+0x30/0x80
> [314185.624021] [<c0128cc6>] ? __atomic_notifier_call_chain+0x0/0x80
> [314185.624021] [<c0128d52>] atomic_notifier_call_chain+0xc/0xe
> [314185.624021] [<c0128d81>] notify_die+0x2d/0x2f
> [314185.624021] [<c01043b0>] do_int3+0x1f/0x4d
> [314185.624021] [<c02f2d3b>] int3+0x27/0x2c
> [314185.624021] =======================
> [314185.624021] ---[ end trace 1923f65a2d7bb246 ]---
> [314185.624021] possible reason: unannotated irqs-off.
> [314185.624021] irq event stamp: 488879
> [314185.624021] hardirqs last enabled at (488879): [<c0102d67>]
> restore_nocheck+0x12/0x15
> [314185.624021] hardirqs last disabled at (488878): [<c0102dca>]
> work_resched+0x19/0x30
> [314185.624021] softirqs last enabled at (488876): [<c011a1ba>]
> __do_softirq+0xa6/0xac
> [314185.624021] softirqs last disabled at (488865): [<c010476e>]
> do_softirq+0x57/0xa6
>
> I didn't seem to find it with some googling, so here it is.
>
> I was incidentally ltracing that process to try and find out
> what was gulping down that much CPU (sorry, no idea
> whether ltrace and the WARNING happened at the same
> time or which came first) and:
Yeah, this is extremely likely to be the source of the warning.
The warning should be harmless, however.
> Box is my trusty noname K7-800, 512MB RAM; if there's
> anything else useful I might be able to provide, just ask.
It would be interesting to see where the int3 comes from. Too bad,
lockdep doesn't provide the register dump. The stacktrace also doesn't
go further than the int3(), I wonder if this int3 came from userspace?
The ltrace readme says "software breakpoints, like gdb", so I guess
this is the case. Yep, seems like it.
This looks relevant:
| commit fb1dac909d
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date: Wed Jan 16 09:51:59 2008 +0100
|
| lockdep: more hardirq annotations for notify_die()
I'm attaching a similarly-looking patch for this case (DO_VM86_ERROR),
though I suspect it might be missing for the other cases
(DO_ERROR/DO_ERROR_INFO) as well.
Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 6e908947b4.
Németh Márton reported:
| there is a problem in 2.6.26-rc3 which was not there in case of
| 2.6.25: the CPU wakes up ~90,000 times per sec instead of ~60 per sec.
|
| I also "git bisected" the problem, the result is:
|
| 6e908947b4 is first bad commit
| commit 6e908947b4
| Author: Ingo Molnar <mingo@elte.hu>
| Date: Fri Mar 21 14:32:36 2008 +0100
|
| x86: fix ioapic bug again
the original problem is fixed by Maciej W. Rozycki in the tip/x86/apic
branch (confirmed by Márton), but those changes are too intrusive for
v2.6.26 so we'll go for the less intrusive (repeated) revert now.
Reported-and-bisected-by: Németh Márton <nm127@freemail.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Mon, May 19, 2008 at 04:10:02PM -0700, Linus Torvalds wrote:
> It also causes these warnings on 32-bit PAE:
>
> AS arch/x86/kernel/head_32.o
> arch/x86/kernel/head_32.S: Assembler messages:
> arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
> arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
>
> and I do not see why (the end result seems to be identical).
Fix head_32.S gcc bignum warnings when CONFIG_PAE=y.
arch/x86/kernel/head_32.S: Assembler messages:
arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
The assembler was stumbling over the 64-bit constant 0x100000000 in the
KPMDS #define.
Testing: a cmp(1) on head_32.o before and after shows the binary is unchanged.
Signed-off-by: Joe Korty <joe.korty@ccur.com
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: "Siddha Suresh B" <suresh.b.siddha@intel.com>
Cc: bugme-daemon@bugzilla.kernel.org
Cc: airlied@linux.ie
Cc: "Barnes Jesse" <jesse.barnes@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The checking 'if nmi_watchdog > 0' (ie NMI_NONE) is quite fast but it
has a side effect - it's taken even if nmi_watchdog = NMI_DISABLED.
Nowadays nmi_watchdog is set up to NMI_NONE by default so this condition
is properly taken most the time but we better show this explicitly.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If GART IOMMU is used on an AMD64 system, the northbridge registers
related to it should be restored during resume so that memory is not
corrupted. Make gart_resume() handle that as appropriate.
Ref. http://lkml.org/lkml/2008/5/25/96 and the following thread.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Starting with commit 8d4a430085 (x86:
cleanup PAT cpu validation) the PAT CPU feature flag is not cleared
anymore. Now the error message
"PAT enabled, but CPU feature cleared"
in pat_init() is misleading.
Furthermore the current code does not check for existence of the PAT
CPU feature flag if a CPU is whitelisted in validate_pat_support.
This patch clears pat_wc_enabled if boot CPU has no PAT feature flag
and adapts the paranoia check.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If PAT support is advertised it should just work. No errata known.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Allows us to dump PCI space before any kernel changes have been made.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce 'struct pm_ops' and 'struct pm_ext_ops' ('ext' meaning
'extended') representing suspend and hibernation operations for bus
types, device classes, device types and device drivers.
Modify the PM core to use 'struct pm_ops' and 'struct pm_ext_ops'
objects, if defined, instead of the ->suspend(), ->resume(),
->suspend_late(), and ->resume_early() callbacks (the old callbacks
will be considered as legacy and gradually phased out).
The main purpose of doing this is to separate suspend (aka S2RAM and
standby) callbacks from hibernation callbacks in such a way that the
new callbacks won't take arguments and the semantics of each of them
will be clearly specified. This has been requested for multiple
times by many people, including Linus himself, and the reason is that
within the current scheme if ->resume() is called, for example, it's
difficult to say why it's been called (ie. is it a resume from RAM or
from hibernation or a suspend/hibernation failure etc.?).
The second purpose is to make the suspend/hibernation callbacks more
flexible so that device drivers can handle more than they can within
the current scheme. For example, some drivers may need to prevent
new children of the device from being registered before their
->suspend() callbacks are executed or they may want to carry out some
operations requiring the availability of some other devices, not
directly bound via the parent-child relationship, in order to prepare
for the execution of ->suspend(), etc.
Ultimately, we'd like to stop using the freezing of tasks for suspend
and therefore the drivers' suspend/hibernation code will have to take
care of the handling of the user space during suspend/hibernation.
That, in turn, would be difficult within the current scheme, without
the new ->prepare() and ->complete() callbacks.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
more unification. Should cause no change in functionality.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cpuid(0x05) provides extended information about MWAIT in EDX when bit
0 of ECX is set. Bit 4-7 of EDX determine whether MWAIT is supported
for C1. C1E enabled CPUs have these bits set to 0.
Based on an earlier patch from Andi Kleen.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Simplify code: no need to do a cpuid(1) again. The cpuinfo structure
has all necessary information already.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rename the "MSR_K8_ENABLE_C1E" MSR to INT_PENDING_MSG, which is the
name in the data sheet as well. Move the C1E mask to the header file.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
default_idle is selected in cpu_idle(), when no other idle routine is
selected. Select it in select_idle_routine() when mwait is not
selected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently arch/x86/kernel/pci-dma.c always adds __GFP_NORETRY
to the allocation flags, because it wants to be reasonably
sure not to deadlock when calling alloc_pages().
But really that should only be done in two cases:
- when allocating memory in the lower 16 MB DMA zone.
If there's no free memory there, waiting or OOM killing is of no use
- when optimistically trying an allocation in the DMA32 zone
when dma_mask < DMA_32BIT_MASK hoping that the allocation
happens to fall within the limits of the dma_mask
Also blindly adding __GFP_NORETRY to the the gfp variable might
not be a good idea since we then also use it when calling
dma_ops->alloc_coherent(). Clearing it might also not be a
good idea, dma_alloc_coherent()'s caller might have set it
on purpose. The gfp variable should not be clobbered.
[ mingo@elte.hu: converted to delta patch ontop of previous version. ]
Signed-off-by: Miquel van Smoorenburg <miquels@cistron.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
need to call early_reserve_e820() to preallocate mptable for 32bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
... so it could fall back to normal numa and we'd reduce the impact of the
NUMAQ subarch.
NUMAQ depends on GENERICARCH
also decouple genericarch numa from acpi.
also make it fall back to bigsmp if apicid > 8.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a multi-socket test-system with 3 or 4 ioapics, when 4 dualcore cpus or
2 quadcore cpus installed, needs to switch to bigsmp or physflat.
CPU apic id is [4,11] instead of [0,7], and we need to check max apic
id instead of cpu numbers.
also add check for 32 bit when acpi is not compiled in or acpi=off.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/built-in.o: In function `proc_nmi_enabled':
: undefined reference to `nmi_watchdog_default'
arch/x86/kernel/built-in.o: In function `native_smp_prepare_cpus':
: undefined reference to `nmi_watchdog_default'
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch does check if CPU is being recongnized
before call the unreserve(). Since enable_lapic_nmi_watchdog()
does have such a check the same is make sense here too
in a sake of code consistency (but nothing more).
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: macro@linux-mips.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
64bit mode bootstrap code does set nmi_watchdog to NMI_NONE
by default and doing the same on 32bit mode is safe too.
Such an action saves us from several #ifdef.
Btw, my previous commit
commit 19ec673ced
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Wed May 28 23:00:47 2008 +0400
x86: nmi - fix incorrect NMI watchdog used by default
did not fix the problem completely, moreover it
introduced additional bug - nmi_watchdog would be
set to either NMI_LOCAL_APIC or NMI_IO_APIC
_regardless_ to boot option if being enabled thru
/proc/sys/kernel/nmi_watchdog. Sorry for that.
Fix it too.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: macro@linux-mips.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds linked list of struct setup_data supported for i386.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch extracts the common part of head32.c and head64.c into head.c.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch reserves the EFI memory map with reserve_early(). Because EFI
memory map is allocated by bootloader, if it is not reserved by
reserved_early(), it may be overwritten through address returned by
find_e820_area().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch makes early reserved highmem pages become reserved
pages. This can be used for highmem pages allocated by bootloader such
as EFI memory map, linked list of setup_data, etc.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch clean up reserve_early() family functions by extracting the
common part of reserve_early(), free_early() and bad_addr() into
find_overlapped_early().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If
fix == 0, aper_enabled == 1, gart_fix_e820 == 0
if (!fix && !aper_enabled)
return;
if (gart_fix_e820 && !fix && aper_enabled) {
if (e820_any_mapped(aper_base, aper_base + aper_size,
E820_RAM)) {
/* reserve it, so we can reuse it in second kernel */
printk(KERN_INFO "update e820 for GART\n");
add_memory_region(aper_base, aper_size, E820_RESERVED);
update_e820();
}
return;
}
/* different nodes have different setting, disable them all atfirst*/
we'll fall back here and disable all the settings, even when they were
all consistent.
What about this? (I hope it compiles...)
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Hi!
void __init early_gart_iommu_check(void)
contains
for (num = 24; num < 32; num++) {
if (!early_is_k8_nb(read_pci_config(0, num, 3, 0x00)))
continue;
loop, with very similar loop duplicated in
void __init gart_iommu_hole_init(void)
. First copy of a loop seems to be buggy, too. It uses 0 as a "nothing
set" value, which may actually bite us in last_aper_enabled case
(because it may be often zero).
(Beware, it is hard to test this patch, because this code has about
2^8 different code paths, depending on hardware and cmdline settings).
Plus, the second loop does not check for consistency of
aper_enabled. Should it?
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
should use right shift
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jürgen Mell reported an FPU state corruption bug under CONFIG_PREEMPT,
and bisected it to commit v2.6.19-1363-gacc2076, "i386: add sleazy FPU
optimization".
Add tsk_used_math() checks to prevent calling math_state_restore()
which can sleep in the case of !tsk_used_math(). This prevents
making a blocking call in __switch_to().
Apparently "fpu_counter > 5" check is not enough, as in some signal handling
and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.
It's a side effect though. This is the failing scenario:
process 'A' in save_i387_ia32() just after clear_used_math()
Got an interrupt and pre-empted out.
At the next context switch to process 'A' again, kernel tries to restore
the math state proactively and sees a fpu_counter > 0 and !tsk_used_math()
This results in init_fpu() during the __switch_to()'s math_state_restore()
And resulting in fpu corruption which will be saved/restored
(save_i387_fxsave and restore_i387_fxsave) during the remaining
part of the signal handling after the context switch.
Bisected-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume. Prevent system suspend in case we
would be unable to resume.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Tested-by: Patrick <ragamuffin@datacomm.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the math emulation that got broken with the recent lazy allocation of FPU
area. init_fpu() need to be added for the math-emulation path aswell
for the FPU area allocation.
math emulation enabled kernel booted fine with this, in the presence
of "no387 nofxsr" boot param.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: hpa@zytor.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
for http://bugzilla.kernel.org/show_bug.cgi?id=10613
BIOS bug, APIC version is 0 for CPU#0! fixing up to 0x10. (tell your hw vendor)
v2: fix 64 bit compilation
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Gabriel C <nix.or.die@googlemail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
this way 32-bit is more similar to 64-bit, and smarter e820 and numa.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It looks good to move bugs_64.c to cpu/bugs_64.c.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
on 64-bit we only get valid max_pfn_mapped after init_memory_mapping().
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
on 32-bit in head_32.S after initial page table is done, we get initial
max_pfn_mapped, and then kernel_physical_mapping_init will give us
a final one.
We need to use that to make sure find_e820_area will get valid addresses
for boot_map and for NODE_DATA(0) on numa32.
XEN PV and lguest may need to assign max_pfn_mapped too.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
make mptable to be consistent with acpi routing, so we could:
1. kexec kernel with acpi=off
2. work around BIOSes where acpi routing is working, but mptable is
not right, so can use kernel/kexec to start other OSes that don't have
good acpi support.
command line: update_mptable
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
we don't need to call memory_present that early.
numa and sparse will call memory_present later and might
even fail, it will call memory_present for the full range.
also for sparse it will call alloc_bootmem ... before we set up bootmem.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume. Prevent system suspend in case we
would be unable to resume.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Tested-by: Patrick <ragamuffin@datacomm.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Update the UV address macros to better describe the
fields of UV physical addresses. Improve comments
in the header files. Add additional MMR definitions.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Wed, 2008-05-28 at 04:47 +0200, Andi Kleen wrote:
> > So... why not just remove the setting of __GFP_NORETRY? Why is it
> > wrong to oom-kill things in this case?
>
> When the 16MB zone overflows (which can be common in some workloads)
> calling the OOM killer is pretty useless because it has barely any
> real user data [only exception would be the "only 16MB" case Alan
> mentioned]. Killing random processes in this case is bad.
>
> I think for 16MB __GFP_NORETRY is ok because there should be
> nothing freeable in there so looping is useless. Only exception would be the
> "only 16MB total" case again but I'm not sure 2.6 supports that at all
> on x86.
>
> On the other hand d_a_c() does more allocations than just 16MB, especially
> on 64bit and the other zones need different strategies.
Okay, so how about this then ?
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce IRQx_VECTOR on 32-bit, so that #ifdef noise is kept
down. There should be no object code change.
[ mingo@elte.hu: merged to x86/irq not x86/i8259 due to x86/irq having
restructured the vector code into asm-x86/irq_vectors.h, which this
patch touches. ]
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements PCI extended configuration space access for
AMD's Barcelona CPUs. It extends the method using CF8/CFC IO
addresses. An x86 capability bit has been introduced that is set for
CPUs supporting PCI extended config space accesses.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
on two node system (16g RAM) with numa config I got this crash:
get_memcfg_from_srat: assigning address to rsdp
RSD PTR v0 [ACPIAM]
ACPI: Too big length in RSDT: 92
failed to get NUMA memory information from SRAT table
NUMA - single node, flat memory mode
Node: 0, start_pfn: 0, end_pfn: 153
Setting physnode_map array to node 0 for pfns:
0
...
Pid: 0, comm: swapper Not tainted 2.6.26-rc4 #4
[<80b41289>] hlt_loop+0x0/0x3
[<8011efa0>] ? alloc_remap+0x50/0x70
[<8079e32e>] alloc_node_mem_map+0x5e/0xa0
[<8012e77b>] ? printk+0x1b/0x20
[<80b590f6>] free_area_init_node+0xc6/0x470
[<80b588fc>] ? __alloc_bootmem_node+0x2c/0x50
[<80b58ad8>] ? find_min_pfn_for_node+0x38/0x70
[<8012e77b>] ? printk+0x1b/0x20
[<80b597c4>] free_area_init_nodes+0x254/0x2d0
[<80b544d7>] zone_sizes_init+0x97/0xa0
[<80b48a03>] setup_arch+0x383/0x530
[<8012e77b>] ? printk+0x1b/0x20
[<80b41aa4>] start_kernel+0x64/0x350
[<80b412d8>] i386_start_kernel+0x8/0x10
=======================
this patch increases the acpi table limit to 32.
Also match early_ioremap() with early_iounmap().
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
reserve early numa kva, so it will not clash with new RAMDISK
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
introduce init_pg_table_start, so xen PV could specify the value.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Create a separate centaur_64.c file in the cpu/ dir for
the useful parts to live in.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Create a separate intel_64.c file in the cpu/ dir for
the useful parts to live in.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Create a separate amd_64.c file in the cpu/ dir for
the useful parts to live in.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
arch/x86/kernel/mmconf-fam10h_64.c is missing the prototypes, which
are decalred in arch/x86/kernel/setup_64.c. Move the prototypes and
the inline stubs to the appropriate header file.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The commit
commit 4b82b27770
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Sat May 24 19:36:35 2008 +0400
set nmi_watchdog to NMI_IO_APIC as by default. This causes hangs on some
machines with buggy watchdogs. Fix it - i.e. restore old behaviour.
Thanks to Sitsofe Wheeler and Adrian Bunk for catching the problem
and Maciej W. Rozycki for explanation what is going on there.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
CC: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some small cleanups for aperture_64.c; they should not really change
any code.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
remove extra -1 in reseve_early calling
panic if can not find space for new RAMDISK
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add pte_flags() to extract the flags from a pte. This is a special
case of pte_val() which is only guaranteed to return the pte's flags
correctly; the page number may be corrupted or missing.
The intent is to allow paravirt implementations to return pte flags
without having to do any translation of the page number (most notably,
Xen).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since cpu_online_map is touched (by for_each_online_cpu)
at moment when cpu_callin_map is already filled up we can
get rid of its checking at all
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
apic_write_around will be expanded to apic_write in 64bit mode
anyway. Only a few CPUs (well, old CPUs to be precise) requires
such an action. In general it should not hurt and could be cleaned
up for apic_write (just in case)
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
traps_32.c already holds these functions so do the same for traps_64.c
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make 64bit die_nmi() to produce the same message as 32bit mode has
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
By slightly changing 32bit mode die_nmi() we may unify the
interface and make it common for both (32/64bit) modes
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
UP builds with LOCAL_APIC=y and IO_APIC=n fail with a missing
reference to mp_bus_not_pci. Distangle the mpparse code some more and
move the ioapic specific bus check into a separate function.
This code needs sume urgent un#ifdef surgery all over the place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This pushes the lock a fair way down and the final kill looks like it
should be an easy project for someone who wants to have a shot at it.
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
and make e820_mark_nosave_regions to take limit_pfn to use max_low_pfn
for 32bit and end_pfn for 64bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sitsofe Wheeler reported boot problems on linux-next.
It looks like the same issue as found by Soeren Sandman in 7575217f656a93,
"x86: initialize all fields of mp_irqs[mp_irq_entries]".
But his fix is also not complete, as dstapic is used before it assigned.
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit "x86: make config_irqsrc not MPspec specific" introduced some uses
of uninitialized fields in mp_config_acpi_legacy_irqs(). I need the
following patch to get sched-devel/master to boot.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the new output is:
MPTABLE: OEM ID: SUN
MPTABLE: Product ID: 4600 M2
MPTABLE: APIC at: 0x
instead of it all in one line with <6> and double Product ID...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
use find_e820_area to find addess for new RAMDISK, instead of using ram blindly
also print out low ram and bootmap info
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add to the kernels boot memory map 'memmap' entries found in
the EFI memory descriptors passed in from the BIOS.
On EFI systems, up to E820MAX == 128 memory map entries can
be passed via the legacy E820 interface (limited by the size
of the 'zeropage'). These entries can be duplicated in the
EFI descriptors also passed from the BIOS, and possibly more
entries passed by the EFI interface, which does not have the
E820MAX limit on number of memory map entries.
This code doesn't worry about the likely duplicate, overlapping
or (unlikely) conflicting entries between the EFI map and the
E820 map. It just dumps all the EFI entries into the memmap[]
array (which already has the E820 entries) and lets the existing
routine sanitize_e820_map() sort the mess out.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Elaborate on the comment for sanitize_e820_map(), epxlaining more what
it does, what it inputs, and what it returns. Rearrange the placement of
this comment to fit kernel conventions, before the routine's code rather
than buried inside it.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The map size counter passed into, and back out of, sanitize_e820_map(),
was an eight bit type (char or u8), as derived from its origins in
legacy BIOS E820 structures. This patch changes that type to an 'int',
to allow this sanitize routine to also be used on larger maps (larger
than the 256 count that fits in a char). The legacy BIOS E820 interface
of course does not change; that remains at 8 bits for this count, holding
up to E820MAX == 128 entries. But the kernel internals can handle more
when those additional memory map entries are passed from the BIOS via
EFI interfaces.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Extend internal boot time memory tables to allow for up to
three entries per node, which may be larger than the 128 E820MAX
entries handled by the legacy BIOS E820 interface. The EFI
interface, if present, is capable of passing memory map
entries for these larger node counts.
This patch requires an earlier patch that rewrote code depending
on these array sizes from using E820MAX explicitly to size loops,
to instead using ARRAY_SIZE() of the applicable array.
Another patch following this one will provide the code to pick
up additional memory entries passed via the EFI interface from
the BIOS and insert them in the following, now enlarged, arrays.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is motivated by a subsequent patch which will allow for more
memory map entries on EFI supported systems than can be passed via the x86
legacy BIOS E820 interface. The legacy interface is limited to E820MAX ==
128 memory entries, and that "E820MAX" manifest constant was used as the
size for several arrays and loops over those arrays.
The primary change in this patch is to change code loop sizes over those
arrays from using the constant E820MAX, to using the ARRAY_SIZE() macro
evaluated for the array being looped. That way, a subsequent patch can
change the size of some of these arrays, without breaking this code.
This patch also adds a parameter to the sanitize_e820_map() routine,
which had an implicit size for the array passed it of E820MAX entries.
This new parameter explicitly passes the size of said array. Once again,
this will allow a subsequent patch to change that array size for some
calls to sanitize_e820_map() without breaking the code.
As part of enhancing the sanitize_e820_map() interface this way, I further
combined the unnecessarily distinct x86_32 and x86_64 declarations for
this routine into a single, commonly used, declaration.
This patch in itself should make no difference to the resulting kernel
binary.
[ mingo@elte.hu: merged to -tip ]
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Standardize a few pointer declarations to not have the
extra space after the '*' character.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
disable the noisy print out.
also use the one the less spare mtrr reg.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
there is a typo in the mask value, need to remove that extra 0,
to avoid 4bit clearing.
Signed-off-by: Yinghal Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
otherwise fixed MTRR for family 10h may not be changed.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Loop through mtrr chunk_size and gran_size from 1M to 2G to find out
the optimal value so user does not need to add mtrr_chunk_size and
mtrr_gran_size to the kernel command line.
If optimal value is not found, print out all list to help select less
optimal value.
Add mtrr_spare_reg_nr= so user could set 2 instead of 1, if the card
need more entries.
v2: find the one with more spare entries
v3: fix hole_basek offset
v4: tight the compare between range and range_new
loop stop with 4g
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Mika Fischer <mika.fischer@zoopnet.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
v9: address format change requests by Ingo
more case handling in range_to_var_with_hole
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
v2: process hole then end_pfn
fix update_memory_range with whole cover comparing
Signed-off-by: Yinghai Lu <yinghai.lu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
converting MTRR layout from continous to discrete, some time could run out of
MTRRs. So add gran_sizek to prevent that by dumpping small RAM piece less than
gran_sizek.
previous trimming only can handle highest_pfn from mtrr to end_pfn from e820.
when have more than 4g RAM installed, there will be holes below 4g. so need to
check ram below 4g is coverred well.
need to be applied after
[PATCH] x86: mtrr cleanup for converting continuous to discrete layout v7
Signed-off-by: Yinghai Lu <yinghai.lu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
some BIOS like to use continus MTRR layout, and X driver can not add
WB entries for graphical cards when 4g or more RAM installed.
the patch will change MTRR to discrete.
mtrr_chunk_size= could be used to have smaller continuous block to hold holes.
default is 256m, could be set according to size of graphics card memory.
mtrr_gran_size= could be used to send smallest mtrr block to avoid run out of MTRRs
v2: fix -1 for UC checking
v3: default to disable, and need use enable_mtrr_cleanup to enable this feature
skip the var state change warning.
remove next_basek in range_to_mtrr()
v4: correct warning mask.
v5: CONFIG_MTRR_SANITIZER
v6: fix 1g, 2g, 512 aligment with extra hole
v7: gran_sizek to prevent running out of MTRRs.
v8: fix hole_basek caculation caused when removing next_basek
gran_sizek using when basek is 0.
need to apply
[PATCH] x86: fix trimming e820 with MTRR holes.
right after this one.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The x86_64 code has centralized the memory setup code in
e820_64.c. This patch copies that approach to i386:
- early_param("mem", ...) parsing is moved from
setup_32.c to e820_32.c.
- setup_memory_map() and finish_e820_parsing() are
factored out from setup_arch(), and declarations
are added to e820_32.h.
- print_memory_map() is made static and removed from
e820_32.h.
- user_defined_memmap is marked as __initdata.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/cpu/mtrr/generic.c:216:12: warning: symbol 'lo' shadows an earlier one
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the not longer used handlers for reserved vectors.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Just moved trailing statements to the next line, removed space before
open/close parenthesis, wrapped long lines.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We should better use already defined flags from processor-flags.h instead
of defining own ones
[>>> object code check >>>]
original
md5sum: 9cfa6dbf045a046bb5dfb85f8bcfe8c4 arch/x86/kernel/head_64.o
text data bss dec hex filename
37361 4432 8192 49985 c341 arch/x86/kernel/head_64.o
patched
md5sum: 9cfa6dbf045a046bb5dfb85f8bcfe8c4 arch/x86/kernel/head_64.o
text data bss dec hex filename
37361 4432 8192 49985 c341 arch/x86/kernel/head_64.o
[<<< object code check <<<]
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch contains the following cleanups:
- make the following needlessly global code static:
- dma_alloc_pages()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/mmconf-fam10h_64.c is missing the prototypes, which
are decalred in arch/x86/kernel/setup_64.c. Move the prototypes and
the inline stubs to the appropriate header file.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
sparse mutters:
arch/x86/kernel/vsmp_64.c:126:5: warning: symbol 'is_vsmp_box' was not declared. Should it be static?
arch/x86/kernel/vsmp_64.c:145:13: warning: symbol 'vsmp_init' was not declared. Should it be static?
Include the appropriate headers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/tsc_64.c:245:13: warning: constant 0x100000000 is so big it is long
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LAPIC interrupts, which don't go through the generic interrupt handling
code, aren't accounted for in /proc/stat. Hence this patch adds a
mechanism architectures can use to accordingly adjust the statistics.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
.. allowing it to be write-protected just as other read-only data
under CONFIG_DEBUG_RODATA.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: prevent PGE flush from interruption/preemption
x86: use explicit copy in vdso_gettimeofday()
namespacecheck: automated fixes
x86/xen: fix arbitrary_virt_to_machine()
x86: don't read maxlvt before checking if APIC is mapped
x86: disable TSC for sched_clock() when calibration failed
x86: distangle user disabled TSC from unstable
x86: fix setup of cyc2ns in tsc_64.c
The leftovers of the i8259 unification have nothing to do with i8259
at all. They contain interrupt init code and the i8259_xx name is just
misleading now.
Rename them to irqinit_32/64.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove #ifdefs where the only difference is formatting of comments.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove #ifdefs around includes; including too much should be always
safe.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make conversion of i8259 very mechanical -- i8259 was generated by
diff -D, with too different parts left in i8259_32 and
i8259_64.c. Only "by hand" changes were removal of #ifdef from middle
of the comment (prevented compilation) and removal of one static to
allow splitting into files.
Of course, it will need some cleanups now, and those will follow.
Signed-of-by: Pavel Machek <pavel@suse.cz>
The leftovers of the i8259 unification have nothing to do with i8259
at all. They contain interrupt init code and the i8259_xx name is just
misleading now.
Rename them to initirq_32/64.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove #ifdefs where the only difference is formatting of comments.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove #ifdefs around includes; including too much should be always
safe.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch removes the Makefile turd and uses the nice CFLAGS_REMOVE macro
in the x86/kernel directory.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
David S. Miller noticed the following bug: the -pg instrumentation
function callback is named differently on each platform. On x86 it
is mcount, on sparc it is _mcount. So the export does not make sense
in kernel/trace/ftrace.c - move it to x86.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
text_poke is sleepable.
The original fix by Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The fault label to jump to on fault of updating the code was misplaced
preventing the fault from being recorded.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
disable the tracer while kexec pulls the rug from under the old
kernel.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch replaces the indirect call to the mcount function
pointer with a direct call that will be patched by the
dynamic ftrace routines.
On boot up, the mcount function calls the ftace_stub function.
When the dynamic ftrace code is initialized, the ftrace_stub
is replaced with a call to the ftrace_record_ip, which records
the instruction pointers of the locations that call it.
Later, the ftraced daemon will call kstop_machine and patch all
the locations to nops.
When a ftrace is enabled, the original calls to mcount will now
be set top call ftrace_caller, which will do a direct call
to the registered ftrace function. This direct call is also patched
when the function that should be called is updated.
All patching is performed by a kstop_machine routine to prevent any
type of race conditions that is associated with modifying code
on the fly.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch moves the memory management of the ftrace
records out of the arch code and into the generic code
making the arch code simpler.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch patches the call to mcount with nops instead
of a jmp over the mcount call.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add preempt off timings. A lot of kernel core code is taken from the RT patch
latency trace that was written by Ingo Molnar.
This adds "preemptoff" and "preemptirqsoff" to /debugfs/tracing/available_tracers
Now instead of just tracing irqs off, preemption off can be selected
to be recorded.
When this is selected, it shares the same files as irqs off timings.
One can either trace preemption off, irqs off, or one or the other off.
By echoing "preemptoff" into /debugfs/tracing/current_tracer, recording
of preempt off only is performed. "irqsoff" will only record the time
irqs are disabled, but "preemptirqsoff" will take the total time irqs
or preemption are disabled. Runtime switching of these options is now
supported by simpling echoing in the appropriate trace name into
/debugfs/tracing/current_tracer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
set to a non-zero value the ftrace routine will be called everytime
we enter a kernel function that is not marked with the "notrace"
attribute.
The ftrace routine will then call a registered function if a function
happens to be registered.
[ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
so don't blame Arnaldo for all of this ;-) ]
Update:
It is now possible to register more than one ftrace function.
If only one ftrace function is registered, that will be the
function that ftrace calls directly. If more than one function
is registered, then ftrace will call a function that will loop
through the functions to call.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add the notrace annotations to the vsyscall functions - there we are
not in kernel context yet, so the tracer function cannot (and must not)
be called.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit 2d474871e2fb092eb46a0930aba5442e10eb96cc
Author: Mike Travis <travis@sgi.com>
Date: Mon May 12 21:21:13 2008 +0200
A check for unmapped apic was added before reading maxlvt but the early
read of maxlvt wasn't removed.
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
When the TSC calibration fails then TSC is still used in
sched_clock(). Disable it completely in that case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
tsc_enabled is set to 0 from the command line switch "notsc" and from
the mark_tsc_unstable code. Seperate those functionalities and replace
tsc_enable with tsc_disable. This makes also the native_sched_clock()
decision when to use TSC understandable.
Preparatory patch to solve the sched_clock() issue on 32 bit.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When the TSC is calibrated against the PIT due to the nonavailability
of PMTIMER/HPET or due to SMI interference then the setup of the per
CPU cyc2ns variables is skipped. This is unlikely to happen but it
would definitely render sched_clock() unusable.
This was introduced with commit 53d517cdba
x86: scale cyc_2_nsec according to CPU frequency
Update the per CPU cyc2ns variables in all exit pathes of tsc_calibrate.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
Unconditionally enable PAT support on Centaur and Transmeta CPUs.
All known models that advertise PAT have no known errata.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] return to old errno choice in mkdir() et.al.
[Patch] fs/binfmt_elf.c: fix wrong return values
[PATCH] get rid of leak in compat_execve()
[Patch] fs/binfmt_elf.c: fix a wrong free
[PATCH] avoid multiplication overflows and signedness issues for max_fds
[PATCH] dup_fd() part 4 - race fix
[PATCH] dup_fd() - part 3
[PATCH] dup_fd() part 2
[PATCH] dup_fd() fixes, part 1
[PATCH] take init_files to fs/file.c
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Dave Jones <davej@redhat.com>
The longrun cpufreq module reports a false minimum frequency 3MHz on
300-600MHz Crusoe processor. This may be due to a calculation bug
in the module.
Original patch from Kaz Sasayama <kazssym@hypercore.co.jp>
submitted as http://bugs.debian.org/468149 patch ported to x86
Cc: Kaz Sasayama <kazssym@hypercore.co.jp>
Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: Dave Jones <davej@redhat.com>
The most common error with powernow-k8 is an ACPI _PSS error
caused either by failure to load the ACPI processor module
or a bad parse of the _PSS object. Make the error message
returned to the user in these situations more straightforward
and easier to understand.
-Mark Langsdorf
Operating System Research Center
AMD
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The previous revert of 0c07ee38c9 left
out the mwait disable condition for AMD family 10H/11H CPUs.
Andreas Herrman said:
It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
depend on a clock divisor and current Pstate of the core.
If all cores of a processor are in halt state (C1) the processor can
enter the C1E (C1 enhanced) state. If mwait is used this will never
happen.
Thus HLT saves more power than MWAIT here.
It might be best to switch off the mwait flag for these AMD CPU
families like it was introduced with commit
f039b75471 (x86: Don't use MWAIT on AMD
Family 10)
Re-add the AMD families 10H/11H check and disable the mwait usage for
those.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Vegard Nossum reports:
| powertop shows between 200-400 wakeups/second with the description
| "<kernel IPI>: Rescheduling interrupts" when all processors have load (e.g.
| I need to run two busy-loops on my 2-CPU system for this to show up).
|
| The bisect resulted in this commit:
|
| commit 0c07ee38c9
| Date: Wed Jan 30 13:33:16 2008 +0100
|
| x86: use the correct cpuid method to detect MWAIT support for C states
remove the functional effects of this patch and make mwait unconditional.
A future patch will turn off mwait on specific CPUs where that causes
power to be wasted.
Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The user_regset_view table for the 32-bit regsets on the 64-bit build had
the wrong sizes for the FP regsets. This bug had no user-visible effect
(just on kernel modules using the user_regset interfaces and the like).
But the fix is trivial and risk-free.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix this symbol export problem:
Building modules, stage 2.
MODPOST 193 modules
ERROR: "csum_partial" [fs/reiserfs/reiserfs.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
This is due to a known weakness of symbol exports: if a symbol's
only in-core user is an EXPORT_SYMBOL from a lib-y section, the
symbol is not linked in.
The solution is to move the export to x8664_ksyms_64.c - but the real
solution would be to fix kbuild.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/setup_64.c:954: warning: passing argument 2 of 'set_bit' from incompatible pointer type
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After resume on a 2cpu laptop, kernel builds collapse with a sed hang,
sh or make segfault (often on 20295564), real-time signal to cc1 etc.
Several hurdles to jump, but a manually-assisted bisect led to -rc1's
d2bcbad5f3 x86: do not zap_low_mappings
in __smp_prepare_cpus. Though the low mappings were removed at bootup,
they were left behind (with Global flags helping to keep them in TLB)
after resume or cpu online, causing the crashes seen.
Reinstate zap_low_mappings (with local __flush_tlb_all) for each cpu_up
on x86_32. This used to be serialized by smp_commenced_mask: that's now
gone, but a low_mappings flag will do. No need for native_smp_cpus_done
to repeat the zap: let mem_init zap BSP's low mappings just like on UP.
(In passing, fix error code from native_cpu_up: do_boot_cpu returns a
variety of diagnostic values, Dprintk what it says but convert to -EIO.
And save_pg_dir separately before zap_low_mappings: doesn't matter now,
but zapping twice in succession wiped out resume's swsusp_pg_dir.)
That worked well on the duo and one quad, but wouldn't boot 3rd or 4th
cpu on P4 Xeon, oopsing just after unlock_ipi_call_lock. The TLB flush
IPI now being sent reveals a long-standing bug: the booting cpu has its
APIC readied in smp_callin at the top of start_secondary, but isn't put
into the cpu_online_map until just before that unlock_ipi_call_lock.
So native_smp_call_function_mask to online cpus would send_IPI_allbutself,
including the cpu just coming up, though it has been excluded from the
count to wait for: by the time it handles the IPI, the call data on
native_smp_call_function_mask's stack may well have been overwritten.
So fall back to send_IPI_mask while cpu_online_map does not match
cpu_callout_map: perhaps there's a better APICological fix to be
made at the start_secondary end, but I wouldn't know that.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The latest rev of Intel doc AP-485 details a new cache
descriptor that we don't yet support.
A 6MB 24-way assoc L2 cache.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
use per_cpu for per CPU data.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cleanup gart handling on amd64 a bit: move common code into
enable_gart_translation , and use symbolic register names where
appropriate.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
we can use free_bootmem() directly.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1. use symbolic register names where appropriate.
2. num to bus or slot changing
3. handle for new opteron for bus other than 0
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
because we try to reserve dma32 early, so we have chance to get aperture
from 64M.
with some sequence aperture allocated from RAM, could become E820_RESERVED.
and then if doing a kexec with a big kernel that uncompressed size is above
64M we could have a range conflict with still using gart.
So allocate gart aperture from 512M instead.
Also change the fallback_aper_order to 5, because we don't have chance to get
2G or 4G aperture.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
some systems are using 32M for gart and agp when memory is less than 4G.
Kernel will reject and try to allcate another 64M that is not needed,
and we will waste 64M of perfectly good RAM.
this patch adds a workaround by checking aper_base/order between NB and
agp bridge. If they are the same, and memory size is less than 4G, it
will allow it.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
while looking at Rafael J. Wysocki's system boot log,
I found a funny printout:
Node 0: aperture @ de000000 size 32 MB
Aperture too small (32 MB)
AGP bridge at 00:04:00
Aperture from AGP @ de000000 size 4096 MB (APSIZE 0)
Aperture too small (0 MB)
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
...
agpgart: Detected AGP bridge 20
agpgart: Aperture pointing to RAM
agpgart: Aperture from AGP @ de000000 size 4096 MB
agpgart: Aperture too small (0 MB)
agpgart: No usable aperture found.
agpgart: Consider rebooting with iommu=memaper=2 to get a good aperture.
it means BIOS allocated the correct gart on the NB and AGP bridge, but
because a bug in the silicon (the agp bridge reports the wrong order,
it wants 4G instead) the kernel will reject that allocation.
Also, because the size is only 32MB, and we try to get another 64M for gart,
late fix_northbridge can not revert that change because it still reads
the wrong size from agp bridge.
So try to double check the order value from the agp bridge, before calling
aperture_valid().
[ mingo@elte.hu: 32-bit fix. ]
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move symbolic constants into gart.h, and use them instead of hardcoded
constant.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move the 4KSTACKS related code to one place. This allows to un#ifdef
do_IRQ() and share the executed on stack for the stack overflow printk
and the softirq call.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add KERN_WARNING to the printk as this could not be done in the
original patch, which allegedly only moves code around.
Un#ifdef do_IRQ.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Previously the reporting printk would run on the process stack, which risks
overflow an already low stack. Instead execute it on the interrupt stack.
This makes it more likely for the printk to make it actually out.
It adds one not taken test/branch more to the interrupt path when
stack overflow checking is enabled. We could avoid that by duplicating
more code, but that seemed not worth it.
Based on an observation by Eric Sandeen.
v2: Fix warnings in some configs
Signed-off-by: Andi Kleen <andi@firstfloor.org>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fixes the build error introduced by my FIRST_SYSTEM_VECTOR patch
Signed-off-by: Alan Mayer <ajm@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The SGI UV system needs several more system vectors than a vanilla
x86_64 system. Rather than burden the other archs with extra system
vectors that they don't use, change FIRST_SYSTEM_VECTOR to a variable,
so that it can be dynamic.
Signed-off-by: Alan Mayer <ajm@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The interrupt vector defines are copied 4 times around with minimal
differences. Move them all into asm-x86/irq_vectors.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eliminate the 6 bank restriction in 64 bit mce reporting code. This
restriction is artificial (due to static creation of sysfs files) and 32
bit code does not have any such restriction.
This change helps in reporting the details of machine checks on a
machine check exception with errors in bank 6 and above on CPUs that
support those banks. Without the patch, machine check errors in those
banks are not reported.
We still have 128 (MCE_EXTENDED_BANK) bank restriction instead of max
256 supported in hardware. That is not changed in the patch below as it
will have some user level mcelog utility dependency, with bank 128 being
used for thermal reporting currently.
The patch below does not create sysfs control (bankNctl) for banks
higher than 6 as well. That needs some pre-cleanup in /sysfs mce layout,
removal of per cpu /sysfs entries for bankctl as they are really global
system level control today. That change will follow. This basic change
is critical to report the detailed errors on banks higher than 6.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have a lot of HPET quirks available which might force enable HPET
even when the BIOS does not enable it. Some of those quirks depend on
the command line option "hpet=force".
Andrew pointed out that hoping that the user will find out about this
boot option is not really helpful.
Emit a kernel info which informs the user about the "hpet=force" boot
option when we enter a quirk which depends on this option and the user
did not provide it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add quirk to allow forced usage of HPET on ATI SB400.
I stumbled over machines where HPET is enabled but not reported
by BIOS. This patch configures the HPET base address and makes
it known to the OS.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
While reading through the HPET code I realized that the
computation of .mult variables could be done with less
lines of code, resulting in a 1.6% text size saving
for hpet.o
So I propose the following patch, which applies against
today's Linus -git tree.
>From 0c6507e400e9ca5f7f14331e18f8c12baf75a9d3 Mon Sep 17 00:00:00 2001
From: Carlos R. Mafra <crmafra@ift.unesp.br>
Date: Mon, 5 May 2008 19:38:53 -0300
The computation of clocksource_hpet.mult
tmp = (u64)hpet_period << HPET_SHIFT;
do_div(tmp, FSEC_PER_NSEC);
clocksource_hpet.mult = (u32)tmp;
can be streamlined if we note that it is equal to
clocksource_hpet.mult = div_sc(hpet_period, FSEC_PER_NSEC, HPET_SHIFT);
Furthermore, the computation of hpet_clockevent.mult
uint64_t hpet_freq;
hpet_freq = 1000000000000000ULL;
do_div(hpet_freq, hpet_period);
hpet_clockevent.mult = div_sc((unsigned long) hpet_freq,
NSEC_PER_SEC, hpet_clockevent.shift);
can also be streamlined with the observation that hpet_period and hpet_freq are
inverse to each other (in proper units).
So instead of computing hpet_freq and using (schematically)
div_sc(hpet_freq, 10^9, shift) we use the trick of calling with the
arguments in reverse order, div_sc(10^6, hpet_period, shift).
The different power of ten is due to frequency being in Hertz (1/sec)
and the period being in units of femtosecond. Explicitly,
mult = (hpet_freq * 2^shift)/10^9 (before)
mult = (10^6 * 2^shift)/hpet_period (after)
because hpet_freq = 10^15/hpet_period.
The comments in the code are also updated to reflect the changes.
As a result,
text data bss dec hex filename
2957 425 92 3474 d92 arch/x86/kernel/hpet.o
3006 425 92 3523 dc3 arch/x86/kernel/hpet.o.old
a 1.6% reduction in text size.
Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Polish the ds.h interface and add support for PEBS.
Ds.c is meant to be the resource allocator for per-thread and per-cpu
BTS and PEBS recording.
It is used by ptrace/utrace to provide execution tracing of debugged tasks.
It will be used by profilers (e.g. perfmon2).
It may be used by kernel debuggers to provide a kernel execution trace.
Changes in detail:
- guard DS and ptrace by CONFIG macros
- separate DS and BTS more clearly
- simplify field accesses
- add functions to manage PEBS buffers
- add simple protection/allocation mechanism
- added support for Atom
Opens:
- buffer overflow handling
Currently, only circular buffers are supported. This is all we need
for debugging. Profilers would want an overflow notification.
This is planned to be added when perfmon2 is made to use the ds.h
interface.
- utrace intermediate layer
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To allow linker to catch sections overlapping we have to declare
them in appropriate order.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The phys_cpu_present_map is an expected symbol in the SMP harness.
Unfortunately, x86 recently moved this and a few others to
kernel/setup.c where it doesn't quite work because voyager has to
define its own. Use CONFIG_X86_LOCAL_APIC to isolate these
definitions and fix up another area in setup.c where CONFIG_X86_SMP
should be used instead of CONFIG_SMP.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: toralf.foerster@gmx.de
Cc: Mike Travis <travis@sgi.com>
Cc: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rene Herman reported:
> commit 8779f2fc3b
>
> "x86: don't try to allocate from DMA zone at first"
>
> breaks all of ISA DMA. Or all of ALSA ISA DMA at least. All
> ISA soundcards are silent following that commit -- no error
> messages, everything appears fine, just silence.
That patch is buggy. We had an implicit assumption that
dev = NULL for ISA devices that require 24bit DMA.
The recent work on x86 dma_alloc_coherent() breaks the ISA DMA buffer
allocation, which is represented by "dev = NULL" and requires 24bit
DMA implicitly.
Bisected-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86: rdc: leds build/config fix
x86: sysfs cpu?/topology is empty in 2.6.25 (32-bit Intel system)
x86: revert commit 709f744 ("x86: bitops asm constraint fixes")
x86: restrict keyboard io ports reservation to make ipmi driver work
x86: fix fpu restore from sig return
x86: remove spew print out about bus to node mapping
x86: revert printk format warning change which is for linux-next
x86: cleanup PAT cpu validation
x86: geode: define geode_has_vsa2() even if CONFIG_MGEODE_LX is not set
x86: GEODE: cache results from geode_has_vsa2() and uninline
x86: revert geode config dependency
On some of our (single board computer) boards (x86) we are using an
IPMI controller that uses I/O ports 0x62 and 0x66 for a KCS (keyboard
controller style) IPMI system interface.
Trying to load the openipmi driver fails, because the ports
(0x62/0x66) are reserved for keyboard. keyboard reserves the full
range 0x60-0x6F while it doesn't need to.
Reserve only ports 0x60 and 0x64 for the legacy PS/2 i8042 keyboad
controller instead of 0x60-0x6F to allow the openipmi driver to work.
[ tglx: added 64bit fixup ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If the task never used fpu, initialize the fpu before restoring the FP
state from the signal handler context. This will allocate the fpu
state, if the task never needed it before.
Reported-and-bisected-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Frederik Deweerdt <deweerdt@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move the scattered checks for PAT support to a single function. Its
moved to addon_cpuid_features.c as this file is shared between 32 and
64 bit.
Remove the manipulation of the PAT feature bit and just disable PAT in
the PAT layer, based on the PAT bit provided by the CPU and the
current CPU version/model white list.
Change the boot CPU check so it works on Voyager somewhere in the
future as well :) Also panic, when a secondary has PAT disabled but
the primary one has alrady switched to PAT. We have no way to undo
that.
The white list is kept for now to ensure that we can rely on known to
work CPU types and concentrate on the software induced problems
instead of fighthing CPU erratas and subtle wreckage caused by not yet
verified CPUs. Once the PAT code has stabilized enough, we can remove
the white list and open the can of worms.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This moves geode_has_vsa2 into a .c file, caches the result we get from
the VSA virtual registers, and causes the function to no longer be inline.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix pcspkr dependancies: make the pcspkr platform
drivers to depend on a platform device, and
not the other way around.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
CC: Vojtech Pavlik <vojtech@suse.cz>
CC: Michael Opdenacker <michael-lists@free-electrons.com>
[fixed for 2.6.26-rc1 by tiwai]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Fix x86 setup printk format warming:
next-20080430/arch/x86/kernel/setup.c:172: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'ssize_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
http://bugzilla.kernel.org/show_bug.cgi?id=10547
Newer Dell OptiPlex 745s hang before rebooting after 'sudo reboot'.
A patch for some versions of the OptiPlex was proposed here --
http://lkml.org/lkml/2007/6/5/59 -- and is included in 2.6.23 and
later kernels, according to
http://lxr.linux.no/linux+v2.6.23/arch/i386/kernel/reboot.c . However,
the DMI_BOARD_NAME ("0WF810") is too restrictive. Newer OptiPlex
machines have a DMI_BOARD_NAME of "0RF703". I therefore suggest
adding another clause to reboot.c, similar to the one in the original
patch, but matching a DMI_BOARD_NAME of "0RF703".
On further inspection, it seems that there are other DMI_BOARD_NAMEs
for this same machine. They seem to change from time to time, which
means that the current code is fragile. Moreover, using bios reboot
should not break non-SFF OptiPlex 745s, and so a reasonable fix is to
simply drop the match on DMI_BOARD_NAME.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch makes the needlessly global additional_cpus static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In kernel/acpi/realmode/Makefile use the 'always'
variable to say that wakeup.bin should always
be made.
In acpi/Makefile we then do not need to specify the
requested target and we avoid the message from make:
`arch/x86/kernel/acpi/realmode/wakeup.bin' is up to date.
Add wakeup.lds to list af targets to avoid rebuilding
wakeup.bin - from Roland McGrath.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@suse.cz>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since the pv_apic_ops are only present if CONFIG_X86_LOCAL_APIC is compiled
in, kvmclock failed to build without this option. This patch fixes this.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation. This removes almost 250 lines of duplicated
code.
It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)
I still haven't changed the cris version even though Linus says the BKL
isn't needed. The arch maintainer can easily do it if there are really
no obstacles.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't warn in read_apic_id() when preemptible but only one CPU online.
Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The .asciz directive takes any number of strings, but each one is zero-
terminated, and string pasting is not done as in C. That results in only the
first line being output.
Replace .asciz with multiple .ascii directives and terminate with .asciz.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The iommu_sac_force variable is needlessly defined global,
and this patch makes it static. Additionally, this variable
needs not be explicitly initialized.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes one sparse warning by including the appropriate
header for the reboot_force symbol.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the 'reboot_force' flag is a notion that non-PC subarchitectures do
not have.
also, unify the X86_BIOS_REBOOT option between 32-bit and 64-bit
and get rid of a few unnecessary Kconfig and Makefile complications
that way.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Al Viro pointed out that there's a missing readl() of timer->hpet_config,
found by Sparse.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (179 commits)
ACPI: Fix acpi_processor_idle and idle= boot parameters interaction
acpi: fix section mismatch warning in pnpacpi
intel_menlo: fix build warning
ACPI: Cleanup: Remove unneeded, multiple local dummy variables
ACPI: video - fix permissions on some proc entries
ACPI: video - properly handle errors when registering proc elements
ACPI: video - do not store invalid entries in attached_array list
ACPI: re-name acpi_pm_ops to acpi_suspend_ops
ACER_WMI/ASUS_LAPTOP: fix build bug
thinkpad_acpi: fix possible NULL pointer dereference if kstrdup failed
ACPI: check a return value correctly in acpi_power_get_context()
#if 0 acpi/bay.c:eject_removable_drive()
eeepc-laptop: add hwmon fan control
eeepc-laptop: add backlight
eeepc-laptop: add base driver
ACPI: thinkpad-acpi: bump up version to 0.20
ACPI: thinkpad-acpi: fix selects in Kconfig
ACPI: thinkpad-acpi: use a private workqueue
ACPI: thinkpad-acpi: fluff really minor fix
ACPI: thinkpad-acpi: use uppercase for "LED" on user documentation
...
Fixed conflicts in drivers/acpi/video.c and drivers/misc/intel_menlow.c
manually.
Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define our own
set_restore_sigmask() function. This saves the costly SMP-safe set_bit
operation, which we do not need for the sigmask flag since TIF_SIGPENDING
always has to be set too.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci:
x86: add pci=check_enable_amd_mmconf and dmi check
x86: work around io allocation overlap of HT links
acpi: get boot_cpu_id as early for k8_scan_nodes
x86_64: don't need set default res if only have one root bus
x86: double check the multi root bus with fam10h mmconf
x86: multi pci root bus with different io resource range, on 64-bit
x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit
x86: get mp_bus_to_node early
x86 pci: remove checking type for mmconfig probe
x86: remove unneeded check in mmconf reject
driver core: try parent numa_node at first before using default
x86: seperate mmconf for fam10h out from setup_64.c
x86: if acpi=off, force setting the mmconf for fam10h
x86_64: check MSR to get MMCONFIG for AMD Family 10h
x86_64: check and enable MMCONFIG for AMD Family 10h
x86_64: set cfg_size for AMD Family 10h in case MMCONFIG
x86: mmconf enable mcfg early
x86: clear pci_mmcfg_virt when mmcfg get rejected
x86: validate against acpi motherboard resources
Fixed up fairly trivial conflicts in arch/x86/pci/{init.c,pci.h} due to
OLPC support manually.
* git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] state info wrong after resume
[CPUFREQ] allow use of the powersave governor as the default one
[CPUFREQ] document the currently undocumented parts of the sysfs interface
[CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism
Drop the macro definitions in asm-offsets_*.c and use kbuild.h
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove proc_root export. Creation and removal works well if parent PDE is
supplied as NULL -- it worked always that way.
So, one useless export removed and consistency added, some drivers created
PDEs with &proc_root as parent but removed them as NULL and so on.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support for OLPC XO hardware. Open Firmware on XOs don't contain
the VSA, so it is necessary to emulate the PCI BARs in the kernel. This also
adds functionality for running EC commands, and a CONFIG_OLPC.
A number of OLPC drivers depend upon CONFIG_OLPC.
olpc_ec_timeout is a hack to work around Embedded Controller bugs.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: geode_has_vsa build fix]
[akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper extern for late_time_init in include/linux/init.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for __do_softirq() in include/linux/interrupt.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function detect_vsmp_box is a void function in the PCI case.
Change the !PCI stub to void too.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
As written, this can never be true.
Spotted by the Sparse checker.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The 64-bit vDSO image is in a special ".vdso" section for no reason
I can determine. Furthermore, the location of the vdso_end symbol
includes some wrongly-calculated padding space in the image, which
is then (correctly) rounded to page size, resulting in an extra page
of zeros in the image mapped in to user processes.
This changes it to put the vdso.so image into normal initdata as we
have always done for the 32-bit vDSO images. The extra padding is
gone, so the user VMA is one page instead of two. The image that
was already copied around at boot time is now in initdata, so we
recover that wasted space after boot.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, affected_cpus shows which CPUs need to have their frequency
coordinated in software. When hardware coordination is in use, the contents
of this file appear the same as when no coordination is required. This can
lead to some confusion among user-space programs, for example, that do not
know that extra coordination is required to force a CPU core to a particular
speed to control power consumption.
To fix this, create a "related_cpus" attribute that always displays the
coordination map regardless of whatever coordination strategy the cpufreq
driver uses (sw or hw). If the cpufreq driver does not provide a value, fall
back to policy->cpus.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
We checked the hardware freq with OS cached freq value in get_cur_freqon_cpu().
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
This bug was introduced in the 2.6.24 i386/x86_64 tree merge, where
MSI-X vector allocation will eventually fail. The cause is the new
bit array tracking used vectors is not getting cleared properly on
IRQ destruction on the 32-bit APIC code.
This can be seen easily using the ixgbe 10 GbE driver on multi-core
systems by simply loading and unloading the driver a few times.
Depending on the number of available vectors on the host system, the
MSI-X allocation will eventually fail, and the driver will only be
able to use legacy interrupts.
I am generating the same patch for both stable trees for 2.6.24 and
2.6.25.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This cleans up a few MSR-using drivers in the following manner:
- Ensures MSRs are all defined in asm/geode.h, rather than in misc
places
- Makes the naming consistent; cs553[56] ones begin with MSR_,
GX-specific ones start with MSR_GX_, and LX-specific ones start
with MSR_LX_. Also, make the names match the data sheet.
- Use MSR names rather than numbers in source code
- Document the fact that the LX's MSR_PADSEL has the wrong value
in the data sheet. That's, uh, good to note.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits)
KVM: kill file->f_count abuse in kvm
KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
KVM: SVM: remove selective CR0 comment
KVM: SVM: remove now obsolete FIXME comment
KVM: SVM: disable CR8 intercept when tpr is not masking interrupts
KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
KVM: export kvm_lapic_set_tpr() to modules
KVM: SVM: sync TPR value to V_TPR field in the VMCB
KVM: ppc: PowerPC 440 KVM implementation
KVM: Add MAINTAINERS entry for PowerPC KVM
KVM: ppc: Add DCR access information to struct kvm_run
ppc: Export tlb_44x_hwater for KVM
KVM: Rename debugfs_dir to kvm_debugfs_dir
KVM: x86 emulator: fix lea to really get the effective address
KVM: x86 emulator: fix smsw and lmsw with a memory operand
KVM: x86 emulator: initialize src.val and dst.val for register operands
KVM: SVM: force a new asid when initializing the vmcb
KVM: fix kvm_vcpu_kick vs __vcpu_run race
KVM: add ioctls to save/store mpstate
KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
...
This patch writes 0 (actually, what really matters is that the
LSB is cleared) to the system time msr before shutting down
the machine for kexec.
Without it, we can have a random memory location being written
when the guest comes back
It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c)
and crash_shutdown, used in the path of crash_kexec() (kexec.c)
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.
Don't report the feature if two dimensional paging is enabled.
[avi:
- guest/host split
- fix 32-bit truncation issues
- adjust to mmu_op
- adjust to ->release_*() renamed
- add ->release_pud()]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.
We do use it, however, for fine-grained adjustment.
Other than that, time comes from the host.
[randy dunlap: add missing include]
[randy dunlap: disallow on Voyager or Visual WS]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
OK, so 25-mm1 gave a lockdep error which made me look into this.
The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561
The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.
So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:
idle routines are entered with IRQs disabled
idle routines will exit with IRQs enabled
Nearly all already did this in one form or another.
Merge the 32 and 64 bit bits so they no longer have different bugs.
As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so will disable that feature by default, and only enable that via
pci=check_enable_amd_mmconf or for system match with dmi table.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mingo@elte.hu: split from "x86_64: get boot_cpu_id as early for k8_scan_nodes]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
some BIOS only let AMD fam 10h handle bus0, and nvidia mcp55/ck804
to handle other buses. at that case MCFG will cover all over them.
but with acpi=off, we can not use MCFG. this patch will double check
the busnbits, and if it is less handling 256 bues, and acpi=off
will forcely reset the mmconf in msr, so we still use mmconf in above case.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
So we can use MMCONF when MMCONF is not set by BIOS
using TOP_MEM2 msr to get memory top, and try to scan fam10h mmio routing to
make sure the range is not conflicted with some prefetch MMIO that is above 4G.
(current only LinuxBIOS assign 64 bit mmio above 4G for some co-processor)
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
typical case: four sockets system, every node has 4g ram, and we are using:
memmap=10g$4g
to mask out memory on node1 and node2
when numa is enabled, early_node_mem is used to get node_data and node_bootmap.
if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.
so check it and print out some info about it.
need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.
depends on "mm: make reserve_bootmem can crossed the nodes".
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Export linked list of struct setup_data via debugfs.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add free_early to early reservation mechanism - this way early bootup
failure paths can stop wasting memory.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes section mismatch warnings in unlock_ExtINT_logic().
WARNING: arch/x86/kernel/built-in.o(.text+0x14a92): Section mismatch in reference from the function unlock_ExtINT_logic()
to the function .init.text:find_isa_irq_pin()
The function unlock_ExtINT_logic() references
the function __init find_isa_irq_pin().
This is often because unlock_ExtINT_logic lacks a __init
annotation or the annotation of find_isa_irq_pin is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x12cc9): Section mismatch in reference from the function unlock_ExtINT_logic()
unlock_ExtINT_logic() is only used by __init check_timer(). Annotate unlock_ExtINT_logic() witch __init.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix folowing warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x10799): Section mismatch in reference from the function uniq_ioapic_id()
uniq_ioapic_id() is only used by __init mp_register_ioapic(). Annotate uniq_ioapic_id() with __init.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes section mismatch warnings of __cpuinit
setup_trampoline() on 32-bit host.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
get_bios_ebda() exists in asm/rio.h and asm/bios_ebda.h.
This patch removes the one in asm/rio.h.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the magic number in the third argment of div_sc().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the magic number in the second argument of clocksource_hz2mult()
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
memset and NULL check after alloc_bootmem() are unnecessary.
Because it returns zeroed memory and it never return NULL.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use bitmap library for pin_programmed rather than reinvent
bitmaps.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove duplicate code by using MP_intsrc_info() in mpparse.c
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use BUILD_BUG_ON() instead of compile-time error technique with
extern non-exsistent function.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that there are no more special cases in sys32_ptrace, we
can convert to using the generic compat_sys_ptrace entry point.
The sys32_ptrace function gets simpler and becomes compat_arch_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This removes the special-case handling for PTRACE_GETSIGINFO
and PTRACE_SETSIGINFO from x86_64's sys32_ptrace. The generic
compat_ptrace_request code handles these.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This lifts the set_fs(USER_DS) call for signal handler setup out of the
three places copying the same code into the one place that calls them
all. There is no change in what it does.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This lifts the code diddling the TF and DF bits for signal handler setup
out of the several places copying the same code into the one place that
calls them all. There is no change in what it does.
I also separated the recently-added DF bit clearing from the TF diddling.
The compiler turns them back into one instruction anyway. The tossing
in of DF to the same line of code with no new comments was a bit more
arcane than seems wise.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is claimed that NexGen CPUs were never shipped:
http://lkml.org/lkml/2008/4/20/179
Also, the kernel support for these chips has been broken for
a long time, the code intended to support NexGen thereby being
essentially dead.
As an outcome of the discussion that can be found using the URL
above, this patch removes the NexGen support altogether.
The changes in this patch survived a defconfig build for i386, a
couple of successful randconfig builds, as well as a runtime test,
which consisted in booting a 32-bit x86 box up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In arch/x86/kernel/setup_64.c, the standard_io_resources array
is needlessly defined as global. This patch makes this variable
static.
This patch was successfully build-tested using the defconfig
for x86_64. Runtime test was performed by booting a 64-bit x86
box up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are no users for the function amd_init_cpu() defined in
arch/x86/kernel/cpu/amd.c. This patch removes this routine.
This patch was build-tested using defconfigs for i386 and x86_64,
and a few randconfig instances. Runtime tests were performed by
booting 32- and 64-bit x86 boxen up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At least on my Barcelona, I see MCE log entries after cold boot caused
by BIOS not properly clearing the respective registers. Therefore, this
patch extends the workaround to families 0x10 and 0x11 (the latter just
for completeness, I have nothing to verify this against).
At the same time, provide a way to make these entries visible via the
'mce=bootlog' command line option even on these machines.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
.. since it uses ILL_BADSTK (which is meaningless in the context of
SIGSEGV).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There apparently was an unnoticed conflict between an earlier patch to
this file and mine (d1e084746b), which
I noticed only now. I suppose a change like the one below (untested) is
needed; I didn't get any response on a confirmation request for this from
the submitter of the first patch.
The issue is the writing of the 'checkbit' member at the end of
setup_intel_arch_watchdog(), which my patch made go to intel_arch_wd_ops
rather than wd_ops.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Two prior changes resulted in the "ecx" clobber being lost.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next: (52 commits)
xen: add balloon driver
xen: allow compilation with non-flat memory
xen: fold xen_sysexit into xen_iret
xen: allow set_pte_at on init_mm to be lockless
xen: disable preemption during tlb flush
xen pvfb: Para-virtual framebuffer, keyboard and pointer driver
xen: Add compatibility aliases for frontend drivers
xen: Module autoprobing support for frontend drivers
xen blkfront: Delay wait for block devices until after the disk is added
xen/blkfront: use bdget_disk
xen: Make xen-blkfront write its protocol ABI to xenstore
xen: import arch generic part of xencomm
xen: make grant table arch portable
xen: replace callers of alloc_vm_area()/free_vm_area() with xen_ prefixed one
xen: make include/xen/page.h portable moving those definitions under asm dir
xen: add resend_irq_on_evtchn() definition into events.c
Xen: make events.c portable for ia64/xen support
xen: move events.c to drivers/xen for IA64/Xen support
xen: move features.c from arch/x86/xen/features.c to drivers/xen
xen: add missing definitions in include/xen/interface/vcpu.h which ia64/xen needs
...
Add some autogenerated files to various .gitignore files
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up the codepath, remove alignment restrictions and do sanity
checking of the end result, to make sure we patched the right site.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel_text_address returns true even for modules which is not wanted
in text_poke. Use core_kernel_text instead.
This is a regression introduced in e587cadd8f
which caused occasionaly crashes after suspend/resume.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: H. Peter Anvin <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
xen_sysexit and xen_iret were doing essentially the same thing. Rather
than having a separate implementation for xen_sysexit, we can just strip
the stack back to an iret frame and jump into xen_iret. This removes
a lot of code and complexity - specifically, another critical region.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use jmp rather than call for the iret fixup, so its consistent with
the sysexit fixup, and it simplifies the stack (which is already
complex).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
64-bit Xen supports sysenter for 32-bit guests, so support its
use. (sysenter is faster than int $0x80 in 32-on-64.)
sysexit is still not supported, so we fake it up using iret.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make KERNEL_PGD_PTRS common, as previously it was only being defined
for 32-bit.
There are a couple of follow-on changes from this:
- KERNEL_PGD_PTRS was being defined in terms of USER_PGD_PTRS. The
definition of USER_PGD_PTRS doesn't really make much sense on x86-64,
since it can have two different user address-space configurations.
I renamed USER_PGD_PTRS to KERNEL_PGD_BOUNDARY, which is meaningful
for all of 32/32, 32/64 and 64/64 process configurations.
- USER_PTRS_PER_PGD was also defined and was being used for similar
purposes. Converting its users to KERNEL_PGD_BOUNDARY left it
completely unused, and so I removed it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.
[ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
SCSI: convert struct class_device to struct device
DRM: remove unused dev_class
IB: rename "dev" to "srp_dev" in srp_host structure
IB: convert struct class_device to struct device
memstick: convert struct class_device to struct device
driver core: replace remaining __FUNCTION__ occurrences
sysfs: refill attribute buffer when reading from offset 0
PM: Remove destroy_suspended_device()
Firmware: add iSCSI iBFT Support
PM: Remove legacy PM (fix)
Kobject: Replace list_for_each() with list_for_each_entry().
SYSFS: Explicitly include required header file slab.h.
Driver core: make device_is_registered() work for class devices
PM: Convert wakeup flag accessors to inline functions
PM: Make wakeup flags available whenever CONFIG_PM is set
PM: Fix misuse of wakeup flag accessors in serial core
Driver core: Call device_pm_add() after bus_add_device() in device_add()
PM: Handle device registrations during suspend/resume
block: send disk "change" event for rescan_partitions()
sysdev: detect multiple driver registrations
...
Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
sched: build fix
sched: better rt-group documentation
sched: features fix
sched: /debug/sched_features
sched: add SCHED_FEAT_DEADLINE
sched: debug: show a weight tree
sched: fair: weight calculations
sched: fair-group: de-couple load-balancing from the rb-trees
sched: fair-group scheduling vs latency
sched: rt-group: optimize dequeue_rt_stack
sched: debug: add some debug code to handle the full hierarchy
sched: fair-group: SMP-nice for group scheduling
sched, cpuset: customize sched domains, core
sched, cpuset: customize sched domains, docs
sched: prepatory code movement
sched: rt: multi level group constraints
sched: task_group hierarchy
sched: fix the task_group hierarchy for UID grouping
sched: allow the group scheduler to have multiple levels
sched: mix tasks and groups
...
This isn't needed, we can just walk the devices in bus order with no
problems at all, as we really want to remove pci_get_device_reverse from
the kernel tree.
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After 2.6.24 there was a plan to make the PM core acquire all device
semaphores during a suspend/hibernation to protect itself from
concurrent operations involving device objects. That proved to be
too heavy-handed and we found a better way to achieve the goal, but
before it happened, we had introduced the functions
device_pm_schedule_removal() and destroy_suspended_device() to allow
drivers to "safely" destroy a suspended device and we had adapted some
drivers to use them. Now that these functions are no longer necessary,
it seems reasonable to remove them and modify their users to use the
normal device unregistration instead.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add /sysfs/firmware/ibft/[initiator|targetX|ethernetX] directories along with
text properties which export the the iSCSI Boot Firmware Table (iBFT)
structure.
What is iSCSI Boot Firmware Table? It is a mechanism for the iSCSI tools to
extract from the machine NICs the iSCSI connection information so that they
can automagically mount the iSCSI share/target. Currently the iSCSI
information is hard-coded in the initrd. The /sysfs entries are read-only
one-name-and-value fields.
The usual set of data exposed is:
# for a in `find /sys/firmware/ibft/ -type f -print`; do echo -n "$a: "; cat $a; done
/sys/firmware/ibft/target0/target-name: iqn.2007.com.intel-sbx44:storage-10gb
/sys/firmware/ibft/target0/nic-assoc: 0
/sys/firmware/ibft/target0/chap-type: 0
/sys/firmware/ibft/target0/lun: 00000000
/sys/firmware/ibft/target0/port: 3260
/sys/firmware/ibft/target0/ip-addr: 192.168.79.116
/sys/firmware/ibft/target0/flags: 3
/sys/firmware/ibft/target0/index: 0
/sys/firmware/ibft/ethernet0/mac: 00:11:25:9d:8b:01
/sys/firmware/ibft/ethernet0/vlan: 0
/sys/firmware/ibft/ethernet0/gateway: 192.168.79.254
/sys/firmware/ibft/ethernet0/origin: 0
/sys/firmware/ibft/ethernet0/subnet-mask: 255.255.252.0
/sys/firmware/ibft/ethernet0/ip-addr: 192.168.77.41
/sys/firmware/ibft/ethernet0/flags: 7
/sys/firmware/ibft/ethernet0/index: 0
/sys/firmware/ibft/initiator/initiator-name: iqn.2007-07.com:konrad.initiator
/sys/firmware/ibft/initiator/flags: 3
/sys/firmware/ibft/initiator/index: 0
For full details of the IBFT structure please take a look at:
ftp://ftp.software.ibm.com/systems/support/system_x_pdf/ibm_iscsi_boot_firmware_table_v1.02.pdf
[akpm@linux-foundation.org: fix build]
Signed-off-by: Konrad Rzeszutek <konradr@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Peter Jones <pjones@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Here is a simple patch to use an allocated array of cpumasks to
represent cpumask_of_cpu() instead of constructing one on the stack.
It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is
currently only set for x86_64 SMP. Otherwise the the existing
cpumask_of_cpu() is used but has been changed to produce an lvalue
so a pointer to it can be used.
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Modify sched_affinity functions to pass cpumask_t variables by reference
instead of by value.
* Use new set_cpus_allowed_ptr function.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: Paul Jackson <pj@sgi.com>
Cc: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:
-int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
+int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
* Cleanup uses of CPU_MASK_ALL.
* Collapse other NR_CPUS changes to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
Use pointers to cpumask_t arguments whenever possible.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: Len Brown <len.brown@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
and MAXNODES counts.
* In some cases, the cpumask variable was initialized but then overwritten
with another value. This is the case for changes like this:
- cpumask_t oldmask = CPU_MASK_ALL;
+ cpumask_t oldmask;
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Change the following static arrays sized by NR_CPUS to
per_cpu data variables:
_cpuid4_info *cpuid4_info[NR_CPUS];
_index_kobject *index_kobject[NR_CPUS];
kobject * cache_kobject[NR_CPUS];
* Remove the local NR_CPUS array with a kmalloc'd region in
show_shared_cpu_map().
Also some minor complaints from checkpatch.pl fixed.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes smpboot.c so that it can start slave cpus running
in UV non-unique apicid mode. The SIPI must be sent using a UV-specific
mechanism.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The code in pci-dma_{32,64}.c are now sufficiently
close to each other. We merge them in pci-dma.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
if the device hasn't provided a mask, abort allocation.
Note that we're using a fallback device now, so it does not cover
the case of a NULL device: just drivers passing NULL masks around.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Just return our allocation if we don't have an mmu. For i386, where this patch
is being applied, we never have. So our goal is just to have the code to look like
x86_64's.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The claim is that i386 does it. Just it does not.
So remove it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the same gfp masks for x86_64 and i386.
It involves using HIGHMEM or DMA32 where necessary, for the sake
of code compatibility, (no real effect), and using the NORETRY
mask for i386.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch puts in the code to retry allocation in case it fails. By its
own, it does not make much sense but making the code look like x86_64.
But later patches in this series will make we try to allocate from
zones other than DMA first, which will possibly fail.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If we fail, we'll loop into the allocation again,
and then allocate in the DMA zone.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can use a fallback dev for cases of a NULL device being passed (mostly ISA)
This comes from x86_64 implementation.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can do it here to, in the same way x86_64 does.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
virt_to_bus() is deprecated according to the docs, and moreover,
won't return the right thing in i386 if we're dealing with high memory mappings.
So we make our allocation function return a page, and then use page_address() (for
virtual addr) and page_to_phys() (for physical addr) instead.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We call unmap_single, if available.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It goes to pci-dma.c, and is removed from the arch-specific files.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
i386 implements the declare coherent memory API, and x86_64 does not
it is reflected in pieces of dma_alloc_coherent and dma_free_coherent.
Those pieces are isolated in separate functions, that are declared
as empty macros in x86_64. This way we can make the code the same.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
They are placed in an ifdef, since they are i386 specific
the structure definition goes to dma-mapping.h.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
we merge the iommu initialization parameters in pci-dma.c
Nice thing, that both architectures at least recognize the same
parameters.
usedac i386 parameter is marked for deprecation
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The code for both arches are very similar, so this patch merge them.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
via_no_dac provides a fixup that is the same for both
architectures. Move it to pci-dma.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch moves the bootmem functions, that are largely
x86_64-specific into pci-dma.c. The code goes inside an ifdef.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
initcalls that triggers the various possibiities for
dma subsys are moved to pci-dma.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
merge pci-base_32.c and pci-nommu_64.c into pci-nommu.c
Their code were made the same, so now they can be merged.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move dma_ops structure definition to pci-dma.c, where it
belongs.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is done to get the code closer to x86_64.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In the very same way i386 do, we use WARN_ON functions
in map_simple and map_sg.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To make the code usable in i386, where we have high memory mappings,
we drop te virt_to_bus(sg_virt()) construction in favour of sg_phys.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds flush_write_buffers() in some functions of pci-nommu_64.c
They are added anywhere i386 would also have it. This is not a problem
for x86_64, since flush_rite_buffers() an nop for it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch implements mapping_error for pci-nommu_64.c.
It takes care to keep the same compatible behaviour it already
had. Although this file is not (yet) used for i386, we introduce
the i386 version here. Again, care is taken, even at the expense of
an ifdef, to keep the same behaviour inconditionally.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This functions are now called conditionally on their
existence in the struct. So just delete them, instead
of keeping an empty implementation.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch introduces pci-dma.c, a common file for pci dma
between i386 and x86_64. As a start, dma_set_mask() is the same
between architectures, and is placed there.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ERROR: "dma_supported" [drivers/ssb/ssb.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/qla2xxx/qla2xxx.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/aic7xxx/aic7xxx.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/aic7xxx/aic79xx.ko] undefined!
ERROR: "dma_supported" [drivers/net/pcnet32.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/saa7134/saa7134.ko] undefined!
ERROR: "dma_set_mask" [drivers/media/video/meye.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx8802.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx8800.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx88-alsa.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx23885/cx23885.ko] undefined!
They just need to be exported like on x86_64.
dma_supported() and dma_set_mask() were previously inlined,
but are now moved to pci-dma_32.c.
Since they're used by various drivers, they need to be
exported.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We provide a map_error function in pci-base_32.c to make
sure i386 keeps with the same behaviour it used to.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's initially 0, since we don't expect any DMA there.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is the way x86_64 does, so this make them equal. They have
to be extern now in the header, and the extern definition is moved to
the common dma-mapping.h header.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the old i386 implementation is moved to pci-base_32.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
i386 base does not need it, so it gets an empty function.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
That's already the name of the game for x86_64. For i386,
we add a pci-base_32.c, that will hold the default operations.
The function call itself goes through dma-mapping.h , the common
header
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
a system with 256 GB of RAM, when NUMA is disabled crashes the
following way:
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Cannot allocate aperture memory hole (ffff8101c0000000,65536K)
Kernel panic - not syncing: Not enough memory for aperture
Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33
Call Trace:
[<ffffffff84037c62>] panic+0xb2/0x190
[<ffffffff840381fc>] ? release_console_sem+0x7c/0x250
[<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90
[<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50
[<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680
[<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310
[<ffffffff84506a2f>] ? _etext+0x0/0x1
[<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40
[<ffffffff847ac795>] mem_init+0x45/0x1a0
[<ffffffff8479ff35>] start_kernel+0x295/0x380
[<ffffffff8479f1c2>] _sinittext+0x1c2/0x230
the root cause is : memmap PMD is too big,
[ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0
almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G.
solution will be:
1. make memmap allocation get memory above 4G...
2. reserve some dma32 range early before we try to set up memmap for all.
and release that before pci_iommu_alloc, so gart or swiotlb could get some
range under 4g limit for sure.
the patch is using method 2.
because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP
will get
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init)
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Only allocate the FPU area when the application actually uses FPU, i.e., in the
first lazy FPU trap. This could save memory for non-fpu using apps.
for example: on my system after boot, there are around 300 processes, with
only 17 using FPU.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Split the FPU save area from the task struct. This allows easy migration
of FPU context, and it's generally cleaner. It also allows the following
two optimizations:
1) only allocate when the application actually uses FPU, so in the first
lazy FPU trap. This could save memory for non-fpu using apps. Next patch
does this lazy allocation.
2) allocate the right size for the actual cpu rather than 512 bytes always.
Patches enabling xsave/xrstor support (coming shortly) will take advantage
of this.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
this function doesnt just 'find' the max_pfn - it also has
other side-effects such as registering sparse memory maps.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code forever. This can cause
time jumps in the range of hours.
This was reported in:
http://lkml.org/lkml/2007/8/23/96
and
http://lkml.org/lkml/2008/3/31/23
I was able to reproduce the problem with a gettimeofday loop test on a
dual core and a quad core machine which both have sychronized
TSCs. The TSCs seems not to be perfectly in sync though, but the
kernel is not able to detect the slight delta in the sync check. Still
there exists an extremly small window where this delta can be observed
with a real big time jump. So far I was only able to reproduce this
with the vsyscall gettimeofday implementation, but in theory this
might be observable with the syscall based version as well.
CPU 0 updates the clock source variables under xtime/vyscall lock and
CPU1, where the TSC is slighty behind CPU0, is reading the time right
after the seqlock was unlocked.
The clocksource reference data was updated with the TSC from CPU0 and
the value which is read from TSC on CPU1 is less than the reference
data. This results in a huge delta value due to the unsigned
subtraction of the TSC value and the reference value. This algorithm
can not be changed due to the support of wrapping clock sources like
pm timer.
The huge delta is converted to nanoseconds and added to xtime, which
is then observable by the caller. The next gettimeofday call on CPU1
will show the correct time again as now the TSC has advanced above the
reference value.
To prevent this TSC specific wreckage we need to compare the TSC value
against the reference value and return the latter when it is larger
than the actual TSC value.
I pondered to mark the TSC unstable when the readout is smaller than
the reference value, but this would render an otherwise good and fast
clocksource unusable without a real good reason.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements the PR_GET_TSC and PR_SET_TSC prctl()
commands on the x86 platform (both 32 and 64 bit.) These
commands control the ability to read the timestamp counter
from userspace (the RDTSC instruction.)
While the RDTSC instuction is a useful profiling tool,
it is also the source of some non-determinism in ring-3.
For deterministic replay applications it is useful to be
able to trap and emulate (and record the outcome of) this
instruction.
This patch uses code earlier used to disable the timestamp
counter for the SECCOMP framework. A side-effect of this
patch is that the SECCOMP environment will now also disable
the timestamp counter on x86_64 due to the addition of the
TIF_NOTSC define on this platform.
The code which enables/disables the RDTSC instruction during
context switches is in the __switch_to_xtra function, which
already handles other unusual conditions, so normal
performance should not have to suffer from this change.
Signed-off-by: Erik Bosman <ejbosman@cs.vu.nl>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This annotates NMI functions with notrace. Some tracers may be able
to live with this, but some cannot. The safest is to turn it off,
it's not particularly interesting anyway.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- noexec32 is on by default for years already
- add noexec32 to kernel-parameters and fix noexec typo in there
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix section mismatch warnings which occurs on my x86_64 box while compiling
linux-next-20080410:
Warning messages:
WARNING: arch/x86/kernel/built-in.o(.text+0x7bc2): Section mismatch in reference from the function bad_addr() to the
variable .init.data:early_res
The function bad_addr() references
the variable __initdata early_res.
This is often because bad_addr lacks a __initdata
annotation or the annotation of early_res is wrong.
WARNING: arch/x86/kernel/built-in.o(.text+0x7c3b): Section mismatch in reference from the function bad_addr_size() to
the variable .init.data:early_res
The function bad_addr_size() references
the variable __initdata early_res.
This is often because bad_addr_size lacks a __initdata
annotation or the annotation of early_res is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I've made a small investigation about vm86.h inclusion rules and it
looks like everything is more or less ok.
Files that rely on asm/vm86.h symbols are:
- kprobes.c
- process_32.c
- signal_32.c
- traps_32.c
- vm86_32.c
File process_32.c includes vm86.h explicitly. We can remove that
include and it won't break anything.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove old comments that include the old arch/i386 directory.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ramdisk is reserved via reserve_early in x86_64_start_kernel,
later early_res_to_bootmem() will convert to reservation in bootmem.
so don't need to reserve that again.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make x86 EFI code works when EFI_PAGE_SHIFT != PAGE_SHIFT. The
memrage_efi_to_native() provided in this patch can be used on other
EFI platform such as IA64 too.
This patch has been tested on Intel x86_64 platform with EFI 64/32
firmware.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
TF_MASK is no longer defined, use X86_EFLAGS_TF.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kgdb core fixes:
- Check to see that mm->mmap_cache is not null before calling
flush_cache_range(), else on arch=ARM it will cause a fatal
fault.
- Breakpoints should only be restored if they are in the BP_ACTIVE
state.
- Fix a typo in comments to "kgdb_register_io_module"
x86 kgdb fixes:
- Fix the x86 arch handler such that on a kill or detach that the
appropriate cleanup on the single stepping flags gets run.
- Add in the DIE_NMIWATCHDOG call for x86_64
- Touch the nmi watchdog before returning the system to normal
operation after performing any kind of kgdb operation, else
the possibility exists to trigger the watchdog.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add HW breakpoints into the arch specific portion of x86 kgdb. In the
current x86 kernel.org kernels HW breakpoints are changed out in lazy
fashion because there is no infrastructure around changing them when
changing to a kernel task or entering the kernel mode via a system
call. This lazy approach means that if a user process uses HW
breakpoints the kgdb will loose out. This is an acceptable trade off
because the developer debugging the kernel is assumed to know what is
going on system wide and would be aware of this trade off.
There is a minor bug fix to the kgdb core so as to correctly call the
hw breakpoint functions with a valid value from the enum.
There is also a minor change to the x86_64 startup code when using
early HW breakpoints. When the debugger is connected, the cpu startup
code must not zero out the HW breakpoint registers or you cannot hit
the breakpoints you are interested in, in the first place.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes the hang regression with kgdb when the NMI interrupt
comes in while the master core is returning from an exception.
Adjust the NMI logic such that KGDB will not stop NMI exceptions from
occurring by in general returning NOTIFY_DONE. It is not possible to
distinguish the debug NMI sync vs the normal NMI apic interrupt so
kgdb needs to catch the unknown NMI if it the debugger was previously
active on one of the cpus.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
simplified and streamlined kgdb support on x86, both 32-bit and 64-bit,
based on patch from:
Subject: kgdb: core-lite
From: Jason Wessel <jason.wessel@windriver.com>
[ and countless other authors - see the patch for details. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Move wakeup code to .c, so that video mode setting code can be shared
between boot and wakeup. Remove nasty assembly code in 64-bit case by
re-using trampoline code. Stack setup was fixed to clear high 16bits
of %esp, maybe that fixes some machines.
.c code sharing and morse code was done H. Peter Anvin, Sam Ravnborg
reviewed kbuild related stuff, and it seems okay to him. Rafael did
some cleanups.
[rjw:
* Made the patch stop breaking compilation on x86-32
* Added arch/x86/kernel/acpi/sleep.h
* Got rid of compiler warnings in arch/x86/kernel/acpi/sleep.c
* Fixed 32-bit compilation on x86-64 systems
* Added include/asm-x86/trampoline.h and fixed the non-SMP
compilation on 64-bit x86
* Removed arch/x86/kernel/acpi/sleep_32.c which was not used
* Fixed some breakage caused by the integration of smpboot.c done
under us in the meantime]
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this patch fixes section mismatch warnings (on x86_64 host) in setup_trampoline(),
which was referencing __initdata variables trampoline_data and trampoline_end.
Warning messages:
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x2b6a): Section mismatch in reference from the function setup_trampoline()
to the variable .init.data:trampoline_data
The function __cpuinit setup_trampoline() references
a variable __initdata trampoline_data.
If trampoline_data is only used by setup_trampoline then
annotate trampoline_data with a matching annotation.
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x2b71): Section mismatch in reference from the function setup_trampoline()
to the variable .init.data:trampoline_end
The function __cpuinit setup_trampoline() references
a variable __initdata trampoline_end.
If trampoline_end is only used by setup_trampoline then
annotate trampoline_end with a matching annotation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes mismatch warnings in smp_checks() (in arch/x86/kernel/smpboot.c):
WARNING: arch/x86/kernel/built-in.o(.text+0x11922): Section mismatch in reference from the function smp_checks()
to the variable .cpuinit.data:smp_b_stepping
The function smp_checks() references
the variable __cpuinitdata smp_b_stepping.
This is often because smp_checks lacks a __cpuinitdata
annotation or the annotation of smp_b_stepping is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > Make sure that we clear the "shutdown status flag" in the CMOS
> > register after each CPU is brought up. This fixes a problem where the
> > "shutdown status flag" may remain set when a CPU is brought up after
> > booting.
>
> btw., what problem does this result in, exactly?
The shutdown status flag set to "0xA", corresponds to "JMP double word
request without INT init".
This JMP at reboot time is at an unintended location. And results in
Triple faults in our case.
Though this error at reboot can be safely ignored in a VM environment,
am not sure what the effect would be on a physical system. May be it
will result in a triple fault and an eventual hardware reset thus
masking this BUG in the kernel.
This fix just makes sure that we reset that status flag after
initialization is done.
Fix paranoia about using BIOS quickboot mechanism.
Make sure that we clear the "shutdown status flag" in the CMOS register
after each CPU is brought up. This fixes a problem where the "shutdown
status flag" may remain set when a CPU is brought up after booting.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Arai <arai@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use cpumask_of_cpu() rather than the pair of cpus_clear() and cpu_set().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
No need to clear the memory allocated by alloc_bootmem().
It is already filled with zero.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove duplicate code by using ioapic_read_entry() and ioapic_write_entry()
in io_apic_{32,64}.c
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If one can find an ack pending pin, there is no need to check
the rest of them.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We should call for kfree if only we really need it.
Though it's safe to call kfree with NULL pointer passed
in this code we've already tested the pointer and can
eliminate the call
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu pointed out a bug in the previous patches,
fix double-shift of apicid.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cleanup references to the early cpu maps for the non-SMP configuration
and remove some functions called for SMP configurations only.
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
UV supports really big systems. So big, in fact, that the APICID register
does not contain enough bits to contain an APICID that is unique across all
cpus.
The UV BIOS supports 3 APICID modes:
- legacy mode. This mode uses the old APIC mode where
APICID is in bits [31:24] of the APICID register.
- x2apic mode. This mode is whitebox-compatible. APICIDs
are unique across all cpus. Standard x2apic APIC operations
(Intel-defined) can be used for IPIs. The node identifier
fits within the Intel-defined portion of the APICID register.
- x2apic-uv mode. In this mode, the APICIDs on each node have
unique IDs, but IDs on different node are not unique. For example,
if each mode has 32 cpus, the APICIDs on each node might be
0 - 31. Every node has the same set of IDs.
The UV hub is used to route IPIs/interrupts to the correct node.
Traditional APIC operations WILL NOT WORK.
In x2apic-uv mode, the ACPI tables all contain a full unique ID (note:
exact bit layout still changing but the following is close):
nnnnnnnnnnlc0cch
n = unique node number
l = socket number on board
c = core
h = hyperthread
Only the "lc0cch" bits are written to the APICID register. The remaining bits are
supplied by having the get_apic_id() function "OR" the extra bits into the value
read from the APICID register. (Hmmm.. why not keep the ENTIRE APICID register
in per-cpu data....)
The x2apic-uv mode is recognized by the MADT table containing:
oem_id = "SGI"
oem_table_id = "UV-X"
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add kernel support for new ACPI "sapic" tables that contain 16-bit APICIDs.
This patch simply adds parsing of an optional SAPIC table if present.
Otherwise, the traditional local APIC table is used.
Note: the SAPIC table is not a new ACPI table - it exists on other architectures
but is not currently recognized by x86_64.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Increase the number of bits in an apicid from 8 to 32.
By default, MP_processor_info() gets the APICID from the
mpc_config_processor structure. However, this structure limits
the size of APICID to 8 bits. This patch allows the caller of
MP_processor_info() to optionally pass a larger APICID that will
be used instead of the one in the mpc_config_processor struct.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add functions that can be used to determine if an x86_64
system is a SGI "UV" system. UV systems come in 3 types and
are identified by the OEM ID in the MADT.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce a function to read the local APIC_ID.
This change is in preparation for additional changes to
the APICID functions that will come in a later patch.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch renames VM_MASK to X86_VM_MASK (which
in turn defined as alias to X86_EFLAGS_VM) to better
distinguish from virtual memory flags. We can't just
use X86_EFLAGS_VM instead because it is also used
for conditional compilation
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The memory resource is also used for main memory, and we need it to
allocate physical addresses for memory hotplug. Knobbling io space is
enough to get the job done anyway.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Report when microcode was successfully updated. It used to be there but
now with DEBUG unset it becomes very silent. Also some cosmetic fixes.
Signed-off-by: Ben Castricum <lk08@bencastricum.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Upcoming 64 bit processors from Centaur can use sysenter.
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Jesse Ahrens <jahrens@centtech.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
By including processor-flags.h we are allowed to use predefined
macroses instead of keeping own ones
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On AMD SMM protected memory is part of the address map, but handled
internally like an MTRR. That leads to large pages getting split
internally which has some performance implications. Check for the
AMD TSEG MSR and split the large page mapping on that area
explicitely if it is part of the direct mapping.
There is also SMM ASEG, but it is in the first 1MB and already covered by
the earlier split first page patch.
Idea for this came from an earlier patch by Andreas Herrmann
On a RevF dual Socket Opteron system kernbench shows a clear
improvement from this:
(together with the earlier patches in this series, especially the
split first 2MB patch)
[lower is better]
no split stddev split stddev delta
Elapsed Time 87.146 (0.727516) 84.296 (1.09098) -3.2%
User Time 274.537 (4.05226) 273.692 (3.34344) -0.3%
System Time 34.907 (0.42492) 34.508 (0.26832) -1.1%
Percent CPU 322.5 (38.3007) 326.5 (44.5128) +1.2%
=> About 3.2% improvement in elapsed time for kernbench.
With GB pages on AMD Fam1h the impact of splitting is much higher of course,
since it would split two full GB pages (together with the first
1MB split patch) instead of two 2MB pages. I could not benchmark
a clear difference in kernbench on gbpages, so I kept it disabled
for that case
That was only limited benchmarking of course, so if someone
was interested in running more tests for the gbpages case
that could be revisited (contributions welcome)
I didn't bother implementing this for 32bit because it is very
unlikely the 32bit lowmem mapping overlaps into the TSEG near 4GB
and the 2MB low split is already handled for both.
[ mingo@elte.hu: do it on gbpages kernels too, there's no clear reason
why it shouldnt help there. ]
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Intel recommends to not use large pages for the first 1MB
of the physical memory because there are fixed size MTRRs there
which cause splitups in the TLBs.
On AMD doing so is also a good idea.
The implementation is a little different between 32bit and 64bit.
On 32bit I just taught the initial page table set up about this
because it was very simple to do. This also has the advantage
that the risk of a prefetch ever seeing the page even
if it only exists for a short time is minimized.
On 64bit that is not quite possible, so use set_memory_4k() a little
later (in check_bugs) instead.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When end_pfn is not aligned to 2MB (or 1GB) then the kernel might
map more memory than end_pfn. Account this in max_pfn_mapped.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>