This reverts commit 13a229abc2.
Quoth Andi:
"After some consideration and feedback from various people it turns
out this wasn't that good an idea. It has some problems and needs
more work. Since it was only an optimization anyways it's best to
just back it out again for now."
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit 9ec4b1f356 made kprobes not compile
without module support, so just make that clear in the Kconfig file.
Also, since it's marked EXPERIMENTAL, make that dependency explicit too.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The commit e2c0388866 added
setup_additional_cpus to setup.c but this is only defined if
CONFIG_HOTPLUG_CPU is set. This patch changes the #ifdef to reflect that.
Signed-off-by: Brian Magnuson <magnuson@rcn.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The previous experiment for using apicmaintimer on ATI systems didn't
work out very well. In particular laptops with C2/C3 support often
don't let it tick during idle, which makes it useless. There were also
some other bugs that made the apicmaintimer often not used at all.
I tried some other experiments - running timer over RTC and some other
things but they didn't really work well neither.
I rechecked the specs now and it turns out this simple change is
actually enough to avoid the double ticks on the ATI systems. We just
turn off IRQ 0 in the 8254 and only route it directly using the IO-APIC.
I tested it on a few ATI systems and it worked there. In fact it worked
on all chipsets (NVidia, Intel, AMD, ATI) I tried it on.
According to the ACPI spec routing should always work through the
IO-APIC so I think it's the correct thing to do anyways (and most of the
old gunk in check_timer should be thrown away for x86-64).
But for 2.6.16 it's best to do a fairly minimal change:
- Use the known to be working everywhere-but-ATI IRQ0 both over 8254
and IO-APIC setup everywhere
- Except on ATI disable IRQ0 in the 8254
- Remove the code to select apicmaintimer on ATI chipsets
- Add some boot options to allow to override this (just paranoia)
In 2.6.17 I hope to switch the default over to this for everybody.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SMP time selection originally ran after all CPUs were brought up because
it needed to know the number of CPUs to decide if it needs an MP safe
timer or not.
This is not needed anymore because we know present CPUs early.
This fixes a couple of problems:
- apicmaintimer didn't always work because it relied on state that was
set up time_init_gtod too late.
- The output for the used timer in early kernel log was misleading
because time_init_gtod could actually change it later. Now always
print the final timer choice
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It didn't set up the CPU possible map early enough, so the
option didn't actually work.
Noticed by Heiko Carstens
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[description from AK]
Old check for the IO-APIC watchdog during the timer check was wrong -
it obviously should only drop into this if the IO-APIC watchdog is used.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes x86-64 use the common X86_PM_TIMER Kconfig entry in drivers/acpi
And since PM timer is needed for correct timing on a lot of systems
now (e.g. AMD dual cores) and we often get bug reports from people
who forgot to set it make it depend on CONFIG_EMBEDDED. x86-64 had
this change before and it's a good thing.
I also fixed the description slightly to make this more clear.
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Big Unisys systems have multiple clusters too, but they have an
synchronized TSC.
I'm using the SMBIOS to check for vendor == IBM.
Cc: Chris McDermott <lcm@us.ibm.com>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@unisys.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In previous versions of pci-gart.c, no_iommu was used to determine if IOMMU was
disabled in the GART DMA mapping functions. This changed in 2.6.16 and now
gart_xxx() functions are only called if gart is enabled. Therefore, uses of
no_iommu in the GART code are no longer necessary and can be removed.
Also, it removes double deceleration of no_iommu and force_iommu in pci.h and
proto.h, by removing the deceleration in pci.h.
Lastly, end_pfn off by one error.
Tested (along with patch 1/2) on dual opteron with gart enabled, iommu=soft,
and iommu=off.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit 9c869edac5 moved the i386 topology.c
file. That change broke x86-64 compiles, as it uses the same file.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Undo setting of CONFIG_DEBUG_INFO in the previous defconfig update. It
will make every build much slower and need more disk space and isn't a good
default.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't print KERN_INFO in the middle of a printk line.
printk(KERN_INFO "OEM ID: %s ",str);
is just above this. This is already fixed up in i386 copy.
Signed-off-by: Martin J. Bligh <mbligh@google.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Previously the numa hash code would be confused by holes in the node space
and stop early. This is the first part of the fix for the non boot issue
with empty nodes on Opterons.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Code was refusing good SRATs because about 12K got lost somewhere.
Allow less than 1MB of difference before rejecting it.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
But do it after everything else to risk less from recursive
crashes.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Many laptops have problems with ticking the local APIC timer in C2/C3.
The code added earlier to use it by default on ATI didn't really work
for them. Don't enable it when the system supports C2/C3.
This doesn't fix the problem fully, but at least it's not worse than before.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This caused a sigreturn with bad argument on a preemptible kernel
to complain with
Debug: sleeping function called from invalid context at /home/lsrc/quilt/linux/include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1
Call Trace: {__might_sleep+190} {profile_task_exit+21}
{__do_exit+34} {do_wait+0}
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Along with that, also suppress the memory touching altogether when the
watchdog is not running, to eliminate needless crosstalk. Plus ad a call
to it to make things consistent (one could also consider removing the call
in enable_timer_nmi_watchdog()).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The early initialization of cpu_to_node code as it is now only updates the
cpu_to_node array, and does not update cpu_pda()->nodemember. This will
cause numa_node_id() to return 0 on systems where CPU 0 is not on Node 0.
This leads to a kernel panic in slab.c.
I've tested the patch below on a 16 processor x86_64 ES7000-600 server, and
no longer see the panic I saw with the original 2.6.16-rc3.
Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't touch the non DMA members in the sg list in dma_map_sg in the IOMMU
Some drivers (in particular ST) ran into problems because they reused the sg
lists after passing them to pci_map_sg(). The merging procedure in the K8
GART IOMMU corrupted the state. This patch changes it to only touch the dma*
entries during merging, but not the other fields. Approach suggested by Dave
Miller.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We found a problem with x86_64 kernels with preemption enabled, where
having multiple tasks doing ptrace singlesteps around the same time will
cause the system to 'oops'. The problem seems that a task can get
preempted out of the do_debug() processing while it is running on the
DEBUG_STACK stack. If another task on that same cpu then enters do_debug()
and uses the same per-cpu DEBUG_STACK stack, the previous preempted tasks's
stack contents can be corrupted, and the system will oops when the
preempted task is context switched back in again.
The typical oops looks like the following:
Unable to handle kernel paging request at ffffffffffffffae RIP: <ffffffff805452a1>{thread_return+34}
PGD 103027 PUD 102429067 PMD 0
Oops: 0002 [1] PREEMPT SMP
CPU 0
Modules linked in:
Pid: 3786, comm: ssdd Not tainted 2.6.15.2 #1
RIP: 0010:[<ffffffff805452a1>] <ffffffff805452a1>{thread_return+34}
RSP: 0018:ffffffff80824058 EFLAGS: 000136c2
RAX: ffff81017e12cea0 RBX: 0000000000000000 RCX: 00000000c0000100
RDX: 0000000000000000 RSI: ffff8100f7856e20 RDI: ffff81017e12cea0
RBP: 0000000000000046 R08: ffff8100f68a6000 R09: 0000000000000000
R10: 0000000000000000 R11: ffff81017e12cea0 R12: ffff81000c2d53e8
R13: ffff81017f5b3be8 R14: ffff81000c0036e0 R15: 000001056cbfc899
FS: 00002aaaaaad9b00(0000) GS:ffffffff80883800(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffffffffae CR3: 00000000f6fcf000 CR4: 00000000000006e0
Process ssdd (pid: 3786, threadinfo ffff8100f68a6000, task ffff8100f7856e20)
Stack: ffffffff808240d8 ffffffff8012a84a ffff8100055f6c00 0000000000000020
0000000000000001 ffff81000c0036e0 ffffffff808240b8 0000000000000000
0000000000000000 0000000000000000
Call Trace: <#DB>
<ffffffff8012a84a>{try_to_wake_up+985}
<ffffffff8012c0d3>{kick_process+87}
<ffffffff8013b262>{signal_wake_up+48}
<ffffffff8013b5ce>{specific_send_sig_info+179}
<ffffffff80546abc>{_spin_unlock_irqrestore+27}
<ffffffff8013b67c>{force_sig_info+159}
<ffffffff801103a0>{do_debug+289} <ffffffff80110278>{sync_regs+103}
<ffffffff8010ed9a>{paranoid_userspace+35}
Unable to handle kernel paging request at 00007fffffb7d000 RIP: <ffffffff8010f2e4>{show_trace+465}
PGD f6f25067 PUD f6fcc067 PMD f6957067 PTE 0
Oops: 0000 [2] PREEMPT SMP
This patch disables preemptions for the task upon entry to do_debug(), before
interrupts are reenabled, and then disables preemption before exiting
do_debug(), after disabling interrupts. I've noticed that the task can be
preempted either at the end of an interrupt, or on the call to
force_sig_info() on the spin_unlock_irqrestore() processing. It might be
better to attempt to code a fix in entry.S around the code that calls
do_debug().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[description from AK]
The IBM Summit 3 chipset doesn't implement the HPET timer replacement
option. Since the current Linux code relies on it use a mixed mode with
both PIT for the interrupt and HPET counters for the time keeping. That
was already implemented, but didn't work properly because it was still
using the last interrupt offset in HPET. This resulted in x460 not
booting. Fix this up by using the free running HPET counter.
Shouldn't affect any other machine because they either use full HPET mode
or no HPET at all.
TBD needs a similar 32bit fix.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Bob Picco <bob.picco@hp.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The *at patches introduced fstatat and, due to inusfficient research, I
used the newfstat functions generally as the guideline. The result is that
on 32-bit platforms we don't have all the information needed to implement
fstatat64.
This patch modifies the code to pass up 64-bit information if
__ARCH_WANT_STAT64 is defined. I renamed the syscall entry point to make
this clear. Other archs will continue to use the existing code. On x86-64
the compat code is implemented using a new sys32_ function. this is what
is done for the other stat syscalls as well.
This patch might break some other archs (those which define
__ARCH_WANT_STAT64 and which already wired up the syscall). Yet others
might need changes to accomodate the compatibility mode. I really don't
want to do that work because all this stat handling is a mess (more so in
glibc, but the kernel is also affected). It should be done by the arch
maintainers. I'll provide some stand-alone test shortly. Those who are
eager could compile glibc and run 'make check' (no installation needed).
The patch below has been tested on x86 and x86-64.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, x86_64 and ia64 arches do not clear the corresponding bits in
the node's cpumask when a cpu goes down or cpu bring up is cancelled. This
is buggy since there are pieces of common code where the cpumask is checked
in the cpu down code path to decide on things (like in the slab down path).
PPC does the right thing, but x86_64 and ia64 don't (This was the reason
Sonny hit upon a slab bug during cpu offline on ppc and could not reproduce
on other arches). This patch fixes it for x86_64. I won't attempt ia64 as
I cannot test it.
Credit for spotting this should go to Alok.
(akpm: this was applied, then reverted. But it's OK now because we now use
for_each_cpu() in the right places).
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit 10f4dc8b27.
Quoth Andi Kleen:
"Kiran decided that it makes the problem worse than it was before.
Fixing it fully requires more work which is too much for 2.6.16. So
please revert that commit for now."
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains a printk reorder to remove the current problem of
displaying "PCI-DMA: Disabling IOMMU." and then "PCI-DMA: using GART
IOMMU" 20 lines later in dmesg.
It also constains a printk reorder in swiotlb to state swiotlb
enablement prior to describing the location of the bounce buffers, and a
printk reorder to state gart enablement prior to describing the
aperature.
Also constains a whitespace cleanup in arch/x86_64/kernel/setup.c
Tested (along with patch 2/2) on dual opteron with gart enabled,
iommu=soft, and iommu=off.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hack for 2.6.16. In 2.6.17 all code that uses NR_CPUs should
be audited and changed to only touch possible CPUs.
Don't mark the reference per cpu data init data (so it stays
around after boot) and point all impossible CPUs to it. This way
they reference some valid - although shared memory. Usually
this is only initialization like INIT_LIST_HEADs and there
won't be races because these CPUs never run. Still somewhat hackish.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's bad juju to touch the APIC when it hasn't been enabled.
I also moved ack_bad_irq for x86-64 out of line following i386.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Checking of the validity of pointers should be consistently done before
dereferencing the pointer.
Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Conditionalize two unwind directives to match other similarly
conditional code.
Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Cc: Jim Houston <jim.houston@ccur.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On some broken motherboards (at least one NForce3 based AMD64 laptop)
the PIT timer runs at a incorrect frequency. This patch adds a new
option "apicpmtimer" that allows to use the APIC timer and calibrate it
using the PMTimer. It requires the earlier patch that allows to run the
main timer from the APIC.
Specifying apicpmtimer implies apicmaintimer.
The option defaults to off for now.
I tested it on a few systems and the resulting APIC timer frequencies
were usually a bit off, but always <1%, which should be tolerable.
TBD figure out heuristic to enable this automatically on the affected
systems TBD perhaps do it on all NForce3s or using DMI?
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kprobes cannot deal with the funny calling conventions when it
runs on a different stack when it returns. If someone wants
to instrument context switch they can add a probe to schedule()
instead.
Cc: jkenisto@us.ibm.com, prasanna@in.ibm.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Align the start of the per-cpu section to the configured number of bytes in a
cache line. This stops a BUG_ON() from triggering in load_module() when
DEFINE_PER_CPU() is used in a module and the section isn't cacheline-aligned.
Rusty also found this and sent a patch in a while ago
(http://lkml.org/lkml/2004/10/19/17), I don't know what came of that.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[ AK: I redid Kevin's fix to be simpler, but the idea and original
analysis of the problem is from Kevin]
This avoid allocation failures on some SATA systems like Nvidia CK8
when the IOMMU gets fragmented. Modern SATA devices have quite large queues
(128 entries) and the FS with ext2/3 is good enough now that it often
passes whole 128 page sg lists down to the driver. These require
512K of continuous free space in the IOMMU aperture to map when merged.
When the IOMMU is fragmented this could lead to spurious IO errors
due to failing mappings.
Short term fix is to just try to map the SG list again unmerged
page by page - this way fragmentation doesn't matter anymore.
The code for that was already there, but it just wasn't enabled for the
merge case.
According to Kevin at least the Nvidia device doesn't seem to benefit
from merging much anyways, so the only slowdown is from trying
to do an unnecessary merge attempt.
Kevin plans to implement better fragmentation avoidance in the future,
but that wouldn't be 2.6.16 material.
TBD: should add some statistic counters to count how often that really
happens.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I broke this earlier when moving the patch from i386 to x86-64.
Need to return the virtual address here, not the physical address.
This fixes some boot time crashes on x86-64.
Cc: gregkh@suse.de
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Check if the processor/memory affinity entries are long enough
according to the ACPI 3.0 spec.
- Ignore memory affinity entries that define a zero length region.
All based on BIOS issues found in the field @)
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
attached patch is 2 more cases i found via running the reference_init.pl
script. These were easy to spot just knowing the file names. There is
one another about init/main.c that i cant exactly zero in. (partly
because i dont know how to interpret the data thats spewed out of the tool).
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It has been enabled by default for some time now and is cheap enough
so it doesn't matter anyways.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, x86_64 and ia64 arches do not clear the corresponding bits
in the node's cpumask when a cpu goes down or cpu bring up is cancelled.
This is buggy since there are pieces of common code where the cpumask is
checked in the cpu down code path to decide on things (like in the slab
down path). PPC does the right thing, but x86_64 and ia64 don't (This
was the reason Sonny hit upon a slab bug during cpu offline on ppc and
could not reproduce on other arches). This patch fixes it for x86_64.
I won't attempt ia64 as I cannot test it.
Credit for spotting this should go to Alok.
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
They cause quite bad performance regressions on Netburst
This is temporary until we can get new optimized functions
for these CPUs.
This undoes changes that were done in 2.6.15 and in 2.6.16-rc1,
essentially bringing the code back to 2.6.14 level. Only change
is I renamed the X86_FEATURE_K8_C flag to X86_FEATURE_REP_GOOD
and fixed the check for the flag and also fixed some comments.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This avoids BUG_ONs in the low level allocator when an illegal
GFP mask is added.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>