Commit Graph

1986 Commits

Author SHA1 Message Date
Ingo Molnar
9b610fda0d Merge branch 'linus' into timers/nohz 2008-07-18 19:53:16 +02:00
Thomas Gleixner
b8f8c3cf0a nohz: prevent tick stop outside of the idle loop
Jack Ren and Eric Miao tracked down the following long standing
problem in the NOHZ code:

	scheduler switch to idle task
	enable interrupts

Window starts here

	----> interrupt happens (does not set NEED_RESCHED)
	      	irq_exit() stops the tick

	----> interrupt happens (does set NEED_RESCHED)

	return from schedule()
	
	cpu_idle(): preempt_disable();

Window ends here

The interrupts can happen at any point inside the race window. The
first interrupt stops the tick, the second one causes the scheduler to
rerun and switch away from idle again and we end up with the tick
disabled.

The fact that it needs two interrupts where the first one does not set
NEED_RESCHED and the second one does made the bug obscure and extremly
hard to reproduce and analyse. Kudos to Jack and Eric.

Solution: Limit the NOHZ functionality to the idle loop to make sure
that we can not run into such a situation ever again.

cpu_idle()
{
	preempt_disable();

	while(1) {
		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
		 			          are in the idle loop

		 while (!need_resched())
		       halt();

		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
		 preempt_enable_no_resched();
		 schedule();
		 preempt_disable();
	}
}

In hindsight we should have done this forever, but ... 

/me grabs a large brown paperbag.

Debugged-by: Jack Ren <jack.ren@marvell.com>, 
Debugged-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-18 18:10:28 +02:00
Linus Torvalds
2b04be7e8a Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix asm/e820.h for userspace inclusion
  x86: fix numaq_tsc_disable
  x86: fix kernel_physical_mapping_init() for large x86 systems
2008-07-17 10:38:59 -07:00
Linus Torvalds
bdec6cace4 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ftrace: do not trace library functions
  ftrace: do not trace scheduler functions
  ftrace: fix lockup with MAXSMP
  ftrace: fix merge buglet
2008-07-17 10:37:10 -07:00
Yinghai Lu
9354094a95 x86: fix numaq_tsc_disable
fix:

 arch/x86/kernel/numaq_32.c: In function ‘numaq_tsc_disable’:
 arch/x86/kernel/numaq_32.c:99: warning: ‘return’ with a value, in function returning void

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-17 19:27:08 +02:00
Ingo Molnar
8e9509c827 ftrace: fix merge buglet
-tip testing found a bootup hang here:

  initcall anon_inode_init+0x0/0x130 returned 0 after 0 msecs
  calling  acpi_event_init+0x0/0x57

the bootup should have continued with:

  initcall acpi_event_init+0x0/0x57 returned 0 after 45 msecs

but it hung hard there instead.

bisection led to this commit:

| commit 5806b81ac1
| Merge: d14c8a6... 6712e29...
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Mon Jul 14 16:11:52 2008 +0200
|     Merge branch 'auto-ftrace-next' into tracing/for-linus

turns out that i made this mistake in the merge:

  ifdef CONFIG_FTRACE
  # Do not profile debug utilities
  CFLAGS_REMOVE_tsc_64.o = -pg
  CFLAGS_REMOVE_tsc_32.o = -pg

those two files got unified meanwhile - so the dont-profile annotation
got lost. The proper rule is:

  CFLAGS_REMOVE_tsc.o = -pg

i guess this could have been caught sooner if the CFLAGS_REMOVE* kbuild
rule aborted the build if it met a target that does not exist anymore?

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-17 13:26:50 +02:00
Linus Torvalds
dc7c65db28 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
  Revert "x86/PCI: ACPI based PCI gap calculation"
  PCI: remove unnecessary volatile in PCIe hotplug struct controller
  x86/PCI: ACPI based PCI gap calculation
  PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
  PCI PM: Fix pci_prepare_to_sleep
  x86/PCI: Fix PCI config space for domains > 0
  Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
  PCI: Simplify PCI device PM code
  PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
  PCI ACPI: Rework PCI handling of wake-up
  ACPI: Introduce new device wakeup flag 'prepared'
  ACPI: Introduce acpi_device_sleep_wake function
  PCI: rework pci_set_power_state function to call platform first
  PCI: Introduce platform_pci_power_manageable function
  ACPI: Introduce acpi_bus_power_manageable function
  PCI: make pci_name use dev_name
  PCI: handle pci_name() being const
  PCI: add stub for pci_set_consistent_dma_mask()
  PCI: remove unused arch pcibios_update_resource() functions
  PCI: fix pci_setup_device()'s sprinting into a const buffer
  ...

Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
2008-07-16 17:25:46 -07:00
Zhao Yakui
da5e09a1b3 ACPI : Create "idle=nomwait" bootparam
"idle=nomwait" disables the use of the MWAIT
instruction from both C1 (C1_FFH) and deeper (C2C3_FFH)
C-states.

When MWAIT is unavailable, the BIOS and OS generally
negotiate to use the HALT instruction for C1,
and use IO accesses for deeper C-states.

This option is useful for power and performance
comparisons, and also to work around BIOS bugs
where broken MWAIT support is advertised.

http://bugzilla.kernel.org/show_bug.cgi?id=10807
http://bugzilla.kernel.org/show_bug.cgi?id=10914

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-07-16 23:27:05 +02:00
Zhao Yakui
c1e3b377ad ACPI: Create "idle=halt" bootparam
"idle=halt" limits the idle loop to using
the halt instruction.  No MWAIT, no IO accesses,
no C-states deeper than C1.

If something is broken in the idle code,
"idle=halt" is a less severe workaround
than "idle=poll" which disables all power savings.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-07-16 23:27:05 +02:00
Zhao Yakui
5b53496a5a ACPI: Disable the C2C3_FFH access mode HW has no MWAIT support
991528d734
(ACPI: Processor native C-states using MWAIT)
started passing C2C3_FFH to _PDC to tell the BIOS
that Linux supports MWAIT for deep C-states.

However, we should first double check with the hardware
that it actually supports MWAIT before potentially exposing
a BIOS bug of an MWAIT _CST on HW that doesn't support MWAIT.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-07-16 23:27:04 +02:00
Linus Torvalds
fafa3a3f16 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix TSC build error on 32bit
2008-07-15 16:29:18 -07:00
Thomas Gleixner
431ceb83f7 x86: fix TSC build error on 32bit
Dave Hansen reported a build error on 32bit which went unnoticed
as newer gcc versions seem to optimize unused static functions
away before compiling them.

Make vread_tsc() depend on CONFIG_X86_64

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-15 22:46:47 +02:00
Ingo Molnar
1a781a777b Merge branch 'generic-ipi' into generic-ipi-for-linus
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15 21:55:59 +02:00
Linus Torvalds
da6e88f496 Merge branch 'timers/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: add PCI ID for 6300ESB force hpet
  x86: add another PCI ID for ICH6 force-hpet
  kernel-paramaters: document pmtmr= command line option
  acpi_pm clccksource: fix printk format warning
  nohz: don't stop idle tick if softirqs are pending.
  pmtmr: allow command line override of ioport
  nohz: reduce jiffies polling overhead
  hrtimer: Remove unused variables in ktime_divns()
  hrtimer: remove warning in hres_timers_resume
  posix-timers: print RT watchdog message
2008-07-15 10:39:57 -07:00
Linus Torvalds
af5329cdf5 Merge branch 'core/stacktrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/stacktrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  generic-ipi: powerpc/generic-ipi tree build failure
  stacktrace: fix build failure on sparc64
  stacktrace: export save_stack_trace[_tsk]
  stacktrace: fix modular build, export print_stack_trace and save_stack_trace
  backtrace: replace timer with tasklet + completions
  stacktrace: add saved stack traces to backtrace self-test
  stacktrace: print_stack_trace() cleanup
  debugging: make stacktrace independent from DEBUG_KERNEL
  stacktrace: don't crash on invalid stack trace structs
2008-07-15 10:31:35 -07:00
Ingo Molnar
91d0322bef Merge branch 'linus' into x86/urgent 2008-07-15 13:45:59 +02:00
Linus Torvalds
5a86102248 Merge branch 'for-2.6.27' of git://git.infradead.org/users/dwmw2/firmware-2.6
* 'for-2.6.27' of git://git.infradead.org/users/dwmw2/firmware-2.6: (64 commits)
  firmware: convert sb16_csp driver to use firmware loader exclusively
  dsp56k: use request_firmware
  edgeport-ti: use request_firmware()
  edgeport: use request_firmware()
  vicam: use request_firmware()
  dabusb: use request_firmware()
  cpia2: use request_firmware()
  ip2: use request_firmware()
  firmware: convert Ambassador ATM driver to request_firmware()
  whiteheat: use request_firmware()
  ti_usb_3410_5052: use request_firmware()
  emi62: use request_firmware()
  emi26: use request_firmware()
  keyspan_pda: use request_firmware()
  keyspan: use request_firmware()
  ttusb-budget: use request_firmware()
  kaweth: use request_firmware()
  smctr: use request_firmware()
  firmware: convert ymfpci driver to use firmware loader exclusively
  firmware: convert maestro3 driver to use firmware loader exclusively
  ...

Fix up trivial conflicts with BKL removal in drivers/char/dsp56k.c and
drivers/char/ip2/ip2main.c manually.
2008-07-14 16:54:07 -07:00
David Woodhouse
751851af7a Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Conflicts:

	sound/pci/Kconfig
2008-07-14 15:51:11 -07:00
Linus Torvalds
d18bb9a548 Merge branch 'core/rodata' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/rodata' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  move BUG_TABLE into RODATA
2008-07-14 15:28:10 -07:00
Linus Torvalds
4bb0057f99 Merge branch 'core/printk' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/printk' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, generic: mark early_printk as asmlinkage
  printk: export console_drivers
  printk: remember the message level for multi-line output
  printk: refactor processing of line severity tokens
  printk: don't prefer unsuited consoles on registration
  printk: clean up recursion check related static variables
  namespacecheck: more kernel/printk.c fixes
  namespacecheck: fix kernel printk.c
2008-07-14 15:27:43 -07:00
Linus Torvalds
e18425a0ab Merge branch 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (228 commits)
  ftrace: build fix for ftraced_suspend
  ftrace: separate out the function enabled variable
  ftrace: add ftrace_kill_atomic
  ftrace: use current CPU for function startup
  ftrace: start wakeup tracing after setting function tracer
  ftrace: check proper config for preempt type
  ftrace: trace schedule
  ftrace: define function trace nop
  ftrace: move sched_switch enable after markers
  ftrace: prevent ftrace modifications while being kprobe'd, v2
  fix "ftrace: store mcount address in rec->ip"
  mmiotrace broken in linux-next (8-bit writes only)
  ftrace: avoid modifying kprobe'd records
  ftrace: freeze kprobe'd records
  kprobes: enable clean usage of get_kprobe
  ftrace: store mcount address in rec->ip
  ftrace: build fix with gcc 4.3
  namespacecheck: fixes
  ftrace: fix "notrace" filtering priority
  ftrace: fix printout
  ...
2008-07-14 14:49:54 -07:00
Linus Torvalds
d1794f2c5b Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6
* 'bkl-removal' of git://git.lwn.net/linux-2.6: (146 commits)
  IB/umad: BKL is not needed for ib_umad_open()
  IB/uverbs: BKL is not needed for ib_uverbs_open()
  bf561-coreb: BKL unneeded for open()
  Call fasync() functions without the BKL
  snd/PCM: fasync BKL pushdown
  ipmi: fasync BKL pushdown
  ecryptfs: fasync BKL pushdown
  Bluetooth VHCI: fasync BKL pushdown
  tty_io: fasync BKL pushdown
  tun: fasync BKL pushdown
  i2o: fasync BKL pushdown
  mpt: fasync BKL pushdown
  Remove BKL from remote_llseek v2
  Make FAT users happier by not deadlocking
  x86-mce: BKL pushdown
  vmwatchdog: BKL pushdown
  vmcp: BKL pushdown
  via-pmu: BKL pushdown
  uml-random: BKL pushdown
  uml-mmapper: BKL pushdown
  ...
2008-07-14 14:48:31 -07:00
Jonathan Corbet
2fceef397f Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
Linus Torvalds
a3da5bf84a Merge branch 'x86/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (821 commits)
  x86: make 64bit hpet_set_mapping to use ioremap too, v2
  x86: get x86_phys_bits early
  x86: max_low_pfn_mapped fix #4
  x86: change _node_to_cpumask_ptr to return const ptr
  x86: I/O APIC: remove an IRQ2-mask hack
  x86: fix numaq_tsc_disable calling
  x86, e820: remove end_user_pfn
  x86: max_low_pfn_mapped fix, #3
  x86: max_low_pfn_mapped fix, #2
  x86: max_low_pfn_mapped fix, #1
  x86_64: fix delayed signals
  x86: remove conflicting nx6325 and nx6125 quirks
  x86: Recover timer_ack lost in the merge of the NMI watchdog
  x86: I/O APIC: Never configure IRQ2
  x86: L-APIC: Always fully configure IRQ0
  x86: L-APIC: Set IRQ0 as edge-triggered
  x86: merge dwarf2 headers
  x86: use AS_CFI instead of UNWIND_INFO
  x86: use ignore macro instead of hash comment
  x86: use matching CFI_ENDPROC
  ...
2008-07-14 13:43:24 -07:00
Linus Torvalds
7daf705f36 Start using the new '%pS' infrastructure to print symbols
This simplifies the code significantly, and was the whole point of the
exercise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-14 12:12:53 -07:00
H. Peter Anvin
065cb3dfe2 x86, suspend, acpi: correct and add comments about Big Real Mode
Explain that we set up the descriptors for Big Real Mode, and why we
do so.  In particular, one system that is known to fail without it is
the Lenovo X61.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-07-14 11:44:26 -07:00
H. Peter Anvin
3bf2e77453 x86, suspend, acpi: enter Big Real Mode
The explanation for recent video BIOS suspend quirk failures is that
the VESA BIOS expects to be entered in Big Real Mode (*.limit = 0xffffffff)
instead of ordinary Real Mode (*.limit = 0xffff).

This patch changes the segment descriptors to Big Real Mode instead.

The segment descriptor registers (what Intel calls "segment cache") is
always active.  The only thing that changes based on CR0.PE is how it is
*loaded* and the interpretation of the CS flags.

The segment descriptor registers contain of the following sub-registers:
selector (the "visible" part), base, limit and flags.  In protected mode
or long mode, they are loaded from descriptors (or fs.base or gs.base can
be manipulated directly in long mode.)  In real mode, the only thing
changed by a segment register load is the selector and the base, where the
base <- selector << 4.  In particular, *the limit and the flags are not
changed*.

As far as the handling of the CS flags: a code segment cannot be writable
in protected mode, whereas it is "just another segment" in real mode, so
there is some kind of quirk that kicks in for this when CR0.PE <- 0.  I'm
not sure if this is accomplished by actually changing the cs.flags register
or just changing the interpretation; it might be something that is
CPU-specific.  In particular, the Transmeta CPUs had an explicit "CS is
writable if you're in real mode" override, so even if you had loaded CS
with an execute-only segment it'd be writable (but not readable!) on return
to real mode.  I'm not at all sure if that is how other CPUs behave.

Signed-off-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 18:16:09 +02:00
Joe Buehler
4c2a997c34 x86: add PCI ID for 6300ESB force hpet
00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller (rev 02)
00:1f.0 Class 0601: 8086:25a1 (rev 02)

kernel: pci 0000:00:1f.0: Force enabled HPET at 0xfed00000
kernel: hpet clockevent registered
kernel: hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
kernel: hpet0: 3 64-bit timers, 14318180 Hz

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-14 17:28:40 +02:00
Krzysztof Oledzki
1c776bf87c x86: add another PCI ID for ICH6 force-hpet
Tested on Asus P5GDC-V

$ lspci -n -n |grep ISA
00:1f.0 ISA bridge [0601]: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface Bridge [8086:2640] (rev 03)

Force enabled HPET at base address 0xfed00000
hpet clockevent registered
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-14 17:28:18 +02:00
Ingo Molnar
5806b81ac1 Merge branch 'auto-ftrace-next' into tracing/for-linus
Conflicts:

	arch/x86/kernel/entry_32.S
	arch/x86/kernel/process_32.c
	arch/x86/kernel/process_64.c
	arch/x86/lib/Makefile
	include/asm-x86/irqflags.h
	kernel/Makefile
	kernel/sched.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 16:11:52 +02:00
Yinghai Lu
2387ce57a8 x86: make 64bit hpet_set_mapping to use ioremap too, v2
keep the one for VSYSCALL_HPET

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 09:24:17 +02:00
Yinghai Lu
87a1c441e1 x86: get x86_phys_bits early
when try to make hpet_enable use io_remap instead fixmap got

ioremap: invalid physical address fed00000
------------[ cut here ]------------
WARNING: at arch/x86/mm/ioremap.c:161 __ioremap_caller+0x8c/0x2f3()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.26-rc9-tip-01873-ga9827e7-dirty #358

Call Trace:
 [<ffffffff8026615e>] warn_on_slowpath+0x6c/0xa7
 [<ffffffff802e2313>] ? __slab_alloc+0x20a/0x3fb
 [<ffffffff802d85c5>] ? mpol_new+0x88/0x17d
 [<ffffffff8022a4f4>] ? mcount_call+0x5/0x31
 [<ffffffff8022a4f4>] ? mcount_call+0x5/0x31
 [<ffffffff8024b0d2>] __ioremap_caller+0x8c/0x2f3
 [<ffffffff80e86dbd>] ? hpet_enable+0x39/0x241
 [<ffffffff8022a4f4>] ? mcount_call+0x5/0x31
 [<ffffffff8024b466>] ioremap_nocache+0x2a/0x40
 [<ffffffff80e86dbd>] hpet_enable+0x39/0x241
 [<ffffffff80e7a1f6>] hpet_time_init+0x21/0x4e
 [<ffffffff80e730e9>] start_kernel+0x302/0x395
 [<ffffffff80e722aa>] x86_64_start_reservations+0xb9/0xd4
 [<ffffffff80e722fe>] ? x86_64_init_pda+0x39/0x4f
 [<ffffffff80e72400>] x86_64_start_kernel+0xec/0x107

---[ end trace a7919e7f17c0a725 ]---

it seems for amd system that is set later...
try to move setting early in early_identify_cpu.
and remove same code for intel and centaur.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 09:24:16 +02:00
Yinghai Lu
32b23e9a73 x86: max_low_pfn_mapped fix #4
only add direct mapping for aperture

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 09:24:16 +02:00
Mike Travis
11369f356b x86: change _node_to_cpumask_ptr to return const ptr
* Strengthen the return type for the _node_to_cpumask_ptr to be
    a const pointer.  This adds compiler checking to insure that
    node_to_cpumask_map[] is not changed inadvertently.

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 19:11:58 +02:00
Maciej W. Rozycki
ce8b06b985 x86: I/O APIC: remove an IRQ2-mask hack
Now that IRQ2 is never made available to the I/O APIC, there is no need
to special-case it and mask as a workaround for broken systems.  Actually,
because of the former, mask_IO_APIC_irq(2) is a no-op already.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 11:43:48 +02:00
Yinghai Lu
3d88cca708 x86: fix numaq_tsc_disable calling
got this on a test-system:

 calling  numaq_tsc_disable+0x0/0x39
 NUMAQ: disabling TSC
 initcall numaq_tsc_disable+0x0/0x39 returned 0 after 0 msecs

that's because we should not be using arch_initcall to call numaq_tsc_disable.

need to call it in setup_arch before time_init()/tsc_init()
and call it in init_intel() to make the cpu feature bits right.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:45 +02:00
Yinghai Lu
7b479becdb x86, e820: remove end_user_pfn
end_user_pfn used to modify the meaning of the e820 maps.

Now that all e820 operations are cleaned up, unified, tightened up,
the e820 map always get updated to reality, we don't need to keep
this secondary mechanism anymore.

If you hit this commit in bisection it means something slipped through.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:40 +02:00
Yinghai Lu
965194c15d x86: max_low_pfn_mapped fix, #2
tighten the boundary checks around max_low_pfn_mapped - dont overmap
nor undermap into holes.

also print out tseg for AMD cpus, for diagnostic purposes.
(this is an SMM area, and we split up any big mappings around that area)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:16 +02:00
Yinghai Lu
7ab073b6e0 x86: max_low_pfn_mapped fix, #1
fix crash on Ingo's big box:

calling  pci_iommu_init+0x0/0x17
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ d0000000 size 65536 KB
PCI-DMA: using GART IOMMU.
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
BUG: unable to handle kernel paging request at ffff88000003be88
IP: [<ffffffff8026d377>] __alloc_pages_internal+0xc3/0x3f2
PGD 202063 PUD 206063 PMD 22fc00163 PTE 3b162
Oops: 0000 [1] SMP

and e820 is:

 BIOS-e820: 0000000000000000 - 000000000009ac00 (usable)
 BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000ca000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ff70000 (usable)
 BIOS-e820: 000000007ff70000 - 000000007ff86000 (ACPI data)
 BIOS-e820: 000000007ff86000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 0000000080000000 - 00000000cfe00000 (usable)
 BIOS-e820: 00000000cfe00000 - 00000000d0000000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000830000000 (usable)

system has 32 GB RAM installed.

max_low_pfn_mapped is 0xcfe00, and GART aperture is not mapped.

So try to use init_memory_mapping to map that area, because the iommu
thinks that area is ram ...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:15 +02:00
Ingo Molnar
ae94b8075a Merge branch 'linus' into x86/core
Conflicts:

	arch/x86/mm/ioremap.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 07:29:02 +02:00
Roland McGrath
eca91e7838 x86_64: fix delayed signals
On three of the several paths in entry_64.S that call
do_notify_resume() on the way back to user mode, we fail to properly
check again for newly-arrived work that requires another call to
do_notify_resume() before going to user mode.  These paths set the
mask to check only _TIF_NEED_RESCHED, but this is wrong.  The other
paths that lead to do_notify_resume() do this correctly already, and
entry_32.S does it correctly in all cases.

All paths back to user mode have to check all the _TIF_WORK_MASK
flags at the last possible stage, with interrupts disabled.
Otherwise, we miss any flags (TIF_SIGPENDING for example) that were
set any time after we entered do_notify_resume().  More work flags
can be set (or left set) synchronously inside do_notify_resume(), as
TIF_SIGPENDING can be, or asynchronously by interrupts or other CPUs
(which then send an asynchronous interrupt).

There are many different scenarios that could hit this bug, most of
them races.  The simplest one to demonstrate does not require any
race: when one signal has done handler setup at the check before
returning from a syscall, and there is another signal pending that
should be handled.  The second signal's handler should interrupt the
first signal handler before it actually starts (so the interrupted PC
is still at the handler's entry point).  Instead, it runs away until
the next kernel entry (next syscall, tick, etc).

This test behaves correctly on 32-bit kernels, and fails on 64-bit
(either 32-bit or 64-bit test binary).  With this fix, it works.

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <signal.h>
    #include <string.h>
    #include <sys/ucontext.h>

    #ifndef REG_RIP
    #define REG_RIP REG_EIP
    #endif

    static sig_atomic_t hit1, hit2;

    static void
    handler (int sig, siginfo_t *info, void *ctx)
    {
      ucontext_t *uc = ctx;

      if ((void *) uc->uc_mcontext.gregs[REG_RIP] == &handler)
        {
          if (sig == SIGUSR1)
            hit1 = 1;
          else
            hit2 = 1;
        }

      printf ("%s at %#lx\n", strsignal (sig),
              uc->uc_mcontext.gregs[REG_RIP]);
    }

    int
    main (void)
    {
      struct sigaction sa;
      sigset_t set;

      sigemptyset (&sa.sa_mask);
      sa.sa_flags = SA_SIGINFO;
      sa.sa_sigaction = &handler;

      if (sigaction (SIGUSR1, &sa, NULL)
          || sigaction (SIGUSR2, &sa, NULL))
        return 2;

      sigemptyset (&set);
      sigaddset (&set, SIGUSR1);
      sigaddset (&set, SIGUSR2);
      if (sigprocmask (SIG_BLOCK, &set, NULL))
        return 3;

      printf ("main at %p, handler at %p\n", &main, &handler);

      raise (SIGUSR1);
      raise (SIGUSR2);

      if (sigprocmask (SIG_UNBLOCK, &set, NULL))
        return 4;

      if (hit1 + hit2 == 1)
        {
          puts ("PASS");
          return 0;
        }

      puts ("FAIL");
      return 1;
    }

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 07:11:10 +02:00
Rafael J. Wysocki
da1f29f5df x86: remove conflicting nx6325 and nx6125 quirks
We have two conflicting DMA-based quirks in there for the same set of
boxes (HP nx6325 and nx6125) and one of them actually breaks my box.

So remove the extra code.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: =?iso-8859-1?q?T=F6r=F6k_Edwin?= <edwintorok@gmail.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 06:44:58 +02:00
Ingo Molnar
6c82a000a2 Merge branch 'x86/generalize-visws' into x86/core 2008-07-11 21:22:18 +02:00
Maciej W. Rozycki
5b4d2386c2 x86: Recover timer_ack lost in the merge of the NMI watchdog
In the course of the recent unification of the NMI watchdog an assignment
to timer_ack to switch off unnecesary POLL commands to the 8259A in the
case of a watchdog failure has been accidentally removed.  The statement
used to be limited to the 32-bit variation as since the rewrite of the
timer code it has been relevant for the 82489DX only.  This change brings
it back.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:54:03 +02:00
Maciej W. Rozycki
af174783b9 x86: I/O APIC: Never configure IRQ2
There is no such entity as ISA IRQ2.  The ACPI spec does not make it
explicitly clear, but does not preclude it either -- all it says is ISA
legacy interrupts are identity mapped by default (subject to overrides),
but it does not state whether IRQ2 exists or not.  As a result if there is
no IRQ0 override, then IRQ2 is normally initialised as an ISA interrupt,
which implies an edge-triggered line, which is unmasked by default as this
is what we do for edge-triggered I/O APIC interrupts so as not to miss an
edge.

To the best of my knowledge it is useless, as IRQ2 has not been in use
since the PC/AT as back then it was taken by the 8259A cascade interrupt
to the slave, with the line position in the slot rerouted to newly-created
IRQ9.  No device could thus make use of this line with the pair of 8259A
chips.  Now in theory INTIN2 of the I/O APIC may be usable, but the
interrupt of the device wired to it would not be available in the PIC mode
at all, so I seriously doubt if anybody decided to reuse it for a regular
device.

However there are two common uses of INTIN2.  One is for IRQ0, with an
ACPI interrupt override (or its equivalent in the MP table).  But in this
case IRQ2 is gone entirely with INTIN0 left vacant.  The other one is for
an 8959A ExtINTA cascade.  In this case IRQ0 goes to INTIN0 and if ACPI is
used INTIN2 is assumed to be IRQ2 (there is no override and ACPI has no
way to report ExtINTA interrupts).  This is where a problem happens.

The problem is INTIN2 is configured as a native APIC interrupt, with a
vector assigned and the mask cleared.  And the line may indeed get active
and inject interrupts if the master 8959A has its timer interrupt enabled
(it might happen for other interrupts too, but they are normally masked in
the process of rerouting them to the I/O APIC).  There are two cases where
it will happen:

* When the I/O APIC NMI watchdog is enabled.  This is actually a misnomer
  as the watchdog pulses are delivered through the 8259A to the LINT0
  inputs of all the local APICs in the system.  The implication is the
  output of the master 8259A goes high and low repeatedly, signalling
  interrupts to INTIN2 which is enabled too!

  [The origin of the name is I think for a brief period during the
  development we had a capability in our code to configure the watchdog to
  use an I/O APIC input; that would be INTIN2 in this scenario.]

* When the native route of IRQ0 via INTIN0 fails for whatever reason -- as
  it happens with the system considered here.  In this scenario the timer
  pulse is delivered through the 8259A to LINT0 input of the local APIC of
  the bootstrap processor, quite similarly to how is done for the watchdog
  described above.  The result is, again, INTIN2 receives these pulses
  too.  Rafael's system used to escape this scenario, because an incorrect
  IRQ0 override would occupy INTIN2 and prevent it from being unmasked.

My conclusion is IRQ2 should be excluded from configuration in all the
cases and the current exception for ACPI systems should be lifted.  The
reason being the exception not only being useless, but harmful as well.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:54:03 +02:00
Maciej W. Rozycki
c88ac1df48 x86: L-APIC: Always fully configure IRQ0
Unlike the 32-bit one, the 64-bit variation of the LVT0 setup code for
the "8259A Virtual Wire" through the local APIC timer configuration does
not fully configure the relevant irq_chip structure.  Instead it relies on
the preceding I/O APIC code to have set it up, which does not happen if
the I/O APIC variants have not been tried.

The patch includes corresponding changes to the 32-bit variation too
which make them both the same, barring a small syntactic difference
involving sequence of functions in the source.  That should work as an aid
with the upcoming merge.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:54:02 +02:00
Maciej W. Rozycki
1baea6e2fe x86: L-APIC: Set IRQ0 as edge-triggered
IRQ0 is edge-triggered, but the "8259A Virtual Wire" through the local
APIC configuration in the 32-bit version uses the "fasteoi" handler
suitable for level-triggered APIC interrupt.  Rewrite code so that the
"edge" handler is used.  The 64-bit version uses different code and is
unaffected.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:54:02 +02:00
Glauber Costa
557d7d4e29 x86: use matching CFI_ENDPROC
The RING0_INT_FRAME macro defines a CFI_STARTPROC.
So we should really be using CFI_ENDPROC after it.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 20:49:28 +02:00
Jeremy Fitzhardinge
8d28aab59f x86_64: add pseudo-features for 32-bit compat syscall
Add pseudo-feature bits to describe whether the CPU supports sysenter
and/or syscall from ia32-compat userspace.  This removes a hardcoded
test in vdso32-setup.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:44:57 +02:00
Ingo Molnar
3d0decc4f4 x86: fix tsc unification buglet with ftrace and stackprotector
Yinghai Lu reported crashes on 64-bit x86:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 IP: [<ffffffff80253b17>] hrtick_start_fair+0x89/0x173
 [...]

And with a long session of debugging and a lot of difficulty, tracked it down
to this commit:

 --------------->
 8fbbc4b45c is first bad commit
 commit 8fbbc4b45c
 Author: Alok Kataria <akataria@vmware.com>
 Date:   Tue Jul 1 11:43:34 2008 -0700

     x86: merge tsc_init and clocksource code
 <--------------

The problem is that the TSC unification missed these Makefile rules
in arch/x86/kernel/Makefile:

  # Do not profile debug and lowlevel utilities
  CFLAGS_REMOVE_tsc_64.o = -pg
  CFLAGS_REMOVE_tsc_32.o = -pg
  ...
  CFLAGS_tsc_64.o         := $(nostackp)
  ...

which rules make sure that various instrumentation and debugging
facilities are disabled for code that might end up in a VDSO - such as
the TSC code.

Reported-and-bisected-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Conflicts:

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:09:15 +02:00