Commit Graph

1461 Commits

Author SHA1 Message Date
Pekka Enberg
54e63f3a42 x86: ifdef 32-bit specific setup in init_memory_mapping()
Impact: cleanup

Enabling NX, PSE, and PGE are only required on 32-bit so ifdef them
in both versions of the function.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-5-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:12 +01:00
Pekka Enberg
e7179853e7 x86: move pgd_base out of init_memory_mapping()
Impact: cleanup

This patch moves pgd_base out of init_memory_mapping() to reduce
the diff between the 32-bit version and the 64-bit version of the
function.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-4-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:12 +01:00
Pekka Enberg
49a2bf7303 x86: find_early_table_space() unification
Impact: cleanup

There are some minor differences between the 32-bit and 64-bit
find_early_table_space() functions. This patch wraps those
differences under CONFIG_X86_32 to make the function identical
on both configurations.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-3-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:11 +01:00
Pekka Enberg
4bbd4fa038 x86: add gbpages support to 32-bit init_memory_mapping()
Impact: cleanup

To reduce the diff between the 32-bit and 64-bit versions of
init_memory_mapping(), add gbpages support to the 32-bit version.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-2-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:11 +01:00
Pekka Enberg
c3f5d2d8b5 x86: init_memory_mapping() trivial cleanups
Impact: cleanup

To reduce the diff between the 32-bit and 64-bit versions of
init_memory_mapping(), fix up all trivial issues.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-1-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05 14:17:10 +01:00
Yinghai Lu
fc5efe3941 x86: fix bootmem cross node for 32bit numa, cleanup
Impact: clean up

Simplify the code, reuse some lines.
Remove min_low_pfn reference, it is always 0

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AEE2C4.2030602@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 22:09:59 +01:00
Pekka Enberg
731ddea636 x86: move free_initrd_mem() to common mm/init.c
Impact: cleanup

The function is identical on 32-bit and 64-bit configurations so move it to the
common mm/init.c file.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236158020.29024.28.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:59:26 +01:00
Yinghai Lu
b68adb16f2 x86: make 32-bit init_memory_mapping range change more like 64-bit
Impact: cleanup

make code more readable and more like 64-bit

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AE48B4.8010907@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:55:03 +01:00
Yinghai Lu
a71edd1f46 x86: fix bootmem cross node for 32bit numa
Impact: fix panic on system 2g x4 sockets

Found one system with 4 sockets and every sockets has 2g can not boot
with numa32 because boot mem is crossing nodes.

So try to have numa version of setup_bootmem_allocator().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AE485B.8000902@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:55:03 +01:00
Ingo Molnar
6d2e91bf80 Merge branch 'x86/urgent' into x86/mm
Conflicts:
	arch/x86/include/asm/fixmap_64.h
Semantic merge:
	arch/x86/include/asm/fixmap.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 20:20:10 +01:00
Pekka Enberg
6298e719cf x86: set_highmem_pages_init() cleanup, #2
Impact: cleanup

The zones are set up at this stage so there's a highmem zone
available even for the UMA case.

The only difference there is that for machines that have
CONFIG_HIGHMEM enabled but don't have any highmem available,
->zone_start_pfn is zero whereas highstart_pfn is non-zero).

The field is left zeroed because of the !size test in
free_area_init_core() but shouldn't be a problem because
add_highpages_with_active_regions() handles empty ranges just
fine.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Mel Gorman <mel@csn.ul.ie>
LKML-Reference: <1236154567.29024.23.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 19:00:51 +01:00
Pekka Enberg
540aca06b7 x86: move devmem_is_allowed() to common mm/init.c
Impact: cleanup

The function is identical on 32-bit and 64-bit configurations so move
it to the common mm/init.c file.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236160001.29024.29.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 11:40:04 +01:00
Ingo Molnar
a1be621dfa Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc7' into tracing/core 2009-03-04 11:14:47 +01:00
Jeremy Fitzhardinge
f254f3909e x86: un-__init fill_pud/pmd/pte
They are used by __set_fixmap->set_pte_vaddr_pud, which can
be used by arch_setup_additional_pages(), and so is used
after init.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-04 02:29:36 +01:00
Ingo Molnar
91d75e209b Merge branch 'x86/core' into core/percpu 2009-03-04 02:29:19 +01:00
Ingo Molnar
8b0e5860cb Merge branches 'x86/apic', 'x86/cpu', 'x86/fixmap', 'x86/mm', 'x86/sched', 'x86/setup-lzma', 'x86/signal' and 'x86/urgent' into x86/core 2009-03-04 02:22:31 +01:00
Linus Torvalds
3024e4a997 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: oprofile: don't set counter width from cpuid on Core2
  x86: fix init_memory_mapping() to handle small ranges
2009-03-03 14:32:55 -08:00
Linus Torvalds
f2a4165526 Merge branch 'tracing/mmiotrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing/mmiotrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86 mmiotrace: fix race with release_kmmio_fault_page()
  x86 mmiotrace: improve handling of secondary faults
  x86 mmiotrace: split set_page_presence()
  x86 mmiotrace: fix save/restore page table state
  x86 mmiotrace: WARN_ONCE if dis/arming a page fails
  x86: add far read test to testmmiotrace
  x86: count errors in testmmiotrace.ko
2009-03-03 14:32:37 -08:00
Ingo Molnar
03787ceed8 x86: set_highmem_pages_init() cleanup, fix !CONFIG_NUMA && CONFIG_HIGHMEM=y
Impact: build fix

 arch/x86/mm/highmem_32.c:187: error: static declaration of 'set_highmem_pages_init' follows non-static declaration
 arch/x86/include/asm/numa_32.h:8: error: previous declaration of 'set_highmem_pages_init' was here

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236082212.2675.24.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 15:32:24 +01:00
Pekka Enberg
867c5b5292 x86: set_highmem_pages_init() cleanup
Impact: cleanup

This patch moves set_highmem_pages_init() to arch/x86/mm/highmem_32.c.

The declaration of the function is kept in asm/numa_32.h because
asm/highmem.h is included only if CONFIG_HIGHMEM is enabled so we
can't put the empty static inline function there.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236082212.2675.24.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 13:13:15 +01:00
Pekka Enberg
e5b2bb5527 x86: unify free_init_pages() and free_initmem()
Impact: unification

This patch introduces a common arch/x86/mm/init.c and moves the identical
free_init_pages() and free_initmem() functions to the file.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236078906.2675.18.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:21:18 +01:00
Pekka Enberg
e087edd8c0 x86: make sure initmem is writable on 64-bit
Impact: unification

This patch ports commit 3c1df68b84 ("x86: make
sure initmem is writable") to the 64-bit version to unify implementations of
free_init_pages().

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <1236078904.2675.17.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:21:18 +01:00
Pekka Enberg
05f209e7b9 x86: add sanity checks to init_32.c
Impact: unification

This patch adds sanity checks that are already in init_64.c to init_32.c.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236078902.2675.16.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:21:17 +01:00
Pekka Enberg
fd578f9c0a x86: use roundup() instead of PAGE_ALIGN() in find_early_table_space()
Impact: cleanup

This patch changes find_early_table_space() to use roundup() for rounding up
tables to page size to unify the common parts of the 32-bit and 64-bit
implementations.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236077705.2675.6.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:07:00 +01:00
Pekka Enberg
2b688dfd0a x86: move __VMALLOC_RESERVE to pgtable_32.c
Impact: cleanup

The __VMALLOC_RESERVE global variable is not used in init_32.c. Move that to
pgtable_32.c to reduce the diff between init_32.c and init_64.c.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236077704.2675.4.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 12:06:59 +01:00
Yinghai Lu
0fc59d3a01 x86: fix init_memory_mapping() to handle small ranges
Impact: fix failed EFI bootup in certain circumstances

Ying Huang found init_memory_mapping() has problem with small ranges
less than 2M when he tried to direct map the EFI runtime code out of
max_low_pfn_mapped.

It turns out we never considered that case and didn't check the range...

Reported-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Maly <bmaly@redhat.com>
LKML-Reference: <49ACDDED.1060508@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03 08:50:22 +01:00
Ingo Molnar
fdfa66ab45 Merge branches 'tracing/ftrace', 'tracing/mmiotrace' and 'linus' into tracing/core 2009-03-02 22:37:35 +01:00
Pekka Paalanen
340430c572 x86 mmiotrace: fix race with release_kmmio_fault_page()
There was a theoretical possibility to a race between arming a page in
post_kmmio_handler() and disarming the page in
release_kmmio_fault_page():

cpu0                             cpu1
------------------------------------------------------------------
mmiotrace shutdown
enter release_kmmio_fault_page
                                 fault on the page
                                 disarm the page
disarm the page
                                 handle the MMIO access
                                 re-arm the page
put the page on release list
remove_kmmio_fault_pages()
                                 fault on the page
                                 page not known to mmiotrace
                                 fall back to do_page_fault()
                                 *KABOOM*

(This scenario also shows the double disarm case which is allowed.)

Fixed by acquiring kmmio_lock in post_kmmio_handler() and checking
if the page is being released from mmiotrace.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:37 +01:00
Stuart Bennett
3e39aa156a x86 mmiotrace: improve handling of secondary faults
Upgrade some kmmio.c debug messages to warnings.
Allow secondary faults on probed pages to fall through, and only log
secondary faults that are not due to non-present pages.

Patch edited by Pekka Paalanen.

Signed-off-by: Stuart Bennett <stuart@freedesktop.org>
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:37 +01:00
Pekka Paalanen
0b700a6a25 x86 mmiotrace: split set_page_presence()
From 36772dcb6ffbbb68254cbfc379a103acd2fbfefc Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Sat, 28 Feb 2009 21:34:59 +0200

Split set_page_presence() in kmmio.c into two more functions set_pmd_presence()
and set_pte_presence(). Purely code reorganization, no functional changes.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:36 +01:00
Pekka Paalanen
5359b585fb x86 mmiotrace: fix save/restore page table state
From baa99e2b32449ec7bf147c234adfa444caecac8a Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Sun, 22 Feb 2009 20:02:43 +0200

Blindly setting _PAGE_PRESENT in disarm_kmmio_fault_page() overlooks the
possibility, that the page was not present when it was armed.

Make arm_kmmio_fault_page() store the previous page presence in struct
kmmio_fault_page and use it on disarm.

This patch was originally written by Stuart Bennett, but Pekka Paalanen
rewrote it a little different.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:36 +01:00
Stuart Bennett
e9d54cae8f x86 mmiotrace: WARN_ONCE if dis/arming a page fails
Print a full warning once, if arming or disarming a page fails.

Also, if initial arming fails, do not handle the page further. This
avoids the possibility of a page failing to arm and then later claiming
to have handled any fault on that page.

WARN_ONCE added by Pekka Paalanen.

Signed-off-by: Stuart Bennett <stuart@freedesktop.org>
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:35 +01:00
Pekka Paalanen
5ff93697fc x86: add far read test to testmmiotrace
Apparently pages far into an ioremapped region might not actually be
mapped during ioremap(). Add an optional read test to try to trigger a
multiply faulting MMIO access. Also add more messages to the kernel log
to help debugging.

This patch is based on a patch suggested by
Stuart Bennett <stuart@freedesktop.org>
who discovered bugs in mmiotrace related to normal kernel space faults.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:35 +01:00
Pekka Paalanen
fab852aaf7 x86: count errors in testmmiotrace.ko
Check the read values against the written values in the MMIO read/write
test. This test shows if the given MMIO test area really works as
memory, which is a prerequisite for a successful mmiotrace test.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:34 +01:00
Ingo Molnar
55f2b78995 Merge branch 'x86/urgent' into x86/pat 2009-03-01 12:47:58 +01:00
Ingo Molnar
f5c1aa1537 Revert "gpu/drm, x86, PAT: PAT support for io_mapping_*"
This reverts commit 17581ad812.

Sitsofe Wheeler reported that /dev/dri/card0 is MIA on his EeePC 900
and bisected it to this commit.

Graphics card is an i915 in an EeePC 900:

 00:02.0 VGA compatible controller [0300]:
   Intel Corporation Mobile 915GM/GMS/910GML
     Express Graphics Controller [8086:2592] (rev 04)

( Most likely the ioremap() of the driver failed and hence the card
  did not initialize. )

Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-01 12:47:49 +01:00
Ingo Molnar
92b9af9e4f x86: i915 needs pgprot_writecombine() and is_io_mapping_possible()
Impact: build fix

Theodore Ts reported that the i915 driver needs these symbols:

 ERROR: "pgprot_writecombine" [drivers/gpu/drm/i915/i915.ko] undefined!
 ERROR: "is_io_mapping_possible" [drivers/gpu/drm/i915/i915.ko] undefined!

Reported-by: Theodore Ts'o <tytso@mit.edu> wrote:
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 14:22:44 +01:00
Gustavo F. Padovan
fd862dde18 x86, fixmap: define reserve_top_address for x86_64
Impact: new interface (not yet use)

Define reserve_top_address for x86_64; only for later x86 integration.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Ingo Molnar
ecc25fbd6b Merge branches 'x86/apic', 'x86/defconfig', 'x86/memtest', 'x86/mm' and 'linus' into x86/core 2009-02-26 06:31:32 +01:00
Ingo Molnar
801c0be814 Merge branches 'x86/urgent' and 'x86/pat' into x86/core
Conflicts:
	arch/x86/include/asm/pat.h
2009-02-26 06:31:23 +01:00
Ingo Molnar
13b2eda64d Merge branch 'x86/urgent' into x86/core
Conflicts:
	arch/x86/mach-voyager/voyager_smp.c
2009-02-26 06:30:42 +01:00
Ingo Molnar
13093cb0e5 gpu/drm, x86, PAT: PAT support for io_mapping_*, export symbols for modules
Impact: build fix

 ERROR: "reserve_io_memtype_wc" [drivers/gpu/drm/i915/i915.ko] undefined!
 ERROR: "free_io_memtype" [drivers/gpu/drm/i915/i915.ko] undefined!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 03:43:53 +01:00
Ingo Molnar
2e31add2a7 Merge branch 'x86/urgent' into x86/pat 2009-02-25 16:40:10 +01:00
Venkatesh Pallipadi
17581ad812 gpu/drm, x86, PAT: PAT support for io_mapping_*
Make io_mapping_create_wc and io_mapping_free go through PAT to make sure
that there are no memory type aliases.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:09:52 +01:00
Venkatesh Pallipadi
4ab0d47d0a gpu/drm, x86, PAT: io_mapping_create_wc and resource_size_t
io_mapping_create_wc should take a resource_size_t parameter in place of
unsigned long. With unsigned long, there will be no way to map greater than 4GB
address in i386/32 bit.

On x86, greater than 4GB addresses cannot be mapped on i386 without PAE. Return
error for such a case.

Patch also adds a structure for io_mapping, that saves the base, size and
type on HAVE_ATOMIC_IOMAP archs, that can be used to verify the offset on
io_mapping_map calls.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:09:51 +01:00
Venkatesh Pallipadi
7880f74645 gpu/drm, x86, PAT: routine to keep identity map in sync
Add a function to check and keep identity maps in sync, when changing
any memory type. One of the follow on patches will also use this
routine.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:09:51 +01:00
Andreas Herrmann
63823126c2 x86: memtest: add additional (regular) test patterns
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:47 +01:00
Andreas Herrmann
bfb4dc0da4 x86: memtest: wipe out test pattern from memory
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:46 +01:00
Andreas Herrmann
570c9e69aa x86: memtest: adapt log messages
- print test pattern instead of pattern number,
- show pattern as stored in memory,
- use proper priority flags,
- consistent use of u64 throughout the code

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:46 +01:00
Andreas Herrmann
7dad169e57 x86: memtest: cleanup memtest function
Impact: code cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:45 +01:00
Andreas Herrmann
6d74171bf7 x86: memtest: introduce array to select memtest patterns
Impact: code cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:45 +01:00
Andreas Herrmann
40823f737e x86: memtest: reuse test patterns when memtest parameter exceeds number of available patterns
Impact: fix unexpected behaviour when pattern number is out of range

Current implementation provides 4 patterns for memtest. The code doesn't
check whether the memtest parameter value exceeds the maximum pattern number.

Instead the memtest code pretends to test with non-existing patterns, e.g.
when booting with memtest=10 I've observed the following

  ...
  early_memtest: pattern num 10
  0000001000 - 0000006000 pattern 0
  ...
  0000001000 - 0000006000 pattern 1
  ...
  0000001000 - 0000006000 pattern 2
  ...
  0000001000 - 0000006000 pattern 3
  ...
  0000001000 - 0000006000 pattern 4
  ...
  0000001000 - 0000006000 pattern 5
  ...
  0000001000 - 0000006000 pattern 6
  ...
  0000001000 - 0000006000 pattern 7
  ...
  0000001000 - 0000006000 pattern 8
  ...
  0000001000 - 0000006000 pattern 9
  ...

But in fact Linux didn't test anything for patterns > 4 as the default
case in memtest() is to leave the function.

I suggest to use the memtest parameter as the number of tests to be
performed and to re-iterate over all existing patterns.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:44 +01:00
Ingo Molnar
0edcf8d692 Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu
Conflicts:
	arch/x86/include/asm/pgtable.h
2009-02-24 21:52:45 +01:00
Ingo Molnar
a852cbfaaf Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm', 'x86/signal' and 'x86/urgent'; commit 'v2.6.29-rc6' into x86/core 2009-02-24 21:50:43 +01:00
Tejun Heo
458a3e644c x86: update populate_extra_pte() and add populate_extra_pmd()
Impact: minor change to populate_extra_pte() and addition of pmd flavor

Update populate_extra_pte() to return pointer to the pte_t for the
specified address and add populate_extra_pmd() which only populates
till the pmd and returns pointer to the pmd entry for the address.

For 64bit, pud/pmd/pte fill functions are separated out from
set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and
populate_extra_{pte|pmd}().

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24 11:57:21 +09:00
Ingo Molnar
fc6fc7f1b1 Merge branch 'linus' into x86/apic
Conflicts:
	arch/x86/mach-default/setup.c

Semantic conflict resolution:
	arch/x86/kernel/setup.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 20:05:19 +01:00
Ingo Molnar
b319eed0aa x86, mm: fault.c, simplify kmmio_fault(), cleanup
Clarify the kmmio_fault() comment.

Acked-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 10:24:18 +01:00
Hannes Eder
2366c298b5 x86: numa_32.c: fix sparse warning: Using plain integer as NULL pointer
Fix this sparse warning:

  arch/x86/mm/numa_32.c:197:24: warning: Using plain integer as NULL pointer

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 09:27:12 +01:00
Ingo Molnar
f8eeb2e6be x86, mm: fault.c, update copyrights
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 23:13:36 +01:00
Ingo Molnar
cd1b68f08f x86, mm: fault.c, give another attempt at prefetch handing before SIGBUS
Impact: extend prefetch handling on 64-bit

Currently there's an extra is_prefetch() check done in do_sigbus(),
which we only do on 32 bits.

This is a last-ditch check before we terminate a task, so it's worth
giving prefetch instructions another chance - should none of our
existing quirks have caught a prefetch instruction related spurious
fault.

The only risk is if a prefetch causes a real sigbus, in that case
we'll not OOM but try another fault. But this code has been on
32-bit for a long time, so it should be fine in practice.

So do this on 64-bit too - and thus remove one more #ifdef.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:46 +01:00
Ingo Molnar
7c178a26d3 x86, mm: fault.c, remove #ifdef from fault_in_kernel_space()
Impact: cleanup

Removal of an #ifdef in fault_in_kernel_space(), by making
use of the new TASK_SIZE_MAX symbol which is now available
on 32-bit too.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:45 +01:00
Ingo Molnar
d951734654 x86, mm: rename TASK_SIZE64 => TASK_SIZE_MAX
Impact: cleanup

Rename TASK_SIZE64 to TASK_SIZE_MAX, and provide the
define on 32-bit too. (mapped to TASK_SIZE)

This allows 32-bit code to make use of the (former-) TASK_SIZE64
symbol as well, in a clean way.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:44 +01:00
Ingo Molnar
c3731c6866 x86, mm: fault.c, remove #ifdef from do_page_fault()
Impact: cleanup

do_page_fault() has this ugly #ifdef in its prototype:

  #ifdef CONFIG_X86_64
  asmlinkage
  #endif
  void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)

Replace it with 'dotraplinkage' which maps to exactly the above
construct: nothing on 32-bit and asmlinkage on 64-bit.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:44 +01:00
Ingo Molnar
1cc99544dd x86, mm: fault.c, unify oops handling
Impact: add oops-recursion check to 32-bit

Unify the oops state-machine, to the 64-bit version. It is
slightly more careful in that it does a recursion check
in oops_begin(), and is thus more likely to show the relevant
oops.

It also means that 32-bit will print one more line at the
end of pagefault triggered oopses:

 	printk(KERN_EMERG "CR2: %016lx\n", address);

Which is generally good information to be seen in partial-dump
digital-camera jpegs ;-)

The downside is the somewhat more complex critical path. Both
variants have been tested well meanwhile by kernel developers
crashing their boxes so i dont think this is a practical worry.

This removes 3 ugly #ifdefs from no_context() and makes the
function a lot nicer read.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:44 +01:00
Ingo Molnar
8f7661496c x86, mm: fault.c, unify oops printing
Impact: refine/extend page fault related oops printing on 64-bit

 - honor the pause_on_oops logic on 64-bit too
 - print out NX fault warnings on 64-bit as well
 - factor out the NX fault message to make it git-greppable and readable

Note that this means that we do the PF_INSTR check on 32-bit non-PAE
as well where it should not occur ... normally. Cannot hurt.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:43 +01:00
Ingo Molnar
f2f13a8535 x86, mm: fault.c, reorder functions
Impact: cleanup

Avoid a couple more #ifdefs by moving fundamentally non-unifiable
functions into a single #ifdef 32-bit / #else / #endif block in
fault.c: vmalloc*(), dump_pagetable(), check_vm8086_mode().

No code changed:

   text	   data	    bss	    dec	    hex	filename
   4618	     32	     24	   4674	   1242	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:43 +01:00
Ingo Molnar
b18018126f x86, mm, kprobes: fault.c, simplify notify_page_fault()
Impact: cleanup

Remove an #ifdef from notify_page_fault(). The function still
compiles to nothing in the !CONFIG_KPROBES case.

Introduce kprobes_built_in() and kprobe_fault_handler() helpers
to allow this - they returns 0 if !CONFIG_KPROBES.

No code changed:

   text	   data	    bss	    dec	    hex	filename
   4618	     32	     24	   4674	   1242	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:42 +01:00
Ingo Molnar
b814d41f09 x86, mm: fault.c, simplify kmmio_fault()
Impact: cleanup

Remove an #ifdef from kmmio_fault() - we can do this by
providing default implementations for is_kmmio_active()
and kmmio_handler(). The compiler optimizes it all away
in the !CONFIG_MMIOTRACE case.

Also, while at it, clean up mmiotrace.h a bit:

 - standard header guards
 - standard vertical spaces for structure definitions

No code changed (both with mmiotrace on and off in the config):

   text	   data	    bss	    dec	    hex	filename
   2947	     12	     12	   2971	    b9b	fault.o.before
   2947	     12	     12	   2971	    b9b	fault.o.after

Cc: Pekka Paalanen <pq@iki.fi>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:42 +01:00
Ingo Molnar
121d5d0a7e x86, mm: fault.c, enable PF_RSVD checks on 32-bit too
Impact: improve page fault handling robustness

The 'PF_RSVD' flag (bit 3) of the page-fault error_code is a
relatively recent addition to x86 CPUs, so the 32-bit do_fault()
implementation never had it. This flag gets set when the CPU
detects nonzero values in any reserved bits of the page directory
entries.

Extend the existing 64-bit check for PF_RSVD in do_page_fault()
to 32-bit too. If we detect such a fault then we print a more
informative oops and the pagetables.

This unifies the code some more, removes an ugly #ifdef and improves
the 32-bit page fault code robustness a bit. It slightly increases
the 32-bit kernel text size.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:41 +01:00
Ingo Molnar
8c938f9fae x86, mm: fault.c, factor out the vm86 fault check
Impact: cleanup

Instead of an ugly, open-coded, #ifdef-ed vm86 related legacy check
in do_page_fault(), put it into the check_v8086_mode() helper
function and merge it with an existing #ifdef.

Also, simplify the code flow a tiny bit in the helper.

No code changed:

arch/x86/mm/fault.o:

   text	   data	    bss	    dec	    hex	filename
   2711	     12	     12	   2735	    aaf	fault.o.before
   2711	     12	     12	   2735	    aaf	fault.o.after

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:41 +01:00
Ingo Molnar
107a03678c x86, mm: fault.c, refactor/simplify the is_prefetch() code
Impact: no functionality changed

Factor out the opcode checker into a helper inline.

The code got a tiny bit smaller:

   text	   data	    bss	    dec	    hex	filename
   4632	     32	     24	   4688	   1250	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

And it got cleaner / easier to review as well.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:40 +01:00
Ingo Molnar
2d4a71676f x86, mm: fault.c cleanup
Impact: cleanup, no code changed

Clean up various small details, which can be correctness checked
automatically:

 - tidy up the include file section
 - eliminate unnecessary includes
 - introduce show_signal_msg() to clean up code flow
 - standardize the code flow
 - standardize comments and other style details
 - more cleanups, pointed out by checkpatch

No code changed on either 32-bit nor 64-bit:

arch/x86/mm/fault.o:

   text	   data	    bss	    dec	    hex	filename
   4632	     32	     24	   4688	   1250	fault.o.before
   4632	     32	     24	   4688	   1250	fault.o.after

the md5 changed due to a change in a single instruction:

   2e8a8241e7f0d69706776a5a26c90bc0  fault.o.before.asm
   c5c3d36e725586eb74f0e10692f0193e  fault.o.after.asm

Because a __LINE__ reference in a WARN_ONCE() has changed.

On 32-bit a few stack offsets changed - no code size difference
nor any functionality difference.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:39 +01:00
Steven Rostedt
1623963097 ftrace, x86: make kernel text writable only for conversions
Impact: keep kernel text read only

Because dynamic ftrace converts the calls to mcount into and out of
nops at run time, we needed to always keep the kernel text writable.

But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
the kernel code to writable before ftrace modifies the text, and converts
it back to read only afterward.

The kernel text is converted to read/write, stop_machine is called to
modify the code, then the kernel text is converted back to read only.

The original version used SYSTEM_STATE to determine when it was OK
or not to change the code to rw or ro. Andrew Morton pointed out that
using SYSTEM_STATE is a bad idea since there is no guarantee to what
its state will actually be.

Instead, I moved the check into the set_kernel_text_* functions
themselves, and use a local variable to determine when it is
OK to change the kernel text RW permissions.

[ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ]

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-20 14:30:06 -05:00
Ingo Molnar
c9e1585b1b Merge branch 'tip/x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into x86/mm 2009-02-20 18:51:43 +01:00
Ingo Molnar
7a5714e018 x86, pat: add large-PAT check to split_large_page()
Impact: future-proof the split_large_page() function

Linus noticed that split_large_page() is not safe wrt. the
PAT bit: it is bit 12 on the 1GB and 2MB page table level
(_PAGE_BIT_PAT_LARGE), and it is bit 7 on the 4K page
table level (_PAGE_BIT_PAT).

Currently it is not a problem because we never set
_PAGE_BIT_PAT_LARGE on any of the large-page mappings - but
should this happen in the future the split_large_page() would
silently lift bit 12 into the lowlevel 4K pte and would start
corrupting the physical page frame offset. Not fun.

So add a debug warning, to make sure if something ever sets
the PAT bit then this function gets updated too.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 17:48:49 +01:00
Steven Rostedt
3c3e5694ad x86: check PMD in spurious_fault handler
Impact: fix to prevent hard lockup on bad PMD permissions

If the PMD does not have the correct permissions for a page access,
but the PTE does, the spurious fault handler will mistake the fault
as a lazy TLB transaction. This will result in an infinite loop of:

 fault -> spurious_fault check (pass) -> return to code -> fault

This patch adds a check and a warn on if the PTE passes the permissions
but the PMD does not.

[ Updated: Ingo Molnar suggested using WARN_ONCE with some text ]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-20 11:44:47 -05:00
Ingo Molnar
3b6f7b9beb Merge branch 'x86/urgent' into x86/core 2009-02-20 17:40:43 +01:00
Ingo Molnar
07a66d7c53 x86: use the right protections for split-up pagetables
Steven Rostedt found a bug in where in his modified kernel
ftrace was unable to modify the kernel text, due to the PMD
itself having been marked read-only as well in
split_large_page().

The fix, suggested by Linus, is to not try to 'clone' the
reference protection of a huge-page, but to use the standard
(and permissive) page protection bits of KERNPG_TABLE.

The 'cloning' makes sense for the ptes but it's a confused and
incorrect concept at the page table level - because the
pagetable entry is a set of all ptes and hence cannot
'clone' any single protection attribute - the ptes can be any
mixture of protections.

With the permissive KERNPG_TABLE, even if the pte protections
get changed after this point (due to ftrace doing code-patching
or other similar activities like kprobes), the resulting combined
protections will still be correct and the pte's restrictive
(or permissive) protections will control it.

Also update the comment.

This bug was there for a long time but has not caused visible
problems before as it needs a rather large read-only area to
trigger. Steve possibly hacked his kernel with some really
large arrays or so. Anyway, the bug is definitely worth fixing.

[ Huang Ying also experienced problems in this area when writing
  the EFI code, but the real bug in split_large_page() was not
  realized back then. ]

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Huang Ying <ying.huang@intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 08:35:03 +01:00
Tejun Heo
11124411aa x86: convert to the new dynamic percpu allocator
Impact: use new dynamic allocator, unified access to static/dynamic
        percpu memory

Convert to the new dynamic percpu allocator.

* implement populate_extra_pte() for both 32 and 64
* update setup_per_cpu_areas() to use pcpu_setup_static()
* define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr()
* define config HAVE_DYNAMIC_PER_CPU_AREA

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-20 16:29:09 +09:00
KAMEZAWA Hiroyuki
f2dbcfa738 mm: clean up for early_pfn_to_nid()
What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:55 -08:00
Linus Torvalds
35010334aa Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, vm86: fix preemption bug
  x86, olpc: fix model detection without OFW
  x86, hpet: fix for LS21 + HPET = boot hang
  x86: CPA avoid repeated lazy mmu flush
  x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context
  x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption
  x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem
  x86/cpa: make sure cpa is safe to call in lazy mmu mode
  x86, ptrace, mm: fix double-free on race
2009-02-17 14:27:39 -08:00
Ingo Molnar
e641f5f525 x86, apic: remove duplicate asm/apic.h inclusions
Impact: cleanup

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 17:52:44 +01:00
Ingo Molnar
7b6aa335ca x86, apic: remove genapic.h
Impact: cleanup

Remove genapic.h and remove all references to it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 17:52:44 +01:00
Ingo Molnar
b233969eaa Merge branch 'x86/untangle2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/headers
Conflicts:
	arch/x86/include/asm/page.h
	arch/x86/include/asm/pgtable.h
	arch/x86/mach-voyager/voyager_smp.c
	arch/x86/mm/fault.c
2009-02-13 13:09:00 +01:00
Ingo Molnar
7032e86967 Merge branches 'x86/paravirt', 'x86/pat', 'x86/setup-v2', 'x86/subarch', 'x86/uaccess' and 'x86/urgent' into x86/core 2009-02-13 09:47:32 +01:00
Ingo Molnar
f268fe7333 Merge branch 'x86/mm' into x86/core 2009-02-13 09:47:24 +01:00
Ingo Molnar
a56cdcb662 Merge branches 'x86/acpi', 'x86/asm', 'x86/cpudetect', 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/doc', 'x86/header-fixes', 'x86/headers' and 'x86/minor-fixes' into x86/core 2009-02-13 09:46:36 +01:00
Ingo Molnar
ab639f3593 Merge branch 'core/percpu' into x86/core 2009-02-13 09:45:09 +01:00
Ingo Molnar
f8a6b2b9ce Merge branch 'linus' into x86/apic
Conflicts:
	arch/x86/kernel/acpi/boot.c
	arch/x86/mm/fault.c
2009-02-13 09:44:22 +01:00
Thomas Gleixner
7ad9de6ac8 x86: CPA avoid repeated lazy mmu flush
Impact: Flush the lazy MMU only once

Pending mmu updates only need to be flushed once to bring the
in-memory pagetable state up to date.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-02-12 23:11:58 +01:00
Ingo Molnar
d88316c243 x86, 32-bit: refactor find_low_pfn_range()
Impact: cleanup

Make the max_low_pfn logic a bit more standard between
lowmem_pfn_init() and highmem_pfn_init().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-12 15:21:17 +01:00
Ingo Molnar
4769843bc2 x86, 32-bit: clean up find_low_pfn_range()
Impact: cleanup

Split find_low_pfn_range() into two functions:

 - lowmem_pfn_init()
 - highmem_pfn_init()

The former gets called if all of RAM fits into lowmem,
otherwise we call highmem_pfn_init().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-12 15:21:16 +01:00
Ingo Molnar
3023533de4 x86: fix warning in find_low_pfn_range()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-12 15:21:15 +01:00
Suresh Siddha
be03d9e802 x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem
Jeff Mahoney reported:

> With Suse's hwinfo tool, on -tip:
> WARNING: at arch/x86/mm/pat.c:637 reserve_pfn_range+0x5b/0x26d()

reserve_pfn_range() is not tracking the memory range below 1MB
as non-RAM and as such is inconsistent with similar checks in
reserve_memtype() and free_memtype()

Rename the pagerange_is_ram() to pat_pagerange_is_ram() and add the
"track legacy 1MB region as non RAM" condition.

And also, fix reserve_pfn_range() to return -EINVAL, when the pfn
range is RAM. This is to be consistent with this API design.

Reported-and-tested-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-12 08:27:27 +01:00
Jeremy Fitzhardinge
4f06b0436b x86/cpa: make sure cpa is safe to call in lazy mmu mode
Impact: fix race leading to crash under KVM and Xen

The CPA code may be called while we're in lazy mmu update mode - for
example, when using DEBUG_PAGE_ALLOC and doing a slab allocation
in an interrupt handler which interrupted a lazy mmu update.  In this
case, the in-memory pagetable state may be out of date due to pending
queued updates.  We need to flush any pending updates before inspecting
the page table.  Similarly, we must explicitly flush any modifications
CPA may have made (which comes down to flushing queued operations when
flushing the TLB).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-12 08:27:26 +01:00
Jaswinder Singh Rajput
7651194fb7 x86: mm/init_32.c fix compilation warning
arch/x86/mm/init_32.c: In function ‘find_low_pfn_range’:
 arch/x86/mm/init_32.c:696: warning: format ‘%u’ expects type ‘unsigned int’, but

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-11 21:00:47 +01:00
Jeremy Fitzhardinge
9049a11de7 Merge commit 'remotes/tip/x86/paravirt' into x86/untangle2
* commit 'remotes/tip/x86/paravirt': (175 commits)
  xen: use direct ops on 64-bit
  xen: make direct versions of irq_enable/disable/save/restore to common code
  xen: setup percpu data pointers
  xen: fix 32-bit build resulting from mmu move
  x86/paravirt: return full 64-bit result
  x86, percpu: fix kexec with vmlinux
  x86/vmi: fix interrupt enable/disable/save/restore calling convention.
  x86/paravirt: don't restore second return reg
  xen: setup percpu data pointers
  x86: split loading percpu segments from loading gdt
  x86: pass in cpu number to switch_to_new_gdt()
  x86: UV fix uv_flush_send_and_wait()
  x86/paravirt: fix missing callee-save call on pud_val
  x86/paravirt: use callee-saved convention for pte_val/make_pte/etc
  x86/paravirt: implement PVOP_CALL macros for callee-save functions
  x86/paravirt: add register-saving thunks to reduce caller register pressure
  x86/paravirt: selectively save/restore regs around pvops calls
  x86: fix paravirt clobber in entry_64.S
  x86/pvops: add a paravirt_ident functions to allow special patching
  xen: move remaining mmu-related stuff into mmu.c
  ...

Conflicts:
	arch/x86/mach-voyager/voyager_smp.c
	arch/x86/mm/fault.c
2009-02-11 11:52:22 -08:00
Ingo Molnar
5d96218b4a Merge branch 'x86/uaccess' into core/percpu 2009-02-10 00:40:48 +01:00
Ingo Molnar
249d51b53a Merge commit 'v2.6.29-rc4' into core/percpu
Conflicts:
	arch/x86/mach-voyager/voyager_smp.c
	arch/x86/mm/fault.c
2009-02-09 14:58:11 +01:00
Brian Gerst
44581a28e8 x86: fix abuse of per_cpu_offset
Impact: bug fix

Don't use per_cpu_offset() to determine if it valid to access a
per-cpu variable for a given cpu number.  It is not a valid assumption
on x86-64 anymore. Use cpu_possible() instead.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-09 10:30:30 +01:00
Ingo Molnar
0464ac9ebd Merge branch 'linus' into x86/mm
Conflicts:
	arch/x86/mm/fault.c
2009-02-06 14:42:54 +01:00
Masami Hiramatsu
9be260a646 prevent kprobes from catching spurious page faults
Prevent kprobes from catching spurious faults which will cause infinite
recursive page-fault and memory corruption by stack overflow.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: <stable@kernel.org>		[2.6.28.x]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05 17:01:50 -08:00
Hiroshi Shimamoto
0973a06cde x86: mm: introduce helper function in fault.c
Impact: cleanup

Introduce helper function fault_in_kernel_address() to make editors happy.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-04 16:16:55 -08:00
Ingo Molnar
8f47e16348 x86: update copyrights
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-31 04:21:18 +01:00
Peter Zijlstra
010060741a x86: add might_sleep() to do_page_fault()
Impact: widen debug checks

VirtualBox calls do_page_fault() from an atomic context but runs into a
might_sleep() way pas this point, cure that.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-29 16:03:34 +01:00
Ingo Molnar
3e5095d152 x86: replace CONFIG_X86_SMP with CONFIG_SMP
The x86/Voyager subarch used to have this distinction between
 'x86 SMP support' and 'Voyager SMP support':

 config X86_SMP
	bool
	depends on SMP && ((X86_32 && !X86_VOYAGER) || X86_64)

This is a pointless distinction - Voyager can (and already does) use
smp_ops to implement various SMP quirks it has - and it can be extended
more to cover all the specialities of Voyager.

So remove this complication in the Kconfig space.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-29 14:17:00 +01:00
Ingo Molnar
d53e2f2855 x86, smp: remove mach_ipi.h
Move mach_ipi.h definitions into genapic.h.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-29 14:16:49 +01:00
Ingo Molnar
dac5f4121d x86, apic: untangle the send_IPI_*() jungle
Our send_IPI_*() methods and definitions are a twisted mess: the same
symbol is defined to different things depending on .config details,
in a non-transparent way.

 - spread out the quirks into separately named per apic driver methods

 - prefix the standard PC methods with default_

 - get rid of wrapper macro obfuscation

 - clean up various details

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-28 23:20:31 +01:00
Ingo Molnar
74b6eb6b93 Merge branches 'x86/asm', 'x86/cleanups', 'x86/cpudetect', 'x86/debug', 'x86/doc', 'x86/header-fixes', 'x86/mm', 'x86/paravirt', 'x86/pat', 'x86/setup-v2', 'x86/subarch', 'x86/uaccess' and 'x86/urgent' into x86/core 2009-01-28 23:13:53 +01:00
Ingo Molnar
4369f1fb7c Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu
Conflicts:
	arch/x86/kernel/setup_percpu.c

Semantic conflict:

	arch/x86/kernel/cpu/common.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-27 12:03:24 +01:00
Ingo Molnar
3ddeb51d9c Merge branch 'linus' into core/percpu
Conflicts:
	arch/x86/kernel/setup_percpu.c
2009-01-27 12:01:51 +01:00
Brian Gerst
6470aff619 x86: move 64-bit NUMA code
Impact: Code movement, no functional change.

Move the 64-bit NUMA code from setup_percpu.c to numa_64.c

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-27 12:56:47 +09:00
Linus Torvalds
810ee58de2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits)
  xen: unitialised return value in xenbus_write_transaction
  x86: fix section mismatch warning
  x86: unmask CPUID levels on Intel CPUs, fix
  x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn.
  x86: use standard PIT frequency
  xen: handle highmem pages correctly when shrinking a domain
  x86, mm: fix pte_free()
  xen: actually release memory when shrinking domain
  x86: unmask CPUID levels on Intel CPUs
  x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h>
  x86: fix PTE corruption issue while mapping RAM using /dev/mem
  x86: mtrr fix debug boot parameter
  x86: fix page attribute corruption with cpa()
  Revert "x86: signal: change type of paramter for sys_rt_sigreturn()"
  x86: use early clobbers in usercopy*.c
  x86: remove kernel_physical_mapping_init() from init section
  fix: crash: IP: __bitmap_intersects+0x48/0x73
  cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
  work_on_cpu: Use our own workqueue.
  work_on_cpu: don't try to get_online_cpus() in work_on_cpu.
  ...
2009-01-26 09:47:28 -08:00
Eric Anholt
ef5fa0ab24 x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn.
In the absence of PAT, PAGE_KERNEL_WC ends up mapping to a memory type that
gets UC behavior even in the presence of a WC MTRR covering the area in
question.  By swapping to PAGE_KERNEL_UC_MINUS, we can get the actual
behavior the caller wanted (WC if you can manage it, UC otherwise).

This recovers the 40% performance improvement of using WC in the DRM
to upload vertex data.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-01-26 11:14:27 +01:00
H. Peter Anvin
75a048119e x86: handle PAT more like other CPU features
Impact: Cleanup

When PAT was originally introduced, it was handled specially for a few
reasons:

- PAT bugs are hard to track down, so we wanted to maintain a
  whitelist of CPUs.
- The i386 and x86-64 CPUID code was not yet unified.

Both of these are now obsolete, so handle PAT like any other features,
including ordinary feature blacklisting due to known bugs.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-01-23 18:07:45 -08:00
Hiroshi Shimamoto
fe40c0af3c x86: uaccess: introduce try and catch framework
Impact: introduce new uaccess exception handling framework

Introduce {get|put}_user_try and {get|put}_user_catch as new uaccess exception
handling framework.
{get|put}_user_try begins exception block and {get|put}_user_catch(err) ends
the block and gets err if an exception occured in {get|put}_user_ex() in the
block. The exception is stored thread_info->uaccess_err.

The example usage of this framework is below;
int func()
{
	int err = 0;

	get_user_try {
		get_user_ex(...);
		get_user_ex(...);
		:
	} get_user_catch(err);

	return err;
}

Note: get_user_ex() is not clear the value when an exception occurs, it's
different from the behavior of __get_user(), but I think it doesn't matter.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-01-23 17:17:36 -08:00
venkatesh.pallipadi@intel.com
d639bab8da x86 PAT: ioremap_wc should take resource_size_t parameter
Impact: fix/extend ioremap_wc() beyond 4GB aperture on 32-bit

ioremap_wc() was taking in unsigned long parameter, where as it should take
64-bit resource_size_t parameter like other ioremap variants.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-22 11:53:42 +01:00
Johannes Weiner
fb746d0e13 x86: optimise page fault entry, cleanup
tsk is already assigned to current, drop the redundant second
assignment.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 21:36:54 +01:00
Suresh Siddha
9597134218 x86: fix PTE corruption issue while mapping RAM using /dev/mem
Beschorner Daniel reported:
> hwinfo problem since 2.6.28, showing this in the oops:
>	Corrupted page table at address 7fd04de3ec00

Also, PaX Team reported a regression with this commit:

>	commit 9542ada803
>	Author: Suresh Siddha <suresh.b.siddha@intel.com>
>	Date:   Wed Sep 24 08:53:33 2008 -0700
>
>	    x86: track memtype for RAM in page struct

This commit breaks mapping any RAM page through /dev/mem, as the
reserve_memtype() was not initializing the return attribute type and as such
corrupting the PTE entry that was setup with the return attribute type.

Because of this bug, application mapping this RAM page through /dev/mem
will die with "Corrupted page table at address xxxx" message in the kernel
log and also the kernel identity mapping which maps the underlying RAM
page gets converted to UC.

Fix this by initializing the return attribute type before calling
reserve_ram_pages_type()

Reported-by: PaX Team <pageexec@freemail.hu>
Reported-and-tested-by: Beschorner Daniel <Daniel.Beschorner@facton.com>
Tested-and-Acked-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 18:42:32 +01:00
Suresh Siddha
a1e46212a4 x86: fix page attribute corruption with cpa()
Impact: fix sporadic slowdowns and warning messages

This patch fixes a performance issue reported by Linus on his
Nehalem system. While Linus reverted the PAT patch (commit
58dab916df) which exposed the issue,
existing cpa() code can potentially still cause wrong(page attribute
corruption) behavior.

This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that
various people reported.

In 64bit kernel, kernel identity mapping might have holes depending
on the available memory and how e820 reports the address range
covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole
in the address range that is not listed by e820 entries, kernel identity
mapping will have a corresponding hole in its 1-1 identity mapping.

If cpa() happens on the kernel identity mapping which falls into these holes,
existing code fails like this:

	__change_page_attr_set_clr()
		__change_page_attr()
			returns 0 because of if (!kpte). But doesn't
			set cpa->numpages and cpa->pfn.
		cpa_process_alias()
			uses uninitialized cpa->pfn (random value)
			which can potentially lead to changing the page
			attribute of kernel text/data, kernel identity
			mapping of RAM pages etc. oops!

This bug was easily exposed by another PAT patch which was doing
cpa() more often on kernel identity mapping holes (physical range between
max_low_pfn_mapped and 4GB), where in here it was setting the
cache disable attribute(PCD) for kernel identity mappings aswell.

Fix cpa() to handle the kernel identity mapping holes. Retain
the WARN() for cpa() calls to other not present address ranges
(kernel-text/data, ioremap() addresses)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 12:24:54 +01:00
Ingo Molnar
198030782c Merge branch 'x86/mm' into core/percpu
Conflicts:
	arch/x86/mm/fault.c
2009-01-21 10:39:51 +01:00
Ingo Molnar
4ec71fa2d2 x86: uv cleanup, build fix
Fix:

 arch/x86/mm/srat_64.c: In function ‘acpi_numa_processor_affinity_init’:
 arch/x86/mm/srat_64.c:141: error: implicit declaration of function ‘get_uv_system_type’
 arch/x86/mm/srat_64.c:141: error: ‘UV_X2APIC’ undeclared (first use in this function)
 arch/x86/mm/srat_64.c:141: error: (Each undeclared identifier is reported only once
 arch/x86/mm/srat_64.c:141: error: for each function it appears in.)

A couple of UV definitions were moved to asm/uv/uv.h, but srat_64.c did
not include that header. Add it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 10:24:27 +01:00
Ingo Molnar
55f4949f57 x86, mm: move tlb.c to arch/x86/mm/
Impact: cleanup

Now that it's unified, move the (SMP) TLB flushing code from arch/x86/kernel/
to arch/x86/mm/, where it belongs logically.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 10:16:19 +01:00
Nick Piggin
92181f190b x86: optimise x86's do_page_fault (C entry point for the page fault path)
Impact: cleanup, restructure code to improve assembly

gcc isn't _all_ that smart about spilling registers to stack or reusing
stack slots, even with branch annotations. do_page_fault contained a lot
of functionality, so split unlikely paths into their own functions, and
mark them as noinline just to be sure. I consider this actually to be
somewhat of a cleanup too: the main function now contains about half
the number of lines so the normal path is easier to read, while the error
cases are also nicely split away.

Also, ensure the order of arguments to functions is always the same: regs,
addr, error_code. This can reduce code size a tiny bit, and just looks neater
too.

And add a couple of branch annotations.

Before:
  do_page_fault:
          subq    $360, %rsp      #,

After:
  do_page_fault:
          subq    $56, %rsp       #,

bloat-o-meter:
  add/remove: 8/0 grow/shrink: 0/1 up/down: 2222/-1680 (542)
  function                                     old     new   delta
  __bad_area_nosemaphore                         -     506    +506
  no_context                                     -     474    +474
  vmalloc_fault                                  -     424    +424
  spurious_fault                                 -     358    +358
  mm_fault_error                                 -     272    +272
  bad_area_access_error                          -      89     +89
  bad_area                                       -      89     +89
  bad_area_nosemaphore                           -      10     +10
  do_page_fault                               2464     784   -1680

Yes, the total size increases by 542 bytes, due to the extra function calls.
But these will very rarely be called (except for vmalloc_fault) in a normal
workload. Importantly, do_page_fault is less than 1/3rd it's original size,
and touches far less stack.

Existing gotos and branch hints did move a lot of the infrequently used text
out of the fastpath, but that's even further improved after this patch.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-20 13:14:23 +01:00
Gary Hade
f5495506c3 x86: remove kernel_physical_mapping_init() from init section
Impact: fix crash with memory hotplug enabled

kernel_physical_mapping_init() is called during memory hotplug
so it does not belong in the init section.

If the kernel is built with CONFIG_DEBUG_SECTION_MISMATCH=y on
the make command line, arch/x86/mm/init_64.c is compiled with
the -fno-inline-functions-called-once gcc option defeating
inlining of kernel_physical_mapping_init() within init_memory_mapping().

When kernel_physical_mapping_init() is not inlined it is placed
in the .init.text section according to the __init in it's current
declaration.  A later call to kernel_physical_mapping_init() during
a memory hotplug operation encounters an int3 trap because the
.init.text section memory has been freed.

This patch eliminates the crash caused by the int3 trap by moving the
non-inlined kernel_physical_mapping_init() from .init.text to .meminit.text.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-20 00:31:43 +01:00
Ingo Molnar
b2b062b816 Merge branch 'core/percpu' into stackprotector
Conflicts:
	arch/x86/include/asm/pda.h
	arch/x86/include/asm/system.h

Also, moved include/asm-x86/stackprotector.h to arch/x86/include/asm.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-18 18:37:14 +01:00
Jan Beulich
a3c6018e56 x86: fix assumed to be contiguous leaf page tables for kmap_atomic region (take 2)
Debugging and original patch from Nick Piggin <npiggin@suse.de>

The early fixmap pmd entry inserted at the very top of the KVA is causing the
subsequent fixmap mapping code to not provide physically linear pte pages over
the kmap atomic portion of the fixmap (which relies on said property to
calculate pte addresses).

This has caused weird boot failures in kmap_atomic much later in the boot
process (initial userspace faults) on a 32-bit PAE system with a larger number
of CPUs (smaller CPU counts tend not to run over into the next page so don't
show up the problem).

Solve this by attempting to clear out the page table, and copy any of its
entries to the new one. Also, add a bug if a nonlinear condition is encountered
and can't be resolved, which might save some hours of debugging if this fragile
scheme ever breaks again...

Once we have such logic, we can also use it to eliminate the early ioremap
trickery around the page table setup for the fixmap area. This also fixes
potential issues with FIX_* entries sharing the leaf page table with the early
ioremap ones getting discarded by early_ioremap_clear() and not restored by
early_ioremap_reset(). It at once eliminates the temporary (and configuration,
namely NR_CPUS, dependent) unavailability of early fixed mappings during the
time the fixmap area page tables get constructed.

Finally, also replace the hard coded calculation of the initial table space
needed for the fixmap area with a proper one, allowing kernels configured for
large CPU counts to actually boot.

Based-on: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-16 13:47:04 +01:00
Linus Torvalds
b5db0e3865 Revert "x86 PAT: remove CPA WARN_ON for zero pte"
This reverts commit 58dab916df, which
makes my Nehalem come to a nasty crawling almost-halt.  It looks like it
turns off caching of regular kernel RAM, with the understandable
slowdown of a few orders of magnitude as a result.

Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:25:09 -08:00
Ingo Molnar
7f268f4352 Merge branches 'cpus4096', 'x86/cleanups' and 'x86/urgent' into x86/percpu 2009-01-15 13:18:57 +01:00
Suresh Siddha
5cca0cf15a x86, pat: fix reserve_memtype() for legacy 1MB range
Thierry Vignaud reported:
> http://bugzilla.kernel.org/show_bug.cgi?id=12372
>
> On P4 with an SiS motherboard (video card is a SiS 651)
> X server fails to start with error:
> xf86MapVidMem: Could not mmap framebuffer (0x00000000,0x2000) (Invalid
> argument)

Here X is trying to map first 8KB of memory using /dev/mem. Existing
code treats first 0-4KB of memory as non-RAM and 4KB-8KB as RAM. Recent
code changes don't allow to map memory with different attributes
at the same time.

Fix this by treating the first 1MB legacy region as special and always
track the attribute requests with in this region using linear linked
list (and don't bother if the range is RAM or non-RAM or mixed)

Reported-and-tested-by: Thierry Vignaud <tvignaud@mandriva.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 20:14:45 +01:00
venkatesh.pallipadi@intel.com
58dab916df x86 PAT: remove CPA WARN_ON for zero pte
Impact: reduce scope of debug check - avoid warnings

The logic to find whether identity map exists or not using
high_memory or max_low_pfn_mapped/max_pfn_mapped are not complete
as the memory withing the range may not be mapped if there is a
unusable hole in e820.

Specifically, on my test system I started seeing these warnings with
tools like hwinfo, acpidump trying to map ACPI region.

[   27.400018] ------------[ cut here ]------------
[   27.400344] WARNING: at /home/venkip/src/linus/linux-2.6/arch/x86/mm/pageattr.c:560 __change_page_attr_set_clr+0xf3/0x8b8()
[   27.400821] Hardware name: X7DB8
[   27.401070] CPA: called for zero pte. vaddr = ffff8800cff6a000 cpa->vaddr = ffff8800cff6a000
[   27.401569] Modules linked in:
[   27.401882] Pid: 4913, comm: dmidecode Not tainted 2.6.28-05716-gfe0bdec #586
[   27.402141] Call Trace:
[   27.402488]  [<ffffffff80237c21>] warn_slowpath+0xd3/0x10f
[   27.402749]  [<ffffffff80274ade>] ? find_get_page+0xb3/0xc9
[   27.403028]  [<ffffffff80274a2b>] ? find_get_page+0x0/0xc9
[   27.403333]  [<ffffffff80226425>] __change_page_attr_set_clr+0xf3/0x8b8
[   27.403628]  [<ffffffff8028ec99>] ? __purge_vmap_area_lazy+0x192/0x1a1
[   27.403883]  [<ffffffff8028eb52>] ? __purge_vmap_area_lazy+0x4b/0x1a1
[   27.404172]  [<ffffffff80290268>] ? vm_unmap_aliases+0x1ab/0x1bb
[   27.404512]  [<ffffffff80290105>] ? vm_unmap_aliases+0x48/0x1bb
[   27.404766]  [<ffffffff80226d28>] change_page_attr_set_clr+0x13e/0x2e6
[   27.405026]  [<ffffffff80698fa7>] ? _spin_unlock+0x26/0x2a
[   27.405292]  [<ffffffff80227e6a>] ? reserve_memtype+0x19b/0x4e3
[   27.405590]  [<ffffffff80226ffd>] _set_memory_wb+0x22/0x24
[   27.405844]  [<ffffffff80225d28>] ioremap_change_attr+0x26/0x28
[   27.406097]  [<ffffffff80228355>] reserve_pfn_range+0x1a3/0x235
[   27.406427]  [<ffffffff80228430>] track_pfn_vma_new+0x49/0xb3
[   27.406686]  [<ffffffff80286c46>] remap_pfn_range+0x94/0x32c
[   27.406940]  [<ffffffff8022878d>] ? phys_mem_access_prot_allowed+0xb5/0x1a8
[   27.407209]  [<ffffffff803e9bf4>] mmap_mem+0x75/0x9d
[   27.407523]  [<ffffffff8028b3b4>] mmap_region+0x2cf/0x53e
[   27.407776]  [<ffffffff8028b8cc>] do_mmap_pgoff+0x2a9/0x30d
[   27.408034]  [<ffffffff8020f4a4>] sys_mmap+0x92/0xce
[   27.408339]  [<ffffffff8020b65b>] system_call_fastpath+0x16/0x1b
[   27.408614] ---[ end trace 4b16ad70c09a602d ]---
[   27.408871] dmidecode:4913 reserve_pfn_range ioremap_change_attr failed write-back for cff6a000-cff6b000

This is wih track_pfn_vma_new trying to keep identity map in sync.
The address cff6a000 is the ACPI region according to e820.

[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
[    0.000000]  BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
[    0.000000]  BIOS-e820: 00000000cff60000 - 00000000cff69000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000cff69000 - 00000000cff80000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000230000000 (usable)

And is not mapped as per init_memory_mapping.

[    0.000000] init_memory_mapping: 0000000000000000-00000000cff60000
[    0.000000] init_memory_mapping: 0000000100000000-0000000230000000

We can add logic to check for this. But, there can also be other holes in
identity map when we have 1GB of aligned reserved space in e820.

This patch handles it by removing the WARN_ON and returning a specific
error value (EFAULT) to indicate that the address does not have any
identity mapping.

The code that tries to keep identity map in sync can ignore
this error, with other callers of cpa still getting error here.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 19:13:02 +01:00
venkatesh.pallipadi@intel.com
cdecff6864 x86 PAT: return compatible mapping to remap_pfn_range callers
Impact: avoid warning message, potentially solve 3D performance regression

Change x86 PAT code to return compatible memtype if the exact memtype that
was requested in remap_pfn_rage and friends is not available due to some
conflict.

This is done by returning the compatible type in pgprot parameter of
track_pfn_vma_new(), and the caller uses that memtype for page table.

Note that track_pfn_vma_copy() which is basically called during fork gets the
prot from existing page table and should not have any conflict. Hence we use
strict memtype check there and do not allow compatible memtypes.

This patch fixes the bug reported here:

  http://marc.info/?l=linux-kernel&m=123108883716357&w=2

Specifically the error message:

  X:5010 map pfn expected mapping type write-back for d0000000-d0101000,
  got write-combining

Should go away.

Reported-and-bisected-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 19:13:02 +01:00
venkatesh.pallipadi@intel.com
e4b866ed19 x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param
Impact: cleanup

Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer.
Subsequent patch changes the x86 PAT handling to return a compatible
memtype in pgprot_t, if what was requested cannot be allowed due to conflicts.
No fuctionality change in this patch.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 19:13:01 +01:00
Andi Kleen
f313e12308 x86: avoid theoretical vmalloc fault loop
Ajith Kumar noticed:

 I was going through the vmalloc fault handling for x86_64 and am unclear
 about the following lines in the vmalloc_fault() function.

 pgd = pgd_offset(current->mm ?: &init_mm, address);
 pgd_ref = pgd_offset_k(address);

 Here the intention is to get the pgd corresponding to the current process
 and sync it up with the pgd in init_mm(obtained from pgd_offset_k).
 However, for kernel threads current->mm is NULL and hence pgd =
 pgd_offset(init_mm, address) = pgd_ref which means the fault handler
 returns without setting the pgd entry in the MM structure in the context
 of which the kernel thread has faulted.  This could lead to never-ending
 faults and busy looping of kernel threads like pdflush.  So, shouldn't the
 pgd = pgd_offset(current->mm ?: &init_mm, address); be pgd =
 pgd_offset(current->active_mm ?: &init_mm, address);

We can use active_mm unconditionally because it should be always set.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-12 19:24:21 +01:00
Ingo Molnar
1de8cd3cb9 Merge branch 'linus' into x86/cleanups 2009-01-10 23:56:42 +01:00
Linus Torvalds
3d14bdad40 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
  x86: fix section mismatch warnings in mcheck/mce_amd_64.c
  x86: offer frame pointers in all build modes
  x86: remove duplicated #include's
  x86: k8 numa register active regions later
  x86: update Alan Cox's email addresses
  x86: rename all fields of mpc_table mpc_X to X
  x86: rename all fields of mpc_oemtable oem_X to X
  x86: rename all fields of mpc_bus mpc_X to X
  x86: rename all fields of mpc_cpu mpc_X to X
  x86: rename all fields of mpc_intsrc mpc_X to X
  x86: rename all fields of mpc_lintsrc mpc_X to X
  x86: rename all fields of mpc_iopic mpc_X to X
  x86: irqinit_64.c init_ISA_irqs should be static
  Documentation/x86/boot.txt: payload length was changed to payload_length
  x86: setup_percpu.c fix style problems
  x86: irqinit_64.c fix style problems
  x86: irqinit_32.c fix style problems
  x86: i8259.c fix style problems
  x86: irq_32.c fix style problems
  x86: ioport.c fix style problems
  ...
2009-01-10 06:13:09 -08:00
Harvey Harrison
9b4778f680 trivial: replace last usages of __FUNCTION__ in kernel
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-07 15:48:54 -08:00
Arjan van de Ven
e8de1481fd resource: allow MMIO exclusivity for device drivers
Device drivers that use pci_request_regions() (and similar APIs) have a
reasonable expectation that they are the only ones accessing their device.
As part of the e1000e hunt, we were afraid that some userland (X or some
bootsplash stuff) was mapping the MMIO region that the driver thought it
had exclusively via /dev/mem or via various sysfs resource mappings.

This patch adds the option for device drivers to cause their reserved
regions to the "banned from /dev/mem use" list, so now both kernel memory
and device-exclusive MMIO regions are banned.
NOTE: This is only active when CONFIG_STRICT_DEVMEM is set.

In addition to the config option, a kernel parameter iomem=relaxed is
provided for the cases where developers want to diagnose, in the field,
drivers issues from userspace.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:32 -08:00
Jaswinder Singh Rajput
dacf733357 x86: smp.h move zap_low_mappings declartion to tlbflush.h
Impact: cleanup, moving NON-SMP stuff from smp.h

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-07 13:51:20 +01:00
Gary Hade
c04fc586c1 mm: show node to memory section relationship with symlinks in sysfs
Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX.  For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
  - Provides information needed to determine the specific node
    on which a defective DIMM is located.  This will reduce system
    downtime when the node or defective DIMM is swapped out.
  - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM.  This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node.  The consequences of reintroducing the defective memory
    could be ugly.
  - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
Future:
  - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems.  Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:00 -08:00
Nick Piggin
1c0fe6e3bd mm: invoke oom-killer from page fault
Rather than have the pagefault handler kill a process directly if it gets
a VM_FAULT_OOM, have it call into the OOM killer.

With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
oom killing throttling, oom priority adjustment or selective disabling,
panic on oom, etc), it's silly to unconditionally kill the faulting
process at page fault time.  Create a hook for pagefault oom path to call
into instead.

Only converted x86 and uml so far.

[akpm@linux-foundation.org: make __out_of_memory() static]
[akpm@linux-foundation.org: fix comment]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeff Dike <jdike@addtoit.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:58:58 -08:00
Yinghai Lu
40bcc69b39 x86: k8 numa register active regions later
Impact: cleanup

don't register early, so we don't need to clear actived regions if it fail
to get node hash shift or wild set in nb config.

also remove nodeids array that is not needed

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 13:21:21 +01:00
Linus Torvalds
b840d79631 Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
  x86: export vector_used_by_percpu_irq
  x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
  sched: nominate preferred wakeup cpu, fix
  x86: fix lguest used_vectors breakage, -v2
  x86: fix warning in arch/x86/kernel/io_apic.c
  sched: fix warning in kernel/sched.c
  sched: move test_sd_parent() to an SMP section of sched.h
  sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
  sched: activate active load balancing in new idle cpus
  sched: bias task wakeups to preferred semi-idle packages
  sched: nominate preferred wakeup cpu
  sched: favour lower logical cpu number for sched_mc balance
  sched: framework for sched_mc/smt_power_savings=N
  sched: convert BALANCE_FOR_xx_POWER to inline functions
  x86: use possible_cpus=NUM to extend the possible cpus allowed
  x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
  x86: update io_apic.c to the new cpumask code
  x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
  x86: xen: use smp_call_function_many()
  x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
  ...

Fixed up trivial conflict in kernel/time/tick-sched.c manually
2009-01-02 11:44:09 -08:00
Ingo Brueckl
e8e3232627 Fix compiler warning in arch/x86/mm/init_32.c
Signed-off-by: Ingo Brueckl <ib@wupperonline.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:27:32 -08:00
Ingo Molnar
a9de18eb76 Merge branch 'linus' into stackprotector
Conflicts:
	arch/x86/include/asm/pda.h
	kernel/fork.c
2008-12-31 08:31:57 +01:00
Linus Torvalds
5f34fe1cfc Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
  stacktrace: provide save_stack_trace_tsk() weak alias
  rcu: provide RCU options on non-preempt architectures too
  printk: fix discarding message when recursion_bug
  futex: clean up futex_(un)lock_pi fault handling
  "Tree RCU": scalable classic RCU implementation
  futex: rename field in futex_q to clarify single waiter semantics
  x86/swiotlb: add default swiotlb_arch_range_needs_mapping
  x86/swiotlb: add default phys<->bus conversion
  x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
  x86: add swiotlb allocation functions
  swiotlb: consolidate swiotlb info message printing
  swiotlb: support bouncing of HighMem pages
  swiotlb: factor out copy to/from device
  swiotlb: add arch hook to force mapping
  swiotlb: allow architectures to override phys<->bus<->phys conversions
  swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
  rcu: fix rcutorture behavior during reboot
  resources: skip sanity check of busy resources
  swiotlb: move some definitions to header
  swiotlb: allow architectures to override swiotlb pool allocation
  ...

Fix up trivial conflicts in
  arch/x86/kernel/Makefile
  arch/x86/mm/init_32.c
  include/linux/hardirq.h
as per Ingo's suggestions.
2008-12-30 16:10:19 -08:00
Linus Torvalds
b0f4b285d7 Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
  sched, trace: update trace_sched_wakeup()
  tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
  Revert "x86: disable X86_PTRACE_BTS"
  ring-buffer: prevent false positive warning
  ring-buffer: fix dangling commit race
  ftrace: enable format arguments checking
  x86, bts: memory accounting
  x86, bts: add fork and exit handling
  ftrace: introduce tracing_reset_online_cpus() helper
  tracing: fix warnings in kernel/trace/trace_sched_switch.c
  tracing: fix warning in kernel/trace/trace.c
  tracing/ring-buffer: remove unused ring_buffer size
  trace: fix task state printout
  ftrace: add not to regex on filtering functions
  trace: better use of stack_trace_enabled for boot up code
  trace: add a way to enable or disable the stack tracer
  x86: entry_64 - introduce FTRACE_ frame macro v2
  tracing/ftrace: add the printk-msg-only option
  tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
  x86, bts: correctly report invalid bts records
  ...

Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
being already partly merged by the SH merge.
2008-12-28 12:21:10 -08:00
Linus Torvalds
be9c5ae4ee Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (246 commits)
  x86: traps.c replace #if CONFIG_X86_32 with #ifdef CONFIG_X86_32
  x86: PAT: fix address types in track_pfn_vma_new()
  x86: prioritize the FPU traps for the error code
  x86: PAT: pfnmap documentation update changes
  x86: PAT: move track untrack pfnmap stubs to asm-generic
  x86: PAT: remove follow_pfnmap_pte in favor of follow_phys
  x86: PAT: modify follow_phys to return phys_addr prot and return value
  x86: PAT: clarify is_linear_pfn_mapping() interface
  x86: ia32_signal: remove unnecessary declaration
  x86: common.c boot_cpu_stack and boot_exception_stacks should be static
  x86: fix intel x86_64 llc_shared_map/cpu_llc_id anomolies
  x86: fix warning in arch/x86/kernel/microcode_amd.c
  x86: ia32.h: remove unused struct sigfram32 and rt_sigframe32
  x86: asm-offset_64: use rt_sigframe_ia32
  x86: sigframe.h: include headers for dependency
  x86: traps.c declare functions before they get used
  x86: PAT: update documentation to cover pgprot and remap_pfn related changes - v3
  x86: PAT: add pgprot_writecombine() interface for drivers - v3
  x86: PAT: change pgprot_noncached to uc_minus instead of strong uc - v3
  x86: PAT: implement track/untrack of pfnmap regions for x86 - v3
  ...
2008-12-28 12:07:57 -08:00
Ingo Molnar
79a66b96c3 Merge branches 'x86/pat2' and 'x86/fpu'; commit 'v2.6.28' into x86/core 2008-12-25 11:50:41 +01:00
H. Peter Anvin
c1c15b65ec x86: PAT: fix address types in track_pfn_vma_new()
Impact: cleanup, fix warning

This warning:

 arch/x86/mm/pat.c: In function track_pfn_vma_copy:
 arch/x86/mm/pat.c:701: warning: passing argument 5 of follow_phys from incompatible pointer type

Triggers because physical addresses are resource_size_t, not u64.

This really matters when calling an interface like follow_phys() which
takes a pointer to a physical address -- although on x86, being
littleendian, it would generally work anyway as long as the memory region
wasn't completely uninitialized.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-24 10:40:19 +01:00
Ingo Molnar
fa623d1b02 Merge branches 'x86/apic', 'x86/cleanups', 'x86/cpufeature', 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/detect-hyper', 'x86/doc', 'x86/dumpstack', 'x86/early-printk', 'x86/fpu', 'x86/idle', 'x86/io', 'x86/memory-corruption-check', 'x86/microcode', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/pat2', 'x86/pci-ioapic-boot-irq-quirks', 'x86/ptrace', 'x86/quirks', 'x86/reboot', 'x86/setup-memory', 'x86/signal', 'x86/sparse-fixes', 'x86/time', 'x86/uv' and 'x86/xen' into x86/core 2008-12-23 16:27:23 +01:00
venkatesh.pallipadi@intel.com
982d789ab7 x86: PAT: remove follow_pfnmap_pte in favor of follow_phys
Impact: Cleanup - removes a new function in favor of a recently modified older one.

Replace follow_pfnmap_pte in pat code with follow_phys. follow_phys lso
returns protection eliminating the need of pte_pgprot call. Using follow_phys
also eliminates the need for pte_pa.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-19 15:40:30 -08:00
venkatesh.pallipadi@intel.com
2520bd3123 x86: PAT: add pgprot_writecombine() interface for drivers - v3
Impact: New mm functionality.

Add pgprot_writecombine. pgprot_writecombine will be aliased to
pgprot_noncached when not supported by the architecture.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18 13:30:16 -08:00
venkatesh.pallipadi@intel.com
5899329b19 x86: PAT: implement track/untrack of pfnmap regions for x86 - v3
Impact: New mm functionality.

Hookup remap_pfn_range and vm_insert_pfn and corresponding copy and free
routines with reserve and free tracking.

reserve and free here only takes care of non RAM region mapping. For RAM
region, driver should use set_memory_[uc|wc|wb] to set the cache type and
then setup the mapping for user pte. We can bypass below
reserve/free in that case.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18 13:30:16 -08:00
Jeremy Fitzhardinge
cfb80c9eae x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
swiotlb on 32 bit will be used by Xen domain 0 support.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17 18:58:19 +01:00
Mike Travis
168ef543a4 x86: prepare for cpumask iterators to only go to nr_cpu_ids
Impact: cleanup, futureproof

In fact, all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids.  So use that instead of NR_CPUS in various
places.

This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 17:40:58 -08:00
Jan Beulich
beeb4195cb x86, 32-bit: add some compile time checks to mem_init()
Some of the inconsistencies checked for at run time can be detected at
build time already, so duplicate the checks done at run time to also be
done at build time.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:42:51 +01:00
Jan Beulich
d6be89ad66 x86, 32-bit: simplify alloc_low_page()
Impact: cleanup

Neither of the callers really needs the physical address this function
returns, so eliminate the pointless argument.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:41:37 +01:00
Ingo Molnar
8808500f26 x86: soften multi-BAR mapping sanity check warning message
Impact: make debug warning less scary

The ioremap() time multi-BAR map warning has been causing false
positives:

  http://lkml.org/lkml/2008/12/10/432
  http://lkml.org/lkml/2008/12/11/136

So make it less scary by making it once-per-boot, by making it KERN_INFO
and by adding this text:

  "Info: mapping multiple BARs. Your kernel is fine."

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 09:22:26 +01:00
Ingo Molnar
c0515566f3 Merge commit 'v2.6.28-rc7' into x86/cleanups 2008-12-04 11:05:26 +01:00
James Morris
ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
Ingo Molnar
dfdc5437bd Merge commit 'v2.6.28-rc7'; branch 'x86/dumpstack' into tracing/ftrace
Merge x86/dumpstack into tracing/ftrace because upcoming ftrace changes
depend on cleanups already in x86/dumpstack.

Also merge to latest upstream -rc.
2008-12-03 08:55:34 +01:00
Ingo Molnar
a0a70c735e Merge branches 'tracing/profiling', 'tracing/options' and 'tracing/urgent' into tracing/core 2008-11-23 09:10:32 +01:00
Ingo Molnar
90accd6fab Merge branch 'linus' into x86/memory-corruption-check 2008-11-20 09:03:38 +01:00
James Morris
2b82892565 Merge branch 'master' into next
Conflicts:
	security/keys/internal.h
	security/keys/process_keys.c
	security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 11:29:12 +11:00
David Howells
350b4da71f CRED: Wrap task credential accesses in the x86 arch
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:40 +11:00
Rafael J. Wysocki
97a70e548b x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA set
Impact: fix crash during hibernation on 32-bit NUMA

The NUMA code on x86_32 creates special memory mapping that allows
each node's pgdat to be located in this node's memory.  For this
purpose it allocates a memory area at the end of each node's memory
and maps this area so that it is accessible with virtual addresses
belonging to low memory.  As a result, if there is high memory,
these NUMA-allocated areas are physically located in high memory,
although they are mapped to low memory addresses.

Our hibernation code does not take that into account and for this
reason hibernation fails on all x86_32 systems with CONFIG_NUMA=y and
with high memory present.  Fix this by adding a special mapping for
the NUMA-allocated memory areas to the temporary page tables created
during the last phase of resume.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 23:28:51 +01:00
Ingo Molnar
e0cb4ebcd9 Merge branch 'tracing/urgent' into tracing/ftrace
Conflicts:
	kernel/trace/trace.c
2008-11-11 09:40:18 +01:00
Ingo Molnar
895e031707 Merge branch 'linus' into x86/cleanups 2008-11-08 20:23:02 +01:00
Ingo Molnar
a6b0786f7f Merge branches 'tracing/ftrace', 'tracing/fastboot', 'tracing/nmisafe' and 'tracing/urgent' into tracing/core 2008-11-08 09:34:35 +01:00
Linus Torvalds
a15a82f42c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  Revert "x86: default to reboot via ACPI"
  x86: align DirectMap in /proc/meminfo
  AMD IOMMU: fix lazy IO/TLB flushing in unmap path
  x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR
  x86: remove VISWS and PARAVIRT around NR_IRQS puzzle
  x86: mention ACPI in top-level Kconfig menu
  x86: size NR_IRQS on 32-bit systems the same way as 64-bit
  x86: don't allow nr_irqs > NR_IRQS
  x86/docs: remove noirqbalance param docs
  x86: don't use tsc_khz to calculate lpj if notsc is passed
  x86, voyager: fix smp_intr_init() compile breakage
  AMD IOMMU: fix detection of NP capable IOMMUs
2008-11-06 15:57:24 -08:00
Hugh Dickins
b9c3bfc24e x86: align DirectMap in /proc/meminfo
Impact: right-align /proc/meminfo consistent with other fields

When the split-LRU patches added Inactive(anon) and Inactive(file) lines
to /proc/meminfo, all counts were moved two columns rightwards to fit in.
Now move x86's DirectMap lines two columns rightwards to line up.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06 15:27:37 +01:00
Ingo Molnar
7a895f53cd Merge branches 'tracing/ftrace', 'tracing/markers', 'tracing/mmiotrace', 'tracing/nmisafe', 'tracing/tracepoints' and 'tracing/urgent' into tracing/core 2008-11-03 10:34:23 +01:00
Zhaolei
a376f30a95 x86: avoid duplicate running of pud_offset and pmd_offset in one_md_table_init()
Impact: simplify implementation, cleanup

If !(pgd_val(*pgd) & _PAGE_PRESENT) in PAE mode, we need not get value of
pmd_table again.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-31 11:03:17 +01:00
Keith Packard
fd94093435 x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
Impact: introduce new APIs, separate kmap code from CONFIG_HIGHMEM

This takes the code used for CONFIG_HIGHMEM memory mappings except that
it's designed for dynamic IO resource mapping.

These fixmaps are available even with CONFIG_HIGHMEM turned off.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-31 10:12:38 +01:00
Ravikiran G Thirumalai
9e41bff270 x86: fix /dev/mem mmap breakage when PAT is disabled
Impact: allow /dev/mem mmaps on non-PAT CPUs/platforms

Fix mmap to /dev/mem when CONFIG_X86_PAT is off and CONFIG_STRICT_DEVMEM is
off

mmap to /dev/mem on kernel memory has been failing since the
introduction of PAT (CONFIG_STRICT_DEVMEM=n case).   Seems like
the check to avoid cache aliasing with PAT is kicking in even
when PAT is disabled. The bug seems to have crept in 2.6.26.

This patch makes sure that the mmap to regular
kernel memory succeeds if CONFIG_STRICT_DEVMEM=n and
PAT is disabled, and the checks to avoid cache aliasing
still happens if PAT is enabled.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Tested-by: Tim Sirianni <tim@scalemp.com>
Cc: <stable@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-30 23:54:41 +01:00
Gary Hade
fe8b868ecc x86: remove debug code from arch_add_memory()
Impact: remove incorrect WARN_ON(1)

Gets rid of dmesg spam created during physical memory hot-add which
will very likely confuse users.  The change removes what appears to
be debugging code which I assume was unintentionally included in:

  x86: arch/x86/mm/init_64.c printk fixes
  commit 10f22dde55

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-29 09:29:22 +01:00
Harvey Harrison
1d6cf1feb8 x86: start annotating early ioremap pointers with __iomem
Impact: some new sparse warnings in e820.c etc, but no functional change.

As with regular ioremap, iounmap etc, annotate with __iomem.

Fixes the following sparse warnings, will produce some new ones
elsewhere in arch/x86 that will get worked out over time.

arch/x86/mm/ioremap.c:402:9: warning: cast removes address space of expression
arch/x86/mm/ioremap.c:406:10: warning: cast adds address space to expression (<asn:2>)
arch/x86/mm/ioremap.c:782:19: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-29 08:05:14 +01:00
Harvey Harrison
9352f5698d x86: two trivial sparse annotations
Impact: fewer sparse warnings, no functional changes

arch/x86/kernel/vsmp_64.c:87:14: warning: incorrect type in argument 1 (different address spaces)
arch/x86/kernel/vsmp_64.c:87:14:    expected void const volatile [noderef] <asn:2>*addr
arch/x86/kernel/vsmp_64.c:87:14:    got void *[assigned] address
arch/x86/kernel/vsmp_64.c:88:22: warning: incorrect type in argument 1 (different address spaces)
arch/x86/kernel/vsmp_64.c:88:22:    expected void const volatile [noderef] <asn:2>*addr
arch/x86/kernel/vsmp_64.c:88:22:    got void *
arch/x86/kernel/vsmp_64.c💯23: warning: incorrect type in argument 2 (different address spaces)
arch/x86/kernel/vsmp_64.c💯23:    expected void volatile [noderef] <asn:2>*addr
arch/x86/kernel/vsmp_64.c💯23:    got void *
arch/x86/kernel/vsmp_64.c:101:23: warning: incorrect type in argument 1 (different address spaces)
arch/x86/kernel/vsmp_64.c:101:23:    expected void const volatile [noderef] <asn:2>*addr
arch/x86/kernel/vsmp_64.c:101:23:    got void *
arch/x86/mm/gup.c:235:6: warning: incorrect type in argument 1 (different base types)
arch/x86/mm/gup.c:235:6:    expected void const volatile [noderef] <asn:1>*<noident>
arch/x86/mm/gup.c:235:6:    got unsigned long [unsigned] [assigned] start

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-29 08:02:28 +01:00
Yinghai Lu
f96f57d91c x86: fix init_memory_mapping for [dc000000 - e0000000) - v2
Impact: change over-mapping to precise mapping, fix /proc/meminfo output

v2: fix less than 1G ram system handling

when gart aperture is 0xdc000000 - 0xe0000000
it return 0xc0000000 - 0xe0000000

that is not right.

this patch fix that will get exact mapping

on 256g sytem with that aperture after patch
LBSuse:~ # cat /proc/meminfo
MemTotal:       264742432 kB
MemFree:        263920628 kB
Buffers:            1416 kB
Cached:            24468 kB
...
DirectMap4k:      5760 kB
DirectMap2M:   3205120 kB
DirectMap1G:  265289728 kB

it is consistent to
LBSuse:~ # cat /sys/kernel/debug/kernel_page_tables
..
---[ Low Kernel Mapping ]---
0xffff880000000000-0xffff880000200000           2M     RW             GLB x  pte
0xffff880000200000-0xffff880040000000        1022M     RW         PSE GLB x  pmd
0xffff880040000000-0xffff8800c0000000           2G     RW         PSE GLB NX pud
0xffff8800c0000000-0xffff8800d7e00000         382M     RW         PSE GLB NX pmd
0xffff8800d7e00000-0xffff8800d7fa0000        1664K     RW             GLB NX pte
0xffff8800d7fa0000-0xffff8800d8000000         384K                           pte
0xffff8800d8000000-0xffff8800dc000000          64M                           pmd
0xffff8800dc000000-0xffff8800e0000000          64M     RW         PSE GLB NX pmd
0xffff8800e0000000-0xffff880100000000         512M                           pmd
0xffff880100000000-0xffff880800000000          28G     RW         PSE GLB NX pud
0xffff880800000000-0xffff880824600000         582M     RW         PSE GLB NX pmd
0xffff880824600000-0xffff8808247f0000        1984K     RW             GLB NX pte
0xffff8808247f0000-0xffff880824800000          64K     RW     PCD     GLB NX pte
0xffff880824800000-0xffff880840000000         440M     RW         PSE GLB NX pmd
0xffff880840000000-0xffff884000000000         223G     RW         PSE GLB NX pud
0xffff884000000000-0xffff884028000000         640M     RW         PSE GLB NX pmd
0xffff884028000000-0xffff884040000000         384M                           pmd
0xffff884040000000-0xffff888000000000         255G                           pud
0xffff888000000000-0xffffc20000000000       58880G                           pgd

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-28 20:54:47 +01:00
Yinghai Lu
11a6b0c933 x86: 64 bit print out absent pages num too
so users are not confused with memhole causing big total ram

we don't need to worry about 32 bit, because memhole is always
above max_low_pfn.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-28 16:50:49 +01:00
Shaohua Li
60817c9b31 x86, memory hotplug: remove wrong -1 in calling init_memory_mapping()
Impact: fix crash with memory hotplug

Shuahua Li found:

| I just did some experiments on a desktop for memory hotplug and this bug
| triggered a crash in my test.
|
| Yinghai's suggestion also fixed the bug.

We don't need to round it, just remove that extra -1

Signed-off-by: Yinghai <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-28 09:33:17 +01:00
Yinghai Lu
3afa39493d x86: keep the /proc/meminfo page count correct
Impact: get correct page count in /proc/meminfo

found page count in /proc/meminfo is nor correct on 1G system in VirtualBox 2.0.4

# cat /proc/meminfo
MemTotal:        1017508 kB
MemFree:          822700 kB
Buffers:            1456 kB
Cached:            26632 kB
SwapCached:            0 kB
...
Hugepagesize:       2048 kB
DirectMap4k:      4032 kB
DirectMap2M:  18446744073709549568 kB

with this patch get:
...
DirectMap4k:      4032 kB
DirectMap2M:   1044480 kB

which is consistent to kernel_page_tables
---[ Low Kernel Mapping ]---
0xffff880000000000-0xffff880000001000           4K     RW     PCD     GLB x  pte
0xffff880000001000-0xffff88000009f000         632K     RW             GLB x  pte
0xffff88000009f000-0xffff8800000a0000           4K     RW     PCD     GLB x  pte
0xffff8800000a0000-0xffff880000200000        1408K     RW             GLB x  pte
0xffff880000200000-0xffff88003fe00000        1020M     RW         PSE GLB x  pmd
0xffff88003fe00000-0xffff88003fff0000        1984K     RW             GLB NX pte
0xffff88003fff0000-0xffff880040000000          64K                           pte
0xffff880040000000-0xffff888000000000         511G                           pud
0xffff888000000000-0xffffc20000000000       58880G                           pgd

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-27 18:55:26 +01:00
Arjan van de Ven
304e629bf4 x86: corruption check: run the corruption checks from a work queue
Impact: change the implementation of the debug feature

the periodic corruption checks are better off run from a work queue; there's
nothing time critical about them and this way the amount of
interrupt-context work is reduced.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-27 18:09:45 +01:00
Pekka Paalanen
fd3fdf11d3 trace: add the MMIO-tracer to the tracer menu, cleanup
Impact: cleanup

We can remove MMIOTRACE_HOOKS and replace it with just MMIOTRACE.
MMIOTRACE_HOOKS is a remnant from the time when I thought that
something else could also use the kmmio facilities.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-27 14:07:26 +01:00
Linus Torvalds
c3c9897c63 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix section mismatch warning - apic_x2apic_phys
  x86: fix section mismatch warning - apic_x2apic_cluster
  x86: fix section mismatch warning - apic_x2apic_uv_x
  x86: fix section mismatch warning - apic_physflat
  x86: fix section mismatch warning - apic_flat
  x86: memtest fix use of reserve_early()
  x86 syscall.h: fix argument order
  x86/tlb_uv: remove strange mc146818rtc include
  x86: remove redundant KERN_DEBUG on pr_debug
  x86: do_boot_cpu - check if we have ESR register
  x86: MAINTAINERS change for AMD microcode patch loader
  x86/proc: fix /proc/cpuinfo cpu offline bug
  x86: call dmi-quirks for HP Laptops after early-quirks are executed
  x86, kexec: fix hang on i386 when panic occurs while console_sem is held
  MCE: Don't run 32bit machine checks with interrupts on
  x86: SB600: skip IRQ0 override if it is not routed to INT2 of IOAPIC
  x86: make variables static
2008-10-23 12:38:39 -07:00
Alexey Dobriyan
e1759c215b proc: switch /proc/meminfo to seq_file
and move it to fs/proc/meminfo.c while I'm at it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 13:52:40 +04:00
Daniele Calore
2cb0ebeeb6 x86: memtest fix use of reserve_early()
Hi all,

Wrong usage of 2nd parameter in reserve_early call.
66/75: reserve_early(start_bad, last_bad - start_bad, "BAD RAM");
                                ^^^^^^^^^^^^^^^^^^^^

The correct way is to use 'end' address and not 'size'.
As a bonus a fix to the printk format.

Signed-off-by: Daniele Calore <orkaan@orkaan.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-22 17:08:06 +02:00
Alexander van Heukelum
874d93d118 x86, dumpstack: let signr=0 signal no do_exit
Change oops_end such that signr=0 signals that do_exit
is not to be called.

Currently, each use of __die is soon followed by a call
to oops_end and 'regs' is set to NULL if oops_end is expected
not to call do_exit. Change all such pairs to set signr=0
instead. On x86_64 oops_end is used 'bare' in die_nmi; use
signr=0 instead of regs=NULL there, too.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-22 14:00:23 +02:00
Linus Torvalds
92b29b86fe Merge branch 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (131 commits)
  tracing/fastboot: improve help text
  tracing/stacktrace: improve help text
  tracing/fastboot: fix initcalls disposition in bootgraph.pl
  tracing/fastboot: fix bootgraph.pl initcall name regexp
  tracing/fastboot: fix issues and improve output of bootgraph.pl
  tracepoints: synchronize unregister static inline
  tracepoints: tracepoint_synchronize_unregister()
  ftrace: make ftrace_test_p6nop disassembler-friendly
  markers: fix synchronize marker unregister static inline
  tracing/fastboot: add better resolution to initcall debug/tracing
  trace: add build-time check to avoid overrunning hex buffer
  ftrace: fix hex output mode of ftrace
  tracing/fastboot: fix initcalls disposition in bootgraph.pl
  tracing/fastboot: fix printk format typo in boot tracer
  ftrace: return an error when setting a nonexistent tracer
  ftrace: make some tracers reentrant
  ring-buffer: make reentrant
  ring-buffer: move page indexes into page headers
  tracing/fastboot: only trace non-module initcalls
  ftrace: move pc counter in irqtrace
  ...

Manually fix conflicts:
 - init/main.c: initcall tracing
 - kernel/module.c: verbose level vs tracepoints
 - scripts/bootgraph.pl: fallout from cherry-picking commits.
2008-10-20 13:35:07 -07:00
Nick Piggin
db64fe0225 mm: rewrite vmap layer
Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
provide a fast, scalable percpu frontend for small vmaps (requires a
slightly different API, though).

The biggest problem with vmap is actually vunmap.  Presently this requires
a global kernel TLB flush, which on most architectures is a broadcast IPI
to all CPUs to flush the cache.  This is all done under a global lock.  As
the number of CPUs increases, so will the number of vunmaps a scaled
workload will want to perform, and so will the cost of a global TLB flush.
 This gives terrible quadratic scalability characteristics.

Another problem is that the entire vmap subsystem works under a single
lock.  It is a rwlock, but it is actually taken for write in all the fast
paths, and the read locking would likely never be run concurrently anyway,
so it's just pointless.

This is a rewrite of vmap subsystem to solve those problems.  The existing
vmalloc API is implemented on top of the rewritten subsystem.

The TLB flushing problem is solved by using lazy TLB unmapping.  vmap
addresses do not have to be flushed immediately when they are vunmapped,
because the kernel will not reuse them again (would be a use-after-free)
until they are reallocated.  So the addresses aren't allocated again until
a subsequent TLB flush.  A single TLB flush then can flush multiple
vunmaps from each CPU.

XEN and PAT and such do not like deferred TLB flushing because they can't
always handle multiple aliasing virtual addresses to a physical address.
They now call vm_unmap_aliases() in order to flush any deferred mappings.
That call is very expensive (well, actually not a lot more expensive than
a single vunmap under the old scheme), however it should be OK if not
called too often.

The virtual memory extent information is stored in an rbtree rather than a
linked list to improve the algorithmic scalability.

There is a per-CPU allocator for small vmaps, which amortizes or avoids
global locking.

To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
must be used in place of vmap and vunmap.  Vmalloc does not use these
interfaces at the moment, so it will not be quite so scalable (although it
will use lazy TLB flushing).

As a quick test of performance, I ran a test that loops in the kernel,
linearly mapping then touching then unmapping 4 pages.  Different numbers
of tests were run in parallel on an 4 core, 2 socket opteron.  Results are
in nanoseconds per map+touch+unmap.

threads           vanilla         vmap rewrite
1                 14700           2900
2                 33600           3000
4                 49500           2800
8                 70631           2900

So with a 8 cores, the rewritten version is already 25x faster.

In a slightly more realistic test (although with an older and less
scalable version of the patch), I ripped the not-very-good vunmap batching
code out of XFS, and implemented the large buffer mapping with vm_map_ram
and vm_unmap_ram...  along with a couple of other tricks, I was able to
speed up a large directory workload by 20x on a 64 CPU system.  I believe
vmap/vunmap is actually sped up a lot more than 20x on such a system, but
I'm running into other locks now.  vmap is pretty well blown off the
profiles.

Before:
1352059 total                                      0.1401
798784 _write_lock                              8320.6667 <- vmlist_lock
529313 default_idle                             1181.5022
 15242 smp_call_function                         15.8771  <- vmap tlb flushing
  2472 __get_vm_area_node                         1.9312  <- vmap
  1762 remove_vm_area                             4.5885  <- vunmap
   316 map_vm_area                                0.2297  <- vmap
   312 kfree                                      0.1950
   300 _spin_lock                                 3.1250
   252 sn_send_IPI_phys                           0.4375  <- tlb flushing
   238 vmap                                       0.8264  <- vmap
   216 find_lock_page                             0.5192
   196 find_next_bit                              0.3603
   136 sn2_send_IPI                               0.2024
   130 pio_phys_write_mmr                         2.0312
   118 unmap_kernel_range                         0.1229

After:
 78406 total                                      0.0081
 40053 default_idle                              89.4040
 33576 ia64_spinlock_contention                 349.7500
  1650 _spin_lock                                17.1875
   319 __reg_op                                   0.5538
   281 _atomic_dec_and_lock                       1.0977
   153 mutex_unlock                               1.5938
   123 iget_locked                                0.1671
   117 xfs_dir_lookup                             0.1662
   117 dput                                       0.1406
   114 xfs_iget_core                              0.0268
    92 xfs_da_hashname                            0.1917
    75 d_alloc                                    0.0670
    68 vmap_page_range                            0.0462 <- vmap
    58 kmem_cache_alloc                           0.0604
    57 memset                                     0.0540
    52 rb_next                                    0.1625
    50 __copy_user                                0.0208
    49 bitmap_find_free_region                    0.2188 <- vmap
    46 ia64_sn_udelay                             0.1106
    45 find_inode_fast                            0.1406
    42 memcmp                                     0.2188
    42 finish_task_switch                         0.1094
    42 __d_lookup                                 0.0410
    40 radix_tree_lookup_slot                     0.1250
    37 _spin_unlock_irqrestore                    0.3854
    36 xfs_bmapi                                  0.0050
    36 kmem_cache_free                            0.0256
    35 xfs_vn_getattr                             0.0322
    34 radix_tree_lookup                          0.1062
    33 __link_path_walk                           0.0035
    31 xfs_da_do_buf                              0.0091
    30 _xfs_buf_find                              0.0204
    28 find_get_page                              0.0875
    27 xfs_iread                                  0.0241
    27 __strncpy_from_user                        0.2812
    26 _xfs_buf_initialize                        0.0406
    24 _xfs_buf_lookup_pages                      0.0179
    24 vunmap_page_range                          0.0250 <- vunmap
    23 find_lock_page                             0.0799
    22 vm_map_ram                                 0.0087 <- vmap
    20 kfree                                      0.0125
    19 put_page                                   0.0330
    18 __kmalloc                                  0.0176
    17 xfs_da_node_lookup_int                     0.0086
    17 _read_lock                                 0.0885
    17 page_waitqueue                             0.0664

vmap has gone from being the top 5 on the profiles and flushing the crap
out of all TLBs, to using less than 1% of kernel time.

[akpm@linux-foundation.org: cleanups, section fix]
[akpm@linux-foundation.org: fix build on alpha]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:32 -07:00
Eric Anholt
d1d8c925b7 Export kmap_atomic_pfn for DRM-GEM.
The driver would like to map IO space directly for copying data in when
appropriate, to avoid CPU cache flushing for streaming writes.
kmap_atomic_pfn lets us avoid IPIs associated with ioremap for this process.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-10-18 07:10:12 +10:00
Linus Torvalds
e533b22705 Merge branch 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  do_generic_file_read: s/EINTR/EIO/ if lock_page_killable() fails
  softirq, warning fix: correct a format to avoid a warning
  softirqs, debug: preemption check
  x86, pci-hotplug, calgary / rio: fix EBDA ioremap()
  IO resources, x86: ioremap sanity check to catch mapping requests exceeding, fix
  IO resources, x86: ioremap sanity check to catch mapping requests exceeding the BAR sizes
  softlockup: Documentation/sysctl/kernel.txt: fix softlockup_thresh description
  dmi scan: warn about too early calls to dmi_check_system()
  generic: redefine resource_size_t as phys_addr_t
  generic: make PFN_PHYS explicitly return phys_addr_t
  generic: add phys_addr_t for holding physical addresses
  softirq: allocate less vectors
  IO resources: fix/remove printk
  printk: robustify printk, update comment
  printk: robustify printk, fix #2
  printk: robustify printk, fix
  printk: robustify printk

Fixed up conflicts in:
	arch/powerpc/include/asm/types.h
	arch/powerpc/platforms/Kconfig.cputype
manually.
2008-10-16 15:17:40 -07:00
Linus Torvalds
0999d978dc Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix compat-vdso
  x86/mm: unify init task OOM handling
  x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault
2008-10-16 15:08:45 -07:00
Ingo Molnar
b2aaf8f74c Merge branch 'linus' into stackprotector
Conflicts:
	arch/x86/kernel/Makefile
	include/asm-x86/pda.h
2008-10-15 13:46:29 +02:00
Ingo Molnar
6b2ada8210 Merge branches 'core/softlockup', 'core/softirq', 'core/resources', 'core/printk' and 'core/misc' into core-v28-for-linus 2008-10-15 12:48:44 +02:00
Pekka Paalanen
4427414170 mmiotrace: remove left-over marker cruft
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:37:17 +02:00
Pekka Paalanen
9e57fb35d7 x86 mmiotrace: implement mmiotrace_printk()
Offer mmiotrace users a function to inject markers from inside the kernel.
This depends on the trace_vprintk() patch.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:37:11 +02:00
Pekka Paalanen
bbe5c7830c x86 mmiotrace: fix a rare memory leak
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:37:01 +02:00
Pekka Paalanen
611b159768 x86: fix mmiotrace 8-bit register decoding
When SIL, DIL, BPL or SPL registers were used in MMIO, the datum
was extracted from AH, BH, CH, or DH, which are incorrect.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: "Vegard Nossum" <vegard.nossum@gmail.com>
Cc: "Steven Rostedt" <srostedt@redhat.com>
Cc: proski@gnu.org
Cc: "Pekka Enberg"
	<penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:33:50 +02:00
Ingo Molnar
3a1dfe6eef x86/mm: unify init task OOM handling
Linus noticed that the "again:" versus "survive:" OOM logic for
the init task was arbitrarily different.

The 64-bit codepath is the better one, because it correctly re-lookups
the vma after having dropped the ->mmap_sem.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-13 18:11:13 +02:00
Linus Torvalds
891cffbd6b x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault
Arjan reported a spike in the following bug pattern in v2.6.27:

   http://www.kerneloops.org/searchweek.php?search=lock_page

which happens because hwclock started triggering warnings due to
a (correct) might_sleep() check in the MM code.

The warning occurs because hwclock uses this dubious sequence of
code to run "atomic" code:

  static unsigned long
  atomic(const char *name, unsigned long (*op)(unsigned long),
         unsigned long arg)
  {
    unsigned long v;
    __asm__ volatile ("cli");
    v = (*op)(arg);
    __asm__ volatile ("sti");
    return v;
  }

Then it pagefaults in that "atomic" section, triggering the warning.

There is no way the kernel could provide "atomicity" in this path,
a page fault is a cannot-continue machine event so the kernel has to
wait for the page to be filled in.

Even if it was just a minor fault we'd have to take locks and might have
to spend quite a bit of time with interrupts disabled - not nice to irq
latencies in general.

So instead just enable interrupts in the pagefault path unconditionally
if we come from user-space, and handle the fault.

Also, while touching this code, unify some trivial parts of the x86
VM paths at the same time.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 17:46:39 +02:00
Yinghai Lu
c1a2f4b108 x86: change early_ioremap to use slots instead of nesting
so we could remove the requirement that one needs to call
early_iounmap() in exactly reverse order of early_ioremap().

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:34:23 +02:00
Vegard Nossum
af5c2bd16a x86: fix virt_addr_valid() with CONFIG_DEBUG_VIRTUAL=y, v2
virt_addr_valid() calls __pa(), which calls __phys_addr(). With
CONFIG_DEBUG_VIRTUAL=y, __phys_addr() will kill the kernel if the
address *isn't* valid. That's clearly wrong for virt_addr_valid().

We also incorporate the debugging checks into virt_addr_valid().

Signed-off-by: Vegard Nossum <vegardno@ben.ifi.uio.no>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:15 +02:00
Alexander van Heukelum
69c89b5bf7 traps: x86: remove trace_hardirqs_fixup from pagefault handler
The last use of trace_hardirqs_fixup is unnecessary, because the
trap is taken with interrupt off on i386 as well as x86_64, and
the irq-tracer is notified of this from the assembly code.

trace_hardirqs_fixup and trace_hardirqs_fixup_flags are removed
from include/asm-x86/irqflags.h as they are no longer used.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:04 +02:00
Jack Steiner
2e42060c19 x86, uv: add early detection of UV system types
Portions of the ACPI code needs to know if a system is a UV system prior
to genapic initialization. This patch adds a call early_acpi_boot_init()
so that the apic type is discovered earlier.

V2 of the patch adding fixes from Yinghai Lu.
Much cleaner and smaller.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:51 +02:00
Jan Beulich
606ee44dbb x86: make mm/gup.c more virtualization friendly
Since pte_flags() is much cheaper than pte_val() in some virtualized
environments (namely, Xen), use the former whereever possible.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: "Nick Piggin" <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:18 +02:00
Jan Beulich
5e72d9e485 x86-64: fix combining of regions in init_memory_mapping()
When nr_range gets decremented, the same slot must be considered for
coalescing with its new successor again.

The issue is apparently pretty benign to native code, but surfaces as a
boot time crash in our forward ported Xen tree (where the page table
setup overall works differently than in native).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:16 +02:00
Jeremy Fitzhardinge
a32ad46267 x86-64: don't check for map replacement
The check prevents flags on mappings from being changed, which is not
desireable.  There's no need to check for replacing a mapping, and
x86-32 does not do this check.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:05 +02:00
Jeremy Fitzhardinge
1494177942 x86: add early_memremap()
early_ioremap() is also used to map normal memory when constructing
the linear memory mapping.  However, since we sometimes need to be able
to distinguish between actual IO mappings and normal memory mappings,
add a early_memremap() call, which maps with PAGE_KERNEL (as opposed
to PAGE_KERNEL_IO for early_ioremap()), and use it when constructing
pagetables.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:01 +02:00
Jeremy Fitzhardinge
be43d72835 x86: add _PAGE_IOMAP pte flag for IO mappings
Use one of the software-defined PTE bits to indicate that a mapping is
intended for an IO address.  On native hardware this is irrelevent,
since a physical address is a physical address.  But in a virtual
environment, physical addresses are also virtualized, so there needs
to be some way to distinguish between pseudo-physical addresses and
actual hardware addresses; _PAGE_IOMAP indicates this intent.

By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn't
even appear in the final pagetable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:56 +02:00
Yinghai Lu
927604c759 x86: rename discontig_32.c to numa_32.c
name it in line with its purpose.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:19:59 +02:00
Ingo Molnar
8daf14cf56 Merge branches 'x86/xen', 'x86/build', 'x86/microcode', 'x86/mm-debug-v2', 'x86/memory-corruption-check', 'x86/early-printk', 'x86/xsave', 'x86/ptrace-v2', 'x86/quirks', 'x86/setup', 'x86/spinlocks' and 'x86/signal' into x86/core-v2 2008-10-12 15:50:02 +02:00
Ingo Molnar
46eaa67020 x86: memory corruption check - cleanup
Move the prototypes from the generic kernel.h header to the more
appropriate include/asm-x86/bios_ebda.h header file.

Also, remove the check from the power management code - this is a
pure x86 matter for now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:09:23 +02:00
Ingo Molnar
a9b9e81c91 Merge branch 'linus' into x86/memory-corruption-check 2008-10-12 15:05:39 +02:00
Ingo Molnar
eceb138336 Merge branches 'core/signal' and 'x86/spinlocks' into x86/xen
Conflicts:
	include/asm-x86/spinlock.h
2008-10-12 13:20:25 +02:00
Ingo Molnar
365d46dc9b Merge branch 'linus' into x86/xen
Conflicts:
	arch/x86/kernel/cpu/common.c
	arch/x86/kernel/process_64.c
	arch/x86/xen/enlighten.c
2008-10-12 12:37:32 +02:00
Alan Cox
c613ec1a7f x86, early_ioremap: fix fencepost error
The x86 implementation of early_ioremap has an off by one error. If we get
an object which ends on the first byte of a page we undermap by one page and
this causes a crash on boot with the ASUS P5QL whose DMI table happens to fit
this alignment.

The size computation is currently

	last_addr = phys_addr + size - 1;
	npages = (PAGE_ALIGN(last_addr) - phys_addr)

(Consider a request for 1 byte at alignment 0...)

Closes #11693

Debugging work by Ian Campbell/Felix Geyer

Signed-off-by: Alan Cox <alan@rehat.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 11:19:04 +02:00
Ingo Molnar
0afe2db213 Merge branch 'x86/unify-cpu-detect' into x86-v28-for-linus-phase4-D
Conflicts:
	arch/x86/kernel/cpu/common.c
	arch/x86/kernel/signal_64.c
	include/asm-x86/cpufeature.h
2008-10-11 20:23:20 +02:00
Ingo Molnar
3dd392a407 Merge branch 'linus' into x86/pat2
Conflicts:
	arch/x86/mm/init_64.c
2008-10-10 19:30:08 +02:00
Suresh Siddha
b27a43c1e9 x86, cpa: make the kernel physical mapping initialization a two pass sequence, fix
Jeremy Fitzhardinge wrote:

> I'd noticed that current tip/master hasn't been booting under Xen, and I
> just got around to bisecting it down to this change.
>
> commit 065ae73c5462d42e9761afb76f2b52965ff45bd6
> Author: Suresh Siddha <suresh.b.siddha@intel.com>
>
>    x86, cpa: make the kernel physical mapping initialization a two pass sequence
>
> This patch is causing Xen to fail various pagetable updates because it
> ends up remapping pagetables to RW, which Xen explicitly prohibits (as
> that would allow guests to make arbitrary changes to pagetables, rather
> than have them mediated by the hypervisor).

Instead of making init a two pass sequence, to satisfy the Intel's TLB
Application note (developer.intel.com/design/processor/applnots/317080.pdf
Section 6 page 26), we preserve the original page permissions
when fragmenting the large mappings and don't touch the existing memory
mapping (which satisfies Xen's requirements).

Only open issue is: on a native linux kernel, we will go back to mapping
the first 0-1GB kernel identity mapping as executable (because of the
static mapping setup in head_64.S). We can fix this in a different
patch if needed.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:21 +02:00
Ingo Molnar
ad2cde16a2 x86, pat: cleanups
clean up recently added code to be more consistent with other x86 code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:20 +02:00
Suresh Siddha
28dd033f43 x86: fix pagetable init 64-bit breakage
Fix _end alignment check - can trigger a crash if _end happens to be
on a page boundary.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:20 +02:00
Suresh Siddha
9542ada803 x86: track memtype for RAM in page struct
Track the memtype for RAM pages in page struct instead of using the
memtype list. This avoids the explosion in the number of entries in
memtype list (of the order of 20,000 with AGP) and makes the PAT
tracking simpler.

We are using PG_arch_1 bit in page->flags.

We still use the memtype list for non RAM pages.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:18 +02:00
Suresh Siddha
ad5ca55f6b x86, cpa: srlz cpa(), global flush tlb after splitting big page and before doing cpa
Do a global flush tlb after splitting the large page and before we do the
actual change page attribute in the PTE.

With out this, we violate the TLB application note, which says
    "The TLBs may contain both ordinary and large-page translations for
     a 4-KByte range of linear addresses. This may occur if software
     modifies the paging structures so that the page size used for the
     address range changes. If the two translations differ with respect
     to page frame or attributes (e.g., permissions), processor behavior
     is undefined and may be implementation-specific."

And also serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity
mappings) using cpa_lock. So that we don't allow any other cpu, with stale
large tlb entries change the page attribute in parallel to some other cpu
splitting a large page entry along with changing the attribute.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:17 +02:00
Suresh Siddha
8311eb84bf x86, cpa: remove cpa pool code
Interrupt context no longer splits large page in cpa(). So we can do away
with cpa memory pool code.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:16 +02:00
Suresh Siddha
55121b4369 x86, cpa: no need to check alias for __set_pages_p/__set_pages_np
No alias checking needed for setting present/not-present mapping. Otherwise,
we may need to break large pages for 64-bit kernel text mappings (this adds to
complexity if we want to do this from atomic context especially, for ex:
with CONFIG_DEBUG_PAGEALLOC). Let's keep it simple!

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:15 +02:00
Suresh Siddha
0b8fdcbcd2 x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC
Don't use large pages for kernel identity mapping with DEBUG_PAGEALLOC.
This will remove the need to split the large page for the
allocated kernel page in the interrupt context.

This will simplify cpa code(as we don't do the split any more from the
interrupt context). cpa code simplication in the subsequent patches.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:14 +02:00
Suresh Siddha
a2699e477b x86, cpa: make the kernel physical mapping initialization a two pass sequence
In the first pass, kernel physical mapping will be setup using large or
small pages but uses the same PTE attributes as that of the early
PTE attributes setup by early boot code in head_[32|64].S

After flushing TLB's, we go through the second pass, which setups the
direct mapped PTE's with the appropriate attributes (like NX, GLOBAL etc)
which are runtime detectable.

This two pass mechanism conforms to the TLB app note which says:

"Software should not write to a paging-structure entry in a way that would
 change, for any linear address, both the page size and either the page frame
 or attributes."

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:13 +02:00
Ingo Molnar
e496e3d645 Merge branches 'x86/alternatives', 'x86/cleanups', 'x86/commandline', 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/doc', 'x86/exports', 'x86/fpu', 'x86/gart', 'x86/idle', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/oprofile', 'x86/paravirt', 'x86/reboot', 'x86/sparse-fixes', 'x86/tsc', 'x86/urgent' and 'x86/vmalloc' into x86-v28-for-linus-phase1 2008-10-06 18:17:07 +02:00
Ingo Molnar
0962f402af Merge branch 'x86/prototypes' into x86-v28-for-linus-phase1
Conflicts:
	arch/x86/kernel/process_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-06 18:06:53 +02:00
Bruce Allan
a03352d2c1 x86: export set_memory_ro and set_memory_rw
Export set_memory_ro() and set_memory_rw() calls for use by drivers that need
to have more debug information about who might be writing to memory space.
this was initially developed for use while debugging a memory corruption
problem with e1000e.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-30 09:06:41 +02:00
Suresh Siddha
379daf6290 IO resources, x86: ioremap sanity check to catch mapping requests exceeding the BAR sizes
Go through the iomem resource tree to check if any of the ioremap()
requests span more than any slot in the iomem resource tree and do
a WARN_ON() if we hit this check.

This will raise a red-flag, if some driver is mapping more than what
is needed. And hopefully identify possible corruptions much earlier.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-26 09:42:20 +02:00
Ingo Molnar
07bbc16a86 Merge branch 'timers/urgent' into x86/xen
Conflicts:
	arch/x86/kernel/process_32.c
	arch/x86/kernel/process_64.c

Manual merge:

	arch/x86/kernel/smpboot.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-23 23:26:42 +02:00
Alex Nixon
5132895f14 x86/paravirt: Remove duplicate paravirt_pagetable_setup_{start, done}()
They were already called once in arch/x86/kernel/setup.c - we don't need to call them again.

fixes:

  http://bugzilla.kernel.org/show_bug.cgi?id=11485

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-14 18:10:01 +02:00
Ingo Molnar
f81b691a3d Merge commit 'v2.6.27-rc6' into x86/pat 2008-09-14 17:26:53 +02:00
Hugh Dickins
bb577f980e x86: add periodic corruption check
Perodically check for corruption in low phusical memory.  Don't bother
checking at fault time, since it won't show anything useful.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-07 17:40:00 +02:00
Jeremy Fitzhardinge
5394f80f92 x86: check for and defend against BIOS memory corruption
Some BIOSes have been observed to corrupt memory in the low 64k.  This
change:
 - Reserves all memory which does not have to be in that area, to
   prevent it from being used as general memory by the kernel.  Things
   like the SMP trampoline are still in the memory, however.
 - Clears the reserved memory so we can observe changes to it.
 - Adds a function check_for_bios_corruption() which checks and reports on
   memory becoming unexpectedly non-zero.  Currently it's called in the
   x86 fault handler, and the powermanagement debug output.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-07 17:39:59 +02:00
Jan Beulich
cc643d4687 x86: adjust vmalloc_sync_all() for Xen (2nd try)
Since the fourth PDPT entry cannot be shared under Xen,
vmalloc_sync_all() must iterate over pmd-s rather than pgd-s here.
Luckily, the code isn't used for native PAE (SHARED_KERNEL_PMD is 1)
and the change is benign to non-PAE.

Also do a little more cleanup in that function.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
2008-09-06 19:48:01 +02:00
Jan Beulich
17b746278d x86: pgd_{c,d}tor() cleanup
Giving pgd_ctor() a properly typed parameter allows eliminating a local
variable. Adjust pgd_dtor() to match.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: "Jeremy Fitzhardinge" <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-06 19:47:09 +02:00
Alex Nixon
efd327a2d4 x86/paravirt: Remove duplicate paravirt_pagetable_setup_{start, done}()
They were already called once in arch/x86/kernel/setup.c - we don't need to call them again.

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-05 17:46:44 +02:00
Jeremy Fitzhardinge
110e0358e7 x86: make sure the CPA test code's use of _PAGE_UNUSED1 is obvious
The CPA test code uses _PAGE_UNUSED1, so make sure its obvious.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-05 17:09:57 +02:00
Ingo Molnar
deed05b7c0 x86, init_64.c: cleanup
Clean up comments.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-05 10:23:47 +02:00
Yinghai Lu
bd220a24a9 x86: move nonx_setup etc from common.c to init_64.c
like 32 bit put it in init_32.c

Signed-off-by: Yinghai <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-05 10:23:47 +02:00
H. Peter Anvin
7203781c98 Merge branch 'x86/cpu' into x86/core
Conflicts:

	arch/x86/kernel/cpu/feature_names.c
	include/asm-x86/cpufeature.h
2008-09-04 08:08:42 -07:00
Ingo Molnar
ea1c9de45e Merge branch 'x86/urgent' into x86/cleanups 2008-08-25 11:10:42 +02:00
Venki Pallipadi
01de05af94 x86: have set_memory_array_{uc,wb} coalesce memtypes, fix
Fix the start addr for free_memtype calls in the error path.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-23 17:33:12 +02:00
Jan Beulich
9482ac6e34 x86: fix two modpost warnings in mm/init_64.c
early_io{re,un}map() are __init and hence can't be called from __meminit
functions.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-22 07:51:54 +02:00
Jan Beulich
8ae3a5a8df x86: fix 1:1 mapping init on 64-bit (memory hotplug case)
While I don't have a hotplug capable system at hand, I think two issues need
fixing:

- pud_phys (in kernel_physical_ampping_init()) would remain uninitialized in
  the after_bootmem case

- the locking done just around phys_pmd_{init,update}() would leave out pgd
  updates, and it was needlessly covering code portions that do allocations
  (perhaps using a more friendly gfp value in alloc_low_page() would then be
  possible)

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-22 07:51:53 +02:00
Rene Herman
c5e147cf5a x86: have set_memory_array_{uc,wb} coalesce memtypes.
Actually, might as well simply reconstruct the memtype list at free time
I guess. How is this for a coalescing version of the array functions?

Compiles, boots and provides me with:

  root@7ixe4:~# wc -l /debug/x86/pat_memtype_list
  53 /debug/x86/pat_memtype_list

otherwise (down from 16384+).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-22 06:07:32 +02:00
Rene Herman
9a79f4f491 x86: {reverve,free}_memtype() take a physical address
The new set_memory_array_{uc,wb}() pass virtual addresses to
{reserve,free}_memtype() it seems.

Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-22 06:07:32 +02:00
Ingo Molnar
8b53b57576 Merge branch 'x86/urgent' into x86/pat
Conflicts:
	arch/x86/mm/pageattr.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-22 06:06:51 +02:00
Shaohua Li
ab7e792437 x86: fix pageattr-test
We changed the interface of some pageattr, this patch makes
pageattr-test compile.

Signed-off-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 13:47:47 +02:00
Shaohua Li
d75586ad01 x86, pageattr: introduce APIs to change pageattr of a page array
Add array interface APIs of pageattr. page based cache flush is quite
slow for a lot of pages. If pages are more than 1024 (4M), the patch
will use a wbinvd(). We have a simple test here (run a 3d game - open
arena), nearly all agp memory allocation are small (< 1M), so suppose
this will not impact runtime performance.

Signed-off-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 13:47:45 +02:00
Ingo Molnar
cacf890694 Revert "introduce two APIs for page attribute"
This reverts commit 1ac2f7d55b.
2008-08-21 13:46:33 +02:00
venkatesh.pallipadi@intel.com
28df82ebab devmem, x86: PAT Change /dev/mem mmap with O_SYNC to use UC_MINUS
All kernel mappings like ioremap(), etc uses UC_MINUS as the type. /dev/mem
mappings with /dev/mem being opened with O_SYNC however was using UC,
resulting in a conflict with /dev/mem mmap failing. This seems to be
affecting some apps (one being flashrom) which are using O_SYNC and which were
working before.

Switch /dev/mem with O_SYNC also to UC_MINUS.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 13:27:33 +02:00
venkatesh.pallipadi@intel.com
c15238df3b x86: PAT proper tracking of set_memory_uc and friends
Big thinko in pat memtype tracking code. reserve_memtype should be called
with physical address and not virtual address.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 13:27:32 +02:00
Marcin Slusarz
7701e8c59b x86, mmiotrace: silence section mismatch warning - leave_uniprocessor
WARNING: vmlinux.o(.text+0x180af): Section mismatch in reference from the function leave_uniprocessor() to the function .cpuinit.text:cpu_up()
The function leave_uniprocessor() references
the function __cpuinit cpu_up().
This is often because leave_uniprocessor lacks a __cpuinit
annotation or the annotation of cpu_up is wrong.

leave_uniprocessor calls cpu_up only when CONFIG_HOTPLUG_CPU is set,
so it can be safely annotated as __ref

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pekka Paalanen <pq@iki.fi>
2008-08-21 12:35:16 +02:00
Dave Young
e621bd1895 i386: vmalloc size fix
Booting kernel with vmalloc=[any size<=16m] will oops on my pc (i386/1G memory).

BUG_ON in arch/x86/mm/init_32.c triggered:
BUG_ON((unsigned long)high_memory > VMALLOC_START);

It's due to the vm area hole.

In include/asm-x86/pgtable_32.h:
#define VMALLOC_OFFSET	(8 * 1024 * 1024)
#define VMALLOC_START	(((unsigned long)high_memory + 2 * VMALLOC_OFFSET - 1) \
			 & ~(VMALLOC_OFFSET - 1))

There's several related point:
1. MAXMEM :
 (-__PAGE_OFFSET - __VMALLOC_RESERVE).
The space after VMALLOC_END is included as well, I set it to
(VMALLOC_END - PAGE_OFFSET - __VMALLOC_RESERVE)

2. VMALLOC_OFFSET is not considered in __VMALLOC_RESERVE
fixed by adding VMALLOC_OFFSET to it.

3. VMALLOC_START :
 (((unsigned long)high_memory + 2 * VMALLOC_OFFSET - 1) & ~(VMALLOC_OFFSET - 1))
So it's not always 8M, bigger than 8M possible.
I set it to ((unsigned long)high_memory + VMALLOC_OFFSET)

4. the VMALLOC_RESERVE is an unused macro, so remove it here.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: akpm@linux-foundation.org
Cc: hidave.darkstar@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-08-21 10:13:21 +02:00
Arjan van de Ven
0c072bb452 x86: use WARN() in arch/x86/mm/ioremap.c
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 09:59:27 +02:00
Jeremy Fitzhardinge
27990eac52 x86: another user of PTE_FLAGS_MASK
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-20 12:38:41 +02:00
Venki Pallipadi
80c5e73d60 x86: fix Xorg startup/shutdown slowdown with PAT
Rene Herman reported significant Xorg startup/shutdown slowdown due
to PAT. It turns out that the memtype list has thousands of entries.

Add cached_entry to list add routine, in order to speed up the
lookup for sequential reserve_memtype calls.

Reported-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-20 12:08:37 +02:00
Ingo Molnar
7393423dd9 Merge branch 'linus' into x86/cleanups 2008-08-20 11:52:15 +02:00
Linus Torvalds
0473b79929 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
  x86: add MAP_STACK mmap flag
  x86: fix section mismatch warning - spp_getpage()
  x86: change init_gdt to update the gdt via write_gdt, rather than a direct write.
  x86-64: fix overlap of modules and fixmap areas
  x86, geode-mfgpt: check IRQ before using MFGPT as clocksource
  x86, acpi: cleanup, temp_stack is used only when CONFIG_SMP is set
  x86: fix spin_is_contended()
  x86, nmi: clean UP NMI watchdog failure message
  x86, NMI: fix watchdog failure message
  x86: fix /proc/meminfo DirectMap
  x86: fix readb() et al compile error with gcc-3.2.3
  arch/x86/Kconfig: clean up, experimental adjustement
  x86: invalidate caches before going into suspend
  x86, perfctr: don't use CCCR_OVF_PMI1 on Pentium 4Ds
  x86, AMD IOMMU: initialize dma_ops after sysfs registration
  x86m AMD IOMMU: cleanup: replace LOW_U32 macro with generic lower_32_bits
  x86, AMD IOMMU: initialize device table properly
  x86, AMD IOMMU: use status bit instead of memory write-back for completion wait
  x86: silence mmconfig printk
  x86, msr: fix NULL pointer deref due to msr_open on nonexistent CPUs
  ...
2008-08-16 17:14:07 -07:00
Marcin Slusarz
8d6ea9674c x86: fix section mismatch warning - spp_getpage()
WARNING: vmlinux.o(.text+0x17a3e): Section mismatch in reference from the function set_pte_vaddr_pud() to the function .init.text:spp_getpage()
The function set_pte_vaddr_pud() references
the function __init spp_getpage().
This is often because set_pte_vaddr_pud lacks a __init
annotation or the annotation of spp_getpage is wrong.

spp_getpage is called from __init (__init_extra_mapping) and
non __init (set_pte_vaddr_pud) functions, so it can't be __init.
Unfortunately it calls alloc_bootmem_pages which is __init,
but does it only when bootmem allocator is available (after_bootmem == 0).

So annotate it accordingly.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
2008-08-15 19:16:06 +02:00
Andi Kleen
e213e87785 x86: Fix ioremap off by one BUG
Jean Delvare's machine triggered this BUG

acpi_os_map_memory phys ffff0000 size 65535
------------[ cut here ]------------
kernel BUG at arch/x86/mm/pat.c:233!

with ACPI in the backtrace.

Adding some debugging output showed that ACPI calls

acpi_os_map_memory phys ffff0000 size 65535

And ioremap/PAT does this check in 32bit, so addr+size wraps and the BUG
in reserve_memtype() triggers incorrectly.

        BUG_ON(start >= end); /* end is exclusive */

But reserve_memtype already uses u64:

int reserve_memtype(u64 start, u64 end,

so the 32bit truncation must happen in the caller. Presumably in ioremap
when it passes this information to reserve_memtype().

This patch does this computation in 64bit.

http://bugzilla.kernel.org/show_bug.cgi?id=11346

Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-08-15 18:18:38 +02:00
Nick Piggin
5843d9a4d0 x86, pat: avoid highmem cache attribute aliasing
Highmem code can leave ptes and tlb entries around for a given page even after
kunmap, and after it has been freed.

>From what I can gather, the PAT code may change the cache attributes of
arbitrary physical addresses (ie. including highmem pages), which would result
in aliases in the case that it operates on one of these lazy tlb highmem
pages.

Flushing kmaps should solve the problem.

I've also just added code for conditional flushing if we haven't got
any dangling highmem aliases -- this should help performance if we
change page attributes frequently or systems that aren't using much
highmem pages (eg. if < 4G RAM). Should be turned into 2 patches, but
just for RFC...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15 17:22:57 +02:00
Shaohua Li
1ac2f7d55b introduce two APIs for page attribute
Introduce two APIs for page attribute. flushing tlb/cache in every page
attribute is expensive. AGP gart usually will do a lot of operations to
change a page to uc, new APIs can reduce flush.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: airlied@linux.ie
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15 16:30:45 +02:00
Hugh Dickins
a06de63000 x86: fix /proc/meminfo DirectMap
Do we actually want these DirectMap lines in the x86 /proc/meminfo?
I can see they're interesting to CPA developers and TLB optimizers,
but they don't fit its usual "where has all my memory gone?" usage.
If they are to stay, here are some fixes.

1. On x86_32 without PAE, they're not 2M but 4M pages: no need to
   mess with the internal enum, but show the right name to users.

2. Many machines can never show anything but 0 for DirectMap1G,
   so suppress that line unless direct_gbpages are really enabled.

3. The unit in /proc/meminfo is kB not number of pages: HugePages
   messed that up, but they're an example to regret not to follow.

4. Once we use kB, it's easy to see that 1GB has gone missing (which
   explains why CONFIG_CPA_DEBUG=y soon wraps DirectMap2M negative):
   because head_64.S's level2_ident_pgt entries were not counted.
   My fix is not ideal, but works for more and for less than 1G,
   and avoids interfering with early bootup pagetable contortions.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15 15:27:55 +02:00
Ingo Molnar
8d7ccaa545 Merge commit 'v2.6.27-rc3' into x86/prototypes
Conflicts:

	include/asm-x86/dma-mapping.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14 12:19:59 +02:00
Yinghai Lu
858f774733 x86: don't call e820_regiter_active_regions if out of range on node
so we don't get warning on 32bit system with 64g RAM or more

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14 11:35:52 +02:00
Arjan van de Ven
875e40b975 x86: use WARN() in arch/x86/mm/pageattr.c
Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes
part of the warning section for better reporting/collection.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-13 19:05:39 +02:00
Linus Torvalds
1c89ac5501 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  fix spinlock recursion in hvc_console
  stop_machine: remove unused variable
  modules: extend initcall_debug functionality to the module loader
  export virtio_rng.h
  lguest: use get_user_pages_fast() instead of get_user_pages()
  mm: Make generic weak get_user_pages_fast and EXPORT_GPL it
  lguest: don't set MAC address for guest unless specified
2008-08-12 08:40:19 -07:00
Rusty Russell
912985dce4 mm: Make generic weak get_user_pages_fast and EXPORT_GPL it
Out of line get_user_pages_fast fallback implementation, make it a weak
symbol, get rid of CONFIG_HAVE_GET_USER_PAGES_FAST.

Export the symbol to modules so lguest can use it.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-08-12 17:52:53 +10:00
Andreas Herrmann
012f09e794 x86: compile pat debugfs interface only if CONFIG_X86_PAT is set
Recently I've run a kernel w/o PAT support and wondered why there was
a file "x86/pat_memtype_list" in debugfs. Of course it's empty if PAT
is disabled ... so just get rid of this interface if PAT is disabled.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11 21:30:59 +02:00
Jeremy Fitzhardinge
cf3e505012 x86: work around gcc 3.4.x bug
Simon Horman reported that gcc-3.4.x crashes when compiling
pgd_prepopulate_pmd() when PREALLOCATED_PMDS == 0 and CONFIG_DEBUG_INFO
is enabled.

Adding an extra check for PREALLOCATED_PMDS == 0 [which is compiled out
by gcc] seems to avoid the problem.

Reported-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11 18:44:02 +02:00
Ingo Molnar
6de9c70882 Merge branch 'linus' into x86/cleanups 2008-08-11 12:57:01 +02:00
Linus Torvalds
9b79022ca9 Fix 'get_user_pages_fast()' with non-page-aligned start address
Alexey Dobriyan reported trouble with LTP with the new fast-gup code,
and Johannes Weiner debugged it to non-page-aligned addresses, where the
new get_user_pages_fast() code would do all the wrong things, including
just traversing past the end of the requested area due to 'addr' never
matching 'end' exactly.

This is not a pretty fix, and we may actually want to move the alignment
into generic code, leaving just the core code per-arch, but Alexey
verified that the vmsplice01 LTP test doesn't crash with this.

Reported-and-tested-by: Alexey Dobriyan <adobriyan@gmail.com>
Debugged-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-28 17:54:21 -07:00
Johannes Weiner
8dad322f54 x86: use generic show_mem()
Remove arch-specific show_mem() in favor of the generic version.

This also removes the following redundant information display:

	- pages in swapcache, printed by show_swap_cache_info()
	- dirty pages, writeback pages, mapped pages, slab pages,
	  pagetable pages, printed by show_free_areas()

where show_mem() calls show_free_areas(), which calls
show_swap_cache_info().

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:10 -07:00
Nick Piggin
652ea69536 x86: support 1GB hugepages with get_user_pages_lockless()
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:06 -07:00
Nick Piggin
8174c430e4 x86: lockless get_user_pages_fast()
Implement get_user_pages_fast without locking in the fastpath on x86.

Do an optimistic lockless pagetable walk, without taking mmap_sem or any
page table locks or even mmap_sem.  Page table existence is guaranteed by
turning interrupts off (combined with the fact that we're always looking
up the current mm, means we can do the lockless page table walk within the
constraints of the TLB shootdown design).  Basically we can do this
lockless pagetable walk in a similar manner to the way the CPU's pagetable
walker does not have to take any locks to find present ptes.

This patch (combined with the subsequent ones to convert direct IO to use
it) was found to give about 10% performance improvement on a 2 socket 8
core Intel Xeon system running an OLTP workload on DB2 v9.5

 "To test the effects of the patch, an OLTP workload was run on an IBM
  x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
  2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel.  Comparing
  runs with and without the patch resulted in an overall performance
  benefit of ~9.8%.  Correspondingly, oprofiles showed that samples from
  __up_read and __down_read routines that is seen during thread contention
  for system resources was reduced from 2.8% down to .05%.  Monitoring the
  /proc/vmstat output from the patched run showed that the counter for
  fast_gup contained a very high number while the fast_gup_slow value was
  zero."

(fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a
counter we had for the number of times the slowpath was invoked).

The main reason for the improvement is that DB2 has multiple threads each
issuing direct-IO.  Direct-IO uses get_user_pages, and thus the threads
contend the mmap_sem cacheline, and can also contend on page table locks.

I would anticipate larger performance gains on larger systems, however I
think DB2 uses an adaptive mix of threads and processes, so it could be
that thread contention remains pretty constant as machine size increases.
In which case, we stuck with "only" a 10% gain.

The downside of using get_user_pages_fast is that if there is not a pte
with the correct permissions for the access, we end up falling back to
get_user_pages and so the get_user_pages_fast is a bit of extra work.
However this should not be the common case in most performance critical
code.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: Kconfig fix]
[akpm@linux-foundation.org: Makefile fix/cleanup]
[akpm@linux-foundation.org: warning fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:06 -07:00
Joerg Roedel
17f3ab748e x86: convert discontig_32.c from round_up to roundup
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 15:39:22 +02:00
Joerg Roedel
be3e89ee6d x86: convert numa_64.c from round_up to roundup
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 15:39:21 +02:00
Joerg Roedel
d86bb0dac7 x86: convert init_64.c from round_up to roundup
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 15:39:21 +02:00
Joerg Roedel
15ae2d76ce x86: convert pageattr.c from round_up to roundup
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 15:39:20 +02:00
Andi Kleen
b4718e628d x86: add hugepagesz option on 64-bit
Add an hugepagesz=...  option similar to IA64, PPC etc.  to x86-64.

This finally allows to select GB pages for hugetlbfs in x86 now that all
the infrastructure is in place.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Andi Kleen
39c11e6c05 x86: support GB hugepages on 64-bit
Acked-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:18 -07:00
Andi Kleen
ceb8687961 hugetlb: introduce pud_huge
Straight forward extensions for huge pages located in the PUD instead of
PMDs.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:18 -07:00
Andi Kleen
a551643895 hugetlb: modular state for hugetlb page size
The goal of this patchset is to support multiple hugetlb page sizes.  This
is achieved by introducing a new struct hstate structure, which
encapsulates the important hugetlb state and constants (eg.  huge page
size, number of huge pages currently allocated, etc).

The hstate structure is then passed around the code which requires these
fields, they will do the right thing regardless of the exact hstate they
are operating on.

This patch adds the hstate structure, with a single global instance of it
(default_hstate), and does the basic work of converting hugetlb to use the
hstate.

Future patches will add more hstate structures to allow for different
hugetlbfs mounts to have different page sizes.

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Rik van Riel
28b2ee20c7 access_process_vm device memory infrastructure
In order to be able to debug things like the X server and programs using
the PPC Cell SPUs, the debugger needs to be able to access device memory
through ptrace and /proc/pid/mem.

This patch:

Add the generic_access_phys access function and put the hooks in place
to allow access_process_vm to access device or PPC Cell SPU memory.

[riel@redhat.com: Add documentation for the vm_ops->access function]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Johannes Weiner
b61bfa3c46 mm: move bootmem descriptors definition to a single place
There are a lot of places that define either a single bootmem descriptor or an
array of them.  Use only one central array with MAX_NUMNODES items instead.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:14 -07:00
Jaswinder Singh
4b6e9f27d0 x86: mm/ioremap.c declare early_ioremap_debug and early_ioremap_nested as static
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
2008-07-23 17:39:16 +05:30
Jaswinder Singh
70ef56414e x86: mm/fault.c declare do_page_fault before they get used
declared do_page_fault() in asm-x86/trap.h for both X86_32 and X86_64

removed do_invalid_op declaration from mm/fault.c as it is already declared in asm-x86/trap.h

Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
2008-07-23 17:36:37 +05:30
Jaswinder Singh
a80495ec92 x86: mm/init_XX.c declare functions before they get used
included <asm/smp.h> in mm/init_32.c for zap_low_mappings()

declared free_initmem() in asm-x86/page_XX.h

Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
2008-07-23 17:33:57 +05:30
Jeremy Fitzhardinge
77be1fabd0 x86: add PTE_FLAGS_MASK
PTE_PFN_MASK was getting lonely, so I made it a friend.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-22 10:43:45 +02:00
Jeremy Fitzhardinge
59438c9fc4 x86: rename PTE_MASK to PTE_PFN_MASK
Rusty, in his peevish way, complained that macros defining constants
should have a name which somewhat accurately reflects the actual
purpose of the constant.

Aside from the fact that PTE_MASK gives no clue as to what's actually
being masked, and is misleadingly similar to the functionally entirely
different PMD_MASK, PUD_MASK and PGD_MASK, I don't really see what the
problem is.

But if this patch silences the incessent noise, then it will have
achieved its goal (TODO: write test-case).

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-22 10:43:44 +02:00
Thomas Gleixner
cfc1b9a6a6 x86: convert Dprintk to pr_debug
There are a couple of places where (P)Dprintk is used which is an old
compile time enabled printk wrapper. Convert it to the generic
pr_debug().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-21 21:35:38 +02:00
Ingo Molnar
acee709cab Merge branches 'x86/urgent', 'x86/amd-iommu', 'x86/apic', 'x86/cleanups', 'x86/core', 'x86/cpu', 'x86/fixmap', 'x86/gart', 'x86/kprobes', 'x86/memtest', 'x86/modules', 'x86/nmi', 'x86/pat', 'x86/reboot', 'x86/setup', 'x86/step', 'x86/unify-pci', 'x86/uv', 'x86/xen' and 'xen-64bit' into x86/for-linus 2008-07-21 16:37:17 +02:00
Ingo Molnar
d092633bff Subject: devmem, x86: fix rename of CONFIG_NONPROMISC_DEVMEM
From: Arjan van de Ven <arjan@infradead.org>
Date: Sat, 19 Jul 2008 15:47:17 -0700

CONFIG_NONPROMISC_DEVMEM was a rather confusing name - but renaming it
to CONFIG_PROMISC_DEVMEM causes problems on architectures that do not
support this feature; this patch renames it to CONFIG_STRICT_DEVMEM,
so that architectures can opt-in into it.

( the polarity of the option is still the same as it was originally; it
  needs to be for now to not break architectures that don't have the
  infastructure yet to support this feature)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: "V.Radhakrishnan" <rk@atr-labs.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
2008-07-20 08:35:55 +02:00
venkatesh.pallipadi@intel.com
fec0962e0b x86: Add a debugfs interface to dump PAT memtype
Add a debugfs interface to list out all the PAT memtype reservations.
Appears at debugfs x86/pat_memtype_list and output format is
type @ <start addr>-<end addr>

We do not hold the lock while printing the entire list. So, the list may not be
a consistent copy in case where regions are getting added or deleted
at the same time.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-07-18 17:22:05 -07:00
Yinghai Lu
caadbdce24 x86: enable memory tester support on 32-bit
only supports memory below max_low_pfn.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 14:11:58 +02:00
Yinghai Lu
1f067167a8 x86: seperate memtest from init_64.c
it's separate functionality that deserves its own file.

This also prepares 32-bit memtest support.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 14:10:27 +02:00
Ingo Molnar
cd569ef5d6 Merge branch 'linus' into x86/urgent 2008-07-18 12:20:23 +02:00
Ingo Molnar
64d206d896 x86: rename CONFIG_NONPROMISC_DEVMEM to CONFIG_PROMISC_DEVMEM
Linus observed:

> The real bug is that we shouldn't have "double negatives", and
> certainly not negative config options. Making that "promiscuous
> /dev/mem" option a negated thing as a config option was bad.

right ... lets rename this option. There should never be a negation
in config options.

[ that reminds me of CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER, but that
  is for another commit ;-) ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 00:28:57 +02:00
Ingo Molnar
393d81aa02 Merge branch 'linus' into xen-64bit 2008-07-17 23:57:20 +02:00
Linus Torvalds
2b04be7e8a Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix asm/e820.h for userspace inclusion
  x86: fix numaq_tsc_disable
  x86: fix kernel_physical_mapping_init() for large x86 systems
2008-07-17 10:38:59 -07:00
Bob Moore
19d0cfe9dd ACPICA: Update DMAR and SRAT table definitions
Synchronized tables with current specifications.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-07-16 23:27:04 +02:00
Jack Steiner
e22146e610 x86: fix kernel_physical_mapping_init() for large x86 systems
Fix bug in kernel_physical_mapping_init() that causes kernel
page table to be built incorrectly for systems with greater
than 512GB of memory.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 18:27:36 +02:00
Eduardo Habkost
c1f2f09ef6 pvops-64: call paravirt_post_allocator_init() on setup_arch()
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:53:57 +02:00
Ingo Molnar
1a781a777b Merge branch 'generic-ipi' into generic-ipi-for-linus
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15 21:55:59 +02:00
Ingo Molnar
5806b81ac1 Merge branch 'auto-ftrace-next' into tracing/for-linus
Conflicts:

	arch/x86/kernel/entry_32.S
	arch/x86/kernel/process_32.c
	arch/x86/kernel/process_64.c
	arch/x86/lib/Makefile
	include/asm-x86/irqflags.h
	kernel/Makefile
	kernel/sched.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 16:11:52 +02:00
Yinghai Lu
9958e810f8 x86: max_low_pfn_mapped fix, #3
optimization: try to merge the range with same page size in
init_memory_mapping, to get the best possible linear mappings set up.

thus when GBpages is not there, we could do 2M pages.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:16 +02:00
Yinghai Lu
965194c15d x86: max_low_pfn_mapped fix, #2
tighten the boundary checks around max_low_pfn_mapped - dont overmap
nor undermap into holes.

also print out tseg for AMD cpus, for diagnostic purposes.
(this is an SMM area, and we split up any big mappings around that area)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 08:19:16 +02:00
Ingo Molnar
ae94b8075a Merge branch 'linus' into x86/core
Conflicts:

	arch/x86/mm/ioremap.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 07:29:02 +02:00
Yinghai Lu
f361a450bf x86: introduce max_low_pfn_mapped for 64-bit
when more than 4g memory is installed, don't map the big hole below 4g.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 10:24:04 +02:00
Yinghai Lu
f302a5bbe5 x86: reserve SLIT
save the SLIT, in case we are using fixmap to read it, and that fixmap
could be cleared by others.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 10:22:33 +02:00
Venkatesh Pallipadi
a361ee5cb8 x86: fix /dev/mem compatibility under PAT
Add ioremap_default(), which gives a sane mapping without worrying about
type conflicts.

Use it in /dev/mem read in place of ioremap(), as with ioremap(),
any mapping of the region (other than UC_MINUS) will cause a conflict
and failure of /dev/mem read.

Should address the vbetest failure reported at:

  http://bugzilla.kernel.org/show_bug.cgi?id=11057

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-10 10:09:59 +02:00
Yinghai Lu
7b16eb8930 x86: overmapped fix when 4K pages on tail, 64-bit
fix phys_pmd_init to make sure not to return bigger value than end.

also print out range split:1G/2M/4K in init_memory_mapping().

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-10 08:46:40 +02:00
Yinghai Lu
c2e6d65bce x86: not overmap more than the end of RAM in init_memory_mapping - 64bit
handle head and tail that are not aligned to big pages (2MB/1GB boundary).

with this patch, on system that support gbpages, change:

  last_map_addr: 1080000000 end: 1078000000

to:

  last_map_addr: 1078000000 end: 1078000000

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-09 10:43:26 +02:00
Yinghai Lu
49c980df55 x86: fix vmemmap printout check
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "Nick Piggin" <npiggin@suse.de>
Cc: "Mark McLoughlin" <markmc@redhat.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: "Eduardo Habkost" <ehabkost@redhat.com>
Cc: "Vegard Nossum" <vegard.nossum@gmail.com>
Cc: "Stephen Tweedie" <sct@redhat.com>
Cc: "Jeremy Fitzhardinge" <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-09 10:43:24 +02:00
Yinghai Lu
b50efd2a55 x86: introduce page_size_mask for 64bit
prepare for overmapped patch

also printout last_map_addr together with end

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-09 09:37:45 +02:00
Andrew Morton
5ed4273af8 arch/x86/mm/pgtable_32.c: remove unused variable `fixmaps'
arch/x86/mm/pgtable_32.c:144: warning: 'fixmaps' defined but not used

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-09 08:18:40 +02:00
Jack Steiner
3a9e189d69 x86: map UV chipset space - pagetable
Add boot-time function for creating additional 2MB page table entries for
mapping chipset specific cached/uncached ranges.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-09 07:43:23 +02:00
Yinghai Lu
6247943d8a x86: remove acpi_srat config v2
use ACPI_NUMA directly

and move srat_32.c to mm/

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 15:49:08 +02:00
Yinghai Lu
698839fe04 x86: remove have_arch_parse_srat -v2
we already have the same srat handling interface for 32bit.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 15:49:01 +02:00
Jeremy Fitzhardinge
5a654ba7a8 x86/cpa: use an undefined PTE bit for testing CPA
Rather than using _PAGE_GLOBAL - which not all CPUs support - to test
CPA, use one of the reserved-for-software-use PTE flags instead.  This
allows CPA testing to work on CPUs which don't support PGD.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:30 +02:00
Jeremy Fitzhardinge
ef5e94af16 x86_32: remove __PAGE_KERNEL(_EXEC)
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Older x86-32 processors do not support global mappings (PGD), so must
only use it if the processor supports it.

The _PAGE_KERNEL* flags always have _PAGE_KERNEL set, since logically
we always want it set.

This is OK even on processors which do not support PGD, since all
_PAGE flags are masked with __supported_pte_mask before being turned
into a real in-pagetable pte.  On 32-bit systems, __supported_pte_mask
is initialized to not contain _PAGE_GLOBAL, and it is then added if
the CPU is found to support it.

The x86-32 code used to use __PAGE_KERNEL/__PAGE_KERNEL_EXEC for this
purpose, but they're now redundant and can be removed.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:29 +02:00
Jeremy Fitzhardinge
574977a2ed x86_64/setup: unconditionally populate the pgd
When allocating a new pud, unconditionally populate the pgd (why did
we bother to create a new pud if we weren't going to populate it?).

This will only happen if the pgd slot was empty, since any existing
pud will be reused.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:27 +02:00
Yinghai Lu
cb95a13a8a x86: merge zones_sizes_init for numa and non numa on 32-bit
move out e820_register_active_regions from non numa zones_sizes_init()
and remove numa version zones_sizes_init().

and let 32 bit call remove_all_active_ranges() in setup_arch() directly
like 64-bit

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:22 +02:00
Yinghai Lu
996cf4438f x86: don't reallocate pgt for node0
kva ram already mapped right after away, so don't need to get that for low ram.
avoid wasting one copy of pgdat.

also add node id in early_res name in case we get it from find_e820_area.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:15 +02:00
Yinghai Lu
a04ad82d0b x86: fix init_memory_mapping over boundary, v4
use PMD_SHIFT to calculate boundary also adjust size for pre-allocated
table size

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:07 +02:00
Yinghai Lu
7482b0e962 x86: fix init_memory_mapping over boundary v3
some ram-end boundary only has page alignment, instead of 2M alignment.

v2: make init_memory_mapping more solid: start could be any value other than 0
v3: fix NON PAE by handling left over in kernel_physical_mapping

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:06 +02:00
Jeremy Fitzhardinge
22b45144f6 x86/paravirt: groundwork for 64-bit Xen support, fix #2
Ingo Molnar wrote:
> that fixed the build but now we've got a boot crash with this config:
>
>  time.c: Detected 2010.304 MHz processor.
>  spurious 8259A interrupt: IRQ7.
>  BUG: unable to handle kernel NULL pointer dereference at  0000000000000000
>  IP: [<0000000000000000>]
>  PGD 0
>  Thread overran stack, or stack corrupted
>  Oops: 0010 [1] SMP
>  CPU 0
>

I don't know if this will fix this bug, but it's definitely a bugfix.
It was trashing random pages by overwriting them with pagetables...

Don't trash a large pmd's data when mapping physical memory.
This is a bugfix for "x86_64: adjust mapping of physical pagetables
to work with Xen".

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:02 +02:00
Yinghai Lu
e7b3789524 x86: move fix mapping page table range early
do that in init_memory_mapping

also remove one init_ohci1394_dma_on_all_controllers

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:01 +02:00
Jeremy Fitzhardinge
8207c2570a x86: fix pte allocation in "x86: introduce init_memory_mapping for 32bit"
The patch "x86: introduce init_memory_mapping for 32bit" does not allocate
enough space for PTEs if the CPU does not implement PSE.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:15:58 +02:00
Eduardo Habkost
0814e0bace x86, 64-bit: split set_pte_vaddr()
We will need to set a pte on l3_user_pgt. Extract set_pte_vaddr_pud()
from set_pte_vaddr(), that will accept the l3 page table as parameter.

This change should be a no-op for existing code.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:09 +02:00
Jeremy Fitzhardinge
7c934d3990 x86, 64-bit: create small vmemmap mappings if PSE not available
If PSE is not available, then fall back to 4k page mappings for the
vmemmap area.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:08 +02:00
Jeremy Fitzhardinge
4f9c11dd49 x86, 64-bit: adjust mapping of physical pagetables to work with Xen
This makes a few of changes to the construction of the initial
pagetables to work better with paravirt_ops/Xen.  The main areas
are:

 1. Support non-PSE mapping of memory, since Xen doesn't currently
    allow 2M pages to be mapped in guests.

 2. Make sure that the ioremap alias of all pages are dropped before
    attaching the new page to the pagetable.  This avoids having
    writable aliases of pagetable pages.

 3. Preserve existing pagetable entries, rather than overwriting.  Its
    possible that a fair amount of pagetable has already been constructed,
    so reuse what's already in place rather than ignoring and overwriting it.

The algorithm relies on the invariant that any page which is part of
the kernel pagetable is itself mapped in the linear memory area.  This
way, it can avoid using ioremap on a pagetable page.

The invariant holds because it maps memory from low to high addresses,
and also allocates memory from low to high.  Each allocated page can
map at least 2M of address space, so the mapped area will always
progress much faster than the allocated area.  It relies on the early
boot code mapping enough pages to get started.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:07 +02:00
Jeremy Fitzhardinge
d8d5900ef8 x86: preallocate and prepopulate separately
Jan Beulich points out that vmalloc_sync_all() assumes that the
kernel's pmd is always expected to be present in the pgd.  The current
pgd construction code will add the pgd to the pgd_list before its pmds
have been pre-populated, thereby making it visible to
vmalloc_sync_all().

However, because pgd_prepopulate_pmd also does the allocation, it may
block and cannot be done under spinlock.

The solution is to preallocate the pmds out of the spinlock, then
populate them while holding the pgd_list lock.

This patch also pulls the pmd preallocation and mop-up functions out
to be common, assuming that the compiler will generate no code for
them when PREALLOCTED_PMDS is 0.  Also, there's no need for pgd_ctor
to clear the pgd again, since it's allocated as a zeroed page.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jan Beulich <jbeulich@novell.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:02 +02:00
Jeremy Fitzhardinge
eba0045ff8 x86/paravirt: add a pgd_alloc/free hooks
Add hooks which are called at pgd_alloc/free time.  The pgd_alloc hook
may return an error code, which if non-zero, causes the pgd allocation
to be failed.  The hooks may be used to allocate/free auxillary
per-pgd information.

also fix:

> * Ingo Molnar <mingo@elte.hu> wrote:
>
>  include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
>  include/asm/pgalloc.h:14: error: parameter name omitted
>  arch/x86/kernel/entry_64.S: In file included from
>  arch/x86/kernel/traps_64.c:51:include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
>  include/asm/pgalloc.h:14: error: parameter name omitted

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:01 +02:00
Jeremy Fitzhardinge
67350a5c45 x86: simplify vmalloc_sync_all
vmalloc_sync_all() is only called from register_die_notifier and
alloc_vm_area.  Neither is on any performance-critical paths, so
vmalloc_sync_all() itself is not on any hot paths.

Given that the optimisations in vmalloc_sync_all add a fair amount of
code and complexity, and are fairly hard to evaluate for correctness,
it's better to just remove them to simplify the code rather than worry
about its absolute performance.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:59 +02:00
Yinghai Lu
c987d12f84 x86: remove end_pfn in 64bit
and use max_pfn directly.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:38 +02:00
Yinghai Lu
d86623a0d5 x86: add table_top check for alloc_low_page in 64 bit
that range is from find_e820_area, so don't try to use end_pfn to see
if out of boundary...use table_top instead to avoid possible strange
result while cross the boundary...

also change early_printk to printk, because init_memory_mapping is after
early param parsing, and console=uart8250 already working at that time.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:36 +02:00
Yinghai Lu
1a0db38e5f x86: get max_pfn_mapped in init_memory_mapping
so don't shift that in the loop

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:35 +02:00
Yinghai Lu
3a58a2a6c8 x86: introduce init_memory_mapping for 32bit #3
move kva related early backto initmem_init for numa32

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:33 +02:00
Yinghai Lu
cfb0e53b05 x86: introduce init_memory_mapping for 32bit #2
moving relocate_initrd early

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:32 +02:00
Yinghai Lu
4e29684c40 x86: introduce init_memory_mapping for 32bit #1
... so can we use mem below max_low_pfn earlier.

this allows us to move several functions more early instead of waiting
to after paging_init.

That includes moving relocate_initrd() earlier in the bootup, and kva
related early setup done in initmem_init. (in followup patches)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:32 +02:00
Jeremy Fitzhardinge
4583ed514e x86, 64-bit: unify early_ioremap
The 32-bit early_ioremap will work equally well for 64-bit, so just use it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:28 +02:00
Jeremy Fitzhardinge
bb23e403e5 x86, 64-bit: use p??_populate() to attach pages to pagetable
Use the _populate() functions to attach new pages to a pagetable, to
make sure the right paravirt_ops calls get called.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:27 +02:00
Yinghai Lu
11cd0bc140 x86: move some func calling from setup_arch to paging_init
those function depend on paging setup pgtable, so they could access
the ram in bootmem region but just get mapped.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:24 +02:00
Yinghai Lu
c09434571d x86: numa32 pfn print out using hex instead
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:23 +02:00
Yinghai Lu
6a07a0edac x86: fix compile warning in init_64.c
len is long and ret is only for NUMA

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:23 +02:00
Yinghai Lu
346cafecde x86: clean up min_low_pfn
for 32bit
we already had early_res support, so don't need to track min_low_pfn.
keep it to 0 always.

also use init_bootmem_node instead of init_bootmem, so don't touch
min_low_pfn.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:21 +02:00
Yinghai Lu
2ec65f8b89 x86: clean up using max_low_pfn on 32-bit
so that max_low_pfn is not changed after it is set.
so we can move that early and out of initmem_init.

could call find_low_pfn_range just after max_pfn is set.

also could move reserve_initrd out of setup_bootmem_allocator

so 32bit is more like 64bit.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:20 +02:00
Yinghai Lu
bef1568d97 x86: move reservetop and vmalloc parsing to pgtable_32.c
also change reserve_top_address to __init attibute

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:19 +02:00
Yinghai Lu
90d967e0ef x86: move find_max_low_pfn to init_32.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:18 +02:00
Yinghai Lu
225c37d71b x86: introduce reserve_initrd
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:16 +02:00
Yinghai Lu
b2ac82a090 x86: introduce initmem_init for 32 bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:15 +02:00
Yinghai Lu
1f75d7e32e x86: introduce initmem_init for 64 bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:14 +02:00
Yinghai Lu
d52d53b8a5 RFC x86: try to remove arch_get_ram_range
want to remove arch_get_ram_range, and use early_node_map instead.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:27 +02:00
Jeremy Fitzhardinge
a7bf0bd5e6 build: add __page_aligned_data and __page_aligned_bss
Making a variable page-aligned by using
__attribute__((section(".data.page_aligned"))) is fragile because if
sizeof(variable) is not also a multiple of page size, it leaves
variables in the remainder of the section unaligned.

This patch introduces two new qualifiers, __page_aligned_data and
__page_aligned_bss to set the section *and* the alignment of
variables.  This makes page-aligned variables more robust because the
linker will make sure they're aligned properly.  Unfortunately it
requires *all* page-aligned data to use these macros...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:13 +02:00
Ingo Molnar
6236af82d8 Merge branch 'x86/fixmap' into x86/devel
Conflicts:

	arch/x86/mm/init_64.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:24:29 +02:00
Ingo Molnar
2b4fa851b2 Merge branch 'x86/numa' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/e820.c
	arch/x86/kernel/efi_64.c
	arch/x86/kernel/mpparse.c
	arch/x86/kernel/setup.c
	arch/x86/kernel/setup_32.c
	arch/x86/mm/init_64.c
	include/asm-x86/proto.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:59:23 +02:00
Bernhard Walle
3fd052b1b4 x86: add flags parameter to reserve_bootmem_generic()
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967.

It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.

The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:49:49 +02:00
Thomas Gleixner
886533a3e3 x86: numa_64.c fix shadowed variable
sparse mutters:
arch/x86/mm/numa_64.c:195:27: warning: symbol 'end_pfn' shadows an earlier one

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:31:29 +02:00
Thomas Gleixner
864fc31ea5 x86: numa_64.c make local variables static
plat_node_bdata, cmdline, nodemap_addr, nodemap_size are local to
numa_64.c. Make them static

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:31:28 +02:00
Mike Travis
9f248bde9d x86: remove the static 256k node_to_cpumask_map
* Consolidate node_to_cpumask operations and remove the 256k
    byte node_to_cpumask_map.  This is done by allocating the
    node_to_cpumask_map array after the number of possible nodes
    (nr_node_ids) is known.

  * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
    been increased.  It now shows faults when calling node_to_cpumask()
    and node_to_cpumask_ptr().

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:24 +02:00
Mike Travis
23ca4bba3e x86: cleanup early per cpu variables/accesses v4
* Introduce a new PER_CPU macro called "EARLY_PER_CPU".  This is
    used by some per_cpu variables that are initialized and accessed
    before there are per_cpu areas allocated.

    ["Early" in respect to per_cpu variables is "earlier than the per_cpu
    areas have been setup".]

    This patchset adds these new macros:

	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)
	EXPORT_EARLY_PER_CPU_SYMBOL(_name)
	DECLARE_EARLY_PER_CPU(_type, _name)

	early_per_cpu_ptr(_name)
	early_per_cpu_map(_name, _idx)
	early_per_cpu(_name, _cpu)

    The DEFINE macro defines the per_cpu variable as well as the early
    map and pointer.  It also initializes the per_cpu variable and map
    elements to "_initvalue".  The early_* macros provide access to
    the initial map (usually setup during system init) and the early
    pointer.  This pointer is initialized to point to the early map
    but is then NULL'ed when the actual per_cpu areas are setup.  After
    that the per_cpu variable is the correct access to the variable.

    The early_per_cpu() macro is not very efficient but does show how to
    access the variable if you have a function that can be called both
    "early" and "late".  It tests the early ptr to be NULL, and if not
    then it's still valid.  Otherwise, the per_cpu variable is used
    instead:

	#define early_per_cpu(_name, _cpu) 			\
		(early_per_cpu_ptr(_name) ?			\
			early_per_cpu_ptr(_name)[_cpu] :	\
			per_cpu(_name, _cpu))

    A better method is to actually check the pointer manually.  In the
    case below, numa_set_node can be called both "early" and "late":

	void __cpuinit numa_set_node(int cpu, int node)
	{
	    int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);

	    if (cpu_to_node_map)
		    cpu_to_node_map[cpu] = node;
	    else
		    per_cpu(x86_cpu_to_node_map, cpu) = node;
	}

  * Add a flag "arch_provides_topology_pointers" that indicates pointers
    to topology cpumask_t maps are available.  Otherwise, use the function
    returning the cpumask_t value.  This is useful if cpumask_t set size
    is very large to avoid copying data on to/off of the stack.

  * The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while
    the non-debug case has been optimized a bit.

  * Remove an unreferenced compiler warning in drivers/base/topology.c

  * Clean up #ifdef in setup.c

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:20 +02:00
Ingo Molnar
3de352bbd8 Merge branch 'x86/mpparse' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/io_apic_32.c
	arch/x86/kernel/setup_64.c
	arch/x86/mm/init_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:14:58 +02:00
Yinghai Lu
a4caa18efe x86: fix compiling when CONFIG_X86_MPPARSE is not set
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:39:19 +02:00
Yinghai Lu
064d25f120 x86: merge setup_memory_map with e820
... and kill e820_32/64.c and e820_32/64.h

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:25 +02:00
Yinghai Lu
cc9f7a0ccf x86: kill bad_ppro
so don't punish all other cpus without that problem when init highmem

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:19 +02:00
Yinghai Lu
b5bc6c0e55 x86, mm: use add_highpages_with_active_regions() for high pages init v2
use early_node_map to init high pages, so we can remove page_is_ram() and
page_is_reserved_early() in the big loop with add_one_highpage

also remove page_is_reserved_early(), it is not needed anymore.

v2: fix the build of other platforms

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:37:25 +02:00
Yinghai Lu
cc1050bafe x86: replace shrink_active_range() with remove_active_range()
in case we have kva before ramdisk on a node, we still need to use
those ranges.

v2: reserve_early kva ram area, in case there are holes in highmem, to avoid
    those area could be treat as free high pages.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:29 +02:00
Yinghai Lu
d2dbf34332 x86: clean up reserve_bootmem_generic() and port it to 32-bit
1. add reserve_bootmem_generic for 32bit
2. change len to unsigned long
3. make early_res_to_bootmem to use it

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:17 +02:00
Bernhard Walle
8b2ef1d728 x86: add flags parameter to reserve_bootmem_generic()
This patch adds a 'flags' parameter to reserve_bootmem_generic() like it
already has been added in reserve_bootmem() with commit
72a7fe3967.

It also changes all users to use BOOTMEM_DEFAULT, which doesn't effectively
change the behaviour. Since the change is x86-specific, I don't think it's
necessary to add a new API for migration. There are only 4 users of that
function.

The change is necessary for the next patch, using reserve_bootmem_generic()
for crashkernel reservation.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:34:54 +02:00
Ingo Molnar
896395c290 Merge branch 'linus' into tmp.x86.mpparse.new 2008-07-08 10:32:56 +02:00
Ingo Molnar
58cf35228f Merge branches 'x86/mmio', 'x86/delay', 'x86/idle', 'x86/oprofile', 'x86/debug', 'x86/ptrace' and 'x86/amd-iommu' into x86/devel 2008-07-08 09:46:15 +02:00
Ingo Molnar
6924d1ab8b Merge branches 'x86/numa-fixes', 'x86/apic', 'x86/apm', 'x86/bitops', 'x86/build', 'x86/cleanups', 'x86/cpa', 'x86/cpu', 'x86/defconfig', 'x86/gart', 'x86/i8259', 'x86/intel', 'x86/irqstats', 'x86/kconfig', 'x86/ldt', 'x86/mce', 'x86/memtest', 'x86/pat', 'x86/ptemask', 'x86/resumetrace', 'x86/threadinfo', 'x86/timers', 'x86/vdso' and 'x86/xen' into x86/devel 2008-07-08 09:16:56 +02:00
Jiri Slaby
684eb0163a x86_64: use PAGE_OFFSET in dump_pagetables
Use PAGE_OFFSET macro instead of using 0xffff810000000000UL directly.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: hpa@zytor.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 08:12:06 +02:00
Thomas Gleixner
6e92a5a615 x86: add sparse annotations to ioremap
arch/x86/mm/ioremap.c:308:11: error: incompatible types in comparison expression (different address spaces)

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 08:12:05 +02:00
Thomas Gleixner
65280e613f x86: janitor CPA statistics patch
1) Remove __meminit from update_pages_count. It is used inside
split_pages()

2) Make the code depend on PROC_FS. Doing statistics for nothing is
useless and not adding useless code is nice to the Linux tiny folks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 08:12:05 +02:00
Andi Kleen
ce0c0e50f9 x86, generic: CPA add statistics about state of direct mapping v4
Add information about the mapping state of the direct mapping to
/proc/meminfo. I chose /proc/meminfo because that is where all the other
memory statistics are too and it is a generally useful metric even
outside debugging situations. A lot of split kernel pages means the
kernel will run slower.

This way we can see how many large pages are really used for it and how
many are split.

Useful for general insight into the kernel.

v2: Add hotplug locking to 64bit to plug a very obscure theoretical race.
    32bit doesn't need it because it doesn't support hotadd for lowmem.
    Fix some typos
v3: Rename dpages_cnt
    Add CONFIG ifdef for count update as requested by tglx
    Expand description
v4: Fix stupid bugs added in v3
    Move update_page_count to pageattr.c

Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 08:11:45 +02:00
Ingo Molnar
4d51c7587b Merge branch 'tracing/mmiotrace-mergefixups' into tracing/mmiotrace 2008-07-07 08:08:19 +02:00
Ingo Molnar
d763d5edf9 Merge branch 'linus' into tracing/mmiotrace 2008-07-07 08:07:35 +02:00
Gustavo Fernando Padovan
95c60b08c6 x86: remove unnecessary #ifdef CONFIG_X86_32...#else
Remove the #ifdef conditional because this comparison is already done in
user_mode_vm().

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Cc: akpm@osdl.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-03 15:09:10 +02:00
Andrew Morton
27df66a406 arch/x86/mm/init_64.c: early_memtest(): fix types
fix this warning:

arch/x86/mm/init_64.c: In function 'early_memtest':
arch/x86/mm/init_64.c:524: warning: passing argument 2 of 'find_e820_area_size' from incompatible pointer type

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-03 10:15:51 +02:00
Vegard Nossum
f294a8ce21 x86: small unifications of address printing
'man 3 printf' tells me that %p should be printed as if by %#x, but
this is not true for the kernel, which does not use the '0x' prefix
for the %p conversion specifier.

A small cast to (void *) is also prettier than #ifdef/#else/#endif.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-01 16:21:41 +02:00
Daniel J Blueman
0b1faeef5f x86: section/warning fixes
WARNING: arch/x86/mm/built-in.o(.text+0x3a1): Section mismatch in
reference from the function set_pte_phys() to the function
.init.text:spp_getpage()
The function set_pte_phys() references
the function __init spp_getpage().
This is often because set_pte_phys lacks a __init
annotation or the annotation of spp_getpage is wrong.

arch/x86/mm/init_64.c: In function 'early_memtest':
arch/x86/mm/init_64.c:520: warning: passing argument 2 of
'find_e820_area_size' from incompatible pointer type

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-26 15:33:09 +02:00
Jens Axboe
15c8b6c1aa on_each_cpu(): kill unused 'retry' parameter
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:38 +02:00
Ingo Molnar
5ce001b0e5 Merge branch 'linus' into stackprotector 2008-06-25 12:27:29 +02:00
Andreas Herrmann
33af9039cb x86: pat.c final cleanup of loop body in reserve_memtype
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:59 +02:00
Andreas Herrmann
64fe44c38b x86: pat.c introduce function to check for conflicts with existing memtypes
... to strip down loop body in reserve_memtype.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:57 +02:00
Andreas Herrmann
f6887264de x86: pat.c consolidate list_add handling in reserve_memtype
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:55 +02:00
Andreas Herrmann
3e9c83b309 x86: pat.c consolidate error/debug messages in reserve_memtype
... and move last debug message out of locked section.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:51 +02:00
Andreas Herrmann
69e26be9b1 x86: pat.c more trivial changes
- add BUG statement to catch invalid start and end parameters
 - No need to track the actual type in both req_type and actual_type --
   keep req_type unchanged.
 - removed (IMHO) superfluous comments

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:50 +02:00
Andreas Herrmann
ac97991ec9 x86: pat.c choose more crisp variable names
- parse/ml => entry (within list_for_each and friends)
 - new_entry => new
 - ret_type => new_type (to avoid confusion with req_type)

(... to make it more readable...)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:49 +02:00
Andreas Herrmann
bcc643dc28 x86: introduce macro to check whether an address range is in the ISA range
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:48 +02:00
Ingo Molnar
a1d5a8691f x86: unify __set_fixmap, fix
fix build failure:

 arch/x86/mm/pgtable.c:280: warning: ‘enum fixed_addresses’ declared inside parameter list
 arch/x86/mm/pgtable.c:280: warning: its scope is only this definition or declaration, which is probably not what you want
 arch/x86/mm/pgtable.c:280: error: parameter 1 (‘idx’) has incomplete type

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:36:57 +02:00
Jeremy Fitzhardinge
aeaaa59c7e x86/paravirt/xen: add set_fixmap pv_mmu_ops
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:56 +02:00
Jeremy Fitzhardinge
d494a96125 x86: implement set_pte_vaddr
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:54 +02:00
Jeremy Fitzhardinge
7c7e6e07e2 x86: unify __set_fixmap
In both cases, I went with the 32-bit behaviour.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:51 +02:00
Jiri Slaby
a1bf9631be x86, MM: virtual address debug, v2
I've removed the test from phys_to_nid and made a function from __phys_addr
only when the debugging is enabled (on x86_32).

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: tglx@linutronix.de
Cc: hpa@zytor.com
Cc: Mike Travis <travis@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <x86@kernel.org>
Cc: linux-mm@kvack.org
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 13:32:22 +02:00
Jiri Slaby
59ea746337 MM: virtual address debug
Add some (configurable) expensive sanity checking to catch wrong address
translations on x86.

- create linux/mmdebug.h file to be able include this file in
  asm headers to not get unsolvable loops in header files
- __phys_addr on x86_32 became a function in ioremap.c since
  PAGE_OFFSET, is_vmalloc_addr and VMALLOC_* non-constasts are undefined
  if declared in page_32.h
- add __phys_addr_const for initializing doublefault_tss.__cr3

Tested on 386, 386pae, x86_64 and x86_64 numa=fake=2.

Contains Andi's enable numa virtual address debug patch.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 13:31:42 +02:00
Andreas Herrmann
dd0c7c4903 x86: shrink pat_x_mtrr_type to its essentials
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 12:57:40 +02:00
Hugh Dickins
6cf514fce1 x86: PAT: make pat_x_mtrr_type() more readable
Clean up over-complications in pat_x_mtrr_type().
And if reserve_memtype() ignores stray req_type bits when
pat_enabled, it's better to mask them off when not also.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 11:51:08 +02:00
Ingo Molnar
faeca31d06 Merge branch 'linus' into x86/pat 2008-06-16 11:20:28 +02:00
Ingo Molnar
064a32d82c Merge branch 'linus' into x86/memtest 2008-06-16 11:19:53 +02:00
Ingo Molnar
1791a78c0b Merge branch 'linus' into x86/cleanups 2008-06-16 11:17:50 +02:00
Ingo Molnar
cb9aa97c21 Merge branch 'linus' into tracing/mmiotrace-mergefixups 2008-06-16 11:16:46 +02:00
Linus Torvalds
cbfa66b88d Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
  x86, lockdep: fix "WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()"
  x86: fix an incompatible pointer type warning on 64-bit compilations
  x86: fix lockdep warning during suspend-to-ram
  x86: fix unused variable 'loops' warning in arch/x86/boot/a20.c
  Revert "x86: fix ioapic bug again"
  x86: fix asm warning in head_32.S
  x86: fix endless page faults in mount_block_root for Linux 2.6
  geode: fix modular build
2008-06-12 12:55:32 -07:00
Kevin Winchester
f8a45704f5 x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
Changed the call to find_e820_area_size to pass u64 instead of unsigned long.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:36:23 +02:00
David Howells
eb53e9f3ea x86: fix an incompatible pointer type warning on 64-bit compilations
Fix an incompatible pointer type warning on x86_64 compilations.
early_memtest() is passing a u64* to find_e820_area_size() which is expecting
an unsigned long.  Change t_start and t_size to unsigned long as those are
also 64-bit types on x88_64.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:15 +02:00
Henry Nestler
b29c701dea x86: fix endless page faults in mount_block_root for Linux 2.6
Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-12 21:26:07 +02:00
Andreas Herrmann
499f8f84b8 x86: rename pat_wc_enabled to pat_enabled
BTW, what does pat_wc_enabled stand for? Does it mean
"write-combining"?

Currently it is used to globally switch on or off PAT support.
Thus I renamed it to pat_enabled.
I think this increases readability (and hope that I didn't miss
something).

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:27 +02:00
Andreas Herrmann
cd7a4e936d x86: PAT: fixed checkpatch errors (and whitespaces)
x86: PAT: fixed checkpatch errors (and whitespaces)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:24 +02:00
Andreas Herrmann
97cfab6ac4 x86: PAT: fix ambiguous paranoia check in pat_init()
Starting with commit 8d4a430085 (x86:
cleanup PAT cpu validation) the PAT CPU feature flag is not cleared
anymore. Now the error message

  "PAT enabled, but CPU feature cleared"

in pat_init() is misleading.

Furthermore the current code does not check for existence of the PAT
CPU feature flag if a CPU is whitelisted in validate_pat_support.

This patch clears pat_wc_enabled if boot CPU has no PAT feature flag
and adapts the paranoia check.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:22 +02:00
Venki Pallipadi
c26421d019 x86: fix Xorg crash with xf86MapVidMem error
Clarify the usage of mtrr_lookup() in PAT code, and to make PAT code
resilient to mtrr lookup problems.

Specifically, pat_x_mtrr_type() is restructured to highlight, under what
conditions we look for mtrr hint. pat_x_mtrr_type() uses a default type
when there are any errors in mtrr lookup (still maintaining the pat
consistency). And, reserve_memtype() highlights its usage ot mtrr_lookup
for request type of '-1' and also defaults in a sane way on any mtrr
lookup failure.

pat.c looks at mtrr type of a range to get a hint on what mapping type
to request when user/API: (1) hasn't specified any type (/dev/mem
mapping) and we do not want to take performance hit by always mapping
UC_MINUS. This will be the case for /dev/mem mappings used to map BIOS
area or ACPI region which are WB'able. In this case, as long as MTRR is
not WB, PAT will request UC_MINUS for such mappings.

(2) user/API requests WB mapping while in reality MTRR may have UC or
WC. In this case, PAT can map as WB (without checking MTRR) and still
effective type will be UC or WC. But, a subsequent request to map same
region as UC or WC may fail, as the region will get trackked as WB in
PAT list. Looking at MTRR hint helps us to track based on effective type
rather than what user requested. Again, here mtrr_lookup is only used as
hint and we fallback to WB mapping (as requested by user) as default.

In both cases, after using the mtrr hint, we still go through the
memtype list to make sure there are no inconsistencies among multiple
users.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Rufus & Azrael <rufus-azrael@numericable.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:03:55 +02:00
Fenghua Yu
39b8931b5c ACPI: handle invalid ACPI SLIT table
This is a SLIT sanity checking patch.  It moves slit_valid() function to
generic ACPI code and does sanity checking for both x86 and ia64.  It sets up
node_distance with LOCAL_DISTANCE and REMOTE_DISTANCE when hitting invalid
SLIT table on ia64.  It also cleans up unused variable localities in
acpi_parse_slit() on x86.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-06-11 19:13:46 -04:00
Yinghai Lu
9043f00796 x86, numa, 32-bit: use find_e820_area() to find KVA RAM on node
don't assume we can use RAM near the end of every node.
Esp systems that have few memory and they could have
kva address and kva RAM all below max_low_pfn.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:52 +02:00
Yinghai Lu
cc1a9d86ce mm, x86: shrink_active_range() should check all
Now we are using register_e820_active_regions() instead of
add_active_range() directly. So end_pfn could be different between the
value in early_node_map to node_end_pfn.

So we need to make shrink_active_range() smarter.

shrink_active_range() is a generic MM function in mm/page_alloc.c but
it is only used on 32-bit x86. Should we move it back to some file in
arch/x86?

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:44 +02:00
Huang, Ying
d0ec2c6f2c x86: reserve highmem pages via reserve_early
This patch makes early reserved highmem pages become reserved
pages. This can be used for highmem pages allocated by bootloader such
as EFI memory map, linked list of setup_data, etc.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Andrew Morton
be524fb960 x86: section mismatch fix
Fix this:

 WARNING: vmlinux.o(.text+0x114bb): Section mismatch in reference from
 the function nopat() to the function .cpuinit.text:pat_disable()
 The function nopat() references
 the function __cpuinit pat_disable().
 This is often because nopat lacks a __cpuinit
 annotation or the annotation of pat_disable is wrong.

Reported-by: "Fabio Comolli" <fabio.comolli@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Venki Pallipadi
282c454cd3 x86: fix Xorg crash with xf86MapVidMem error
Clarify the usage of mtrr_lookup() in PAT code, and to make PAT code
resilient to mtrr lookup problems.

Specifically, pat_x_mtrr_type() is restructured to highlight, under what
conditions we look for mtrr hint. pat_x_mtrr_type() uses a default type
when there are any errors in mtrr lookup (still maintaining the pat
consistency). And, reserve_memtype() highlights its usage ot mtrr_lookup
for request type of '-1' and also defaults in a sane way on any mtrr
lookup failure.

pat.c looks at mtrr type of a range to get a hint on what mapping type
to request when user/API: (1) hasn't specified any type (/dev/mem
mapping) and we do not want to take performance hit by always mapping
UC_MINUS. This will be the case for /dev/mem mappings used to map BIOS
area or ACPI region which are WB'able. In this case, as long as MTRR is
not WB, PAT will request UC_MINUS for such mappings.

(2) user/API requests WB mapping while in reality MTRR may have UC or
WC. In this case, PAT can map as WB (without checking MTRR) and still
effective type will be UC or WC. But, a subsequent request to map same
region as UC or WC may fail, as the region will get trackked as WB in
PAT list. Looking at MTRR hint helps us to track based on effective type
rather than what user requested. Again, here mtrr_lookup is only used as
hint and we fallback to WB mapping (as requested by user) as default.

In both cases, after using the mtrr hint, we still go through the
memtype list to make sure there are no inconsistencies among multiple
users.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Rufus & Azrael <rufus-azrael@numericable.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Kevin Winchester
511631011d x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
Changed the call to find_e820_area_size to pass u64 instead of unsigned long.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Hugh Dickins
2884f110d5 x86: fix bad pmd ffff810000207xxx(9090909090909090)
OGAWA Hirofumi and Fede have reported rare pmd_ERROR messages:
mm/memory.c:127: bad pmd ffff810000207xxx(9090909090909090).

Initialization's cleanup_highmap was leaving alignment filler
behind in the pmd for MODULES_VADDR: when vmalloc's guard page
would occupy a new page table, it's not allocated, and then
module unload's vfree hits the bad 9090 pmd entry left over.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Ingo Molnar
226e9a93a2 x86: ioremap fix failing nesting check
Mika Kukkonen noticed that the nesting check in early_iounmap() is not
actually done.

Reported-by: Mika Kukkonen <mikukkon@srv1-m700-lanp.koti>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: torvalds@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: mikukkon@iki.fi
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:47 +02:00
Yinghai Lu
7b2a0a6c48 x86: make 32-bit use e820_register_active_regions()
this way 32-bit is more similar to 64-bit, and smarter e820 and numa.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:58 +02:00
Yinghai Lu
84b56fa46b x86, numa, 32-bit: make sure get we kva space
when 1/3 user/kernel split is used, and less memory is installed, or if
we have a big hole below 4g, max_low_pfn is still using 3g-128m

try to go down from max_low_pfn until we get it. otherwise will panic.

need to make 32-bit code to use register_e820_active_regions ... later.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:55 +02:00
Yinghai Lu
6af61a7614 x86: clean up max_pfn_mapped usage - 32-bit
on 32-bit in head_32.S after initial page table is done, we get initial
max_pfn_mapped, and then kernel_physical_mapping_init will give us
a final one.

We need to use that to make sure find_e820_area will get valid addresses
for boot_map and for NODE_DATA(0) on numa32.

XEN PV and lguest may need to assign max_pfn_mapped too.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:28 +02:00
Yinghai Lu
287572cb38 x86, numa, 32-bit: avoid clash between ramdisk and kva
use find_e820_area to get address space...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:28 +02:00
Yinghai Lu
e8c27ac919 x86, numa, 32-bit: print out debug info on all kvas
also fix the print out of node_remap_end_vaddr

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:26 +02:00
Yinghai Lu
0596152388 x86, 32-bit: change propagate_e820_map() back to find_max_pfn()
we don't need to call memory_present that early.
numa and sparse will call memory_present later and might
even fail, it will call memory_present for the full range.

also for sparse it will call alloc_bootmem ... before we set up bootmem.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu
b66cd72073 x86: set node_remap_size[0] in fallback path
... otherwise alloc_remap will not get node_mem_map from kva area, and
alloc_node_mem_map has to alloc_bootmem_node to get mem_map.
It will use two low address copies ...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu
ba924c81dd x86, numa, 32-bit: increase max_elements to 1024
so every element will represent 64M instead of 256M.

AMD opteron could have HW memory hole remapping, so could have
[0, 8g + 64M) on node0. Reduce element size to 64M to keep that on node 0

Later we need to use find_e820_area() to allocate memory_node_map like
on 64-bit. But need to move memory_present out of populate_mem_map...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:24 +02:00
Ingo Molnar
471b3c1b01 x86, numaq 32-bit: build fix
fix:

drivers/built-in.o: In function `acpi_numa_init':
: undefined reference to `acpi_numa_arch_fixup'

which can happen with ACPI && NUMAQ.
2008-06-03 12:49:06 +02:00
Ingo Molnar
cf3d0cb8a1 x86: 32-bit numa, build fix
on Summit it's possible to have:

 CONFIG_ACPI_SRAT=y
 CONFIG_HAVE_ARCH_PARSE_SRAT=y

in which case acpi.h defines the acpi_numa_slit_init() and
acpi_numa_processor_affinity_init() methods as a macro.
2008-06-03 11:45:09 +02:00
Ingo Molnar
b28852d670 x86: add dummy acpi_numa_processor_affinity_init() implementation on 32-bit
allow CONFIG_ACPI_NUMA builds to succeed.
2008-06-03 10:23:53 +02:00
Ingo Molnar
2772f54bf3 x86: add acpi_numa_slit_init() dummy implementation on 32-bit
allow CONFIG_ACPI_NUMA builds to succeed on 32-bit.
2008-06-03 10:23:48 +02:00
Yinghai Lu
a5481280b2 x86: extend e820 early_res support 32bit -fix #5
reserve early numa kva, so it will not clash with new RAMDISK

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:53 +02:00
Yinghai Lu
163872950d x86: extend e820 early_res support 32bit -fix #4
reserve_early pgdata for 32bit numa

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:50 +02:00
Eric Sandeen
7c9f8861e6 stackprotector: use canary at end of stack to indicate overruns at oops time
(Updated with a common max-stack-used checker that knows about
the canary, as suggested by Joe Perches)

Use a canary at the end of the stack to clearly indicate
at oops time whether the stack has ever overflowed.

This is a very simple implementation with a couple of
drawbacks:

1) a thread may legitimately use exactly up to the last
   word on the stack

 -- but the chances of doing this and then oopsing later seem slim

2) it's possible that the stack usage isn't dense enough
   that the canary location could get skipped over

 -- but the worst that happens is that we don't flag the overrun
 -- though this happens fairly often in my testing :(

With the code in place, an intentionally-bloated stack oops might
do:

BUG: unable to handle kernel paging request at ffff8103f84cc680
IP: [<ffffffff810253df>] update_curr+0x9a/0xa8
PGD 8063 PUD 0
Thread overran stack or stack corrupted
Oops: 0000 [1] SMP
CPU 0
...

... unless the stack overrun is so bad that it corrupts some other
thread.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-26 16:15:33 +02:00
Paul Jackson
e9197bf011 x86 boot: remove some unused extern function declarations
Remove three extern declarations for routines
that don't exist.  Fix a typo in a comment.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:10 +02:00
Ingo Molnar
668a6c3654 - fix mmioftrace + rcu merge interaction
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 09:51:43 +02:00
Thomas Gleixner
48e2395722 x86: fixup the fallout of the bitops changes
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:36 +02:00
Jan Beulich
4e50e62ce5 x86: eliminate duplicate consistency checks in init_32.c
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:30 +02:00
Thomas Gleixner
6a1673ae22 x86: make memory_add_physaddr_to_nid depend on MEMORY_HOTPLUG
memory_add_physaddr_to_nid() is only used in the
CONFIG_MEMORY_HOTPLUG_SPARSE || CONFIG_ACPI_HOTPLUG_MEMORY case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:29 +02:00
Thomas Gleixner
11034d5597 x86: init64.c include initrd.h
free_initrd_mem needs a prototype, which is in linux/initrd.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:27 +02:00
Thomas Gleixner
55d4f22abc x86: k8topology cleanup variable declarations
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:26 +02:00
Thomas Gleixner
d34c08958f x86: k8topology fix shadow variable
sparse mutters:
arch/x86/mm/k8topology_64.c:108:7: warning: symbol 'nodeid' shadows an earlier one

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:26 +02:00
Thomas Gleixner
0eafe234a2 x86: k8topology add missing header
k8_scan_nodes is global and needs a prototype. Add the header file
which contains it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:26 +02:00
Andrew Morton
46dd98a5c0 arch/x86/mm/pat.c: use boot_cpu_has()
arch/x86/mm/pat.c: In function 'phys_mem_access_prot_allowed':
arch/x86/mm/pat.c:526: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:526: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:527: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:527: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:528: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:528: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:529: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:529: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type

Don't open-code test_bit() on a __u32

Cc: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:50:25 +02:00
Pekka Paalanen
790e2a290b x86 mmiotrace: page level is unsigned
Fixes some sparse warnings.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:27:47 +02:00
Pekka Paalanen
a50445d76c mmiotrace: rename kmmio_probe::user_data to :private.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:27:41 +02:00
Pekka Paalanen
dee310d0ad x86 mmiotrace: use resource_size_t for phys addresses
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:27:36 +02:00
Pekka Paalanen
87e547fe41 x86 mmiotrace: fix page-unaligned ioremaps
mmiotrace_ioremap() expects to receive the original unaligned map phys address
and size. Also fix {un,}register_kmmio_probe() to deal properly with
unaligned size.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:27:32 +02:00
Pekka Paalanen
970e6fa038 mmiotrace: code style cleanups
From c2da03771e29159627c5c7b9509ec70bce9f91ee Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Mon, 28 Apr 2008 21:25:22 +0300

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:27:28 +02:00
Pekka Paalanen
7423d1115f x86 mmiotrace: dynamically disable non-boot CPUs
From 8979ee55cb6a429c4edd72ebec2244b849f6a79a Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Sat, 12 Apr 2008 00:18:57 +0300

Mmiotrace is not reliable with multiple CPUs and may
miss events. Drop to single CPU when mmiotrace is activated.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 11:26:52 +02:00
Randy Dunlap
0663bb6cd9 mmiotrace: fix printk format
Fix gcc printk format warnings:

next-20080415/arch/x86/mm/mmio-mod.c: In function 'print_pte':
next-20080415/arch/x86/mm/mmio-mod.c:154: warning: format '%lx' expects type 'long unsigned int', but argument 3 has type 'pteval_t'
next-20080415/arch/x86/mm/mmio-mod.c:154: warning: format '%lx' expects type 'long unsigned int', but argument 4 has type 'pteval_t'
next-20080415/arch/x86/mm/mmio-mod.c: At top level:
next-20080415/arch/x86/mm/mmio-mod.c:403: warning: 'downed_cpus' defined but not used

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:26:26 +02:00
Pekka Paalanen
e4b37ee686 x86 mmiotrace: remove ISA_trace parameter.
This had become a no-op.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:25:44 +02:00
Pekka Paalanen
ff3a3e9ba5 x86 mmiotrace: move files into arch/x86/mm/.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:25:37 +02:00
Pekka Paalanen
d61fc44853 x86: mmiotrace, preview 2
Kconfig.debug, Makefile and testmmiotrace.c style fixes.
Use real mutex instead of mutex.
Fix failure path in register probe func.
kmmio: RCU read-locked over single stepping.
Generate mapping id's.
Make mmio-mod.c built-in and rewrite its locking.
Add debugfs file to enable/disable mmiotracing.
kmmio: use irqsave spinlocks.
Lots of cleanups in mmio-mod.c
Marker file moved from /proc into debugfs.
Call mmiotrace entrypoints directly from ioremap.c.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:22:24 +02:00
Pekka Paalanen
0fd0e3da45 x86: mmiotrace full patch, preview 1
kmmio.c handles the list of mmio probes with callbacks, list of traced
pages, and attaching into the page fault handler and die notifier. It
arms, traps and disarms the given pages, this is the core of mmiotrace.

mmio-mod.c is a user interface, hooking into ioremap functions and
registering the mmio probes. It also decodes the required information
from trapped mmio accesses via the pre and post callbacks in each probe.
Currently, hooking into ioremap functions works by redefining the symbols
of the target (binary) kernel module, so that it calls the traced
versions of the functions.

The most notable changes done since the last discussion are:
- kmmio.c is a built-in, not part of the module
- direct call from fault.c to kmmio.c, removing all dynamic hooks
- prepare for unregistering probes at any time
- make kmmio re-initializable and accessible to more than one user
- rewrite kmmio locking to remove all spinlocks from page fault path

Can I abuse call_rcu() like I do in kmmio.c:unregister_kmmio_probe()
or is there a better way?

The function called via call_rcu() itself calls call_rcu() again,
will this work or break? There I need a second grace period for RCU
after the first grace period for page faults.

Mmiotrace itself (mmio-mod.c) is still a module, I am going to attack
that next. At some point I will start looking into how to make mmiotrace
a tracer component of ftrace (thanks for the hint, Ingo). Ftrace should
make the user space part of mmiotracing as simple as
'cat /debug/trace/mmio > dump.txt'.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:22:12 +02:00
Pekka Paalanen
10c43d2eb5 x86: explicit call to mmiotrace in do_page_fault()
The custom page fault handler list is replaced with a single function
pointer. All related functions and variables are renamed for
mmiotrace.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: pq@iki.fi
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:21:55 +02:00
Pekka Paalanen
75bb88350e x86 mmiotrace: use lookup_address()
Use lookup_address() from pageattr.c instead of doing the same
manually. Also had to EXPORT_SYMBOL_GPL(lookup_address) to make this
work for modules. This also fixes "undefined symbol 'init_mm'"
compile error for x86_32.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 11:21:32 +02:00
Pekka Paalanen
72b59d67f8 x86_64: fix kernel rodata NX setting
Without CONFIG_DYNAMIC_FTRACE, mark_rodata_ro() would mark a wrong
number of pages as no-execute. The bug was introduced in the patch
"ftrace: dont write protect kernel text". The symptom was machine reboot
after a CPU hotplug.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 21:53:07 +02:00
Pekka Paalanen
86069782d6 x86: add a list for custom page fault handlers.
Provides kernel modules a way to register custom page fault handlers.
On every page fault this will call a list of registered functions. The
functions may handle the fault and force do_page_fault() to return
immediately.

This functionality is similar to the now removed page fault notifiers.
Custom page fault handlers are used by debugging and reverse engineering
tools. Mmiotrace is one such tool and a patch to add it into the tree
will follow.

The custom page fault handlers are called earlier in do_page_fault()
than the page fault notifiers were.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 21:16:38 +02:00
Steven Rostedt
8f0f996e80 ftrace: dont write protect kernel text
Dynamic ftrace cant work when the kernel has its text write protected.
This patch keeps the kernel from being write protected when
dynamic ftrace is in place.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 21:16:22 +02:00
Andy Whitcroft
84d6bd0e27 x86: cope with no remap space being allocated for a numa node
When allocating the pgdat's for numa nodes on x86_32 we attempt to place
them in the numa remap space for that node.  However should the node not
have any remap space allocated (such as due to having non-ram pages in
the remap location in the node) then we will incorrectly place the pgdat
at zero.  Check we have remap available, falling back to node 0 memory
where we do not.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-20 14:10:50 +02:00
Andy Whitcroft
b9ada4281c x86: reinstate numa remap for SPARSEMEM on x86 NUMA systems
Recent kernels have been panic'ing trying to allocate memory early in boot,
in __alloc_pages:

  BUG: unable to handle kernel paging request at 00001568
  IP: [<c10407b6>] __alloc_pages+0x33/0x2cc
  *pdpt = 00000000013a5001 *pde = 0000000000000000
  Oops: 0000 [#1] SMP
  Modules linked in:

  Pid: 1, comm: swapper Not tainted (2.6.25 #78)
  EIP: 0060:[<c10407b6>] EFLAGS: 00010246 CPU: 0
  EIP is at __alloc_pages+0x33/0x2cc
  EAX: 00001564 EBX: 000412d0 ECX: 00001564 EDX: 000005c3
  ESI: f78012a0 EDI: 00000001 EBP: 00001564 ESP: f7871e50
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 1, ti=f7870000 task=f786f670 task.ti=f7870000)
  Stack: 00000000 f786f670 00000010 00000000 0000b700 000412d0 f78012a0 00000001
         00000000 c105b64d 00000000 000412d0 f78012a0 f7803120 00000000 c105c1c5
         00000010 f7803144 000412d0 00000001 f7803130 f7803120 f78012a0 00000001
  Call Trace:
   [<c105b64d>] kmem_getpages+0x94/0x129
   [<c105c1c5>] cache_grow+0x8f/0x123
   [<c105c689>] ____cache_alloc_node+0xb9/0xe4
   [<c105c999>] kmem_cache_alloc_node+0x92/0xd2
   [<c1018929>] build_sched_domains+0x536/0x70d
   [<c100b63c>] do_flush_tlb_all+0x0/0x3f
   [<c100b63c>] do_flush_tlb_all+0x0/0x3f
   [<c10572d6>] interleave_nodes+0x23/0x5a
   [<c105c44f>] alternate_node_alloc+0x43/0x5b
   [<c1018b47>] arch_init_sched_domains+0x46/0x51
   [<c136e85e>] kernel_init+0x0/0x82
   [<c137ac19>] sched_init_smp+0x10/0xbb
   [<c136e8a1>] kernel_init+0x43/0x82
   [<c10035cf>] kernel_thread_helper+0x7/0x10

Debugging this showed that the NODE_DATA() for nodes other than node 0
were all NULL.  Tracing this back showed that the NODE_DATA() pointers
were being initialised to each nodes remap space.  However under
SPARSEMEM remap is disabled which leads to the pgdat's being placed
incorrectly at kernel virtual address 0.  Leading to the panic when
attempting to allocate memory from these nodes.

Numa remap was disabled in the commit below.  This occured while fixing
problems triggered when attempting to boot x86_32 NUMA SPARSEMEM kernels
on non-numa hardware.

	x86: make NUMA work on 32-bit
	commit 1b000a5dbe

The real problem is believed to be related to other alignment issues in
the regions blocked out from the bootmem allocator for small memory
systems, and has been fixed separately.  Therefore re-enable remap for
SPARSMEM, which fixes pgdat allocation issues.  Testing confirms that
SPARSMEM NUMA kernels will boot correctly with this part of the change
reverted.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-20 14:10:48 +02:00
Ingo Molnar
a8ac1ae3a2 Merge branch 'linus' into x86/pat 2008-05-19 09:23:07 +02:00
Thomas Gleixner
b4ef290d7c Merge branch 'linus' into x86/pat 2008-05-18 11:54:43 +02:00
Avi Kivity
31f4d870b0 x86: fix crash on cpu hotplug on pat-incapable machines
pat_disable() is __init, which means it goes away after booting is complete.
Unfortunately it is used by the hotplug code if the machine is not
pat-capable, causing a crash.

Fix by marking pat_disable() as __cpuinit.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-17 22:57:20 +02:00
Pranith Kumar
afc8534380 x86: arch/x86/mm/pat.c - fix warning
fix this warning:

 arch/x86/mm/pat.c: In function `phys_mem_access_prot_allowed':
 arch/x86/mm/pat.c:558: warning: long long unsigned int format, long
 unsigned int arg (arg 6)
 arch/x86/mm/pat.c: In function `map_devmem':
 arch/x86/mm/pat.c:580: warning: long long unsigned int format, long
 unsigned int arg (arg 6)

Signed-off-by: D Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:39:30 +02:00
Hugh Dickins
61165d7a03 x86: fix app crashes after SMP resume
After resume on a 2cpu laptop, kernel builds collapse with a sed hang,
sh or make segfault (often on 20295564), real-time signal to cc1 etc.

Several hurdles to jump, but a manually-assisted bisect led to -rc1's
d2bcbad5f3 x86: do not zap_low_mappings
in __smp_prepare_cpus.  Though the low mappings were removed at bootup,
they were left behind (with Global flags helping to keep them in TLB)
after resume or cpu online, causing the crashes seen.

Reinstate zap_low_mappings (with local __flush_tlb_all) for each cpu_up
on x86_32.  This used to be serialized by smp_commenced_mask: that's now
gone, but a low_mappings flag will do.  No need for native_smp_cpus_done
to repeat the zap: let mem_init zap BSP's low mappings just like on UP.

(In passing, fix error code from native_cpu_up: do_boot_cpu returns a
variety of diagnostic values, Dprintk what it says but convert to -EIO.
And save_pg_dir separately before zap_low_mappings: doesn't matter now,
but zapping twice in succession wiped out resume's swsusp_pg_dir.)

That worked well on the duo and one quad, but wouldn't boot 3rd or 4th
cpu on P4 Xeon, oopsing just after unlock_ipi_call_lock.  The TLB flush
IPI now being sent reveals a long-standing bug: the booting cpu has its
APIC readied in smp_callin at the top of start_secondary, but isn't put
into the cpu_online_map until just before that unlock_ipi_call_lock.

So native_smp_call_function_mask to online cpus would send_IPI_allbutself,
including the cpu just coming up, though it has been excluded from the
count to wait for: by the time it handles the IPI, the call data on
native_smp_call_function_mask's stack may well have been overwritten.

So fall back to send_IPI_mask while cpu_online_map does not match
cpu_callout_map: perhaps there's a better APICological fix to be
made at the start_secondary end, but I wouldn't know that.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:36:12 +02:00
Yinghai Lu
0327318445 x86_64: simplify the memtest parameter setting
use CONFIG_MEMTEST only. if it is set, will have memtest=0 (disabled)

need to have memtest=4 in command line to test more patterns.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:15 +02:00
Venki Pallipadi
77b52b4c5c x86: add "debugpat" boot option
enable debug messages by a boot option "debugpat".

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:07 +02:00
Thomas Gleixner
8d4a430085 x86: cleanup PAT cpu validation
Move the scattered checks for PAT support to a single function. Its
moved to addon_cpuid_features.c as this file is shared between 32 and
64 bit.

Remove the manipulation of the PAT feature bit and just disable PAT in
the PAT layer, based on the PAT bit provided by the CPU and the
current CPU version/model white list.

Change the boot CPU check so it works on Voyager somewhere in the
future as well :) Also panic, when a secondary has PAT disabled but
the primary one has alrady switched to PAT. We have no way to undo
that.

The white list is kept for now to ensure that we can rely on known to
work CPU types and concentrate on the software induced problems
instead of fighthing CPU erratas and subtle wreckage caused by not yet
verified CPUs. Once the PAT code has stabilized enough, we can remove
the white list and open the can of worms.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-08 15:43:51 +02:00
Hugh Dickins
aeed5fce37 x86: fix PAE pmd_bad bootup warning
Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.

That came from 9fc34113f6 x86: debug pmd_bad();
but we understand now that the typecasting was wrong for PAE in the previous
version: pagetable pages above 4GB looked bad and stopped Arjan from booting.

And revert that cded932b75 x86: fix pmd_bad
and pud_bad to support huge pages.  It was the wrong way round: we shouldn't
weaken every pmd_bad and pud_bad check to let huge pages slip through - in
part they check that we _don't_ have a huge page where it's not expected.

Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
junk in the upper word is good; and x86_64 should follow x86_32's stricter
comparison, to stop thinking any subset of required bits is good); but that
should be a later patch.

Fix Hans' good observation that follow_page() will never find pmd_huge()
because that would have already failed the pmd_bad test: test pmd_huge in
between the pmd_none and pmd_bad tests.  Tighten x86's pmd_huge() check?
No, once it's a hugepage entry, it can get quite far from a good pmd: for
example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.

However... though follow_page() contains this and another test for huge
pages, so it's nice to keep it working on them, where does it actually get
called on a huge page?  get_user_pages() checks is_vm_hugetlb_page(vma) to
to call alternative hugetlb processing, as does unmap_vmas() and others.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Earlier-version-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Chua <jeff.chua.linux@gmail.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-06 13:08:58 -07:00
Thomas Gleixner
48b83d2425 x86: undo visws/numaq build changes
arch/x86/pci/Makefile_32 has a nasty detail. VISWS and NUMAQ build
override the generic pci-y rules. This needs a proper cleanup, but
that needs more thoughts. Undo

commit 895d30935e
    x86: numaq fix
    do not override the existing pci-y rule when adding visws or
    numaq rules.

There is also a stupid init function ordering problem vs. acpi.o

Add comments to the Makefile to avoid tripping over this again.

Remove the srat stub code in discontig_32.c to allow a proper NUMAQ
build.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:45 +02:00
Andres Salomon
cb8ab687c3 x86: ioremap ram check fix
bdd3cee2e4 (x86: ioremap(), extend check
to all RAM pages) breaks OLPC's ioremap call.  The ioremap that OLPC uses is:

        romsig = ioremap(0xffffffc0, 16);

The commit that breaks it is basically:

-       for (pfn = phys_addr >> PAGE_SHIFT; pfn < max_pfn_mapped &&
-            (pfn << PAGE_SHIFT) < last_addr; pfn++) {
+       for (pfn = phys_addr >> PAGE_SHIFT;
+                               (pfn << PAGE_SHIFT) < last_addr; pfn++) {
+

Previously, the 'pfn < max_pfn_mapped' check would've caused us to not
enter the loop.  Removing that check means we loop infinitely.  The
reason for that is because pfn is 0xfffff, and last_addr is 0xffffffcf.
The remaining check that is used to exit the loop is not sufficient;
when pfn<<PAGE_SHIFT is 0xfffff000, that is less than 0xffffffcf; when
we increment pfn and it overflows (pfn == 0x100000), pfn<<PAGE_SHIFT
ends up being 0.  That, of course, is less than last_addr.  In effect,
pfn<<PAGE_SHIFT is never lower than last_addr.

The simple fix for this is to limit the last_addr check to the PAGE_MASK;
a patch is below.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Suresh Siddha
de33c442ed x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()
Use UC_MINUS for ioremap(), ioremap_nocache() instead of strong UC.
Once all the X drivers move to ioremap_wc(), we can go back to strong
UC semantics for ioremap() and ioremap_nocache().

To avoid attribute aliasing issues, pci_mmap_page_range() will also
use UC_MINUS for default non write-combining mapping request.

Next steps:
	a) change all the video drivers using ioremap() or ioremap_nocache()
	   and adding WC MTTR using mttr_add() to ioremap_wc()

	b) for strict usage, we can go back to strong uc semantics
	   for ioremap() and ioremap_nocache() after some grace period for
	   completing step-a.

	c) user level X server needs to use the appropriate method for setting
	   up WC mapping (like using resourceX_wc sysfs file instead of
	   adding MTRR for WC and using /dev/mem or resourceX under /sys)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Ingo Molnar
2544a873ab revert: "x86: ioremap(), extend check to all RAM pages"
Vegard Nossum reported a large (150 seconds) boot delay during bootup,
and bisected it to "x86: ioremap(), extend check to all RAM pages"
(commit bdd3cee2e4). Revert this commit for now.

Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Adrian Bunk
b9e017e04b x86: unexport kmap_atomic_to_page
This patch removes the no longer used export of kmap_atomic_to_page.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Linus Torvalds
5f78e4d339 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci:
  x86: add pci=check_enable_amd_mmconf and dmi check
  x86: work around io allocation overlap of HT links
  acpi: get boot_cpu_id as early for k8_scan_nodes
  x86_64: don't need set default res if only have one root bus
  x86: double check the multi root bus with fam10h mmconf
  x86: multi pci root bus with different io resource range, on 64-bit
  x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit
  x86: get mp_bus_to_node early
  x86 pci: remove checking type for mmconfig probe
  x86: remove unneeded check in mmconf reject
  driver core: try parent numa_node at first before using default
  x86: seperate mmconf for fam10h out from setup_64.c
  x86: if acpi=off, force setting the mmconf for fam10h
  x86_64: check MSR to get MMCONFIG for AMD Family 10h
  x86_64: check and enable MMCONFIG for AMD Family 10h
  x86_64: set cfg_size for AMD Family 10h in case MMCONFIG
  x86: mmconf enable mcfg early
  x86: clear pci_mmcfg_virt when mmcfg get rejected
  x86: validate against acpi motherboard resources

Fixed up fairly trivial conflicts in arch/x86/pci/{init.c,pci.h} due to
OLPC support manually.
2008-04-29 08:26:51 -07:00
Christoph Lameter
2301696932 vmallocinfo: add caller information
Add caller information so that /proc/vmallocinfo shows where the allocation
request for a slice of vmalloc memory originated.

Results in output like this:

0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20000801000-0xffffc20000806000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages
0xffffc20000c07000-0xffffc20000c0a000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c0a000-0xffffc20000c0c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c0c000-0xffffc20000c0f000   12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap
0xffffc20000c10000-0xffffc20000c15000   20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap
0xffffc20000c16000-0xffffc20000c18000    8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap
0xffffc20000c18000-0xffffc20000c1a000    8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap
0xffffc20000c1a000-0xffffc20000c1c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1c000-0xffffc20000c1e000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1e000-0xffffc20000c20000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c20000-0xffffc20000c22000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c22000-0xffffc20000c24000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c24000-0xffffc20000c26000    8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap
0xffffc20000c26000-0xffffc20000c28000    8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap
0xffffc20000c28000-0xffffc20000c2d000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000c2d000-0xffffc20000c31000   16384 tcp_init+0xd5/0x31c pages=3 vmalloc
0xffffc20000c31000-0xffffc20000c34000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c34000-0xffffc20000c36000    8192 init_vdso_vars+0xde/0x1f1
0xffffc20000c36000-0xffffc20000c38000    8192 pci_iomap+0x8a/0xb4 phys=d8e00000 ioremap
0xffffc20000c38000-0xffffc20000c3a000    8192 usb_hcd_pci_probe+0x139/0x295 [usbcore] phys=d8e00000 ioremap
0xffffc20000c3a000-0xffffc20000c3e000   16384 sys_swapon+0x509/0xa15 pages=3 vmalloc
0xffffc20000c40000-0xffffc20000c61000  135168 e1000_probe+0x1c4/0xa32 phys=d8a20000 ioremap
0xffffc20000c61000-0xffffc20000c6a000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c6a000-0xffffc20000c73000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c73000-0xffffc20000c7c000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c7c000-0xffffc20000c7f000   12288 e1000e_setup_tx_resources+0x29/0xbe pages=2 vmalloc
0xffffc20000c80000-0xffffc20001481000 8392704 pci_mmcfg_arch_init+0x90/0x118 phys=e0000000 ioremap
0xffffc20001481000-0xffffc20001682000 2101248 alloc_large_system_hash+0x127/0x246 pages=512 vmalloc
0xffffc20001682000-0xffffc20001e83000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20001e83000-0xffffc20002204000 3674112 alloc_large_system_hash+0x127/0x246 pages=896 vmalloc vpages
0xffffc20002204000-0xffffc2000220d000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc2000220d000-0xffffc20002216000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002216000-0xffffc2000221f000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc2000221f000-0xffffc20002228000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002228000-0xffffc20002231000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002231000-0xffffc20002234000   12288 e1000e_setup_rx_resources+0x35/0x122 pages=2 vmalloc
0xffffc20002240000-0xffffc20002261000  135168 e1000_probe+0x1c4/0xa32 phys=d8a60000 ioremap
0xffffc20002261000-0xffffc2000270c000 4894720 sys_swapon+0x509/0xa15 pages=1194 vmalloc vpages
0xffffffffa0000000-0xffffffffa0022000  139264 module_alloc+0x4f/0x55 pages=33 vmalloc
0xffffffffa0022000-0xffffffffa0029000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc
0xffffffffa002b000-0xffffffffa0034000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
0xffffffffa0034000-0xffffffffa003d000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
0xffffffffa003d000-0xffffffffa0049000   49152 module_alloc+0x4f/0x55 pages=11 vmalloc
0xffffffffa0049000-0xffffffffa0050000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:21 -07:00
Jeremy Fitzhardinge
180c06efce hotplug-memory: make online_page() common
All architectures use an effectively identical definition of online_page(), so
just make it common code.  x86-64, ia64, powerpc and sh are actually
identical; x86-32 is slightly different.

x86-32's differences arise because it puts its hotplug pages in the highmem
zone.  We can handle this in the generic code by inspecting the page to see if
its in highmem, and update the totalhigh_pages count appropriately.  This
leaves init_32.c:free_new_highpage with a single caller, so I folded it into
add_one_highpage_init.

I also removed an incorrect comment referring to the NUMA case; any NUMA
details have already been dealt with by the time online_page() is called.

[akpm@linux-foundation.org: fix indenting]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:17 -07:00
Ingo Molnar
f022bfd582 x86: PAT fix
Adrian Bunk noticed the following Coverity report:

> Commit e7f260a276
> (x86: PAT use reserve free memtype in mmap of /dev/mem)
> added the following gem to arch/x86/mm/pat.c:
>
> <--  snip  -->
>
> ...
> int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
>                                 unsigned long size, pgprot_t *vma_prot)
> {
>         u64 offset = ((u64) pfn) << PAGE_SHIFT;
>         unsigned long flags = _PAGE_CACHE_UC_MINUS;
>         unsigned long ret_flags;
> ...
> ...  (nothing that touches ret_flags)
> ...
>         if (flags != _PAGE_CACHE_UC_MINUS) {
>                 retval = reserve_memtype(offset, offset + size, flags, NULL);
>         } else {
>                 retval = reserve_memtype(offset, offset + size, -1, &ret_flags);
>         }
>
>         if (retval < 0)
>                 return 0;
>
>         flags = ret_flags;
>
>         if (pfn <= max_pfn_mapped &&
>             ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) {
>                 free_memtype(offset, offset + size);
>                 printk(KERN_INFO
>                 "%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n",
>                         current->comm, current->pid,
>                         cattr_name(flags),
>                         offset, offset + size);
>                 return 0;
>         }
>
>         *vma_prot = __pgprot((pgprot_val(*vma_prot) & ~_PAGE_CACHE_MASK) |
>                              flags);
>         return 1;
> }
>
> <--  snip  -->
>
> If (flags != _PAGE_CACHE_UC_MINUS) we pass garbage from the stack to
> ioremap_change_attr() and/or __pgprot().
>
> Spotted by the Coverity checker.

the fix simplifies the code as we get rid of the 'ret_flags'
complication.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:15:16 -07:00
Linus Torvalds
86cf02f8ea x86 PAT: tone down debugging messages some more
Ingo already fixed one of these at my request (in "x86 PAT: tone down
debugging messages", commit 1ebcc654f0),
but there was another one he missed.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-27 11:59:30 -07:00
Yinghai Lu
cbf9bd603a acpi: get boot_cpu_id as early for k8_scan_nodes
[mingo@elte.hu: split from "x86_64: get boot_cpu_id as early for k8_scan_nodes]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu
c2b91e2eec x86_64/mm: check and print vmemmap allocation continuous
On big systems with lots of memory, don't print out too much during
bootup, and make it easy to find if it is continuous.

on 256G 8 sockets system will get
 [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0
[ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs
 [ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0
[ffffe20038700000-ffffe200387fffff] potential offnode page_structs
 [ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1
 [ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2
 [ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2
[ffffe20054700000-ffffe200547fffff] potential offnode page_structs
 [ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2
[ffffe20070700000-ffffe200707fffff] potential offnode page_structs
 [ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3
 [ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4
 [ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4
[ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs
 [ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4
[ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs
 [ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5
 [ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6
 [ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6
[ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs
 [ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6
 [ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7

instead of a very long print out...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 22:51:09 +02:00
Yinghai Lu
1a27fc0a42 x86_64: fix setup_node_bootmem to support big mem excluding with memmap
typical case: four sockets system, every node has 4g ram, and we are using:

	memmap=10g$4g

to mask out memory on node1 and node2

when numa is enabled, early_node_mem is used to get node_data and node_bootmap.

if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.

so check it and print out some info about it.

need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.

depends on "mm: make reserve_bootmem can crossed the nodes".

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Yinghai Lu
8b3cd09ed2 x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
"mm: make reserve_bootmem can crossed the nodes" provides new
reserve_bootmem(), let reserve_bootmem_generic() use that.

reserve_bootmem_generic() is used to reserve initramdisk, so this way
we can make sure even when bootloader or kexec load ranges cross the
node memory boundaries, reserve_bootmem still works.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Venki Pallipadi
0124cecfc8 x86, PAT: disable /dev/mem mmap RAM with PAT
disable /dev/mem mmap of RAM with PAT. It makes things safer and
eliminates aliasing. A future improvement would be to avoid the
range_is_allowed duplication.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 21:28:29 +02:00
Linus Torvalds
4a27214d7b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
  x86 PAT: decouple from nonpromisc devmem
  x86 PAT: tone down debugging messages
2008-04-26 09:50:58 -07:00
Dmitri Vorobiev
f7f17a67c5 x86: remove NexGen support
It is claimed that NexGen CPUs were never shipped:

   http://lkml.org/lkml/2008/4/20/179

Also, the kernel support for these chips has been broken for
a long time, the code intended to support NexGen thereby being
essentially dead.

As an outcome of the discussion that can be found using the URL
above, this patch removes the NexGen support altogether.

The changes in this patch survived a defconfig build for i386, a
couple of successful randconfig builds, as well as a runtime test,
which consisted in booting a 32-bit x86 box up to the shell prompt.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Ingo Molnar
1ebcc654f0 x86 PAT: tone down debugging messages
Linus reported these excessive debug printouts:

>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0380000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000

turn that into a pr_debug().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 16:01:26 +02:00
Linus Torvalds
bf16ae2509 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat:
  generic: add ioremap_wc() interface wrapper
  /dev/mem: make promisc the default
  pat: cleanups
  x86: PAT use reserve free memtype in mmap of /dev/mem
  x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
  x86: PAT avoid aliasing in /dev/mem read/write
  devmem: add range_is_allowed() check to mmap of /dev/mem
  x86: introduce /dev/mem restrictions with a config option
2008-04-25 12:48:08 -07:00
Linus Torvalds
4b7227ca32 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next: (52 commits)
  xen: add balloon driver
  xen: allow compilation with non-flat memory
  xen: fold xen_sysexit into xen_iret
  xen: allow set_pte_at on init_mm to be lockless
  xen: disable preemption during tlb flush
  xen pvfb: Para-virtual framebuffer, keyboard and pointer driver
  xen: Add compatibility aliases for frontend drivers
  xen: Module autoprobing support for frontend drivers
  xen blkfront: Delay wait for block devices until after the disk is added
  xen/blkfront: use bdget_disk
  xen: Make xen-blkfront write its protocol ABI to xenstore
  xen: import arch generic part of xencomm
  xen: make grant table arch portable
  xen: replace callers of alloc_vm_area()/free_vm_area() with xen_ prefixed one
  xen: make include/xen/page.h portable moving those definitions under asm dir
  xen: add resend_irq_on_evtchn() definition into events.c
  Xen: make events.c portable for ia64/xen support
  xen: move events.c to drivers/xen for IA64/Xen support
  xen: move features.c from arch/x86/xen/features.c to drivers/xen
  xen: add missing definitions in include/xen/interface/vcpu.h which ia64/xen needs
  ...
2008-04-25 12:32:10 -07:00
Ingo Molnar
70c9f590ff x86: remove set_fixmap() warning
set_fixmap()+clear_fixmap() is safe.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Ingo Molnar
82a355f5a2 x86: make __set_fixmap() non-init
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Jeremy Fitzhardinge
85958b465c x86: unify pgd ctor/dtor
All pagetables need fundamentally the same setup and destruction, so
just use the same code for everything.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
68db065c84 x86: unify KERNEL_PGD_PTRS
Make KERNEL_PGD_PTRS common, as previously it was only being defined
for 32-bit.

There are a couple of follow-on changes from this:
 - KERNEL_PGD_PTRS was being defined in terms of USER_PGD_PTRS.  The
   definition of USER_PGD_PTRS doesn't really make much sense on x86-64,
   since it can have two different user address-space configurations.
   I renamed USER_PGD_PTRS to KERNEL_PGD_BOUNDARY, which is meaningful
   for all of 32/32, 32/64 and 64/64 process configurations.

 - USER_PTRS_PER_PGD was also defined and was being used for similar
   purposes.  Converting its users to KERNEL_PGD_BOUNDARY left it
   completely unused, and so I removed it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
c20311e165 x86/pgtable.h: demacro ptep_clear_flush_young
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
f9fbf1a36a x86/pgtable.h: demacro ptep_test_and_clear_young
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
ee5aa8d3ba x86/pgtable.h: demacro ptep_set_access_flags
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
2761fa0920 x86: add pud_alloc for 4-level pagetables
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
6944a9c894 x86: rename paravirt_alloc_pt etc after the pagetable structure
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.

[ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
394158559d x86: move all the pgd_list handling to one place
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
5a5f8f4224 x86: move pgalloc pud and pgd operations into common place
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge
170fdff705 x86: move pmd functions into common asm/pgalloc.h
Common definitions for 3-level pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge
397f687ab7 x86: move pte functions into common asm/pgalloc.h
Common definitions for 2-level pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge
1d262d3a49 x86: put paravirt stubs into common asm/pgalloc.h
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Ingo Molnar
1ec1fe73df x86: xen unify x86 add common mm pgtable c fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge
4f76cd3822 x86: add common mm/pgtable.c
Add a common arch/x86/mm/pgtable.c file for common pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge
79bf6d66ab x86: convert pgalloc_64.h from macros to inlines
Convert asm-x86/pgalloc_64.h from macros into functions (#include hell
prevents __*_free_tlb from being inline, but they're probably a bit
big to inline anyway).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Ingo Molnar
28eb559b5b pat: cleanups
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com
e7f260a276 x86: PAT use reserve free memtype in mmap of /dev/mem
Use reserve_memtype and free_memtype wrappers for /dev/mem mmaps. The memtype
is slightly complicated here, given that we have to support existing X mappings.
We fallback on UC_MINUS for that.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com
f0970c13b6 x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
Introduce phys_mem_access_prot_allowed(), which checks whether the mapping
is possible, without any conflicts and returns success or failure based on that.
phys_mem_access_prot() by itself does not allow failure case. This ability
to return error is needed for PAT where we may have aliasing conflicts.

x86 setup __HAVE_PHYS_MEM_ACCESS_PROT and move x86 specific code out of
/dev/mem into arch specific area.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com
e045fb2a98 x86: PAT avoid aliasing in /dev/mem read/write
Add xlate and unxlate around /dev/mem read/write. This sets up the mapping
that can be used for /dev/mem read and write without aliasing worries.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
Arjan van de Ven
ae531c26c5 x86: introduce /dev/mem restrictions with a config option
This patch introduces a restriction on /dev/mem: Only non-memory can be
read or written unless the newly introduced config option is set.

The X server needs access to /dev/mem for the PCI space, but it doesn't need
access to memory; both the file permissions and SELinux permissions of /dev/mem
just make X effectively super-super powerful. With the exception of the
BIOS area, there's just no valid app that uses /dev/mem on actual memory.
Other popular users of /dev/mem are rootkits and the like.
(note: mmap access of memory via /dev/mem was already not allowed since
a really long time)

People who want to use /dev/mem for kernel debugging can enable the config
option.

The restrictions of this patch have been in the Fedora and RHEL kernels for
at least 4 years without any problems.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:40:47 +02:00
Ingo Molnar
a4928cffe6 "make namespacecheck" fixes
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:15:44 +02:00
Linus Torvalds
ec965350bb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
  sched: build fix
  sched: better rt-group documentation
  sched: features fix
  sched: /debug/sched_features
  sched: add SCHED_FEAT_DEADLINE
  sched: debug: show a weight tree
  sched: fair: weight calculations
  sched: fair-group: de-couple load-balancing from the rb-trees
  sched: fair-group scheduling vs latency
  sched: rt-group: optimize dequeue_rt_stack
  sched: debug: add some debug code to handle the full hierarchy
  sched: fair-group: SMP-nice for group scheduling
  sched, cpuset: customize sched domains, core
  sched, cpuset: customize sched domains, docs
  sched: prepatory code movement
  sched: rt: multi level group constraints
  sched: task_group hierarchy
  sched: fix the task_group hierarchy for UID grouping
  sched: allow the group scheduler to have multiple levels
  sched: mix tasks and groups
  ...
2008-04-21 15:40:24 -07:00
Mike Travis
f46bdf2db2 numa: move large array from stack to _initdata section
* Move large array "struct bootnode nodes" from stack to _initdata
    section to reduce amount of stack space required.

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:58 +02:00
Glauber Costa
85c246ee16 x86: move definition to pci-dma.c
Move dma_ops structure definition to pci-dma.c, where it
belongs.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:57 +02:00
Suresh Siddha
6ec6e0d9f2 srat, x86: add support for nodes spanning other nodes
For example, If the physical address layout on a two node system with 8 GB
memory is something like:
node 0: 0-2GB, 4-6GB
node 1: 2-4GB, 6-8GB

Current kernels fail to boot/detect this NUMA topology.

ACPI SRAT tables can expose such a topology which needs to be supported.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:55 +02:00
Ingo Molnar
fa5c463941 x86: rename find_max_pfn() to propagate_e820_map()
this function doesnt just 'find' the max_pfn - it also has
other side-effects such as registering sparse memory maps.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:55 +02:00
Randy Dunlap
4c8337ac42 x86: fix arch/x86/mm/ioremap.c warning
Fix printk formats in x86/mm/ioremap.c:

next-20080410/arch/x86/mm/ioremap.c:137: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'resource_size_t'
next-20080410/arch/x86/mm/ioremap.c:188: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'resource_size_t'
next-20080410/arch/x86/mm/ioremap.c:188: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'long unsigned int'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:54 +02:00
WANG Cong
cf9b111c17 x86: remove pointless comments
Remove old comments that include the old arch/i386 directory.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:54 +02:00
Ingo Molnar
d1a4be630f x86 PAT: fix mmap() of holes
do not return a -EINVAL when mmap()-ing PCI holes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
2008-04-18 23:40:49 +02:00
Mike Travis
b447a468fc x86: clean up non-smp usage of cpu maps
Cleanup references to the early cpu maps for the non-SMP configuration
and remove some functions called for SMP configurations only.

Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:34 +02:00
Jack Steiner
a65d1d644c x86: increase size of APICID
Increase the number of bits in an apicid from 8 to 32.

By default, MP_processor_info() gets the APICID from the
mpc_config_processor structure. However, this structure limits
the size of APICID to 8 bits. This patch allows the caller of
MP_processor_info() to optionally pass a larger APICID that will
be used instead of the one in the mpc_config_processor struct.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:33 +02:00
gorcunov@gmail.com
6b6891f9c5 x86: cleanup - rename VM_MASK to X86_VM_MASK
This patch renames VM_MASK to X86_VM_MASK (which
in turn defined as alias to X86_EFLAGS_VM) to better
distinguish from virtual memory flags. We can't just
use X86_EFLAGS_VM instead because it is also used
for conditional compilation

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:33 +02:00
Ingo Molnar
756a6c6855 x86: ioremap of 64-bit resource on 32-bit kernel fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:30 +02:00
Andi Kleen
f5c24a7fd0 x86: don't use large pages to map the first 2/4MB of memory
Intel recommends to not use large pages for the first 1MB
of the physical memory because there are fixed size MTRRs there
which cause splitups in the TLBs.

On AMD doing so is also a good idea.

The implementation is a little different between 32bit and 64bit.
On 32bit I just taught the initial page table set up about this
because it was very simple to do. This also has the advantage
that the risk of a prefetch ever seeing the page even
if it only exists for a short time is minimized.

On 64bit that is not quite possible, so use set_memory_4k() a little
later (in check_bugs) instead.

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:30 +02:00
Andi Kleen
c9caa02c52 x86: add set_memory_4k to pageattr.c
Add a new function to force split large pages into 4k pages.
This is needed for some followup optimizations.

I had to add a new field to cpa_data to pass down the information
that try_preserve_large_page should not run.

Right now no set_page_4k() because I didn't need it and all the
specialized users I have in mind would be more comfortable with
pure addresses. I also didn't export it because it's unlikely
external code needs it.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:30 +02:00
Andi Kleen
cc61503219 x86: account overlapped mappings in max_pfn_mapped
When end_pfn is not aligned to 2MB (or 1GB) then the kernel might
map more memory than end_pfn. Account this in max_pfn_mapped.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:30 +02:00
Thomas Gleixner
67794292c8 x86: replace the now useless max_pfn_mapped define
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:30 +02:00
Andi Kleen
7d1116a92d x86: implement true end_pfn_mapped for 32bit
Even on 32bit 2MB pages can map more memory than is in the true
max_low_pfn if end_pfn is not highmem and not aligned to 2MB.
Add a end_pfn_map similar to x86-64 that accounts for this
fact. This is important for code that really needs to know about
all mapping aliases.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:30 +02:00
Yinghai Lu
dcfe946520 x86: fix memtest print out
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:21 +02:00
Yinghai Lu
c64df70793 x86: memtest bootparam
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:21 +02:00
Venki Pallipadi
b450e5e816 x86: PAT bug fix for attribute type check after reserve_memtype, debug
Make the PAT related printks in ioremap pr_debug.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:20 +02:00
Venki Pallipadi
dee7cbb210 x86: PAT bug fix for attribute type check after reserve_memtype
Bug fixes for reserve_memtype() call in __ioremap and pci_mmap_page_range().
If reserve_memtype returns non-zero, then it is an error and subsequent free is
not required. Requested and returned prot value check should be done when
reserve_memtype returns success.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:20 +02:00
Yinghai Lu
9307cacad0 x86: pat cpu feature bit setting for known cpus
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:20 +02:00
Yinghai Lu
35605a1027 x86: enable PAT for amd k8 and fam10h
make known_pat_cpu to think amd k8 and fam10h is ok too.

also make tom2 below to be WRBACK

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:20 +02:00
venkatesh.pallipadi@intel.com
6997ab4982 x86: add PAT related debug prints
Adds debug prints at critical code. Adds enough info in dmesg to allow us to
do effective first round of analysis of any issues that may result due to PAT
patch series.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:20 +02:00
venkatesh.pallipadi@intel.com
b310f381d2 x86: PAT add ioremap_wc() interface
Introduce ioremap_wc for wc remap.

(generic wrapper is in a later patch)

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:20 +02:00
venkatesh.pallipadi@intel.com
ef354af462 x86: PAT add set_memory_wc() interface
Add a set_memory_wc interface(), similar to set_memory_uc interface.
Callers has to call set_memory_uc, set_memory_wb and
set_memory_wc, set_memory_wb as pairs.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:20 +02:00
venkatesh.pallipadi@intel.com
1219333dfd x86: PAT use reserve free memtype in set_memory_uc
Use reserve_memtype and free_memtype interfaces in set_memory_uc/set_memory_wb
interfaces to avoid aliasing.
Usage model of set_memory_uc and set_memory_wb is for RAM memory and users
will first call set_memory_uc and call set_memory_wb after use to reset the
attribute.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:19 +02:00
venkatesh.pallipadi@intel.com
d7677d4034 x86: PAT use reserve free memtype in ioremap and iounmap
Use reserve_memtype and free_memtype interfaces in ioremap/iounmap to avoid
aliasing.

If there is an existing alias for the region, inherit the memory type from
the alias. If there are conflicting aliases for the entire region, then fail
ioremap.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:19 +02:00
venkatesh.pallipadi@intel.com
3a96ce8cac x86: PAT make ioremap_change_attr non-static
Make ioremap_change_attr() non-static and use prot_val in place of ioremap_mode.
This interface is used in subsequent PAT patches.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:19 +02:00
Ingo Molnar
55c626820a x86: revert ucminus change
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:19 +02:00
venkatesh.pallipadi@intel.com
2e5d9c857d x86: PAT infrastructure patch
Sets up pat_init() infrastructure.

PAT MSR has following setting.
	PAT
	|PCD
	||PWT
	|||
	000 WB		_PAGE_CACHE_WB
	001 WC		_PAGE_CACHE_WC
	010 UC-		_PAGE_CACHE_UC_MINUS
	011 UC		_PAGE_CACHE_UC

We are effectively changing WT from boot time setting to WC.
UC_MINUS is used to provide backward compatibility to existing /dev/mem
users(X).

reserve_memtype and free_memtype are new interfaces for maintaining alias-free
mapping. It is currently implemented in a simple way with a linked list and
not optimized. reserve and free tracks the effective memory type, as a result
of PAT and MTRR setting rather than what is actually requested in PAT.

pat_init piggy backs on mtrr_init as the rules for setting both pat and mtrr
are same.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:19 +02:00
Yinghai Lu
272b9cad6e x86: early memtest to find bad ram
do simple memtest after init_memory_mapping

use find_e820_area_size to find all ram range that is not reserved.

and do some simple bits test to find some bad ram.

if find some bad ram, use reserve_early to exclude that range.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:19 +02:00
Alexey Starikovskiy
ce3fe6b2bf x86: use get_bios_ebda in mpparse_64.c
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:05 +02:00
Johannes Weiner
1415d160c7 x86: Remove redundant display of free swap space in show_mem()
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:58 +02:00
Yinghai Lu
9a79cf9c1a x86: sort address_markers for dump_pagetables
otherwise Vmemmap and High Kernel Mapping string is not showing up.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:58 +02:00
Mathieu Desnoyers
4e4eee0e01 x86: enhance DEBUG_RODATA support for hotplug and kprobes
Standardize DEBUG_RODATA, removing special cases for hotplug and kprobes.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: pageexec@freemail.hu
Cc: akpm@linux-foundation.org
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:58 +02:00
Ingo Molnar
9fc34113f6 x86: debug pmd_bad()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:52 +02:00
Ingo Molnar
ba748d221e x86: warn about RAM pages in ioremap()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:52 +02:00
Ingo Molnar
bdd3cee2e4 x86: ioremap(), extend check to all RAM pages
Suggested by Jan Beulich.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jan Beulich <jbeulich@novell.com>
2008-04-17 17:40:51 +02:00
Thomas Gleixner
e3100c82ab x86: check physical address range in ioremap
Roland Dreier reported in http://lkml.org/lkml/2008/2/27/194

[ 8425.915139] BUG: unable to handle kernel paging request at ffffc20001a0a000
[ 8425.919087] IP: [<ffffffff8021dacc>] clflush_cache_range+0xc/0x25
[ 8425.919087] PGD 1bf80e067 PUD 1bf80f067 PMD 1bb497067 PTE 80000047000ee17b

This is on a Intel machine with 36bit physical address space. The PTE
entry references 47000ee000, which is outside of it.

Add a check for the physical address space and warn/printk about the
stupid caller.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:51 +02:00
Ian Campbell
c92a7a54d6 x86: reduce arch/x86/mm/ioremap.o size
> Don't we have a special section for page-aligned data so it doesn't
> waste most of two pages?

We have .bss.page_aligned and it seems appropriate to use it.

    text	   data	    bss	    dec	    hex	filename
    -   3388	   8236	      4	  11628	   2d6c	../build-32/arch/x86/mm/ioremap.o
    +   3388	     48	   4100	   7536	   1d70	../build-32/arch/x86/mm/ioremap.o

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:47 +02:00
Yinghai Lu
04adf11435 x86: remove never used nodenumer in pda
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:47 +02:00
Yinghai Lu
beafe91f1c x86: get apic_id later in acpi_numa_processor_affinity_init
we don't need get that so early.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:47 +02:00
Andi Kleen
ef9257668e x86: do kernel direct mapping at boot using GB pages
The AMD Fam10h CPUs support new Gigabyte page table entry for
mapping 1GB at a time. Use this for the kernel direct mapping.

Only done for 64bit because i386 does not support GB page tables.

This only applies to the data portion of the direct mapping; the
kernel text mapping stays with 2MB pages because the AMD Fam10h
microarchitecture does not support GB ITLBs and AMD recommends
against using GB mappings for code.

Can be disabled with disable_gbpages on the kernel command line

[ tglx@linutronix.de: simplify enable code ]
[ Yinghai Lu <yinghai.lu@sun.com>: boot fix on 256 GB RAM ]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
Ingo Molnar
00d1c5e057 x86: add gbpages switches
These new controls toggle experimental support for a new CPU feature,
the straightforward extension of largepages from the pmd level to the
pud level, which allows 1GB (kernel) TLBs instead of 2MB TLBs.

Turn it off by default, as this code has not been tested well enough yet.

Use the CONFIG_DIRECT_GBPAGES=y .config option or gbpages on the
boot line can be used to enable it. If enabled in the .config then
nogbpages boot option disables it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
H. Peter Anvin
fe770bf031 x86: clean up the page table dumper and add 32-bit support
Clean up the page table dumper (fix boundary conditions, table driven
address ranges, some formatting changes since it is no longer using
the kernel log but a separate virtual file), and generalize to 32
bits.

[ mingo@elte.hu: x86: fix the pagetable dumper ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
Arjan van de Ven
926e5392ba x86: add code to dump the (kernel) page tables for visual inspection by kernel developers
This patch adds code to the kernel to have an (optional)
/proc/kernel_page_tables debug file that basically dumps the kernel
pagetables; this allows us kernel developers to verify that nothing fishy is
going on and that the various mappings are set up correctly. This was quite
useful in finding various change_page_attr() bugs, and is very likely to be
useful in the future as well.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: mingo@elte.hu
Cc: tglx@tglx.de
Cc: hpa@zytor.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
H. Peter Anvin
2596e0fae0 x86: unify arch/x86/mm/Makefile
Unify arch/x86/mm/Makefile between 32 and 64 bits.

All configuration variables that are protected by Kconfig constraints
have been put in the common part of the Makefile; however, the NUMA
files are totally different between 32 and 64 bits and are handled via
an ifdef.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
Thomas Gleixner
ee7ae7a198 x86: add debug info to DEBUG_PAGEALLOC
Add debug information for DEBUG_PAGEALLOC to get some statistics about
the pool usage and split status.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17 17:40:45 +02:00
Ingo Molnar
b4e0409a36 x86: check vmlinux limits, 64-bit
these build-time and link-time checks would have prevented the
vmlinux size regression.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:45 +02:00
Andrew Morton
9c312058b2 Avoid false positive warnings in kmap_atomic_prot() with DEBUG_HIGHMEM
I believe http://bugzilla.kernel.org/show_bug.cgi?id=10318 is a false
positive.  There's no way in which networking will be using highmem pages
here, so it won't be taking the KM_USER0 kmap slot, so there's no point in
performing these checks.

Cc: Pawel Staszewski <pstaszewski@artcom.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
 [ Really sad.  We lose almost all real-life coverage of the debug tests
   with this patch. Now it will only report problems for the cases where
   people actually end up using a HIGHMEM page, not when they just _might_
   use one.    - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28 13:08:14 -07:00
Ingo Molnar
3085354de6 x86: prefetch fix #2
Linus noticed a second bug and an uncleanliness:

 - we'd return on any instruction fetch fault

 - we'd use both the value of 16 and the PF_INSTR symbol which are
   the same and make no sense

the cleanup nicely unifies this piece of logic.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27 22:00:16 +01:00
Christoph Lameter
25e59881f1 x86: stricter check in follow_huge_addr()
The first page of the compound page is determined in follow_huge_addr()
but then PageCompound() only checks if the page is part of a compound page.
PageHead() allows checking if this is indeed the first page of the
compound.

Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27 16:08:45 +01:00
Ingo Molnar
bc713dcf35 x86: fix prefetch workaround
some early Athlon XP's and Opterons generate bogus faults on prefetch
instructions. The workaround for this regressed over .24 - reinstate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27 16:08:44 +01:00
Suresh Siddha
d546b67a94 x86: fix performance drop for glx
fix the 3D performance drop reported at:

   http://bugzilla.kernel.org/show_bug.cgi?id=10328

fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with
WC attribute. Recent changes in page attribute code made both
ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This
breaks the graphics performance, as the effective memory type is UC instead
of expected WC.

The correct way to fix this is to add ioremap_wc() (which uses UC- in the
absence of PAT kernel support and WC with PAT) and change all the
fb drivers to use this new ioremap_wc() API.

We can take this correct and longer route for post 2.6.25. For now,
revert back to the UC- behavior for ioremap/ioremap_nocache.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-26 22:23:41 +01:00
Yinghai Lu
76c324182b x86: fix trim mtrr not to setup_memory two times
we could call find_max_pfn() directly instead of setup_memory() to get
max_pfn needed for mtrr trimming.

otherwise setup_memory() is called two times... that is duplicated...

[ mingo@elte.hu: both Thomas and me simulated a double call to
  setup_bootmem_allocator() and can confirm that it is a real bug
  which can hang in certain configs. It's not been reported yet but
  that is probably due to the relatively scarce nature of
  MTRR-trimming systems. ]

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-26 22:23:41 +01:00
Linus Torvalds
b9e76a0074 x86-32: Pass the full resource data to ioremap()
It appears that 64-bit PCI resources cannot possibly ever have worked on
x86-32 even when the RESOURCES_64BIT config option was set, because any
driver that tried to [pci_]ioremap() the resource would have been unable
to do so because the high 32 bits would have been silently dropped on
the floor by the ioremap() routines that only used "unsigned long".

Change them to use "resource_size_t" instead, which properly encodes the
whole 64-bit resource data if RESOURCES_64BIT is enabled.

Acked-by: H. Peter Anvin <hpa@kernel.org>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-24 11:22:39 -07:00
Yinghai Lu
37bff62e98 x86_64: free_bootmem should take phys
so use nodedata_phys directly.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21 17:06:15 +01:00
Thomas Gleixner
985a34bd75 x86: remove quicklists
quicklists cause a serious memory leak on 32-bit x86,
as documented at:

  http://bugzilla.kernel.org/show_bug.cgi?id=9991

the reason is that the quicklist pool is a special-purpose
cache that grows out of proportion. It is not accounted for
anywhere and users have no way to even realize that it's
the quicklists that are causing RAM usage spikes. It was
supposed to be a relatively small pool, but as demonstrated
by KOSAKI Motohiro, they can grow as large as:

  Quicklists:    1194304 kB

given how much trouble this code has caused historically,
and given that Andrew objected to its introduction on x86
(years ago), the best option at this point is to remove them.

[ any performance benefits of caching constructed pgds should
  be implemented in a more generic way (possibly within the page
  allocator), while still allowing constructed pages to be
  allocated by other workloads. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-11 17:11:55 +01:00
Ingo Molnar
9a46d7e5b6 x86: ioremap, remove WARN_ON()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-11 17:11:54 +01:00
Yinghai Lu
7c9e92b6cd x86: not set node to cpu_to_node if the node is not online
resolve boot problem reported by Mel Gorman:

   http://lkml.org/lkml/2008/2/13/404

init_cpu_to_node will use cpu->apic (from MADT or mptable) and
apic->node(from SRAT or AMD config space with k8_bus_64.c) to have
cpu->node mapping, and later identify_cpu will overwrite them
again...(with nearby_node...)

this patch checks if the node is online, otherwise it will not
update cpu_node map. so keep cpu_node map to online node before
identify_cpu..., to prevent possible error.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-04 17:10:12 +01:00
Rafael J. Wysocki
9b5cf48b06 x86: revert "x86: CPA: avoid split of alias mappings"
Revert:

  commit 8be8f54bae
  Author: Thomas Gleixner <tglx@linutronix.de>
  Date:   Sat Feb 23 20:43:21 2008 +0100

      x86: CPA: avoid split of alias mappings

because it clearly mishandles the case when __change_page_attr(), called
from __change_page_attr_set_clr(), changes cpa->processed to 1 and
cpa_process_alias(cpa) is executed right after that.

This crashes my x86-64 test box early in the boot process
(ref. http://bugzilla.kernel.org/show_bug.cgi?id=10140#c4).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-03 14:18:27 +01:00
Thomas Gleixner
8be8f54bae x86: CPA: avoid split of alias mappings
avoid over-eager large page splitup.

When the target area needs to be split or is split already (ioremap)
then the current code enforces the split of large mappings in the alias
regions even if we could avoid it.

Use a separate variable processed in the cpa_data structure to carry
the number of pages which have been processed instead of reusing the
numpages variable. This keeps numpages intact and gives the alias code
a chance to keep large mappings intact.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:42 +01:00
Ingo Molnar
b16bf712f4 x86: fix leak un ioremap_page_range() failure
Jan Beulich noticed it during code review that if a driver's ioremap()
fails (say due to -ENOMEM) then we might leak the struct vm_area.

Free it properly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:42 +01:00
Ingo Molnar
88f3aec7af x86: fix spontaneous reboot with allyesconfig bzImage
recently the 64-bit allyesconfig bzImage kernel started spontaneously
rebooting during early bootup.

after a few fun hours spent with early init debugging, it turns out
that we've got this rather annoying limit on the size of the kernel
image:

      #define KERNEL_TEXT_SIZE  (40*1024*1024)

which limit my vmlinux just happened to pass:

       text           data       bss        dec       hex   filename
   29703744        4222751   8646224   42572719   2899baf   vmlinux

40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/

So it happily crashed right in head_64.S, which - as we all know - is
the most debuggable code in the whole architecture ;-)

So increase the limit to allow an up to 128MB kernel image to be mapped.
(should anyone be that crazy or lazy)

We have a full 4K of pagetable (level2_kernel_pgt) allocated for these
mappings already, so there's no RAM overhead and the limit was rather
pointless and arbitrary.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-26 12:55:56 +01:00
Yinghai Lu
3b57bc461f x86: remove double-checking empty zero pages debug
so far no one complained about that.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-26 12:55:55 +01:00
Ingo Molnar
92cb54a37a x86: make DEBUG_PAGEALLOC and CPA more robust
Use PF_MEMALLOC to prevent recursive calls in the DBEUG_PAGEALLOC
case. This makes the code simpler and more robust against allocation
failures.

This fixes the following fallback to non-mmconfig:

   http://lkml.org/lkml/2008/2/20/551
   http://bugzilla.kernel.org/show_bug.cgi?id=10083

Also, for DEBUG_PAGEALLOC=n reduce the pool size to one page.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-26 12:55:50 +01:00
Rafael J. Wysocki
8a235efad5 Hibernation: Handle DEBUG_PAGEALLOC on x86
Make hibernation work with CONFIG_DEBUG_PAGEALLOC set on x86, by
checking if the pages to be copied are marked as present in the
kernel mapping and temporarily marking them as present if that's not
the case.  No functional modifications are introduced if
CONFIG_DEBUG_PAGEALLOC is unset.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-21 02:15:28 -05:00
Linus Torvalds
5d9c4a7de6 Merge branch 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6
* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
  agp: fix missing casts that produced a warning.
  agp: add support for 662/671 to agp driver
  fix historic ioremap() abuse in AGP
  agp/sis: Suspend support for SiS AGP
  agp/sis: Clear bit 2 from aperture size byte as well
2008-02-19 18:29:57 -08:00
Arjan van de Ven
156fbc3fbe x86: fix page_is_ram() thinko
page_is_ram() has a special case for the 640k-1M bios area, however
due to a thinko the special case checks the e820 table entry and
not the memory the user has asked for. This patch fixes the bug.

[ mingo@elte.hu: this too is better solved in the e820 space, but those
  fixes are too intrusive for v2.6.25. ]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:34 +01:00
Arjan van de Ven
d8a9e6a51e x86: fix WARN_ON() message: teach page_is_ram() about the special 4Kb bios data page
This patch teaches page_is_ram() about the fact that the first
4Kb of memory are special on x86, even though the E820 table
normally doesn't exclude it.

This fixes the WARN_ON() reported by Laurent Riffard who was also
very helpful in diagnosing the issue.

[ mingo@elte.hu: we are working on doing this properly in the e820
  space, but for 2.6.25 this is the better fix. ]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:34 +01:00
Sam Ravnborg
d01b9ad56e x86: fix section mismatch in srat_64.c:reserve_hotadd
reserve_hotadd() are only used by __init acpi_numa_memory_affinity_init().
Annotate reserve_hotadd() with __init is the trivial fix.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:31 +01:00
Andi Kleen
8e31c2ac11 x86: CPA: remove BUG_ON for LRU/Compound pages
New implementation does not use lru for anything so there is no need
to reject pages that are in the LRU. Similar for compound pages (which
were checked because they also use page->lru)

[ tglx@linutronix.de: removed unused variable ]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:29 +01:00
Arjan van dev Ven
fcea424d31 fix historic ioremap() abuse in AGP
Several AGP drivers right now use ioremap_nocache() on kernel ram in order
to turn a page of regular memory uncached.

There are two problems with this:

    1) This is a total nightmare for the ioremap() implementation to keep
       various mappings of the same page coherent.

    2) It's a total nightmare for the AGP code since it adds a ton of
       complexity in terms of keeping track of 2 different pointers to
       the same thing, in terms of error handling etc etc.

This patch fixes this by making the AGP drivers use the new
set_memory_XX APIs instead.

Note: amd-k7-agp.c is built on Alpha too, and generic.c is built
on ia64 as well, which do not yet have the set_memory_*() APIs,
so for them some we have a few ugly #ifdefs - hopefully they'll
be fixed soon.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2008-02-19 14:46:39 +10:00
Yinghai Lu
b7ad149d62 x86: reenable support for system without on node0
One system doesn't have RAM for node0 installed.

SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 1 -> APIC 2 -> Node 1
SRAT: PXM 1 -> APIC 3 -> Node 1
SRAT: Node 1 PXM 1 0-a0000
SRAT: Node 1 PXM 1 0-dd000000
SRAT: Node 1 PXM 1 0-123000000
ACPI: SLIT: nodes = 2
 10 13
 13 10
mapped APIC to ffffffffff5fb000 (        fee00000)
Bootmem setup node 1 0000000000000000-0000000123000000
  NODE_DATA [000000000000e000 - 0000000000014fff]
  bootmap [0000000000015000 -  00000000000395ff] pages 25
Could not find start_pfn for node 0
Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #14

Call Trace:
 [<ffffffff80bab498>] free_area_init_node+0x22/0x381
 [<ffffffff8045ffc5>] generic_swap+0x0/0x17
 [<ffffffff80bab0cc>] find_zone_movable_pfns_for_nodes+0x54/0x271
 [<ffffffff80baba5f>] free_area_init_nodes+0x239/0x287
 [<ffffffff80ba6311>] paging_init+0x46/0x4c
 [<ffffffff80b9dda5>] setup_arch+0x3c3/0x44e
 [<ffffffff80b978be>] start_kernel+0x6f/0x2c7
 [<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3

This happens because node 0 is not online, but the node state in
mm/page_alloc.c has node 0 set.

        nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
                [N_POSSIBLE] = NODE_MASK_ALL,
                [N_ONLINE] = { { [0] = 1UL } },

So we need to clear node_online_map before initializing the memory.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
f34b439f34 x86: CPA: avoid double checking of alias ranges
When the CPA code is called with an virtual address in the range of
the direct mapping or the high alias then we do not need to run
through the alias check for this range.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
af96e4438a x86: CPA no alias checking for _NX
NX settings are not required to be consistent across alias mappings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
31eedd823c x86: zap invalid and unused pmds in early boot
The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting
from __START_KERNEL_map. The kernel itself only needs _text to _end
mapped in the high alias. On relocatible kernels the ASM setup code
adjusts the compile time created high mappings to the relocation. This
creates invalid pmd entries for negative offsets:

0xffffffff80000000 -> pmd entry: ffffffffff2001e3
It points outside of the physical address space and is marked present.

This starts at the virtual address __START_KERNEL_map and goes up to
the point where the first valid physical address (0x0) is mapped.

Zap the mappings before _text and after _end right away in early
boot. This removes also the invalid entries.

Furthermore it simplifies the range check for high aliases.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00