Commit Graph

1941 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
b5aef682e0 Merge 3.10-rc7 into driver-core-next
We want the firmware merge fixes, and other bits, in here now.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-24 15:14:43 -07:00
Aneesh Kumar K.V
1a5272866f powerpc: Optimize hugepage invalidate
Hugepage invalidate involves invalidating multiple hpte entries.
Optimize the operation using H_BULK_REMOVE on lpar platforms.
On native, reduce the number of tlb flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:58 +10:00
Aneesh Kumar K.V
437d496457 powerpc/THP: Enable THP on PPC64
We enable only if the we support 16MB page size.

Reviewed-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:58 +10:00
Aneesh Kumar K.V
d8e355a20f powerpc: split hugepage when using subpage protection
We find all the overlapping vma and mark them such that we don't allocate
hugepage in that range. Also we split existing huge page so that the
normal page hash can be invalidated and new page faulted in with new
protection bits.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:57 +10:00
Aneesh Kumar K.V
a00e7bea0d powerpc: disable assert_pte_locked for collapse_huge_page
With THP we set pmd to none, before we do pte_clear. Hence we can't
walk page table to get the pte lock ptr and verify whether it is locked.
THP do take pte lock before calling pte_clear. So we don't change the locking
rules here. It is that we can't use page table walking to check whether
pte locks are held with THP.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:57 +10:00
Aneesh Kumar K.V
7888b4ddb4 powerpc: Prevent gcc to re-read the pagetables
GCC is very likely to read the pagetables just once and cache them in
the local stack or in a register, but it is can also decide to re-read
the pagetables. The problem is that the pagetable in those places can
change from under gcc.

With THP/hugetlbfs the pmd (and pgd for hugetlbfs giga pages) can
change under gup_fast. The pages won't be freed untill we finish
gup fast because we have irq disabled and we free these pages via
rcu callback.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:56 +10:00
Aneesh Kumar K.V
0ac52dd766 powerpc: Make linux pagetable walk safe with THP enabled
We need to have irqs disabled to handle all the possible parallel update for
linux page table without holding locks.

Events that we are intersted in while walking page tables are
1) Page fault
2) umap
3) THP split
4) THP collapse

A) local_irq_disabled:
------------------------
1) page fault:
A none to valid transition via page fault is not an issue because we
would either see a none or valid. If it is none, we would error out
the page table walk. We may need to use on stack values when checking for
type of page table elements, because if we do

if (!is_hugepd()) {
    if (!pmd_none() {
       if (pmd_bad() {

We could take that bad condition because the pmd got converted to a hugepd
after the !is_hugepd check via a hugetlb fault.

The right way would be to check for pmd_none higher up or use on stack value.

2) A valid to none conversion via unmap:
We can safely walk the upper level table, because we don't remove the the
page table entries until rcu grace period. So even if we followed a
wrong pointer we still have the pointer valid till the grace period.

A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and
 _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent
pte_clear we take _PAGE_BUSY.

3) THP split:
A valid transparent hugepage is converted to nomal page. Before we split we
do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING
So when walking page table we need to check for pmd_trans_splitting and
handle that. The pte returned should also need to be checked for
_PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save
the value of PTE on stack and check for the flag in the local pte value.
If we don't have the value set we can safely operate on the local pte value
and we atomicaly set _PAGE_BUSY.

4) THP collapse:
A normal page gets converted to hugepage. In the collapse path, we
mark the pmd none early (pmdp_clear_flush). With irq disabled, if we
are aleady walking page table we would see the pmd_none and won't continue.
If we see a valid PMD, we should still check for _PAGE_PRESENT before
setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:56 +10:00
Aneesh Kumar K.V
6d492ecc64 powerpc/THP: Add code to handle HPTE faults for hugepages
The deposted PTE page in the second half of the PMD table is used to
track the state on hash PTEs. After updating the HPTE, we mark the
coresponding slot in the deposted PTE page valid.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:56 +10:00
Aneesh Kumar K.V
c367714ce8 powerpc: Update gup_pmd_range to handle transparent hugepages
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:55 +10:00
Aneesh Kumar K.V
12bc9f6fc1 powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly
document why we don't need to handle transparent hugepages at callsites.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:54 +10:00
Aneesh Kumar K.V
ac52ae4721 powerpc: Update find_linux_pte_or_hugepte to handle transparent hugepages
Reviewed-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:54 +10:00
Aneesh Kumar K.V
29409997f8 powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common code
We will use this in the later patch for handling THP pages

Reviewed-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:54 +10:00
Aneesh Kumar K.V
074c2eae3e powerpc/THP: Implement transparent hugepages for ppc64
We now have pmd entries covering 16MB range and the PMD table double its original size.
We use the second half of the PMD table to deposit the pgtable (PTE page).
The depoisted PTE page is further used to track the HPTE information. The information
include [ secondary group | 3 bit hidx | valid ]. We use one byte per each HPTE entry.
With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need
4096 entries. Both will fit in a 4K PTE page. On hugepage invalidate we need to walk
the PTE page and invalidate all valid HPTEs.

This patch implements necessary arch specific functions for THP support and also
hugepage invalidate logic. These PMD related functions are intentionally kept
similar to their PTE counter-part.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:53 +10:00
Aneesh Kumar K.V
f940f52898 powerpc/THP: Double the PMD table size for THP
THP code does PTE page allocation along with large page request and deposit them
for later use. This is to ensure that we won't have any failures when we split
hugepages to regular pages.

On powerpc we want to use the deposited PTE page for storing hash pte slot and
secondary bit information for the HPTEs. We use the second half
of the pmd table to save the deposted PTE page.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:53 +10:00
Aneesh Kumar K.V
db3d853490 powerpc/mm: handle hugepage size correctly when invalidating hpte entries
If a hash bucket gets full, we "evict" a more/less random entry from it.
When we do that we don't invalidate the TLB (hpte_remove) because we assume
the old translation is still technically "valid". This implies that when
we are invalidating or updating pte, even if HPTE entry is not valid
we should do a tlb invalidate. With hugepages, we need to pass the correct
actual page size value for tlb invalidation.

This change update the patch 0608d69246
"powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle
transparent hugepages correctly.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:52 +10:00
Daniel Walker
d5d8ec895c powerpc/mm: Make mmap_64.c compile on 32bit powerpc
There appears to be no good reason to keep this as 64bit only. It works
on 32bit also, and has checks so that it can work correctly with 32bit
binaries on 64bit hardware which is why I think this works.

I tested this on qemu using the virtex-ml507 machine type.

Before,

/bin2 # ./test & cat /proc/${!}/maps
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
48000000-48020000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
48021000-48023000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bfd03000-bfd24000 rw-p 00000000 00:00 0          [stack]
/bin2 # ./test & cat /proc/${!}/maps
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
0fe6e000-0ffd8000 r-xp 00000000 00:01 214        /lib/libc-2.11.3.so
0ffd8000-0ffe8000 ---p 0016a000 00:01 214        /lib/libc-2.11.3.so
0ffe8000-0ffed000 rw-p 0016a000 00:01 214        /lib/libc-2.11.3.so
0ffed000-0fff0000 rw-p 00000000 00:00 0
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
48000000-48020000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
48020000-48021000 rw-p 00000000 00:00 0
48021000-48023000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bf98a000-bf9ab000 rw-p 00000000 00:00 0          [stack]
/bin2 # ./test & cat /proc/${!}/maps
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
0fe6e000-0ffd8000 r-xp 00000000 00:01 214        /lib/libc-2.11.3.so
0ffd8000-0ffe8000 ---p 0016a000 00:01 214        /lib/libc-2.11.3.so
0ffe8000-0ffed000 rw-p 0016a000 00:01 214        /lib/libc-2.11.3.so
0ffed000-0fff0000 rw-p 00000000 00:00 0
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
48000000-48020000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
48020000-48021000 rw-p 00000000 00:00 0
48021000-48023000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bfa54000-bfa75000 rw-p 00000000 00:00 0          [stack]

After,

bash-4.1# ./test & cat /proc/${!}/maps
[7] 803
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
b7eb0000-b7ed0000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
b7ed1000-b7ed3000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bfbc0000-bfbe1000 rw-p 00000000 00:00 0          [stack]
bash-4.1# ./test & cat /proc/${!}/maps
[8] 805
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
b7b03000-b7b23000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
b7b24000-b7b26000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bfc27000-bfc48000 rw-p 00000000 00:00 0          [stack]
bash-4.1# ./test & cat /proc/${!}/maps
[9] 807
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
b7f37000-b7f57000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
b7f58000-b7f5a000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bff96000-bffb7000 rw-p 00000000 00:00 0          [stack]

Signed-off-by: Daniel Walker <dwalker@fifo90.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:11 +10:00
Scott Wood
39a421ff0b powerpc/mm/nohash: Ignore NULL stale_map entries
This happens with threads that are offline due to CPU hotplug
(including threads that were never "plugged in" to begin with because
SMT is disabled).

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:10 +10:00
Aneesh Kumar K.V
8bbd9f04b7 powerpc: Fix bad pmd error with book3E config
Book3E uses the hugepd at PMD level and don't encode pte directly
at the pmd level. So it will find the lower bits of pmd set
and the pmd_bad check throws error. Infact the current code
will never take the free_hugepd_range call at all because it will
clear the pmd if it find a hugepd pointer. Fix this by clearing
bad pmd only if it is not a hugepd pointer.

This is regression introduced by e2b3d202d1
"powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format"

Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 15:25:21 +10:00
Greg Kroah-Hartman
bb07b00be7 Merge 3.10-rc6 into driver-core-next
We want these fixes here too.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-17 16:57:20 -07:00
Stephen Rothwell
40b313608a Finally eradicate CONFIG_HOTPLUG
Ever since commit 45f035ab9b ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off.  Remove all the remaining references to it.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-03 14:20:18 -07:00
Aneesh Kumar K.V
0608d69246 powerpc/mm: Always invalidate tlb on hpte invalidate and update
If a hash bucket gets full, we "evict" a more/less random entry from it.
When we do that we don't invalidate the TLB (hpte_remove) because we assume
the old translation is still technically "valid". This implies that when
we are invalidating or updating pte, even if HPTE entry is not valid
we should do a tlb invalidate.

This was a regression introduced by b1022fbd29

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:26 +10:00
Li Zhong
ba12eedee3 powerpc: Exception hooks for context tracking subsystem
This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.

This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
  commit 6ba3c97a38

But after the exception handling moved to generic code, and some changes in
following two commits:
56dd9470d7
  context_tracking: Move exception handling to generic code
6c1e0256fa
  context_tracking: Restore correct previous context state on exception exit

it is able for exception hooks to use the generic code above instead of a
redundant arch implementation.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:19 +10:00
Aneesh Kumar K.V
83d5e64b7e powerpc: Fix build errors STRICT_MM_TYPECHECKS
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:20 +10:00
Michael Neuling
c2fd22df89 powerpc/tm: Fix null pointer deference in flush_hash_page
Make sure that current->thread.reg exists before we deference it in
flush_hash_page.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: John J Miller <millerjo@us.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:36 +10:00
Linus Torvalds
5a148af669 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlights this time around are:

   - A pile of addition POWER8 bits and nits, such as updated
     performance counter support (Michael Ellerman), new branch history
     buffer support (Anshuman Khandual), base support for the new PCI
     host bridge when not using the hypervisor (Gavin Shan) and other
     random related bits and fixes from various contributors.

   - Some rework of our page table format by Aneesh Kumar which fixes a
     thing or two and paves the way for THP support.  THP itself will
     not make it this time around however.

   - More Freescale updates, including Altivec support on the new e6500
     cores, new PCI controller support, and a pile of new boards support
     and updates.

   - The usual batch of trivial cleanups & fixes"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  powerpc: Fix build error for book3e
  powerpc: Context switch the new EBB SPRs
  powerpc: Turn on the EBB H/FSCR bits
  powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
  powerpc: Setup BHRB instructions facility in HFSCR for POWER8
  powerpc: Fix interrupt range check on debug exception
  powerpc: Update tlbie/tlbiel as per ISA doc
  powerpc: Print page size info during boot
  powerpc: print both base and actual page size on hash failure
  powerpc: Fix hpte_decode to use the correct decoding for page sizes
  powerpc: Decode the pte-lp-encoding bits correctly.
  powerpc: Use encode avpn where we need only avpn values
  powerpc: Reduce PTE table memory wastage
  powerpc: Move the pte free routines from common header
  powerpc: Reduce the PTE_INDEX_SIZE
  powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
  powerpc: New hugepage directory format
  powerpc: Don't truncate pgd_index wrongly
  powerpc: Don't hard code the size of pte page
  powerpc: Save DAR and DSISR in pt_regs on MCE
  ...
2013-05-02 10:16:16 -07:00
Aneesh Kumar K.V
1f6aaaccb1 powerpc: Update tlbie/tlbiel as per ISA doc
Encode the actual page correctly in tlbie/tlbiel. This make sure we handle
multiple page size segment correctly.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:29 +10:00
Aneesh Kumar K.V
3dc4feca4b powerpc: Print page size info during boot
This gives hint about different base and actual page size combination
supported by the platform.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:25 +10:00
Aneesh Kumar K.V
d8139ebf85 powerpc: print both base and actual page size on hash failure
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:22 +10:00
Aneesh Kumar K.V
7e74c3921a powerpc: Fix hpte_decode to use the correct decoding for page sizes
As per ISA doc, we encode base and actual page size in the LP bits of
PTE. The number of bit used to encode the page sizes depend on actual
page size.  ISA doc lists this as

   PTE LP     actual page size
rrrr rrrz 	>=8KB
rrrr rrzz	>=16KB
rrrr rzzz 	>=32KB
rrrr zzzz 	>=64KB
rrrz zzzz 	>=128KB
rrzz zzzz 	>=256KB
rzzz zzzz	>=512KB
zzzz zzzz 	>=1MB

ISA doc also says
"The values of the “z” bits used to specify each size, along with all possible
values of “r” bits in the LP field, must result in LP values distinct from
other LP values for other sizes."

based on the above update hpte_decode to use the correct decoding for LP bits.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:18 +10:00
Aneesh Kumar K.V
b1022fbd29 powerpc: Decode the pte-lp-encoding bits correctly.
We look at both the segment base page size and actual page size and store
the pte-lp-encodings in an array per base page size.

We also update all relevant functions to take actual page size argument
so that we can use the correct PTE LP encoding in HPTE. This should also
get the basic Multiple Page Size per Segment (MPSS) support. This is needed
to enable THP on ppc64.

[Fixed PR KVM build --BenH]

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:14 +10:00
Aneesh Kumar K.V
74f227b228 powerpc: Use encode avpn where we need only avpn values
In all these cases we are doing something similar to

HPTE_V_COMPARE(hpte_v, want_v) which ignores the HPTE_V_LARGE bit

With MPSS support we would need actual page size to set HPTE_V_LARGE
bit and that won't be available in most of these cases. Since we are ignoring
HPTE_V_LARGE bit, use the  avpn value instead. There should not be any change
in behaviour after this patch.

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:11 +10:00
Aneesh Kumar K.V
5c1f6ee9a3 powerpc: Reduce PTE table memory wastage
We allocate one page for the last level of linux page table. With THP and
large page size of 16MB, that would mean we are wasting large part
of that page. To map 16MB area, we only need a PTE space of 2K with 64K
page size. This patch reduce the space wastage by sharing the page
allocated for the last level of linux page table with multiple pmd
entries. We call these smaller chunks PTE page fragments and allocated
page, PTE page.

In order to support systems which doesn't have 64K HPTE support, we also
add another 2K to PTE page fragment. The second half of the PTE fragments
is used for storing slot and secondary bit information of an HPTE. With this
we now have a 4K PTE fragment.

We use a simple approach to share the PTE page. On allocation, we bump the
PTE page refcount to 16 and share the PTE page with the next 16 pte alloc
request. This should help in the node locality of the PTE page fragment,
assuming that the immediate pte alloc request will mostly come from the
same NUMA node. We don't try to reuse the freed PTE page fragment. Hence
we could be waisting some space.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:07 +10:00
Aneesh Kumar K.V
e2b3d202d1 powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
We will be switching PMD_SHIFT to 24 bits to facilitate THP impmenetation.
With PMD_SHIFT set to 24, we now have 16MB huge pages allocated at PGD level.
That means with 32 bit process we cannot allocate normal pages at
all, because we cover the entire address space with one pgd entry. Fix this
by switching to a new page table format for hugepages. With the new page table
format for 16GB and 16MB hugepages we won't allocate hugepage directory. Instead
we encode the PTE information directly at the directory level. This forces 16MB
hugepage at PMD level. This will also make the page take walk much simpler later
when we add the THP support.

With the new table format we have 4 cases for pgds and pmds:
(1) invalid (all zeroes)
(2) pointer to next table, as normal; bottom 6 bits == 0
(3) leaf pte for huge page, bottom two bits != 00
(4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:56 +10:00
Aneesh Kumar K.V
cf9427b85e powerpc: New hugepage directory format
Change the hugepage directory format so that we can have leaf ptes directly
at page directory avoiding the allocation of hugepage directory.

With the new table format we have 3 cases for pgds and pmds:
(1) invalid (all zeroes)
(2) pointer to next table, as normal; bottom 6 bits == 0
(4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table

Instead of storing shift value in hugepd pointer we use mmu_psize_def index
so that we can fit all the supported hugepage size in 4 bits

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:53 +10:00
Aneesh Kumar K.V
cc3665a60a powerpc: Don't hard code the size of pte page
USE PTRS_PER_PTE to indicate the size of pte page. To support THP,
later patches will be changing PTRS_PER_PTE value.

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:46 +10:00
Nathan Fontenot
601abdc3b4 powerpc/pseries: Correct builds break when CONFIG_SMP not defined
Correct build failure for powerpc/pseries builds with CONFIG_SMP not defined.

The function cpu_sibling_mask has no meaning (or definition) when CONFIG_SMP
is not defined. Additionally, the updating of NUMA affinity for a CPU in a UP
system doesn't really make sense.

This patch ifdef's out the code making the affinity updates for PRRN events to
fix the following build break.

arch/powerpc/mm/numa.c: In function ‘stage_topology_update’:
arch/powerpc/mm/numa.c:1535: error: implicit declaration of function ‘cpu_sibling_mask’
arch/powerpc/mm/numa.c:1535: warning: passing argument 3 of ‘cpumask_or’ makes pointer from integer without a cast
make[1]: *** [arch/powerpc/mm/numa.o] Error 1

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:35 +10:00
Stephen Rothwell
9f3a90e89d powerpc: Fix build failure after merge of the cgroup tree
After merging the cgroup tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology':
arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror]
arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]

Caused by commit 30c05350c3 ("powerpc/pseries: Use stop machine to
update cpu maps") from the powerpc tree interacting with (probably)
commit ff794dea52 ("cpuset: remove include of cgroup.h from cpuset.h")
from the cgroup tree.  Removing includes from header files is fraught
with danger ...

The former should have added an include of linux/slab.h to
arch/powerpc/mm/numa.c.

I have added the following merge fix patch for today (but it should be
applied to the powerpc tree ASAP).

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 29 Apr 2013 14:01:44 +1000
Subject: [PATCH] powerpc: numa.c: using kzalloc/kfree requires including
 slab.h

fixes these build errors:

arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology':
arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror]
arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:24 +10:00
Linus Torvalds
191a712090 Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - Fixes and a lot of cleanups.  Locking cleanup is finally complete.
   cgroup_mutex is no longer exposed to individual controlelrs which
   used to cause nasty deadlock issues.  Li fixed and cleaned up quite a
   bit including long standing ones like racy cgroup_path().

 - device cgroup now supports proper hierarchy thanks to Aristeu.

 - perf_event cgroup now supports proper hierarchy.

 - A new mount option "__DEVEL__sane_behavior" is added.  As indicated
   by the name, this option is to be used for development only at this
   point and generates a warning message when used.  Unfortunately,
   cgroup interface currently has too many brekages and inconsistencies
   to implement a consistent and unified hierarchy on top.  The new flag
   is used to collect the behavior changes which are necessary to
   implement consistent unified hierarchy.  It's likely that this flag
   won't be used verbatim when it becomes ready but will be enabled
   implicitly along with unified hierarchy.

   The option currently disables some of broken behaviors in cgroup core
   and also .use_hierarchy switch in memcg (will be routed through -mm),
   which can be used to make very unusual hierarchy where nesting is
   partially honored.  It will also be used to implement hierarchy
   support for blk-throttle which would be impossible otherwise without
   introducing a full separate set of control knobs.

   This is essentially versioning of interface which isn't very nice but
   at this point I can't see any other options which would allow keeping
   the interface the same while moving towards hierarchy behavior which
   is at least somewhat sane.  The planned unified hierarchy is likely
   to require some level of adaptation from userland anyway, so I think
   it'd be best to take the chance and update the interface such that
   it's supportable in the long term.

   Maintaining the existing interface does complicate cgroup core but
   shouldn't put too much strain on individual controllers and I think
   it'd be manageable for the foreseeable future.  Maybe we'll be able
   to drop it in a decade.

Fix up conflicts (including a semantic one adding a new #include to ppc
that was uncovered by header the file changes) as per Tejun.

* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (45 commits)
  cpuset: fix compile warning when CONFIG_SMP=n
  cpuset: fix cpu hotplug vs rebuild_sched_domains() race
  cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn()
  cgroup: restore the call to eventfd->poll()
  cgroup: fix use-after-free when umounting cgroupfs
  cgroup: fix broken file xattrs
  devcg: remove parent_cgroup.
  memcg: force use_hierarchy if sane_behavior
  cgroup: remove cgrp->top_cgroup
  cgroup: introduce sane_behavior mount option
  move cgroupfs_root to include/linux/cgroup.h
  cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix
  cgroup: make cgroup_path() not print double slashes
  Revert "cgroup: remove bind() method from cgroup_subsys."
  perf: make perf_event cgroup hierarchical
  cgroup: implement cgroup_is_descendant()
  cgroup: make sure parent won't be destroyed before its children
  cgroup: remove bind() method from cgroup_subsys.
  devcg: remove broken_hierarchy tag
  cgroup: remove cgroup_lock_is_held()
  ...
2013-04-29 19:14:20 -07:00
Benjamin Herrenschmidt
bc23100a0d Merge remote-tracking branch 'kumar/next' into next
From Kumar Gala:
<<
Add support for T4 and B4 SoC families from Freescale, e6500 altivec
support, some various board fixes and other minor cleanups.
>>
2013-04-30 11:10:09 +10:00
Michel Lespinasse
fba2369e6c mm: use vm_unmapped_area() on powerpc architecture
Update the powerpc slice_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Tested-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 11:05:17 +10:00
Michel Lespinasse
34d07177b8 mm: remove free_area_cache use in powerpc architecture
As all other architectures have been converted to use vm_unmapped_area(),
we are about to retire the free_area_cache.

This change simply removes the use of that cache in
slice_get_unmapped_area(), which will most certainly have a
performance cost. Next one will convert that function to use the
vm_unmapped_area() infrastructure and regain the performance.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 11:05:10 +10:00
Cody P Schafer
f9d531b8dc powerpc/mm/numa: use setup_nr_node_ids() instead of opencoding.
[sfr@canb.auug.org.au: add missing semicolon]
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:36 -07:00
Johannes Weiner
0aad818b2d sparse-vmemmap: specify vmemmap population range in bytes
The sparse code, when asking the architecture to populate the vmemmap,
specifies the section range as a starting page and a number of pages.

This is an awkward interface, because none of the arch-specific code
actually thinks of the range in terms of 'struct page' units and always
translates it to bytes first.

In addition, later patches mix huge page and regular page backing for
the vmemmap.  For this, they need to call vmemmap_populate_basepages()
on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind.  But these
are not necessarily multiples of the 'struct page' size and so this unit
is too coarse.

Just translate the section range into bytes once in the generic sparse
code, then pass byte ranges down the stack.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: David S. Miller <davem@davemloft.net>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:35 -07:00
Jiang Liu
7db6f78cfe mm/PPC: use free_highmem_page() to free highmem pages into buddy system
Use helper function free_highmem_page() to free highmem pages into
the buddy system.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:32 -07:00
Jiang Liu
5d585e5c48 mm/ppc: use common help functions to free reserved pages
Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:30 -07:00
Nathan Fontenot
e04fa61214 powerpc/pseries: Add /proc interface to control topology updates
There are instances in which we do not want topology updates to occur.
In order to allow this a /proc interface (/proc/powerpc/topology_updates)
is introduced so that topology updates can be enabled and disabled.

This patch also adds a prrn_is_enabled() call so that PRRN events are
handled in the kernel only if topology updating is enabled.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:26 +10:00
Jesse Larrew
b7abef045f powerpc/pseries: RE-enable Virtual Processor Home Node updating
The new PRRN firmware feature provides a more convenient and event-driven
interface than VPHN for notifying Linux of changes to the NUMA affinity of
platform resources. However, for practical reasons, it may not be feasible
for some customers to update to the latest firmware. For these customers,
the VPHN feature supported on previous firmware versions may still be the
best option.

The VPHN feature was previously disabled due to races with the load
balancing code when accessing the NUMA cpu maps, but the new stop_machine()
approach protects the NUMA cpu maps from these concurrent accesses. It
should be safe to re-enable this feature now.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:25 +10:00
Jesse Larrew
176bbf1424 powerpc/pseries: Update NUMA VDSO information when updating CPU maps
The following patch adds vdso_getcpu_init(), which stores the NUMA node for
a cpu in SPRG3:

Commit 18ad51dd34 ("powerpc: Add VDSO version of getcpu") adds
vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3.

This patch ensures that this information is also updated when the NUMA
affinity of a cpu changes.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:24 +10:00
Nathan Fontenot
30c05350c3 powerpc/pseries: Use stop machine to update cpu maps
The new PRRN firmware feature allows CPU and memory resources to be
transparently reassigned across NUMA boundaries. When this happens, the
kernel must update the node maps to reflect the new affinity information.

Although the NUMA maps can be protected by locking primitives during the
update itself, this is insufficient to prevent concurrent accesses to these
structures. Since cpumask_of_node() hands out a pointer to these
structures, they can still be modified outside of the lock. Furthermore,
tracking down each usage of these pointers and adding locks would be quite
invasive and difficult to maintain.

The approach used is to make a list of affected cpus and call stop_machine
to have the update routine run on each of the affected cpus allowing them
to update themselves. Each cpu finds itself in the list of cpus and makes
the appropriate updates. We need to have each cpu do this for themselves to
handle calls to vdso_getcpu_init() added in a subsequent patch.

Situations like these are best handled using stop_machine(). Since the NUMA
affinity updates are exceptionally rare events, this approach has the
benefit of not adding any overhead while accessing the NUMA maps during
normal operation.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:24 +10:00
Jesse Larrew
5d88aa85c0 powerpc/pseries: Update CPU maps when device tree is updated
Platform events such as partition migration or the new PRRN firmware
feature can cause the NUMA characteristics of a CPU to change, and these
changes will be reflected in the device tree nodes for the affected
CPUs.

This patch registers a handler for Open Firmware device tree updates
and reconfigures the CPU and node maps whenever the associativity
changes. Currently, this is accomplished by marking the affected CPUs in
the cpu_associativity_changes_mask and allowing
arch_update_cpu_topology() to retrieve the new associativity information
using hcall_vphn().

Protecting the NUMA cpu maps from concurrent access during an update
operation will be addressed in a subsequent patch in this series.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:23 +10:00
Nathan Fontenot
8002b0c54b powerpc/pseries: Update numa.c to use updated firmware_has_feature()
Update the numa code to use the updated firmware_has_feature() when checking
for type 1 affinity.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:22 +10:00
Li Zhong
016af59f0f powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()
This patch fixes the following oops, which could be trigged by build the kernel
with many concurrent threads, under CONFIG_DEBUG_PAGEALLOC.

hpte_insert() might return -1, indicating that the bucket (primary here)
is full. We are not necessarily reporting a BUG in this case. Instead, we could
try repeatedly (try secondary, remove and try again) until we find a slot.

[  543.075675] ------------[ cut here ]------------
[  543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239!
[  543.075714] Oops: Exception in kernel mode, sig: 5 [#1]
[  543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[  543.075741] Modules linked in: binfmt_misc ehea
[  543.075759] NIP: c000000000036eb0 LR: c000000000036ea4 CTR: c00000000005a594
[  543.075771] REGS: c0000000a90832c0 TRAP: 0700   Not tainted  (3.8.0-next-20130222)
[  543.075781] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 22224482  XER: 00000000
[  543.075816] SOFTE: 0
[  543.075823] CFAR: c00000000004c200
[  543.075830] TASK = c0000000e506b750[23934] 'cc1' THREAD: c0000000a9080000 CPU: 1
GPR00: 0000000000000001 c0000000a9083540 c000000000c600a8 ffffffffffffffff
GPR04: 0000000000000050 fffffffffffffffa c0000000a90834e0 00000000004ff594
GPR08: 0000000000000001 0000000000000000 000000009592d4d8 c000000000c86854
GPR12: 0000000000000002 c000000006ead300 0000000000a51000 0000000000000001
GPR16: f000000003354380 ffffffffffffffff ffffffffffffff80 0000000000000000
GPR20: 0000000000000001 c000000000c600a8 0000000000000001 0000000000000001
GPR24: 0000000003354380 c000000000000000 0000000000000000 c000000000b65950
GPR28: 0000002000000000 00000000000cd50e 0000000000bf50d9 c000000000c7c230
[  543.076005] NIP [c000000000036eb0] .kernel_map_pages+0x1e0/0x3f8
[  543.076016] LR [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8
[  543.076025] Call Trace:
[  543.076033] [c0000000a9083540] [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable)
[  543.076053] [c0000000a9083640] [c000000000167638] .get_page_from_freelist+0x6cc/0x8dc
[  543.076067] [c0000000a9083800] [c000000000167a48] .__alloc_pages_nodemask+0x200/0x96c
[  543.076082] [c0000000a90839c0] [c0000000001ade44] .alloc_pages_vma+0x160/0x1e4
[  543.076098] [c0000000a9083a80] [c00000000018ce04] .handle_pte_fault+0x1b0/0x7e8
[  543.076113] [c0000000a9083b50] [c00000000018d5a8] .handle_mm_fault+0x16c/0x1a0
[  543.076129] [c0000000a9083c00] [c0000000007bf1dc] .do_page_fault+0x4d0/0x7a4
[  543.076144] [c0000000a9083e30] [c0000000000090e8] handle_page_fault+0x10/0x30
[  543.076155] Instruction dump:
[  543.076163] 7c630038 78631d88 e80a0000 f8410028 7c0903a6 e91f01de e96a0010 e84a0008
[  543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b000000> 7f63db78 48785781 60000000
[  543.076224] ---[ end trace bd5807e8d6ae186b ]---

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 16:00:00 +10:00
Li Zhong
b170bd3de6 powerpc: Split the code trying to insert hpte repeatedly as an helper function
Move the logic trying to insert hpte in __hash_page_huge() to an helper
function, so it could also be used by others.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:59 +10:00
Li Zhong
2c3c0693d9 powerpc: Move the setting of rflags out of loop in __hash_page_huge
It seems that new_pte and rflags don't get changed in the repeating loop, so
move their assignment out of the loop.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:59 +10:00
Stephen Rothwell
55671f3cc2 powerpc: fix annotation of fake_numa_create_new_node()
This function has always been marked as __cpuinit, but is only called
from functions marked as __init and references an __initdata variable.
So change its annotation to __init.

Fixes this build warning:

WARNING: arch/powerpc/mm/built-in.o(.cpuinit.text+0x86): Section mismatch in reference from the function .fake_numa_create_new_node() to the variable .init.data:cmdline
The function __cpuinit .fake_numa_create_new_node() references
a variable __initdata cmdline.
If cmdline is only used by .fake_numa_create_new_node then
annotate cmdline with a matching annotation.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:56 +10:00
Vaidyanathan Srinivasan
7122beeee7 powerpc: fix numa distance for form0 device tree
The following commit breaks numa distance setup for old powerpc
systems that use form0 encoding in device tree.

commit 41eab6f88f
powerpc/numa: Use form 1 affinity to setup node distance

Device tree node /rtas/ibm,associativity-reference-points would
index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
form1 encoding detected by ibm,architecture-vec-5 property.

All modern systems use form1 and current kernel code is correct.
However, on older systems with form0 encoding, the numa distance
will get hard coded as LOCAL_DISTANCE for all nodes.  This causes
task scheduling anomaly since scheduler will skip building numa
level domain (topmost domain with all cpus) if all numa distances
are same.  (value of 'level' in sched_init_numa() will remain 0)

Prior to the above commit:
((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)

Restoring compatible behavior with this patch for old powerpc systems
with device tree where numa distance are encoded as form0.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:56 +10:00
Paul Bolle
aac3d0c875 powerpc: Fix typo "CONFIG_ICSWX_PID"
Untested. As this typo was introduced in v3.3, with commit
9d67028090 ("powerpc: Split ICSWX ACOP and
PID processing"), which actually added PPC_ICSWX_PID, this surely needs
testing.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:54 +10:00
Valentina Manea
8040bda31a powerpc: place EXPORT_SYMBOL macro right after declaration
This fixes the following checkpatch.pl warnings:

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(kmap_prot);

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(kmap_pte);

Signed-off-by: Valentina Manea <valentina.manea.m@gmail.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:49 +10:00
Aneesh Kumar K.V
af81d7878c powerpc: Rename USER_ESID_BITS* to ESID_BITS*
Now we use ESID_BITS of kernel address to build proto vsid. So rename
USER_ESIT_BITS to ESID_BITS

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.8]
2013-03-17 12:45:44 +11:00
Aneesh Kumar K.V
c60ac5693c powerpc: Update kernel VSID range
This patch change the kernel VSID range so that we limit VSID_BITS to 37.
This enables us to support 64TB with 65 bit VA (37+28). Without this patch
we have boot hangs on platforms that only support 65 bit VA.

With this patch we now have proto vsid generated as below:

We first generate a 37-bit "proto-VSID". Proto-VSIDs are generated
from mmu context id and effective segment id of the address.

For user processes max context id is limited to ((1ul << 19) - 5)
for kernel space, we use the top 4 context ids to map address as below
0x7fffc -  [ 0xc000000000000000 - 0xc0003fffffffffff ]
0x7fffd -  [ 0xd000000000000000 - 0xd0003fffffffffff ]
0x7fffe -  [ 0xe000000000000000 - 0xe0003fffffffffff ]
0x7ffff -  [ 0xf000000000000000 - 0xf0003fffffffffff ]

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.8]
2013-03-17 12:39:06 +11:00
Benjamin Herrenschmidt
13938117a5 powerpc: Fix STAB initialization
Commit f5339277eb accidentally removed
more than just iSeries bits and took out the call to stab_initialize()
thus breaking support for POWER3 processors.

Put it back. (Yes, nobody noticed until now ...)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.4+]
2013-03-13 10:06:22 +11:00
Kumar Gala
1b29187315 powerpc/fsl-booke: Support detection of page sizes on e6500
The e6500 core used on T4240 and B4860 SoCs from FSL implements MMUv2 of
the Power Book-E Architecture.  However there are some minor differences
between it and other Book-E implementations.

Add support to parse SPRN_TLB1PS for the variable page sizes supported.
In the future this should be expanded for more page sizes supported on
e6500 as well as other MMU features.

This patch is based on code from Scott Wood.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2013-03-05 17:10:27 -06:00
Linus Torvalds
5ce1a70e2f Merge branch 'akpm' (more incoming from Andrew)
Merge second patch-bomb from Andrew Morton:

 - A little DM fix

 - the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
  ksm: allocate roots when needed
  mm: cleanup "swapcache" in do_swap_page
  mm,ksm: swapoff might need to copy
  mm,ksm: FOLL_MIGRATION do migration_entry_wait
  ksm: shrink 32-bit rmap_item back to 32 bytes
  ksm: treat unstable nid like in stable tree
  ksm: add some comments
  tmpfs: fix mempolicy object leaks
  tmpfs: fix use-after-free of mempolicy object
  mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
  mm: export mmu notifier invalidates
  mm: accelerate mm_populate() treatment of THP pages
  mm: use long type for page counts in mm_populate() and get_user_pages()
  mm: accurately document nr_free_*_pages functions with code comments
  HWPOISON: change order of error_states[]'s elements
  HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
  memcg: stop warning on memcg_propagate_kmem
  net: change type of virtio_chan->p9_max_pages
  vmscan: change type of vm_total_pages to unsigned long
  fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
  ...
2013-02-23 17:50:35 -08:00
Tang Chen
0197518cd3 memory-hotplug: remove memmap of sparse-vmemmap
Introduce a new API vmemmap_free() to free and remove vmemmap
pagetables.  Since pagetable implements are different, each architecture
has to provide its own version of vmemmap_free(), just like
vmemmap_populate().

Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc.

[mhocko@suse.cz: fix implicit declaration of remove_pagetable]
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00
Yasuaki Ishimatsu
46723bfa54 memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap
For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by
get_page_bootmem().  So the patch searches pages of virtual mapping and
registers the pages by get_page_bootmem().

NOTE: register_page_bootmem_memmap() is not implemented for ia64,
      ppc, s390, and sparc.  So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
      and revert register_page_bootmem_info_node() when platform doesn't
      support it.

      It's implemented by adding a new Kconfig option named
      CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
      by memory-hotplug feature fully supported archs(currently only on
      x86_64).

      Since we have 2 config options called MEMORY_HOTPLUG and
      MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
      and codes in function register_page_bootmem_info_node() are only
      used for collecting infomation for hot-remove, so reside it under
      MEMORY_HOTREMOVE.

      Besides page_isolation.c selected by MEMORY_ISOLATION under
      MEMORY_HOTPLUG is also such case, move it too.

[mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
[linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
[mhocko@suse.cz: remove the arch specific functions without any implementation]
[linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
[rientjes@google.com: fix defined but not used warning]
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wu Jianguo <wujianguo@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00
Wen Congyang
24d335ca36 memory-hotplug: introduce new arch_remove_memory() for removing page table
For removing memory, we need to remove page tables.  But it depends on
architecture.  So the patch introduce arch_remove_memory() for removing
page table.  Now it only calls __remove_pages().

Note: __remove_pages() for some archtecuture is not implemented
      (I don't know how to implement it for s390).

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00
Linus Torvalds
9d3cae26ac Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Benjamin Herrenschmidt:
 "So from the depth of frozen Minnesota, here's the powerpc pull request
  for 3.9.  It has a few interesting highlights, in addition to the
  usual bunch of bug fixes, minor updates, embedded device tree updates
  and new boards:

   - Hand tuned asm implementation of SHA1 (by Paulus & Michael
     Ellerman)

   - Support for Doorbell interrupts on Power8 (kind of fast
     thread-thread IPIs) by Ian Munsie

   - Long overdue cleanup of the way we handle relocation of our open
     firmware trampoline (prom_init.c) on 64-bit by Anton Blanchard

   - Support for saving/restoring & context switching the PPR (Processor
     Priority Register) on server processors that support it.  This
     allows the kernel to preserve thread priorities established by
     userspace.  By Haren Myneni.

   - DAWR (new watchpoint facility) support on Power8 by Michael Neuling

   - Ability to change the DSCR (Data Stream Control Register) which
     controls cache prefetching on a running process via ptrace by
     Alexey Kardashevskiy

   - Support for context switching the TAR register on Power8 (new
     branch target register meant to be used by some new specific
     userspace perf event interrupt facility which is yet to be enabled)
     by Ian Munsie.

   - Improve preservation of the CFAR register (which captures the
     origin of a branch) on various exception conditions by Paulus.

   - Move the Bestcomm DMA driver from arch powerpc to drivers/dma where
     it belongs by Philippe De Muyter

   - Support for Transactional Memory on Power8 by Michael Neuling
     (based on original work by Matt Evans).  For those curious about
     the feature, the patch contains a pretty good description."

(See commit db8ff90702: "powerpc: Documentation for transactional
memory on powerpc" for the mentioned description added to the file
Documentation/powerpc/transactional_memory.txt)

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (140 commits)
  powerpc/kexec: Disable hard IRQ before kexec
  powerpc/85xx: l2sram - Add compatible string for BSC9131 platform
  powerpc/85xx: bsc9131 - Correct typo in SDHC device node
  powerpc/e500/qemu-e500: enable coreint
  powerpc/mpic: allow coreint to be determined by MPIC version
  powerpc/fsl_pci: Store the pci ctlr device ptr in the pci ctlr struct
  powerpc/85xx: Board support for ppa8548
  powerpc/fsl: remove extraneous DIU platform functions
  arch/powerpc/platforms/85xx/p1022_ds.c: adjust duplicate test
  powerpc: Documentation for transactional memory on powerpc
  powerpc: Add transactional memory to pseries and ppc64 defconfigs
  powerpc: Add config option for transactional memory
  powerpc: Add transactional memory to POWER8 cpu features
  powerpc: Add new transactional memory state to the signal context
  powerpc: Hook in new transactional memory code
  powerpc: Routines for FP/VSX/VMX unavailable during a transaction
  powerpc: Add transactional memory unavaliable execption handler
  powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes
  powerpc: Add FP/VSX and VMX register load functions for transactional memory
  powerpc: Add helper functions for transactional memory context switching
  ...
2013-02-23 17:09:55 -08:00
Michael Neuling
bc2a9408fa powerpc: Hook in new transactional memory code
This hooks the new transactional memory code into context switching, FP/VMX/VMX
unavailable and exception return.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15 17:02:23 +11:00
Aneesh Kumar K.V
eda8eebdd1 powerpc/mm: Fix hash computation function
The ASM version of hash computation function was truncating the upper bit.
Make the ASM version similar to hpt_hash function. Remove masking vsid bits.
Without this patch, we observed hang during bootup due to not satisfying page
fault request correctly. The fault handler used wrong hash values to update
the HPTE. Hence we kept looping with page fault.

hash_page(ea=000001003e260008, access=203, trap=300 ip=3fff91787134 dsisr 42000000
The computed value of hash 000000000f22f390
update: avpnv=4003e46054003e00, hash=000000000722f390, f=80000006, psize: 2 ...

BenH: The over-masking has been there for ever but only hurts with the
new 64T support introduced in 3.7

Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.7]
2013-02-04 15:15:08 +11:00
Cody P Schafer
4e8309baed powerpc/mm: Eliminate unneeded for_each_memblock
The only persistent change made by this loop is calling
memblock_set_node() once for each memblock, which is not useful (and has
no effect) as memblock_set_node() is not called with any
memblock-specific parameters.

Subsistute a single memblock_set_node().

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-29 11:34:25 +11:00
Michael Neuling
9422de3e95 powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers
This is a rewrite so that we don't assume we are using the DABR throughout the
code.  We now use the arch_hw_breakpoint to store the breakpoint in a generic
manner in the thread_struct, rather than storing the raw DABR value.

The ptrace GET/SET_DEBUGREG interface currently passes the raw DABR in from
userspace.  We keep this functionality, so that future changes (like the POWER8
DAWR), will still fake the DABR to userspace.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:01:44 +11:00
Anton Blanchard
1fbe9cf259 powerpc: Build kernel with -mcmodel=medium
Finally remove the two level TOC and build with -mcmodel=medium.

Unfortunately we can't build modules with -mcmodel=medium due to
the tricks the kernel module loader plays with percpu data:

# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
# percpu data area are created by this method.
#
# The kernel module loader relocates the percpu data section from the
# original location (starting with 0xd...) to somewhere in the base
# kernel percpu data space (starting with 0xc...). We need a full
# 64bit relocation for this to work, hence -mcmodel=large.

On older kernels we fall back to the two level TOC (-mminimal-toc)

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:00:31 +11:00
Greg Kroah-Hartman
cad5cef62a POWERPC: drivers: remove __dev* attributes.
CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, __devinitdata,
__devinitconst, and __devexit from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03 15:57:04 -08:00
Linus Torvalds
16e024f30c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlight is probably some base POWER8 support.  There's more
  to come such as transactional memory support but that will wait for
  the next one.

  Overall it's pretty quiet, or rather I've been pretty poor at picking
  things up from patchwork and reviewing them this time around and Kumar
  no better on the FSL side it seems..."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (73 commits)
  powerpc+of: Rename and fix OF reconfig notifier error inject module
  powerpc: mpc5200: Add a3m071 board support
  powerpc/512x: don't compile any platform DIU code if the DIU is not enabled
  powerpc/mpc52xx: use module_platform_driver macro
  powerpc+of: Export of_reconfig_notifier_[register,unregister]
  powerpc/dma/raidengine: add raidengine device
  powerpc/iommu/fsl: Add PAMU bypass enable register to ccsr_guts struct
  powerpc/mpc85xx: Change spin table to cached memory
  powerpc/fsl-pci: Add PCI controller ATMU PM support
  powerpc/86xx: fsl_pcibios_fixup_bus requires CONFIG_PCI
  drivers/virt: the Freescale hypervisor driver doesn't need to check MSR[GS]
  powerpc/85xx: p1022ds: Use NULL instead of 0 for pointers
  powerpc: Disable relocation on exceptions when kexecing
  powerpc: Enable relocation on during exceptions at boot
  powerpc: Move get_longbusy_msecs into hvcall.h and remove duplicate function
  powerpc: Add wrappers to enable/disable relocation on exceptions
  powerpc: Add set_mode hcall
  powerpc: Setup relocation on exceptions for bare metal systems
  powerpc: Move initial mfspr LPCR out of __init_LPCR
  powerpc: Add relocation on exception vector handlers
  ...
2012-12-18 09:58:09 -08:00
Linus Torvalds
f6e858a00a Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc VM changes from Andrew Morton:
 "The rest of most-of-MM.  The other MM bits await a slab merge.

  This patch includes the addition of a huge zero_page.  Not a
  performance boost but it an save large amounts of physical memory in
  some situations.

  Also a bunch of Fujitsu engineers are working on memory hotplug.
  Which, as it turns out, was badly broken.  About half of their patches
  are included here; the remainder are 3.8 material."

However, this merge disables CONFIG_MOVABLE_NODE, which was totally
broken.  We don't add new features with "default y", nor do we add
Kconfig questions that are incomprehensible to most people without any
help text.  Does the feature even make sense without compaction or
memory hotplug?

* akpm: (54 commits)
  mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
  mm/memory.c: remove unused code from do_wp_page()
  asm-generic, mm: pgtable: consolidate zero page helpers
  mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
  hwpoison, hugetlbfs: fix RSS-counter warning
  hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
  mm: protect against concurrent vma expansion
  memcg: do not check for mm in __mem_cgroup_count_vm_event
  tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
  mm: provide more accurate estimation of pages occupied by memmap
  fs/buffer.c: remove redundant initialization in alloc_page_buffers()
  fs/buffer.c: do not inline exported function
  writeback: fix a typo in comment
  mm: introduce new field "managed_pages" to struct zone
  mm, oom: remove statically defined arch functions of same name
  mm, oom: remove redundant sleep in pagefault oom handler
  mm, oom: cleanup pagefault oom handler
  memory_hotplug: allow online/offline memory to result movable node
  numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
  mm, memcg: avoid unnecessary function call when memcg is disabled
  ...
2012-12-13 13:11:15 -08:00
David Rientjes
c2d23f919b mm, oom: remove statically defined arch functions of same name
out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file scope
in their respective fault.c unnecessarily.  Inline the functions into the
pagefault handlers to clean the code up.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Adam Buchbinder
48fc7f7e78 Fix misspellings of "whether" in comments.
"Whether" is misspelled in various comments across the tree; this
fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:31:35 +01:00
Benjamin Herrenschmidt
de1bb03af7 Merge branch 'dt' into next 2012-11-15 15:02:44 +11:00
Tony Breeds
1afc149def powerpc/47x: Use the new ppc-opcode infrastructure
Don't use 47x only #defines for TLBIVAX or ICBT, supply and use helpers
in ppc-opcode.h

This fixes a compile breakage.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:24 +11:00
Nathan Fontenot
f594972083 powerpc+of: Move of_drconf_cell struct definition to asm/prom.h
This patch moves the definition of the of_drconf_cell struct to asm/prom.h
to make it available for all powerpc/pseries code.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 09:43:55 +11:00
Shaohua Li
45cac65b0f readahead: fault retry breaks mmap file read random detection
.fault now can retry.  The retry can break state machine of .fault.  In
filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
try, since the page is in page cache now, ra->mmap_miss is decreased.  And
these are done in one fault, so we can't detect random mmap file access.

Add a new flag to indicate .fault is tried once.  In the second try, skip
ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.

I only tested x86, didn't test other archs, but looks the change for other
archs is obvious, but who knows :)

Signed-off-by: Shaohua Li <shaohua.li@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:47 +09:00
Linus Torvalds
5f3d2f2e1a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Benjamin Herrenschmidt:
 "Some highlights in addition to the usual batch of fixes:

   - 64TB address space support for 64-bit processes by Aneesh Kumar

   - Gavin Shan did a major cleanup & re-organization of our EEH support
     code (IBM fancy PCI error handling & recovery infrastructure) which
     paves the way for supporting different platform backends, along
     with some rework of the PCIe code for the PowerNV platform in order
     to remove home made resource allocations and instead use the
     generic code (which is possible after some small improvements to it
     done by Gavin).

   - Uprobes support by Ananth N Mavinakayanahalli

   - A pile of embedded updates from Freescale folks, including new SoC
     and board supports, more KVM stuff including preparing for 64-bit
     BookE KVM support, ePAPR 1.1 updates, etc..."

Fixup trivial conflicts in drivers/scsi/ipr.c

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits)
  powerpc/iommu: Fix multiple issues with IOMMU pools code
  powerpc: Fix VMX fix for memcpy case
  driver/mtd:IFC NAND:Initialise internal SRAM before any write
  powerpc/fsl-pci: use 'Header Type' to identify PCIE mode
  powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get
  powerpc: Remove tlb batching hack for nighthawk
  powerpc: Set paca->data_offset = 0 for boot cpu
  powerpc/perf: Sample only if SIAR-Valid bit is set in P7+
  powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled
  powerpc/mpc85xx: Update interrupt handling for IFC controller
  powerpc/85xx: Enable USB support in p1023rds_defconfig
  powerpc/smp: Do not disable IPI interrupts during suspend
  powerpc/eeh: Fix crash on converting OF node to edev
  powerpc/eeh: Lock module while handling EEH event
  powerpc/kprobe: Don't emulate store when kprobe stwu r1
  powerpc/kprobe: Complete kprobe and migrate exception frame
  powerpc/kprobe: Introduce a new thread flag
  powerpc: Remove unused __get_user64() and __put_user64()
  powerpc/eeh: Global mutex to protect PE tree
  powerpc/eeh: Remove EEH PE for normal PCI hotplug
  ...
2012-10-06 03:16:12 +09:00
Linus Torvalds
437589a74b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "This is a mostly modest set of changes to enable basic user namespace
  support.  This allows the code to code to compile with user namespaces
  enabled and removes the assumption there is only the initial user
  namespace.  Everything is converted except for the most complex of the
  filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
  nfs, ocfs2 and xfs as those patches need a bit more review.

  The strategy is to push kuid_t and kgid_t values are far down into
  subsystems and filesystems as reasonable.  Leaving the make_kuid and
  from_kuid operations to happen at the edge of userspace, as the values
  come off the disk, and as the values come in from the network.
  Letting compile type incompatible compile errors (present when user
  namespaces are enabled) guide me to find the issues.

  The most tricky areas have been the places where we had an implicit
  union of uid and gid values and were storing them in an unsigned int.
  Those places were converted into explicit unions.  I made certain to
  handle those places with simple trivial patches.

  Out of that work I discovered we have generic interfaces for storing
  quota by projid.  I had never heard of the project identifiers before.
  Adding full user namespace support for project identifiers accounts
  for most of the code size growth in my git tree.

  Ultimately there will be work to relax privlige checks from
  "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
  root in a user names to do those things that today we only forbid to
  non-root users because it will confuse suid root applications.

  While I was pushing kuid_t and kgid_t changes deep into the audit code
  I made a few other cleanups.  I capitalized on the fact we process
  netlink messages in the context of the message sender.  I removed
  usage of NETLINK_CRED, and started directly using current->tty.

  Some of these patches have also made it into maintainer trees, with no
  problems from identical code from different trees showing up in
  linux-next.

  After reading through all of this code I feel like I might be able to
  win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
  userns: Convert the ufs filesystem to use kuid/kgid where appropriate
  userns: Convert the udf filesystem to use kuid/kgid where appropriate
  userns: Convert ubifs to use kuid/kgid
  userns: Convert squashfs to use kuid/kgid where appropriate
  userns: Convert reiserfs to use kuid and kgid where appropriate
  userns: Convert jfs to use kuid/kgid where appropriate
  userns: Convert jffs2 to use kuid and kgid where appropriate
  userns: Convert hpfs to use kuid and kgid where appropriate
  userns: Convert btrfs to use kuid/kgid where appropriate
  userns: Convert bfs to use kuid/kgid where appropriate
  userns: Convert affs to use kuid/kgid wherwe appropriate
  userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
  userns: On ia64 deal with current_uid and current_gid being kuid and kgid
  userns: On ppc convert current_uid from a kuid before printing.
  userns: Convert s390 getting uid and gid system calls to use kuid and kgid
  userns: Convert s390 hypfs to use kuid and kgid where appropriate
  userns: Convert binder ipc to use kuids
  userns: Teach security_path_chown to take kuids and kgids
  userns: Add user namespace support to IMA
  userns: Convert EVM to deal with kuids and kgids in it's hmac computation
  ...
2012-10-02 11:11:09 -07:00
Michael Ellerman
8e166991c0 powerpc: Remove tlb batching hack for nighthawk
In hpte_init_native() we call tlb_batching_enabled() to decide if we
should setup ppc_md.flush_hash_range.

tlb_batching_enabled() checks the _unflattened_ device tree, to see
if we are running on a nighthawk.

Since commit a223535 ("dont allow pSeries_probe to succeed without
initialising MMU", Dec 2006), hpte_init_native() has been called from
pSeries_probe() - at which point we have not yet unflattened the
device tree.

This means tlb_batching_enabled() will always return true, so the hack
has effectively been disabled since Dec 2006. Ergo, I think we can
drop it.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-27 12:51:06 +10:00
Eric W. Biederman
9e184e0aa3 userns: On ppc convert current_uid from a kuid before printing.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-21 03:13:28 -07:00
Benjamin Herrenschmidt
caa1d631fc Merge remote-tracking branch 'kumar/next' into next 2012-09-18 16:04:33 +10:00
Aneesh Kumar K.V
78f1dbde9f powerpc/mm: Make some of the PGTABLE_RANGE dependency explicit
slice array size and slice mask size depend on PGTABLE_RANGE.

Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:53 +10:00
Aneesh Kumar K.V
f033d659c3 powerpc/mm: Update VSID allocation documentation
This update the proto-VSID and VSID scramble related information
to be more generic by using names instead of current values.

Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:53 +10:00
Aneesh Kumar K.V
048ee0993e powerpc/mm: Add 64TB support
Increase max addressable range to 64TB. This is not tested on
real hardware yet.

Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:51 +10:00
Aneesh Kumar K.V
735cafc32b powerpc/mm: Use 32bit array for slb cache
With larger vsid we need to track more bits of ESID in slb cache
for slb invalidate.

Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:51 +10:00
Aneesh Kumar K.V
ac8dc2823a powerpc/mm: Use the required number of VSID bits in slbmte
ASM_VSID_SCRAMBLE can leave non-zero bits in the high 28 bits of the result
for 256MB segment (40 bits for 1T segment). Properly mask them before using
the values in slbmte

Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:50 +10:00
Aneesh Kumar K.V
7aa0727f33 powerpc/mm: Increase the slice range to 64TB
This patch makes the high psizes mask as an unsigned char array
so that we can have more than 16TB. Currently we support upto
64TB

Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:50 +10:00
Aneesh Kumar K.V
5524a27d39 powerpc/mm: Convert virtual address to vpn
This patch convert different functions to take virtual page number
instead of virtual address. Virtual page number is virtual address
shifted right by VPN_SHIFT (12) bits. This enable us to have an
address range of upto 76 bits.

Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:49 +10:00
Aneesh Kumar K.V
dcda287a9b powerpc/mm: Simplify hpte_decode
This patch simplify hpte_decode for easy switching of virtual address to
virtual page number in the later patch

Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:49 +10:00
Aneesh Kumar K.V
f038392363 powerpc/mm: Replace open coded CONTEXT_BITS value
To clarify the meaning for future readers, replace the open coded
19 with CONTEXT_BITS

Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:48 +10:00
Jia Hongtao
688ba1dbee powerpc/swiotlb: Enable at early stage and disable if not necessary
Remove the dependency on PCI initialization for SWIOTLB initialization.
So that PCI can be initialized at proper time.

SWIOTLB is partly determined by PCI inbound/outbound map which is assigned
in PCI initialization. But swiotlb_init() should be done at the stage of
mem_init() which is much earlier than PCI initialization. So we reserve the
memory for SWIOTLB first and free it if not necessary.

All boards are converted to fit this change.

Signed-off-by: Jia Hongtao <B38951@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Acked-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-09-12 14:57:09 -05:00
Joe MacDonald
6b5e7229bb powerpc/mm: Match variable types to API
sys_subpage_prot() takes an unsigned long for 'addr' then does some stuff
with it and the result is stored in a signed int, i, which is eventually
used as the size parameter in a copy_from_user call.  Update 'i' to be an
unsigned long as well and since 'nw' is used in a size_t context which,
depending on whether this is 32- or 64-bit may be unsigned int or unsigned
long, switch that to a size_t and always be right.

Finally, since we're in the neighbourhood, make the same changes to
subpage_prot_clear().

Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Joe MacDonald <joe.macdonald@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-10 14:37:31 +10:00
Suzuki Poulose
a84fcd4687 powerpc: Change memory_limit from phys_addr_t to unsigned long long
There are some device-tree nodes, whose values are of type phys_addr_t.
The phys_addr_t is variable sized based on the CONFIG_PHSY_T_64BIT.

Change these to a fixed unsigned long long for consistency.

This patch does the change only for memory_limit.

The following is a list of such variables which need the change:

 1) kernel_end, crashk_size - in arch/powerpc/kernel/machine_kexec.c

 2) (struct resource *)crashk_res.start - We could export a local static
    variable from machine_kexec.c.

Changing the above values might break the kexec-tools. So, I will
fix kexec-tools first to handle the different sized values and then change
 the above.

Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-07 11:44:30 +10:00
Benjamin Herrenschmidt
fff34b3412 Merge branch 'merge' into next
Brings in various bug fixes from 3.6-rcX
2012-09-07 09:48:59 +10:00
Jesse Larrew
79c5fcebfe powerpc/vphn: Fix arch_update_cpu_topology() return value
arch_update_cpu_topology() should only return 1 when the topology has
actually changed, and should return 0 otherwise.

This patch fixes a potential bug where rebuild_sched_domains() would
reinitialize the sched domains even when the topology hasn't changed.

Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 16:05:19 +10:00
Mihai Caraman
8b64a9dfb0 powerpc/booke64: Use SPRG0/3 scratch for bolted TLB miss & crit int
Embedded.Hypervisor category defines GSPRG0..3 physical registers for guests.
Avoid SPRG4-7 usage as scratch in host exception handlers, otherwise guest
SPRG4-7 registers will be clobbered.
For bolted TLB miss exception handlers, which is the version currently
supported by KVM, use SPRN_SPRG_GEN_SCRATCH aka SPRG0 instead of
SPRN_SPRG_TLB_SCRATCH aka SPRG6. Keep using TLB PACA slots to fit in one
64-byte cache line.
For critical exception handlers use SPRG3 instead of SPRG7. Provide a routine
to store and restore user-visible SPRGs. This will be subsequently used
to restore VDSO information in SPRG3. Add EX_R13 to paca slots to free up
SPRG3 and change the critical exception epilog to use it.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 15:35:52 +10:00
Mihai Caraman
fecff0f724 powerpc/booke64: Add DO_KVM kernel hooks
Hook DO_KVM macro into 64-bit booke for KVM integration. Extend interrupt
handlers' parameter list with interrupt vector numbers to accomodate the macro.
Only the bolted version of tlb miss handers is addressed now.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 15:35:46 +10:00
Ananth N Mavinakayanahalli
41ab5266c3 powerpc: Add trap_nr to thread_struct
Add thread_struct.trap_nr and use it to store the last exception
the thread experienced. In this patch, we populate the field at
various places where we force_sig_info() to the process.

This is also used in uprobes to determine if the probed instruction
caused an exception.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 15:19:36 +10:00
Michael Ellerman
beacc6da86 powerpc: Remove all includes of <asm/abs_addr.h>
It's empty now, apart from other includes.

Fixup a few files that were getting things via this header.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 15:19:33 +10:00
Michael Ellerman
d88dc13a24 powerpc/mm: Remove uses of abs_to_virt() and virt_to_abs()
These days they are just __va() and __pa() respectively.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 15:19:31 +10:00
Michael Ellerman
70267a7f47 powerpc/mm: Replace abs_to_virt() with __va()
abs_to_virt() is just a wrapper around __va(), call __va() directly.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 15:18:43 +10:00
Alexander Graf
249ba1ee0f KVM: PPC: Add cache flush on page map
When we map a page that wasn't icache cleared before, do so when first
mapping it in KVM using the same information bits as the Linux mapping
logic. That way we are 100% sure that any page we map does not have stale
entries in the icache.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-08-16 14:14:53 +02:00
Stuart Yoder
9778b696a0 powerpc: Use CURRENT_THREAD_INFO instead of open coded assembly
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-11 14:18:22 +10:00
Michael Neuling
962cffbd8a powerpc: Enforce usage of RA 0-R31 where possible
Some macros use RA where when RA=R0 the values is 0, so make this
the enforced mnemonic in the macro.

Idea suggested by Andreas Schwab.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:18:35 +10:00
Michael Neuling
e55174e911 powerpc: Fixes for instructions not using correct register naming
These macros are using integers where they could be using logical
names since they take registers.

We are going to enforce this soon, so fix these up now.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:18:16 +10:00
Michael Neuling
44ce6a5ee7 powerpc: Merge STK_REG/PARAM/FRAMESIZE
Merge the defines of STACKFRAMESIZE, STK_REG, STK_PARAM from different
places.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:18:03 +10:00
Michael Neuling
c75df6f96c powerpc: Fix usage of register macros getting ready for %r0 change
Anything that uses a constructed instruction (ie. from ppc-opcode.h),
need to use the new R0 macro, as %r0 is not going to work.

Also convert usages of macros where we are just determining an offset
(usually for a load/store), like:
	std	r14,STK_REG(r14)(r1)
Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since
it's just calculating an offset.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:17:55 +10:00
Benjamin Herrenschmidt
50bba07d6a Merge branch 'merge' into next
We want to bring in the latest IRQ fixes
2012-07-10 19:16:43 +10:00
Benjamin Herrenschmidt
aa709f3bc9 powerpc/numa: Avoid stupid uninitialized warning from gcc
Newer gcc are being a bit blind here (it's pretty obvious we don't
reach the code path using the array if we haven't initialized the
pointer) but none of that is performance critical so let's just
silence it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:16:23 +10:00
Gavin Shan
5b958a7eef powerpc/numa: Fix OF node refcounting bug
The form affinity for NUMA is set to 1 if the firmware supports
OPAL. Otherwise, we have to retrieve that from OF node "/chosen".
For the latter case, OF node "/chosen" reference count was never
decreased.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:44 +10:00
Michael Neuling
82b2521d25 powerpc: Fix uninitialised error in numa.c
chroma_defconfig currently gives me this with gcc 4.6:
  arch/powerpc/mm/numa.c:638:13: error: 'dm' may be used uninitialized in this function [-Werror=uninitialized]

It's a bogus warning/error since of_get_drconf_memory() only writes it
anyway.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: <stable@kernel.org> [v3.3+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-06-29 14:35:35 +10:00
Anton Vorontsov
73863ab028 powerpc: use clear_tasks_mm_cpumask()
Current CPU hotplug code has some task->mm handling issues:

1. Working with task->mm w/o getting mm or grabing the task lock is
   dangerous as ->mm might disappear (exit_mm() assigns NULL under
   task_lock(), so tasklist lock is not enough).

   We can't use get_task_mm()/mmput() pair as mmput() might sleep,
   so we must take the task lock while handle its mm.

2. Checking for process->mm is not enough because process' main
   thread may exit or detach its mm via use_mm(), but other threads
   may still have a valid mm.

   To fix this we would need to use find_lock_task_mm(), which would
   walk up all threads and returns an appropriate task (with task
   lock held).

clear_tasks_mm_cpumask() has all the issues fixed, so let's use it.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Paul Gortmaker
89528127fa powerpc: fix compile fail in hugetlb cmdline parsing
Commit 9fb48c744b

    "params: add 3rd arg to option handler callback signature"

added an extra arg to the function, but didn't catch all the use
cases needing it, causing this compile fail in mpc85xx_defconfig:

 arch/powerpc/mm/hugetlbpage.c:316:4: error: passing argument 7 of
 'parse_args' from incompatible pointer type [-Werror]

 include/linux/moduleparam.h:317:12: note: expected
	 'int (*)(char *, char *, const char *)' but argument is of type
	 'int (*)(char *, char *)'

This function has no need to printk out the "doing" value, so
just add the arg as an "unused".

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Becky Bruce <beckyb@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-07 16:51:19 -07:00
Linus Torvalds
0195c00244 Disintegrate and delete asm/system.h
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk
 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m
 u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB
 ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m
 rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl
 eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45
 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+
 /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9
 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK
 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR
 FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD
 NypQthI85pc=
 =G9mT
 -----END PGP SIGNATURE-----

Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system

Pull "Disintegrate and delete asm/system.h" from David Howells:
 "Here are a bunch of patches to disintegrate asm/system.h into a set of
  separate bits to relieve the problem of circular inclusion
  dependencies.

  I've built all the working defconfigs from all the arches that I can
  and made sure that they don't break.

  The reason for these patches is that I recently encountered a circular
  dependency problem that came about when I produced some patches to
  optimise get_order() by rewriting it to use ilog2().

  This uses bitops - and on the SH arch asm/bitops.h drags in
  asm-generic/get_order.h by a circuituous route involving asm/system.h.

  The main difficulty seems to be asm/system.h.  It holds a number of
  low level bits with no/few dependencies that are commonly used (eg.
  memory barriers) and a number of bits with more dependencies that
  aren't used in many places (eg.  switch_to()).

  These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

        Move memory barriers here.  This already done for MIPS and Alpha.

    (2) asm/switch_to.h

        Move switch_to() and related stuff here.

    (3) asm/exec.h

        Move arch_align_stack() here.  Other process execution related bits
        could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

        Move xchg() and cmpxchg() here as they're full word atomic ops and
        frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

        Move die() and related bits.

    (6) asm/auxvec.h

        Move AT_VECTOR_SIZE_ARCH here.

  Other arch headers are created as needed on a per-arch basis."

Fixed up some conflicts from other header file cleanups and moving code
around that has happened in the meantime, so David's testing is somewhat
weakened by that.  We'll find out anything that got broken and fix it..

* tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
  Delete all instances of asm/system.h
  Remove all #inclusions of asm/system.h
  Add #includes needed to permit the removal of asm/system.h
  Move all declarations of free_initmem() to linux/mm.h
  Disintegrate asm/system.h for OpenRISC
  Split arch_align_stack() out from asm-generic/system.h
  Split the switch_to() wrapper out of asm-generic/system.h
  Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
  Create asm-generic/barrier.h
  Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
  Disintegrate asm/system.h for Xtensa
  Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
  Disintegrate asm/system.h for Tile
  Disintegrate asm/system.h for Sparc
  Disintegrate asm/system.h for SH
  Disintegrate asm/system.h for Score
  Disintegrate asm/system.h for S390
  Disintegrate asm/system.h for PowerPC
  Disintegrate asm/system.h for PA-RISC
  Disintegrate asm/system.h for MN10300
  ...
2012-03-28 15:58:21 -07:00
Linus Torvalds
2e7580b0e7 Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Avi Kivity:
 "Changes include timekeeping improvements, support for assigning host
  PCI devices that share interrupt lines, s390 user-controlled guests, a
  large ppc update, and random fixes."

This is with the sign-off's fixed, hopefully next merge window we won't
have rebased commits.

* 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
  KVM: Convert intx_mask_lock to spin lock
  KVM: x86: fix kvm_write_tsc() TSC matching thinko
  x86: kvmclock: abstract save/restore sched_clock_state
  KVM: nVMX: Fix erroneous exception bitmap check
  KVM: Ignore the writes to MSR_K7_HWCR(3)
  KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask
  KVM: PMU: add proper support for fixed counter 2
  KVM: PMU: Fix raw event check
  KVM: PMU: warn when pin control is set in eventsel msr
  KVM: VMX: Fix delayed load of shared MSRs
  KVM: use correct tlbs dirty type in cmpxchg
  KVM: Allow host IRQ sharing for assigned PCI 2.3 devices
  KVM: Ensure all vcpus are consistent with in-kernel irqchip settings
  KVM: x86 emulator: Allow PM/VM86 switch during task switch
  KVM: SVM: Fix CPL updates
  KVM: x86 emulator: VM86 segments must have DPL 3
  KVM: x86 emulator: Fix task switch privilege checks
  arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice
  KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation
  KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
  ...
2012-03-28 14:35:31 -07:00
David Howells
ae3a197e3d Disintegrate asm/system.h for PowerPC
Disintegrate asm/system.h for PowerPC.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: linuxppc-dev@lists.ozlabs.org
2012-03-28 18:30:02 +01:00
Pawel Moll
026cee0086 params: <level>_initcall-like kernel parameters
This patch adds a set of macros that can be used to declare
kernel parameters to be parsed _before_ initcalls at a chosen
level are executed.  We rename the now-unused "flags" field of
struct kernel_param as the level.  It's signed, for when we
use this for early params as well, in future.

Linker macro collating init calls had to be modified in order
to add additional symbols between levels that are later used
by the init code to split the calls into blocks.

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-03-26 12:50:51 +10:30
Linus Torvalds
5375871d43 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc merge from Benjamin Herrenschmidt:
 "Here's the powerpc batch for this merge window.  It is going to be a
  bit more nasty than usual as in touching things outside of
  arch/powerpc mostly due to the big iSeriesectomy :-) We finally got
  rid of the bugger (legacy iSeries support) which was a PITA to
  maintain and that nobody really used anymore.

  Here are some of the highlights:

   - Legacy iSeries is gone.  Thanks Stephen ! There's still some bits
     and pieces remaining if you do a grep -ir series arch/powerpc but
     they are harmless and will be removed in the next few weeks
     hopefully.

   - The 'fadump' functionality (Firmware Assisted Dump) replaces the
     previous (equivalent) "pHyp assisted dump"...  it's a rewrite of a
     mechanism to get the hypervisor to do crash dumps on pSeries, the
     new implementation hopefully being much more reliable.  Thanks
     Mahesh Salgaonkar.

   - The "EEH" code (pSeries PCI error handling & recovery) got a big
     spring cleaning, motivated by the need to be able to implement a
     new backend for it on top of some new different type of firwmare.

     The work isn't complete yet, but a good chunk of the cleanups is
     there.  Note that this adds a field to struct device_node which is
     not very nice and which Grant objects to.  I will have a patch soon
     that moves that to a powerpc private data structure (hopefully
     before rc1) and we'll improve things further later on (hopefully
     getting rid of the need for that pointer completely).  Thanks Gavin
     Shan.

   - I dug into our exception & interrupt handling code to improve the
     way we do lazy interrupt handling (and make it work properly with
     "edge" triggered interrupt sources), and while at it found & fixed
     a wagon of issues in those areas, including adding support for page
     fault retry & fatal signals on page faults.

   - Your usual random batch of small fixes & updates, including a bunch
     of new embedded boards, both Freescale and APM based ones, etc..."

I fixed up some conflicts with the generalized irq-domain changes from
Grant Likely, hopefully correctly.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (141 commits)
  powerpc/ps3: Do not adjust the wrapper load address
  powerpc: Remove the rest of the legacy iSeries include files
  powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces
  init: Remove CONFIG_PPC_ISERIES
  powerpc: Remove FW_FEATURE ISERIES from arch code
  tty/hvc_vio: FW_FEATURE_ISERIES is no longer selectable
  powerpc/spufs: Fix double unlocks
  powerpc/5200: convert mpc5200 to use of_platform_populate()
  powerpc/mpc5200: add options to mpc5200_defconfig
  powerpc/mpc52xx: add a4m072 board support
  powerpc/mpc5200: update mpc5200_defconfig to fit for charon board
  Documentation/powerpc/mpc52xx.txt: Checkpatch cleanup
  powerpc/44x: Add additional device support for APM821xx SoC and Bluestone board
  powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board
  MAINTAINERS: Update PowerPC 4xx tree
  powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board
  powerpc: document the FSL MPIC message register binding
  powerpc: add support for MPIC message register API
  powerpc/fsl: Added aliased MSIIR register address to MSI node in dts
  powerpc/85xx: mpc8548cds - add 36-bit dts
  ...
2012-03-21 18:55:10 -07:00
Stephen Rothwell
f5339277eb powerpc: Remove FW_FEATURE ISERIES from arch code
This is no longer selectable, so just remove all the dependent code.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-21 11:16:11 +11:00
Cong Wang
2480b20892 powerpc: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:14 +08:00
Kumar Gala
f0b8b3417d powerpc/fsl-booke: Fixup calc_cam_sz to support MMU v2
The registers that describe size supported by TLB are different on MMU
v2 as well as we support power of two page sizes.  For now we continue
to assume that FSL variable size array supports all page sizes up to the
maximum one reported in TLB1PS.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-03-15 12:12:19 -05:00
Benjamin Herrenschmidt
9be72573a8 powerpc: Add support for page fault retry and fatal signals
Other architectures such as x86 and ARM have been growing
new support for features like retrying page faults after
dropping the mm semaphore to break contention, or being
able to return from a stuck page fault when a SIGKILL is
pending.

This refactors our implementation of do_page_fault() to
move the error handling out of line in a way similar to
x86 and adds support for those two features.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 10:55:12 +11:00
Benjamin Herrenschmidt
a546498f3b powerpc: Call do_page_fault() with interrupts off
We currently turn interrupts back to their previous state before
calling do_page_fault(). This can be annoying when debugging as
a bad fault will potentially have lost some processor state before
getting into the debugger.

We also end up calling some generic code with interrupts enabled
such as notify_page_fault() with interrupts enabled, which could
be unexpected.

This changes our code to behave more like other architectures,
and make the assembly entry code call into do_page_faults() with
interrupts disabled. They are conditionally re-enabled from
within do_page_fault() in the same spot x86 does it.

While there, add the might_sleep() test in the case of a successful
trylock of the mmap semaphore, again like x86.

Also fix a bug in the existing assembly where r12 (_MSR) could get
clobbered by C calls (the DTL accounting in the exception common
macro and DISABLE_INTS) in some cases.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2. Add the r12 clobber fix
2012-03-09 10:55:08 +11:00
Benjamin Herrenschmidt
4f8cf36f48 powerpc: Remove legacy iSeries bits from assembly files
This removes the various bits of assembly in the kernel entry,
exception handling and SLB management code that were specific
to running under the legacy iSeries hypervisor which is no
longer supported.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 10:54:59 +11:00
Joe Perches
a2234b4bae powerpc: Use vsprintf extention %pf with builtin_return_address
Emit the function name not the address when possible.

builtin_return_address() gives an address.  When building
a kernel with CONFIG_KALLSYMS, emit the actual function
name not the address.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-07 17:06:09 +11:00
Jimi Xenidis
de801de139 powerpc/icswx: Fix race condition with IPI setting ACOP
There is a race where a thread causes a coprocessor type to be valid
in its own ACOP _and_ in the current context, but it does not
propagate to the ACOP register of other threads in time for them to
use it.  The original code tries to solve this by sending an IPI to
all threads on the system, which is heavy handed, but unfortunately
still provides a window where the icswx is issued by other threads and
the ACOP is not up to date.

This patch detects that the ACOP DSI fault was a "false positive" and
syncs the ACOP and causes the icswx to be replayed.

Signed-off-by: Jimi Xenidis <jimix@pobox.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-07 17:06:09 +11:00
Paul Mackerras
342d3db763 KVM: PPC: Implement MMU notifiers for Book3S HV guests
This adds the infrastructure to enable us to page out pages underneath
a Book3S HV guest, on processors that support virtualized partition
memory, that is, POWER7.  Instead of pinning all the guest's pages,
we now look in the host userspace Linux page tables to find the
mapping for a given guest page.  Then, if the userspace Linux PTE
gets invalidated, kvm_unmap_hva() gets called for that address, and
we replace all the guest HPTEs that refer to that page with absent
HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
set, which will cause an HDSI when the guest tries to access them.
Finally, the page fault handler is extended to reinstantiate the
guest HPTE when the guest tries to access a page which has been paged
out.

Since we can't intercept the guest DSI and ISI interrupts on PPC970,
we still have to pin all the guest pages on PPC970.  We have a new flag,
kvm->arch.using_mmu_notifiers, that indicates whether we can page
guest pages out.  If it is not set, the MMU notifier callbacks do
nothing and everything operates as before.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:38 +02:00
Mahesh Salgaonkar
3ccc00a7e0 fadump: Register for firmware assisted dump.
On 2012-02-20 11:02:51 Mon, Paul Mackerras wrote:
> On Thu, Feb 16, 2012 at 04:44:30PM +0530, Mahesh J Salgaonkar wrote:
>
> If I have read the code correctly, we are going to get this printk on
> non-pSeries machines or on older pSeries machines, even if the user
> has not put the fadump=on option on the kernel command line.  The
> printk will be annoying since there is no actual error condition.  It
> seems to me that the condition for the printk should include
> fw_dump.fadump_enabled.  In other words you should probably add
>
> 	if (!fw_dump.fadump_enabled)
> 		return 0;
>
> at the beginning of the function.

Hi Paul,

Thanks for pointing it out. Please find the updated patch below.

The existing patches above this (4/10 through 10/10) cleanly applies
on this update.

Thanks,
-Mahesh.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:01 +11:00
Wanlong Gao
9512938b88 cpumask: update setup_node_to_cpumask_map() comments
node_to_cpumask() has been replaced by cpumask_of_node(), and wholly
removed since commit 29c337a0 ("cpumask: remove obsolete node_to_cpumask
now everyone uses cpumask_of_node").

So update the comments for setup_node_to_cpumask_map().

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
Linus Torvalds
98793265b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
  Kconfig: acpi: Fix typo in comment.
  misc latin1 to utf8 conversions
  devres: Fix a typo in devm_kfree comment
  btrfs: free-space-cache.c: remove extra semicolon.
  fat: Spelling s/obsolate/obsolete/g
  SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
  tools/power turbostat: update fields in manpage
  mac80211: drop spelling fix
  types.h: fix comment spelling for 'architectures'
  typo fixes: aera -> area, exntension -> extension
  devices.txt: Fix typo of 'VMware'.
  sis900: Fix enum typo 'sis900_rx_bufer_status'
  decompress_bunzip2: remove invalid vi modeline
  treewide: Fix comment and string typo 'bufer'
  hyper-v: Update MAINTAINERS
  treewide: Fix typos in various parts of the kernel, and fix some comments.
  clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
  gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
  leds: Kconfig: Fix typo 'D2NET_V2'
  sound: Kconfig: drop unknown symbol ARCH_CLPS7500
  ...

Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
2012-01-08 13:21:22 -08:00
Linus Torvalds
7affca3537 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
  arm: fix up some samsung merge sysdev conversion problems
  firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
  Drivers:hv: Fix a bug in vmbus_driver_unregister()
  driver core: remove __must_check from device_create_file
  debugfs: add missing #ifdef HAS_IOMEM
  arm: time.h: remove device.h #include
  driver-core: remove sysdev.h usage.
  clockevents: remove sysdev.h
  arm: convert sysdev_class to a regular subsystem
  arm: leds: convert sysdev_class to a regular subsystem
  kobject: remove kset_find_obj_hinted()
  m86k: gpio - convert sysdev_class to a regular subsystem
  mips: txx9_sram - convert sysdev_class to a regular subsystem
  mips: 7segled - convert sysdev_class to a regular subsystem
  sh: dma - convert sysdev_class to a regular subsystem
  sh: intc - convert sysdev_class to a regular subsystem
  power: suspend - convert sysdev_class to a regular subsystem
  power: qe_ic - convert sysdev_class to a regular subsystem
  power: cmm - convert sysdev_class to a regular subsystem
  s390: time - convert sysdev_class to a regular subsystem
  ...

Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
 - arch/arm/mach-exynos/cpu.c
 - arch/arm/mach-exynos/irq-eint.c
 - arch/arm/mach-s3c64xx/common.c
 - arch/arm/mach-s3c64xx/cpu.c
 - arch/arm/mach-s5p64x0/cpu.c
 - arch/arm/mach-s5pv210/common.c
 - arch/arm/plat-samsung/include/plat/cpu.h
 - arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07 12:03:30 -08:00
Linus Torvalds
e4e88f31bc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (185 commits)
  powerpc: fix compile error with 85xx/p1010rdb.c
  powerpc: fix compile error with 85xx/p1023_rds.c
  powerpc/fsl: add MSI support for the Freescale hypervisor
  arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree
  powerpc/fsl: Add support for Integrated Flash Controller
  powerpc/fsl: update compatiable on fsl 16550 uart nodes
  powerpc/85xx: fix PCI and localbus properties in p1022ds.dts
  powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig
  powerpc/fsl: Update defconfigs to enable some standard FSL HW features
  powerpc: Add TBI PHY node to first MDIO bus
  sbc834x: put full compat string in board match check
  powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address
  powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit
  offb: Fix setting of the pseudo-palette for >8bpp
  offb: Add palette hack for qemu "standard vga" framebuffer
  offb: Fix bug in calculating requested vram size
  powerpc/boot: Change the WARN to INFO for boot wrapper overlap message
  powerpc/44x: Fix build error on currituck platform
  powerpc/boot: Change the load address for the wrapper to fit the kernel
  powerpc/44x: Enable CRASH_DUMP for 440x
  ...

Fix up a trivial conflict in arch/powerpc/include/asm/cputime.h due to
the additional sparse-checking code for cputime_t.
2012-01-06 17:58:22 -08:00
Greg Kroah-Hartman
ff4b8a57f0 Merge branch 'driver-core-next' into Linux 3.2
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.

The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 11:42:52 -08:00
Kay Sievers
8a25a2fd12 cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.

Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 14:29:42 -08:00
Suzuki Poulose
368ff8f14d powerpc: Define virtual-physical translations for RELOCATABLE
We find the runtime address of _stext and relocate ourselves based
on the following calculation.

	virtual_base = ALIGN(KERNELBASE,KERNEL_TLB_PIN_SIZE) +
			MODULO(_stext.run,KERNEL_TLB_PIN_SIZE)

relocate() is called with the Effective Virtual Base Address (as
shown below)

            | Phys. Addr| Virt. Addr |
Page        |------------------------|
Boundary    |           |            |
            |           |            |
            |           |            |
Kernel Load |___________|_ __ _ _ _ _|<- Effective
Addr(_stext)|           |      ^     |Virt. Base Addr
            |           |      |     |
            |           |      |     |
            |           |reloc_offset|
            |           |      |     |
            |           |      |     |
            |           |______v_____|<-(KERNELBASE)%TLB_SIZE
            |           |            |
            |           |            |
            |           |            |
Page        |-----------|------------|
Boundary    |           |            |

On BookE, we need __va() & __pa() early in the boot process to access
the device tree.

Currently this has been defined as :

#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) -
						PHYSICAL_START + KERNELBASE)
where:
 PHYSICAL_START is kernstart_addr - a variable updated at runtime.
 KERNELBASE	is the compile time Virtual base address of kernel.

This won't work for us, as kernstart_addr is dynamic and will yield different
results for __va()/__pa() for same mapping.

e.g.,

Let the kernel be loaded at 64MB and KERNELBASE be 0xc0000000 (same as
PAGE_OFFSET).

In this case, we would be mapping 0 to 0xc0000000, and kernstart_addr = 64M

Now __va(1MB) = (0x100000) - (0x4000000) + 0xc0000000
		= 0xbc100000 , which is wrong.

it should be : 0xc0000000 + 0x100000 = 0xc0100000

On platforms which support AMP, like PPC_47x (based on 44x), the kernel
could be loaded at highmem. Hence we cannot always depend on the compile
time constants for mapping.

Here are the possible solutions:

1) Update kernstart_addr(PHSYICAL_START) to match the Physical address of
compile time KERNELBASE value, instead of the actual Physical_Address(_stext).

The disadvantage is that we may break other users of PHYSICAL_START. They
could be replaced with __pa(_stext).

2) Redefine __va() & __pa() with relocation offset

#ifdef	CONFIG_RELOCATABLE_PPC32
#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + (KERNELBASE + RELOC_OFFSET)))
#define __pa(x) ((unsigned long)(x) + PHYSICAL_START - (KERNELBASE + RELOC_OFFSET))
#endif

where, RELOC_OFFSET could be

  a) A variable, say relocation_offset (like kernstart_addr), updated
     at boot time. This impacts performance, as we have to load an additional
     variable from memory.

		OR

  b) #define RELOC_OFFSET ((PHYSICAL_START & PPC_PIN_SIZE_OFFSET_MASK) - \
                      (KERNELBASE & PPC_PIN_SIZE_OFFSET_MASK))

   This introduces more calculations for doing the translation.

3) Redefine __va() & __pa() with a new variable

i.e,

#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + VIRT_PHYS_OFFSET))

where VIRT_PHYS_OFFSET :

#ifdef CONFIG_RELOCATABLE_PPC32
#define VIRT_PHYS_OFFSET virt_phys_offset
#else
#define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START)
#endif /* CONFIG_RELOCATABLE_PPC32 */

where virt_phy_offset is updated at runtime to :

	Effective KERNELBASE - kernstart_addr.

Taking our example, above:

virt_phys_offset = effective_kernelstart_vaddr - kernstart_addr
		 = 0xc0400000 - 0x400000
		 = 0xc0000000
	and

	__va(0x100000) = 0xc0000000 + 0x100000 = 0xc0100000
	 which is what we want.

I have implemented (3) in the following patch which has same cost of
operation as the existing one.

I have tested the patches on 440x platforms only. However this should
work fine for PPC_47x also, as we only depend on the runtime address
and the current TLB XLAT entry for the startup code, which is available
in r25. I don't have access to a 47x board yet. So, it would be great if
somebody could test this on 47x.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20 10:21:34 -05:00
Suzuki Poulose
0f890c8d20 powerpc: Rename mapping based RELOCATABLE to DYNAMIC_MEMSTART for BookE
The current implementation of CONFIG_RELOCATABLE in BookE is based
on mapping the page aligned kernel load address to KERNELBASE. This
approach however is not enough for platforms, where the TLB page size
is large (e.g, 256M on 44x). So we are renaming the RELOCATABLE used
currently in BookE to DYNAMIC_MEMSTART to reflect the actual method.

The CONFIG_RELOCATABLE for PPC32(BookE) based on processing of the
dynamic relocations will be introduced in the later in the patch series.

This change would allow the use of the old method of RELOCATABLE for
platforms which can afford to enforce the page alignment (platforms with
smaller TLB size).

Changes since v3:

* Introduced a new config, NONSTATIC_KERNEL, to denote a kernel which is
  either a RELOCATABLE or DYNAMIC_MEMSTART(Suggested by: Josh Boyer)

Suggested-by: Scott Wood <scottwood@freescale.com>
Tested-by: Scott Wood <scottwood@freescale.com>

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux ppc dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20 10:20:19 -05:00
David Rientjes
2011b1d0d3 powerpc/mm: Fix section mismatch for read_n_cells
read_n_cells() cannot be marked as .devinit.text since it is referenced
from two functions that are not in that section: of_get_lmb_size() and
hot_add_drconf_scn_to_nid().

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:41:18 +11:00
David Rientjes
28e86bdbc9 powerpc/mm: Fix section mismatch for mark_reserved_regions_for_nid
mark_reserved_regions_for_nid() is only called from do_init_bootmem(),
which is in .init.text, so it must be in the same section to avoid a
section mismatch warning.

Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:41:16 +11:00
Benjamin Herrenschmidt
1e7342e778 Merge remote-tracking branch 'jwb/next' into next
Conflicts:
	arch/powerpc/platforms/40x/ppc40x_simple.c
2011-12-16 11:24:25 +11:00
Christoph Egger
ca899859f1 powerpc/44x: Removing dead CONFIG_PPC47x
CONFIG_PPC47x doesn't exist in Kconfig and no 476 processor calls this
function ppc44x_pin_tlb() as it has it's own ppc47x_pin_tlb().

This code is probably an artifact of the original 476 code that
shouldn't have made it upstream.

Signed-off-by: Christoph Egger <siccegge@cs.fau.de>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-09 07:48:56 -05:00
Tejun Heo
1d7cfe18ec powerpc: Use HAVE_MEMBLOCK_NODE_MAP
powerpc doesn't access early_node_map[] directly and enabling
HAVE_MEMBLOCK_NODE_MAP is trivial - replacing add_active_range() calls
with memblock_set_node() and selecting HAVE_MEMBLOCK_NODE_MAP is
enough.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
2011-12-08 10:22:08 -08:00
Tejun Heo
1aadc0560f memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users
The only function of memblock_analyze() is now allowing resize of
memblock region arrays.  Rename it to memblock_allow_resize() and
update its users.

* The following users remain the same other than renaming.

  arm/mm/init.c::arm_memblock_init()
  microblaze/kernel/prom.c::early_init_devtree()
  powerpc/kernel/prom.c::early_init_devtree()
  openrisc/kernel/prom.c::early_init_devtree()
  sh/mm/init.c::paging_init()
  sparc/mm/init_64.c::paging_init()
  unicore32/mm/init.c::uc32_memblock_init()

* In the following users, analyze was used to update total size which
  is no longer necessary.

  powerpc/kernel/machine_kexec.c::reserve_crashkernel()
  powerpc/kernel/prom.c::early_init_devtree()
  powerpc/mm/init_32.c::MMU_init()
  powerpc/mm/tlb_nohash.c::__early_init_mmu()  
  powerpc/platforms/ps3/mm.c::ps3_mm_add_memory()
  powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups()
  sh/kernel/machine_kexec.c::reserve_crashkernel()

* x86/kernel/e820.c::memblock_x86_fill() was directly setting
  memblock_can_resize before populating memblock and calling analyze
  afterwards.  Call memblock_allow_resize() before start populating.

memblock_can_resize is now static inside memblock.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08 10:22:08 -08:00
Tejun Heo
6fbef13c4f powerpc: Cleanup memblock usage
* early_init_devtree(): Total memory size is aligned to PAGE_SIZE;
  however, alignment isn't enforced if memory_limit is explicitly
  specified.  Simplify the logic and always apply PAGE_SIZE alignment.

* MMU_init(): memblock regions is truncated by directly modifying
  memblock.memory.cnt.  This is incomplete (reserved array is not
  truncated) and unnecessarily low level hindering further memblock
  improvments.  Use memblock_enforce_memory_limit() instead.

* wii_memory_fixups(): Unnecessarily low level direct manipulation of
  memblock regions.  The same result can be achieved using properly
  abstracted operations.  Reimplement using memblock API.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
2011-12-08 10:22:07 -08:00
sukadev@linux.vnet.ibm.com
8a3e3d31d1 powerpc: Punch a hole in /dev/mem for librtas
With CONFIG_STRICT_DEVMEM=y, user space cannot read any part of /dev/mem.
Since this breaks librtas, punch a hole in /dev/mem to allow access to the
rmo_buffer that librtas needs.

Anton Blanchard reported the problem and helped with the fix.

A quick test for this patch:

       # cat /proc/rtas/rmo_buffer
       000000000f190000 10000

       # python -c "print 0x000000000f190000 / 0x10000"
       3865

       # dd if=/dev/mem of=/tmp/foo count=1 bs=64k skip=3865
       1+0 records in
       1+0 records out
       65536 bytes (66 kB) copied, 0.000205235 s, 319 MB/s

       # dd if=/dev/mem of=/tmp/foo
       dd: reading `/dev/mem': Operation not permitted
       0+0 records in
       0+0 records out
       0 bytes (0 B) copied, 0.00022519 s, 0.0 kB/s

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-08 14:22:52 +11:00
Becky Bruce
d93e4d7d72 powerpc/book3e: Change hugetlb preload to take vma argument
This avoids an extra find_vma() and is less error-prone.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 16:26:24 +11:00
Becky Bruce
a6146888be powerpc: Add gpages reservation code for 64-bit FSL BOOKE
For 64-bit FSL_BOOKE implementations, gigantic pages need to be
reserved at boot time by the memblock code based on the command line.
This adds the call that handles the reservation, and fixes some code
comments.

It also removes the previous pr_err when reserve_hugetlb_gpages
is called on a system without hugetlb enabled - the way the code is
structured, the call is unconditional and the resulting error message
spurious and confusing.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 16:26:23 +11:00
Becky Bruce
d1b9b12811 powerpc: Add hugepage support to 64-bit tablewalk code for FSL_BOOK3E
Before hugetlb, at each level of the table, we test for
!0 to determine if we have a valid table entry.  With hugetlb, this
compare becomes:
        < 0 is a normal entry
        0 is an invalid entry
        > 0 is huge

This works because the hugepage code pulls the top bit off the entry
(which for non-huge entries always has the top bit set) as an
indicator that we have a hugepage.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 16:26:22 +11:00
Becky Bruce
27609a42ee powerpc: Whitespace/comment changes to tlb_low_64e.S
I happened to comment this code while I was digging through it;
we might as well commit that.  I also made some whitespace
changes - the existing code had a lot of unnecessary newlines
that I found annoying when I was working on my tiny laptop.

No functional changes.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 16:26:22 +11:00
Becky Bruce
881fde1db5 powerpc: hugetlb: modify include usage for FSL BookE code
The original 32-bit hugetlb implementation used PPC64 vs PPC32 to
determine which code path to take.  However, the final hugetlb
implementation for 64-bit FSL ended up shared with the FSL
32-bit code so the actual check needs to be FSL_BOOK3E vs
everything else.  This patch changes the include protections to
reflect this.

There are also a couple of related comment fixes.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 16:26:22 +11:00
Becky Bruce
a1cd541988 powerpc: Update hugetlb huge_pte_alloc and tablewalk code for FSL BOOKE
This updates the hugetlb page table code to handle 64-bit FSL_BOOKE.
The previous 32-bit work counted on the inner levels of the page table
collapsing.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 16:26:22 +11:00
Becky Bruce
8c1674de2b powerpc: Fix booke hugetlb preload code for PPC_MM_SLICES and 64-bit
This patch does 2 things: It corrects the code that determines the
size to write into MAS1 for the PPC_MM_SLICES case (this originally
came from David Gibson and I had incorrectly altered it), and it
changes the methodolody used to calculate the size for !PPC_MM_SLICES
to work for 64-bit as well as 32-bit.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 16:26:21 +11:00
Becky Bruce
7651295944 powerpc: Only define HAVE_ARCH_HUGETLB_UNMAPPED_AREA if PPC_MM_SLICES
If we don't have slices, we should be able to use the generic
hugetlb_get_unmapped_area() code

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 16:26:21 +11:00
Justin P. Mattock
42b2aa86c6 treewide: Fix typos in various parts of the kernel, and fix some comments.
The below patch fixes some typos in various parts of the kernel, as well as fixes some comments.
Please let me know if I missed anything, and I will try to get it changed and resent.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-12-02 14:57:31 +01:00
Tejun Heo
d4bbf7e775 Merge branch 'master' into x86/memblock
Conflicts & resolutions:

* arch/x86/xen/setup.c

	dc91c728fd "xen: allow extra memory to be in multiple regions"
	24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."

	conflicted on xen_add_extra_mem() updates.  The resolution is
	trivial as the latter just want to replace
	memblock_x86_reserve_range() with memblock_reserve().

* drivers/pci/intel-iommu.c

	166e9278a3 "x86/ia64: intel-iommu: move to drivers/iommu/"
	5dfe8660a3 "bootmem: Replace work_with_active_regions() with..."

	conflicted as the former moved the file under drivers/iommu/.
	Resolved by applying the chnages from the latter on the moved
	file.

* mm/Kconfig

	6661672053 "memblock: add NO_BOOTMEM config symbol"
	c378ddd53f "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"

	conflicted trivially.  Both added config options.  Just
	letting both add their own options resolves the conflict.

* mm/memblock.c

	d1f0ece6cd "mm/memblock.c: small function definition fixes"
	ed7b56a799 "memblock: Remove memblock_memory_can_coalesce()"

	confliected.  The former updates function removed by the
	latter.  Resolution is trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-28 09:46:22 -08:00
Dan McGee
fa8cbaaf5a powerpc+sparc64/mm: Remove hack in mmap randomize layout
Since commit 8a0a9bd4db, this comment in mmap_rnd() does not
hold true as the value returned by get_random_int() will in fact be

different every single call. Remove the comment and simplify the code
back to its original desired form.

This reverts commit a5adc91a4b which is no longer necessary and
also fixes the sparc code that copied this same adjustment.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-28 11:42:09 +11:00
sukadev@linux.vnet.ibm.com
1d54cf2b97 powerpc: Implement CONFIG_STRICT_DEVMEM
As described in the help text in the patch, this token restricts general
access to /dev/mem as a way of increasing the security. Specifically, access
to exclusive IOMEM and kernel RAM is denied unless CONFIG_STRICT_DEVMEM is
set to 'n'.

Implement the 'devmem_is_allowed()' interface for Powerpc. It will be
called from range_is_allowed() when userpsace attempts to access /dev/mem.

This patch is based on an earlier patch from Steve Best and with input from
Paul Mackerras and Scott Wood.

[BenH] Fixed a typo or two and removed the generic change which should
       be submitted as a separate patch

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-28 11:42:08 +11:00
Jimi Xenidis
c3dcf53a3f powerpc/icswx: Simple ACOP fault handler
This patch adds a fault handler that responds to illegal Coprocessor
types.  Currently all CTs are treated and illegal.  There are two ways
to report the fault back to the application.  If the application used
the record form ("icswx.") then the architected "reject" is emulated.
If the application did not used the record form ("icswx") then it is
selectable by config whether the failure is silent (as architected) or
a SIGILL is generated.

In all cases pr_warn() is used to log the bad CT.

Signed-off-by: Jimi Xenidis <jimix@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-25 14:11:28 +11:00
Jimi Xenidis
9d67028090 powerpc: Split ICSWX ACOP and PID processing
Some processors, like embedded, that already have a PID register that
is managed by the system.  This patch separates the ACOP and PID
processing into separate files so that the ACOP code can be shared.

Signed-off-by: Jimi Xenidis <jimix@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-25 14:11:27 +11:00
Kumar Gala
13020be8be powerpc: Fix compiliation with hugetlbfs enabled
arch/powerpc/mm/hugetlbpage.c: In function 'reserve_hugetlb_gpages':
arch/powerpc/mm/hugetlbpage.c:312:2: error: implicit declaration of function 'parse_args'

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-25 10:05:59 +11:00
Dipankar Sarma
1c8ee73395 powerpc/numa: NUMA topology support for PowerNV
This patch adds support for numa topology on powernv platforms running
OPAL formware. It checks for the type of platform at run time and
sets the affinity form correctly so that NUMA topology can be discovered
correctly.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-08 14:51:46 +11:00
Anton Blanchard
c40dd2f766 powerpc: Add System RAM to /proc/iomem
We've resisted adding System RAM to /proc/iomem because it is
the wrong place for it. Unfortunately we continue to find tools
that rely on this behaviour so give up and add it in.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-08 14:51:46 +11:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Linus Torvalds
1197ab2942 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits)
  powerpc/p3060qds: Add support for P3060QDS board
  powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX
  powerpc/85xx: Make kexec to interate over online cpus
  powerpc/fsl_booke: Fix comment in head_fsl_booke.S
  powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices
  powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver
  powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver
  powerpc/86xx: Correct Gianfar support for GE boards
  powerpc/cpm: Clear muram before it is in use.
  drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager
  powerpc/fsl_msi: add support for "msi-address-64" property
  powerpc/85xx: Setup secondary cores PIR with hard SMP id
  powerpc/fsl-booke: Fix settlbcam for 64-bit
  powerpc/85xx: Adding DCSR node to dtsi device trees
  powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards
  powerpc/85xx: fix PHYS_64BIT selection for P1022DS
  powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map
  powerpc: respect mem= setting for early memory limit setup
  powerpc: Update corenet64_smp_defconfig
  powerpc: Update mpc85xx/corenet 32-bit defconfigs
  ...

Fix up trivial conflicts in:
 - arch/powerpc/configs/40x/hcu4_defconfig
	removed stale file, edited elsewhere
 - arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c:
	added opal and gelic drivers vs added ePAPR driver
 - drivers/tty/serial/8250.c
	moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support
2011-11-06 17:12:03 -08:00
Andrea Arcangeli
b35a35b556 thp: share get_huge_page_tail()
This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Andrea Arcangeli
cf592bf768 powerpc: gup_huge_pmd() return 0 if pte changes
powerpc didn't return 0 in that case, if it's rolling back the *nr pointer
it should also return zero to avoid adding pages to the array at the wrong
offset.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Andrea Arcangeli
3526741f09 powerpc: gup_hugepte() support THP based tail recounting
Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Andrea Arcangeli
8596468487 powerpc: gup_hugepte() avoid freeing the head page too many times
We only taken "refs" pins on the head page not "*nr" pins.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Andrea Arcangeli
405e44f2e3 powerpc: get_hugepte() don't put_page() the wrong page
"page" may have changed to point to the next hugepage after the loop
completed, The references have been taken on the head page, so the
put_page must happen there too.

This is a longstanding issue pre-thp inclusion.

It's totally unclear how these page_cache_add_speculative and
pte_val(pte) != pte_val(*ptep) checks are necessary across all the
powerpc gup_fast code, when x86 doesn't need any of that: there's no way
the page can be freed with irq disabled so we're guaranteed the
atomic_inc will happen on a page with page_count > 0 (so not needing the
speculative check).

The pte check is also meaningless on x86: no need to rollback on x86 if
the pte changed, because the pte can still change a CPU tick after the
check succeeded and it won't be rolled back in that case.  The important
thing is we got a reference on a valid page that was mapped there a CPU
tick ago.  So not knowing the soft tlb refill code of ppc64 in great
detail I'm not removing the "speculative" page_count increase and the
pte checks across all the code, but unless there's a strong reason for
it they should be later cleaned up too.

If a pte can change from huge to non-huge (like it could happen with
THP) passing a pte_t *ptep to gup_hugepte() would also require to repeat
the is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs
only so I'm not altering that.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Andrea Arcangeli
2839bdc1bf powerpc: remove superfluous PageTail checks on the pte gup_fast
This part of gup_fast doesn't seem capable of handling hugetlbfs ptes,
those should be handled by gup_hugepd only, so these checks are
superfluous.

Plus if this wasn't a noop, it would have oopsed because, the insistence
of using the speculative refcounting would trigger a VM_BUG_ON if a tail
page was encountered in the page_cache_get_speculative().

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Andrea Arcangeli
70b50f94f1 mm: thp: tail page refcounting fix
Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().

He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail->_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page->_count in __split_huge_page_refcount() (in addition to the
head_page->_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).

The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages.  The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it.  A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.

This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Paul Gortmaker
4b16f8e2d6 powerpc: various straight conversions from module.h --> export.h
All these files were including module.h just for the basic
EXPORT_SYMBOL infrastructure.  We can shift them off to the
export.h header which is a way smaller footprint and thus
realize some compile time gains.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:44 -04:00
Paul Gortmaker
9308794884 powerpc: include export.h for files using EXPORT_SYMBOL/THIS_MODULE
Fix failures in powerpc associated with the previously allowed
implicit module.h presence that now lead to things like this:

arch/powerpc/mm/mmu_context_hash32.c:76:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/powerpc/mm/tlb_hash32.c:48:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
arch/powerpc/kernel/pci_32.c:51:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/powerpc/kernel/iomap.c:36:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
arch/powerpc/platforms/44x/canyonlands.c:126:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
arch/powerpc/kvm/44x.c:168:59: error: 'THIS_MODULE' undeclared (first use in this function)

[with several contibutions from Stephen Rothwell <sfr@canb.auug.org.au>]

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:38 -04:00
Paul Gortmaker
66b15db69c powerpc: add export.h to files making use of EXPORT_SYMBOL
With module.h being implicitly everywhere via device.h, the absence
of explicitly including something for EXPORT_SYMBOL went unnoticed.
Since we are heading to fix things up and clean module.h from the
device.h file, we need to explicitly include these files now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:37 -04:00
Becky Bruce
4559424a0c powerpc/fsl-booke: Fix settlbcam for 64-bit
Currently, it does a cntlzd on the size and then subtracts it from
21.... this doesn't take into account the varying size of a "long".
Just use __ilog instead (and subtract the 10 we have to subtract
to get to the tsize encoding).

Also correct the comment about page sizes supported.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2011-10-12 23:39:10 -05:00
Kumar Gala
1dc91c3eb3 powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map
On FSL Book-E devices we support multiple large TLB sizes and so we can
get into situations in which the initial 1G TLB size is too big and
we're asked for a size that is not mappable by a single entry (like
512M).  The single entry is important because when we bring up secondary
cores they need to ensure any data structure they need to access (eg
PACA or stack) is always mapped.

So we really need to determine what size will actually be mapped by the
first TLB entry to ensure we limit early memory references to that
region.  We refactor the map_mem_in_cams() code to provider a helper
function that we can utilize to determine the size of the first TLB
entry while taking into account size and alignment constraints.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2011-10-11 23:30:41 -05:00
Paul Mackerras
25c29f9e32 powerpc: Fix hugetlb with CONFIG_PPC_MM_SLICES=y
Commit 41151e77a4 ("powerpc: Hugetlb for BookE") added some
#ifdef CONFIG_MM_SLICES conditionals to hugetlb_get_unmapped_area()
and vma_mmu_pagesize().  Unfortunately this is not the correct config
symbol; it should be CONFIG_PPC_MM_SLICES.  The result is that
attempting to use hugetlbfs on 64-bit Power server processors results
in an infinite stack recursion between get_unmapped_area() and
hugetlb_get_unmapped_area().

This fixes it by changing the #ifdef to use CONFIG_PPC_MM_SLICES
in those functions and also in book3e_hugetlb_preload().

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-23 10:21:33 +10:00
Anton Blanchard
8bdafa39a4 powerpc: Fix deadlock in icswx code
The icswx code introduced an A-B B-A deadlock:

     CPU0                    CPU1
     ----                    ----
lock(&anon_vma->mutex);
                             lock(&mm->mmap_sem);
                             lock(&anon_vma->mutex);
lock(&mm->mmap_sem);

Instead of using the mmap_sem to keep mm_users constant, take the
page table spinlock.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20 15:53:23 +10:00
Anton Blanchard
a11940978b powerpc: Fix oops when echoing bad values to /sys/devices/system/memory/probe
If we echo an address the hypervisor doesn't like to
/sys/devices/system/memory/probe we oops the box:

# echo 0x10000000000 > /sys/devices/system/memory/probe

kernel BUG at arch/powerpc/mm/hash_utils_64.c:541!

The backtrace is:

create_section_mapping
arch_add_memory
add_memory
memory_probe_store
sysdev_class_store
sysfs_write_file
vfs_write
SyS_write

In create_section_mapping we BUG if htab_bolt_mapping returned
an error. A better approach is to return an error which will
propagate back to userspace.

Rerunning the test with this patch applied:

# echo 0x10000000000 > /sys/devices/system/memory/probe
-bash: echo: write error: Invalid argument

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20 15:53:23 +10:00
Anton Blanchard
dfbe93a222 powerpc: Coding style cleanups
While converting code to use for_each_node_by_type I noticed a
number of coding style issues.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20 15:53:23 +10:00
Anton Blanchard
94db7c5e14 powerpc: Use for_each_node_by_type instead of open coding it
Use for_each_node_by_type instead of open coding it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20 15:53:23 +10:00
Anton Blanchard
6083184269 powerpc/numa: Remove double of_node_put in hot_add_node_scn_to_nid
During memory hotplug testing, I got the following warning:

ERROR: Bad of_node_put() on /memory@0

of_node_release
kref_put
of_node_put
of_find_node_by_type
hot_add_node_scn_to_nid
hot_add_scn_to_nid
memory_add_physaddr_to_nid
...

of_find_node_by_type() loop does the of_node_put for us so we only
need the handle the case where we terminate the loop early.

As suggested by Stephen Rothwell we can do the of_node_put
unconditionally outside of the loop since of_node_put handles a
NULL argument fine.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20 15:53:22 +10:00
Becky Bruce
41151e77a4 powerpc: Hugetlb for BookE
Enable hugepages on Freescale BookE processors.  This allows the kernel to
use huge TLB entries to map pages, which can greatly reduce the number of
TLB misses and the amount of TLB thrashing experienced by applications with
large memory footprints.  Care should be taken when using this on FSL
processors, as the number of large TLB entries supported by the core is low
(16-64) on current processors.

The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g.
Page sizes larger than the max zone size are called "gigantic" pages and
must be allocated on the command line (and cannot be deallocated).

This is currently only fully implemented for Freescale 32-bit BookE
processors, but there is some infrastructure in the code for
64-bit BooKE.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20 09:19:40 +10:00
Linus Torvalds
184475029a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (99 commits)
  drivers/virt: add missing linux/interrupt.h to fsl_hypervisor.c
  powerpc/85xx: fix mpic configuration in CAMP mode
  powerpc: Copy back TIF flags on return from softirq stack
  powerpc/64: Make server perfmon only built on ppc64 server devices
  powerpc/pseries: Fix hvc_vio.c build due to recent changes
  powerpc: Exporting boot_cpuid_phys
  powerpc: Add CFAR to oops output
  hvc_console: Add kdb support
  powerpc/pseries: Fix hvterm_raw_get_chars to accept < 16 chars, fixing xmon
  powerpc/irq: Quieten irq mapping printks
  powerpc: Enable lockup and hung task detectors in pseries and ppc64 defeconfigs
  powerpc: Add mpt2sas driver to pseries and ppc64 defconfig
  powerpc: Disable IRQs off tracer in ppc64 defconfig
  powerpc: Sync pseries and ppc64 defconfigs
  powerpc/pseries/hvconsole: Fix dropped console output
  hvc_console: Improve tty/console put_chars handling
  powerpc/kdump: Fix timeout in crash_kexec_wait_realmode
  powerpc/mm: Fix output of total_ram.
  powerpc/cpufreq: Add cpufreq driver for Momentum Maple boards
  powerpc: Correct annotations of pmu registration functions
  ...

Fix up trivial Kconfig/Makefile conflicts in arch/powerpc, drivers, and
drivers/cpufreq
2011-07-25 22:59:39 -07:00
Linus Torvalds
5fabc487c9 Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  KVM: IOMMU: Disable device assignment without interrupt remapping
  KVM: MMU: trace mmio page fault
  KVM: MMU: mmio page fault support
  KVM: MMU: reorganize struct kvm_shadow_walk_iterator
  KVM: MMU: lockless walking shadow page table
  KVM: MMU: do not need atomicly to set/clear spte
  KVM: MMU: introduce the rules to modify shadow page table
  KVM: MMU: abstract some functions to handle fault pfn
  KVM: MMU: filter out the mmio pfn from the fault pfn
  KVM: MMU: remove bypass_guest_pf
  KVM: MMU: split kvm_mmu_free_page
  KVM: MMU: count used shadow pages on prepareing path
  KVM: MMU: rename 'pt_write' to 'emulate'
  KVM: MMU: cleanup for FNAME(fetch)
  KVM: MMU: optimize to handle dirty bit
  KVM: MMU: cache mmio info on page fault path
  KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code
  KVM: MMU: do not update slot bitmap if spte is nonpresent
  KVM: MMU: fix walking shadow page table
  KVM guest: KVM Steal time registration
  ...
2011-07-24 09:07:03 -07:00
Linus Torvalds
4d4abdcb1d Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits)
  perf: Remove the nmi parameter from the oprofile_perf backend
  x86, perf: Make copy_from_user_nmi() a library function
  perf: Remove perf_event_attr::type check
  x86, perf: P4 PMU - Fix typos in comments and style cleanup
  perf tools: Make test use the preset debugfs path
  perf tools: Add automated tests for events parsing
  perf tools: De-opt the parse_events function
  perf script: Fix display of IP address for non-callchain path
  perf tools: Fix endian conversion reading event attr from file header
  perf tools: Add missing 'node' alias to the hw_cache[] array
  perf probe: Support adding probes on offline kernel modules
  perf probe: Add probed module in front of function
  perf probe: Introduce debuginfo to encapsulate dwarf information
  perf-probe: Move dwarf library routines to dwarf-aux.{c, h}
  perf probe: Remove redundant dwarf functions
  perf probe: Move strtailcmp to string.c
  perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END
  tracing/kprobe: Update symbol reference when loading module
  tracing/kprobes: Support module init function probing
  kprobes: Return -ENOENT if probe point doesn't exist
  ...
2011-07-22 16:44:39 -07:00
Benjamin Herrenschmidt
4b575f3e8a Merge remote-tracking branch 'jwb/next' into next 2011-07-22 13:16:41 +10:00
Tony Breeds
f7ba2991e9 powerpc/mm: Fix output of total_ram.
On 32bit platforms that support >= 4GB memory total_ram was truncated.
This creates a confusing printk:
	Top of RAM: 0x100000000, Total RAM: 0x0
Fix that:
	Top of RAM: 0x100000000, Total RAM: 0x100000000

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-19 15:13:04 +10:00
Tejun Heo
5dfe8660a3 bootmem: Replace work_with_active_regions() with for_each_mem_pfn_range()
Callback based iteration is cumbersome and much less useful than
for_each_*() iterator.  This patch implements for_each_mem_pfn_range()
which replaces work_with_active_regions().  All the current users of
work_with_active_regions() are converted.

This simplifies walking over early_node_map and will allow converting
internal logics in page_alloc to use iterator instead of walking
early_node_map directly, which in turn will enable moving node
information to memblock.

powerpc change is only compile tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110714074610.GD3455@htj.dyndns.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:45:29 -07:00
Dave Kleikamp
9661534d6a powerpc/47x: allow kernel to be loaded in higher physical memory
The 44x code (which is shared by 47x) assumes the available physical memory
begins at 0x00000000.  This is not necessarily the case in an AMP
environment.

Support CONFIG_RELOCATABLE for 476 in order to allow the kernel to be
loaded into a higher memory range.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2011-07-12 10:34:24 -04:00
Dave Kleikamp
91b191c71e powerpc/44x: don't use tlbivax on AMP systems
Since other OS's may be running on the other cores don't use tlbivax

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2011-07-12 09:21:55 -04:00
Paul Mackerras
9e368f2915 KVM: PPC: book3s_hv: Add support for PPC970-family processors
This adds support for running KVM guests in supervisor mode on those
PPC970 processors that have a usable hypervisor mode.  Unfortunately,
Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to
1), but the YDL PowerStation does have a usable hypervisor mode.

There are several differences between the PPC970 and POWER7 in how
guests are managed.  These differences are accommodated using the
CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature
bits.  Notably, on PPC970:

* The LPCR, LPID or RMOR registers don't exist, and the functions of
  those registers are provided by bits in HID4 and one bit in HID0.

* External interrupts can be directed to the hypervisor, but unlike
  POWER7 they are masked by MSR[EE] in non-hypervisor modes and use
  SRR0/1 not HSRR0/1.

* There is no virtual RMA (VRMA) mode; the guest must use an RMO
  (real mode offset) area.

* The TLB entries are not tagged with the LPID, so it is necessary to
  flush the whole TLB on partition switch.  Furthermore, when switching
  partitions we have to ensure that no other CPU is executing the tlbie
  or tlbsync instructions in either the old or the new partition,
  otherwise undefined behaviour can occur.

* The PMU has 8 counters (PMC registers) rather than 6.

* The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist.

* The SLB has 64 entries rather than 32.

* There is no mediated external interrupt facility, so if we switch to
  a guest that has a virtual external interrupt pending but the guest
  has MSR[EE] = 0, we have to arrange to have an interrupt pending for
  it so that we can get control back once it re-enables interrupts.  We
  do that by sending ourselves an IPI with smp_send_reschedule after
  hard-disabling interrupts.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:59 +03:00
Paul Mackerras
969391c58a powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits
This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to
indicate that we have a usable hypervisor mode, and another to indicate
that the processor conforms to PowerISA version 2.06.  We also add
another bit to indicate that the processor conforms to ISA version 2.01
and set that for PPC970 and derivatives.

Some PPC970 chips (specifically those in Apple machines) have a
hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode
is not useful in the sense that there is no way to run any code in
supervisor mode (HV=0 PR=0).  On these processors, the LPES0 and LPES1
bits in HID4 are always 0, and we use that as a way of detecting that
hypervisor mode is not useful.

Where we have a feature section in assembly code around code that
only applies on POWER7 in hypervisor mode, we use a construct like

END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)

The definition of END_FTR_SECTION_IFSET is such that the code will
be enabled (not overwritten with nops) only if all bits in the
provided mask are set.

Note that the CPU feature check in __tlbie() only needs to check the
ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called
if we are running bare-metal, i.e. in hypervisor mode.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:58 +03:00
Becky Bruce
3160b09796 powerpc: Create next_tlbcam_idx percpu variable for FSL_BOOKE
This is used to round-robin TLBCAM entries.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2011-07-08 00:21:34 -05:00
Peter Zijlstra
a8b0ca17b8 perf: Remove the nmi parameter from the swevent and overflow interface
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.

For the various event classes:

  - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
  - tracepoint: nmi=0; since tracepoint could be from NMI context.
  - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01 11:06:35 +02:00
Dave Carroll
a9c0f41b3a powerpc: Add printk companion for ppc_md.progress
This patch adds a printk companion to replace the udbg progress function
when initmem is freed.

Suggested-by: Milton Miller <miltonm@bga.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Carroll <dcarroll@astekcorp.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-30 15:28:05 +10:00
Dave Carroll
2773fcc8c4 powerpc: Move free_initmem to common code
The free_initmem function is basically duplicated in mm/init_32,
and init_64, and is moved to the common 32/64-bit mm/mem.c.

All other sections except init were removed in v2.6.15 by
6c45ab992e (powerpc: Remove section
free() and linker script bits), and therefore the bulk of the executed
code is identical.

This patch also removes updating ppc_md.progress to NULL in the powermac
late_initcall.

Suggested-by: Milton Miller <miltonm@bga.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Carroll <dcarroll@astekcorp.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-30 15:28:05 +10:00
Benjamin Herrenschmidt
6da49a2925 Merge remote branch 'origin/master' into next 2011-06-30 15:23:59 +10:00
Becky Bruce
3d41e0f6d9 powerpc: mem_init should call memblock_is_reserved with phys_addr_t
This has been broken for a while but hasn't been an issue until
now because nobody was reserving regions at high addresses.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-29 17:48:18 +10:00
Scott Wood
f67f4ef5fc powerpc/book3e-64: use a separate TLB handler when linear map is bolted
On MMUs such as FSL where we can guarantee the entire linear mapping is
bolted, we don't need to worry about linear TLB misses.  If on top of
that we do a full table walk, we get rid of all recursive TLB faults, and
can dispense with some state saving.  This gains a few percent on
TLB-miss-heavy workloads, and around 50% on a benchmark that had a high
rate of virtual page table faults under the normal handler.

While touching the EX_TLB layout, remove EX_TLB_MMUCR0, EX_TLB_SRR0, and
EX_TLB_SRR1 as they're not used.

[BenH: Fixed build with 64K pages (wsp config)]

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-29 17:47:48 +10:00
Christian Dietrich
76462232c2 arch/powerpc: use printk_ratelimited instead of printk_ratelimit
Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited.

Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-29 15:31:01 +10:00
Kumar Gala
32d206eb56 powerpc/book3e: Clarify HW table walk enable/disable message
Before if we didn't support or enable HW table walk we'd get a messaage
like:

MMU: Book3E Page Tables Disabled

Which is a bit misleading.  Now it will say:

MMU: Book3E HW tablewalk not supported

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-17 16:19:51 +10:00
Benjamin Herrenschmidt
307cfe7153 powerpc: Force page alignment for initrd reserved memory
When using 64K pages with a separate cpio rootfs, U-Boot will align
the rootfs on a 4K page boundary. When the memory is reserved, and
subsequent early memblock_alloc is called, it will allocate memory
between the 64K page alignment and reserved memory. When the reserved
memory is subsequently freed, it is done so by pages, causing the
early memblock_alloc requests to be re-used, which in my case, caused
the device-tree to be clobbered.

This patch forces the reserved memory for initrd to be kernel page
aligned, and will move the device tree if it overlaps with the range
extension of initrd. This patch will also consolidate the identical
function free_initrd_mem() from mm/init_32.c, init_64.c to mm/mem.c,
and adds the same range extension when freeing initrd. free_initrd_mem()
is also moved to the __init section.

Many thanks to Milton Miller for his input on this patch.

[BenH: Fixed build without CONFIG_BLK_DEV_INITRD]

Signed-off-by: Dave Carroll <dcarroll@astekcorp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-09 16:52:38 +10:00
Peter Zijlstra
2672391169 mm, powerpc: move the RCU page-table freeing into generic code
In case other architectures require RCU freed page-tables to implement
gup_fast() and software filled hashes and similar things, provide the
means to do so by moving the logic into generic code.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Requested-by: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:16 -07:00
Peter Zijlstra
d6bf29b44d powerpc: mmu_gather rework
Fix up powerpc to the new mmu_gather stuff.

PPC has an extra batching queue to RCU free the actual pagetable
allocations, use the ARCH extentions for that for now.

For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the
hardware hash-table, keep using per-cpu arrays but flush on context switch
and use a TLF bit to track the lazy_mmu state.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:13 -07:00
Anton Blanchard
40f1ce7fb7 powerpc: Remove ioremap_flags
We have a confusing number of ioremap functions. Make things just a
bit simpler by merging ioremap_flags and ioremap_prot.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-19 14:30:43 +10:00
Anton Blanchard
be135f4089 powerpc: Add ioremap_wc
Add ioremap_wc so drivers can request write combining on kernel
mappings.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-19 14:30:42 +10:00
Stephen Rothwell
79af2187fa powerpc: Fix compile with icwsx support
Due to a collision between NO_CONTEXT->MMU_NO_CONTEXT change and
Anton's patch.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-06 13:18:34 +10:00
KOSAKI Motohiro
104699c0ab powerpc: Convert old cpumask API into new one
Adapt new API.

Almost change is trivial. Most important change is the below line
because we plan to change task->cpus_allowed implementation.

-       ctx->cpus_allowed = current->cpus_allowed;

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 15:22:59 +10:00
Tseng-Hui (Frank) Lin
851d2e2fe8 powerpc: Add Initiate Coprocessor Store Word (icswx) support
Icswx is a PowerPC instruction to send data to a co-processor. On Book-S
processors the LPAR_ID and process ID (PID) of the owning process are
registered in the window context of the co-processor at initialization
time. When the icswx instruction is executed the L2 generates a cop-reg
transaction on PowerBus. The transaction has no address and the
processor does not perform an MMU access to authenticate the transaction.
The co-processor compares the LPAR_ID and the PID included in the
transaction and the LPAR_ID and PID held in the window context to
determine if the process is authorized to generate the transaction.

The OS needs to assign a 16-bit PID for the process. This cop-PID needs
to be updated during context switch. The cop-PID needs to be destroyed
when the context is destroyed.

Signed-off-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com>
Signed-off-by: Tseng-Hui (Frank) Lin <thlin@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 15:19:26 +10:00
Michael Neuling
a32e252f7c powerpc: Use new CPU feature bit to select 2.06 tlbie
This removes MMU_FTR_TLBIE_206 as we can now use CPU_FTR_HVMODE_206.  It
also changes the logic to select which tlbie to use to be based on this
new CPU feature bit.

This also duplicates the ASM_FTR_IF/SET/CLR defines for CPU features
(copied from MMU features).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 15:19:26 +10:00
Matt Evans
44ae3ab335 powerpc: Free up some CPU feature bits by moving out MMU-related features
Some of the 64bit PPC CPU features are MMU-related, so this patch moves
them to MMU_FTR_ bits.  All cpu_has_feature()-style tests are moved to
mmu_has_feature(), and seven feature bits are freed as a result.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-27 14:18:52 +10:00
Michael Ellerman
e70606eb9b powerpc/numa: Look for ibm, associativity-reference-points at the root
If we don't find ibm,associativity-reference-points as a child of
/rtas, look for it at the root of the tree instead. We use this on
Book3E where we have no RTAS but still use the sPAPR conventions
for NUMA.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-27 14:18:35 +10:00
Benjamin Herrenschmidt
bd49178109 powerpc: Add TLB size detection for TYPE_3E MMUs
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-27 13:02:10 +10:00
Anton Blanchard
b68a70c496 powerpc: Replace open coded instruction patching with patch_instruction/patch_branch
There are a few places we patch instructions without using
patch_instruction and patch_branch, probably because they
predated it. Fix it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-20 17:01:18 +10:00
Michael Ellerman
f5be2dc0bd powerpc/nohash: Allocate stale_map[cpu] on CPU_UP_PREPARE not CPU_ONLINE
Currently we allocate the stale_map for a cpu when it comes online,
this leaves open a small window where a process can be scheduled
on the cpu before the stale_map is allocated. Instead allocate
the stale_map at CPU_UP_PREPARE time, that way it will be always
available before tasks start running.

It is possible the cpu fails to come up, in which case we should free
the stale_map, so add a CPU_UP_CANCELED case to do that.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-20 17:01:18 +10:00
Michael Ellerman
5e8e7b404a powerpc/mm: Standardise on MMU_NO_CONTEXT
Use MMU_NO_CONTEXT as the initialiser for mm_context.id on
nohash and hash64.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-20 16:59:20 +10:00
Linus Torvalds
42933bac11 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
  Fix common misspellings
2011-04-07 11:14:49 -07:00
Sylvestre Ledru
f65e51d740 Documentation: fix minor typos/spelling
Fix some minor typos:
 * informations => information
 * there own => their own
 * these => this

Signed-off-by: Sylvestre Ledru <sylvestre.ledru@scilab.org>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04 17:51:47 -07:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Benjamin Herrenschmidt
6090912c4a powerpc: Implement dma_mmap_coherent()
This is used by Alsa to mmap buffers allocated with dma_alloc_coherent()
into userspace. We need a special variant to handle machines with
non-coherent DMAs as those buffers have "special" virt addresses and
require non-cachable mappings

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-30 10:44:00 +11:00
Linus Torvalds
0a95d92c00 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (62 commits)
  powerpc/85xx: Fix signedness bug in cache-sram
  powerpc/fsl: 85xx: document cache sram bindings
  powerpc/fsl: define binding for fsl mpic interrupt controllers
  powerpc/fsl_msi: Handle msi-available-ranges better
  drivers/serial/ucc_uart.c: Add of_node_put to avoid memory leak
  powerpc/85xx: Fix SPE float to integer conversion failure
  powerpc/85xx: Update sata controller compatible for p1022ds board
  ATA: Add FSL sata v2 controller support
  powerpc/mpc8xxx_gpio: simplify searching for 'fsl, qoriq-gpio' compatiable
  powerpc/8xx: remove obsolete mgsuvd board
  powerpc/82xx: rename and update mgcoge board support
  powerpc/83xx: rename and update kmeter1
  powerpc/85xx: Workaroudn e500 CPU erratum A005
  powerpc/fsl_pci: Add support for FSL PCIe controllers v2.x
  powerpc/85xx: Fix writing to spin table 'cpu-release-addr' on ppc64e
  powerpc/pseries: Disable MSI using new interface if possible
  powerpc: Enable GENERIC_HARDIRQS_NO_DEPRECATED.
  powerpc: core irq_data conversion.
  powerpc: sysdev/xilinx_intc irq_data conversion.
  powerpc: sysdev/uic irq_data conversion.
  ...

Fix up conflicts in arch/powerpc/sysdev/fsl_msi.c (due to getting rid of
of_platform_driver in arch/powerpc)
2011-03-18 06:31:43 -07:00
Benjamin Herrenschmidt
831532035b Merge remote branch 'jwb/next' into next 2011-03-17 17:59:01 +11:00
Benjamin Herrenschmidt
36e8695ca5 powerpc/pseries: Disable VPNH feature
This feature triggers nasty races in the scheduler between the
rebuilding of the topology and the load balancing code, causing
the machine to hang.

Disable it for now until the races are fixed.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-10 10:06:41 +11:00
Scott Wood
6dd2270029 powerpc: Fix memory limits when starting at a non-zero address
memblock_enforce_memory_limit() takes the desired maximum quantity of memory
to end up with, not an address above which memory will not be used.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-02 16:56:15 +11:00
Peter Zijlstra
f342552b91 powerpc/mm: Make hpte_need_flush() safe for preemption
hpte_need_flush() might be called outside of a preempt section
when manipulating the kernel page tables, so we need to use the
appopriate variants of per-cpu variable accesses. There should
be no risk of being in the middle of a batch and a context
switch will flush any pending batch.

[Patch extracted from a larger patch in Peter's preemptible
 mmu_gather series]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-02 14:56:48 +11:00
Anton Blanchard
429f4d8d20 powerpc/numa: Fix bug in unmap_cpu_from_node
When converting to the new cpumask code I screwed up:

-       if (cpu_isset(cpu, numa_cpumask_lookup_table[node])) {
-               cpu_clear(cpu, numa_cpumask_lookup_table[node]);
+       if (cpumask_test_cpu(cpu, node_to_cpumask_map[node])) {
+               cpumask_set_cpu(cpu, node_to_cpumask_map[node]);

This was introduced in commit 25863de07a (powerpc/cpumask: Convert NUMA code
to new cpumask API)

Fix it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-02-07 13:06:06 +11:00
Anton Blanchard
fe5cfd6355 powerpc/numa: Disable VPHN on dedicated processor partitions
There is no need to start up the timer and monitor topology changes on a
dedicated processor partition, so disable it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-02-07 13:06:04 +11:00
Anton Blanchard
c0e5e46f39 powerpc/numa: Add length when creating OF properties via VPHN
The rest of the NUMA code expects an OF associativity property with
the first cell containing the length. Without this fix all topology changes
cause us to misparse the property and put the cpu into node 0.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-02-07 13:06:03 +11:00
Anton Blanchard
d69043e806 powerpc/numa: Check for all VPHN changes
The hypervisor uses unsigned 1 byte counters to signal topology changes to
the OS. Since they can wrap we need to check for any difference, not just if
the hypervisor count is greater than the previous count.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-02-07 13:06:01 +11:00
Anton Blanchard
5de1669910 powerpc/numa: Only use active VPHN count fields
VPHN supports up to 8 distance fields but the number of entries in
ibm,associativity-reference-points signifies how many are in use.
Don't look at all the VPHN counts, only distance_ref_points_depth
worth.

Since we already cap our distance metrics at MAX_DISTANCE_REF_POINTS,
use that to size the VPHN arrays and add a BUILD_BUG_ON to avoid it growing
larger than the VPHN maximum of 8.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-02-07 13:05:59 +11:00
Jesse Larrew
cd9d6cc726 powerpc/pseries: Remove unnecessary variable initializations in numa.c
Remove unnecessary variable initializations in VPHN functions.

Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-02-07 13:05:36 +11:00
Jesse Larrew
7639adaafb powerpc/pseries: Fix brace placement in numa.c
Fix brace placement in VPHN code.

Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-02-07 12:58:23 +11:00
Jesse Larrew
bd03403ad5 powerpc/pseries: Fix typo in VPHN comments
Correct a spelling error in VPHN comments in numa.c.

Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-02-07 12:58:21 +11:00
Dave Kleikamp
21a06b0459 powerpc/476: Workaround for PLB6 hang
The 476FP core may hang if an instruction fetch happens during an msync
following a tlbsync.  This workaround makes sure that enough instruction
cache lines are pre-fetched before executing the msync.  (sync and msync
are the same to the compiler.)

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2011-02-02 06:59:02 -05:00
Andrea Arcangeli
9180706344 thp: alter compound get_page/put_page
Alter compound get_page/put_page to keep references on subpages too, in
order to allow __split_huge_page_refcount to split an hugepage even while
subpages have been pinned by one of the get_user_pages() variants.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:39 -08:00
Benjamin Herrenschmidt
5d7d8072ed powerpc/pseries: Fix build of topology stuff without CONFIG_NUMA
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-12 10:56:29 +11:00
Jesse Larrew
39bf990ead powerpc/pseries: Fix VPHN build errors on non-SMP systems
The header asm/hvcall.h was previously included indirectly via
smp.h. On non-SMP systems, however, these declarations are excluded
and the build breaks. This is easily fixed by including asm/hvcall.h
directly.

The VPHN feature is only meaningful on NUMA systems that implement
the SPLPAR option, so exclude the VPHN code on systems without
SPLPAR enabled.

Also, expose unmap_cpu_from_node() on systems with SPLPAR enabled,
even if CONFIG_HOTPLUG_CPU is disabled.

Lastly, map_cpu_to_node() is now needed by VPHN to manipulate the
node masks after boot time, so remove the __cpuinit annotation to
fix a section mismatch.

Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
2011-01-11 16:06:16 +11:00
Jesper Juhl
ae9fd31a36 powerpc: Remove unnecessary casts of void ptr
Hi,

The [vk][cmz]alloc(_node) family of functions return void pointers which
it's completely unnecessary/pointless to cast to other pointer types since
that happens implicitly.

This patch removes such casts from arch/powerpc/

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-12-09 15:36:30 +11:00
Jesse Larrew
9eff1a3840 powerpc/pseries: Poll VPA for topology changes and update NUMA maps
This patch sets a timer during boot that will periodically poll the
associativity change counters in the VPA. When a change in
associativity is detected, it retrieves the new associativity domain
information via the H_HOME_NODE_ASSOCIATIVITY hcall and updates the
NUMA node maps and sysfs entries accordingly. Note that since the
ibm,associativity device tree property does not exist on configurations
with both NUMA and SPLPAR enabled, no device tree updates are necessary.

Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-12-09 15:36:29 +11:00
Michael Ellerman
7a9d12568e powerpc: Record vma->phys_addr in ioremap()
The vmalloc code can track the physical address of a vma, when the
vma is used for ioremap, if set it is displayed in /proc/vmallocinfo.

Because get_vm_area_caller() doesn't know it's being called for
ioremap() it's up to the arch code to set the phys_addr. A bunch
of other arch's do this, I'm not sure why powerpc doesn't?

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-12-09 15:35:32 +11:00
Benjamin Herrenschmidt
f4b9841595 Merge branch 'nvram' into next 2010-12-09 14:36:38 +11:00
Peter Zijlstra
f2e785ed5f powerpc: Use call_rcu_sched() for pagetables
PowerPC relies on IRQ-disable to guard against RCU quiecent states,
use the appropriate RCU call version.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-11-30 10:42:20 +11:00
Michael Neuling
0b97fee0ef powerpc/mm: Avoid avoidable void* pointer
Change pgdir from a void to real type.  Having this as a void is
stupid and has already caused 1 bug.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-11-29 15:48:23 +11:00
Nishanth Aravamudan
cd34206e94 powerpc: Add memory_hotplug_max()
Add a function to get the maximum address that can be hotplug added.
This is needed to calculate the size of the tce table needed to cover
all memory in 1:1 mode.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-11-29 15:48:21 +11:00
Vaidyanathan Srinivasan
99d8670525 powerpc: Cleanup APIs for cpu/thread/core mappings
These APIs take logical cpu number as input
Change cpu_first_thread_in_core() to cpu_first_thread_sibling()
Change cpu_last_thread_in_core() to cpu_last_thread_sibling()

These APIs convert core number (index) to logical cpu/thread numbers
Add cpu_first_thread_of_core(int core)
Changed cpu_thread_to_core() to cpu_core_index_of_thread(int cpu)

The goal is to make 'threads_per_core' accessible to the
pseries_energy module.  Instead of making an API to read
threads_per_core, this is a higher level wrapper function to
convert from logical cpu number to core number.

The current APIs cpu_first_thread_in_core() and
cpu_last_thread_in_core() returns logical CPU number while
cpu_thread_to_core() returns core number or index which is
not a logical CPU number.  The new APIs are now clearly named to
distinguish 'core number' versus first and last 'logical cpu
number' in that core.

The new APIs cpu_{first,last}_thread_sibling() work on
logical cpu numbers.  While cpu_first_thread_of_core() and
cpu_core_index_of_thread() work on core index.

Example usage:  (4 threads per core system)

cpu_first_thread_sibling(5) = 4
cpu_last_thread_sibling(5) = 7
cpu_core_index_of_thread(5) = 1
cpu_first_thread_of_core(1) = 4

cpu_core_index_of_thread() is used in cpu_to_drc_index() in the
module and cpu_first_thread_of_core() is used in
drc_index_to_cpu() in the module.

Make API changes to few callers.  Export symbols for use in modules.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-11-29 15:48:19 +11:00
Kumar Gala
82ae5eaffa powerpc/mm: Fix module instruction tlb fault handling on Book-E 64
We were seeing oops like the following when we did an rmmod on a module:

Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x8000000000008010
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2 P5020 DS
last sysfs file: /sys/devices/qman-portals.2/qman-pool.9/uevent
Modules linked in: qman_tester(-)
NIP: 8000000000008010 LR: c000000000074858 CTR: 8000000000008010
REGS: c00000002e29bab0 TRAP: 0400   Not tainted
(2.6.34.6-00744-g2d21f14)
MSR: 0000000080029000 <EE,ME,CE>  CR: 24000448  XER: 00000000
TASK = c00000007a8be600[4987] 'rmmod' THREAD: c00000002e298000 CPU: 1
GPR00: 8000000000008010 c00000002e29bd30 8000000000012798 c00000000035fb28
GPR04: 0000000000000002 0000000000000002 0000000024022428 c000000000009108
GPR08: fffffffffffffffe 800000000000a618 c0000000003c13c8 0000000000000000
GPR12: 0000000022000444 c00000000fffed00 0000000000000000 0000000000000000
GPR16: 00000000100c0000 0000000000000000 00000000100dabc8 0000000010099688
GPR20: 0000000000000000 00000000100cfc28 0000000000000000 0000000010011a44
GPR24: 00000000100017b2 0000000000000000 0000000000000000 0000000000000880
GPR28: c00000000035fb28 800000000000a7b8 c000000000376d80 c0000000003cce50
NIP [8000000000008010] .test_exit+0x0/0x10 [qman_tester]
LR [c000000000074858] .SyS_delete_module+0x1f8/0x2f0
Call Trace:
[c00000002e29bd30] [c0000000000748b4] .SyS_delete_module+0x254/0x2f0 (unreliable)
[c00000002e29be30] [c000000000000580] syscall_exit+0x0/0x2c
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
38600000 4e800020 60000000 60000000 <4e800020> 60000000 60000000 60000000
---[ end trace 4f57124939a84dc8 ]---

This appears to be due to checking the wrong permission bits in the
instruction_tlb_miss handling if the address that faulted was in vmalloc
space.  We need to look at the supervisor execute (_PAGE_BAP_SX) bit and
not the user bit (_PAGE_BAP_UX/_PAGE_EXEC).

Also removed a branch level since it did not appear to be used.

Reported-by: Jeffrey Ladouceur <Jeffrey.Ladouceur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-11-18 14:54:23 +11:00
Michael Neuling
1c2c25c787 powerpc: Fix call to subpage_protection()
In:
  powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT
  commit d28513bc7f
  Author: David Gibson <david@gibson.dropbear.id.au>

subpage_protection() was changed to to take an mm rather a pgdir but it
didn't change calling site in hashpage_preload().  The change wasn't
noticed at compile time since hashpage_preload() used a void* as the
parameter to subpage_protection().

This is obviously wrong and can trigger the following crash when
CONFIG_SLAB, CONFIG_DEBUG_SLAB, CONFIG_PPC_64K_PAGES
CONFIG_PPC_SUBPAGE_PROT are enabled.

Freeing unused kernel memory: 704k freed
Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6c49b7
Faulting instruction address: 0xc0000000000410f4
cpu 0x2: Vector: 300 (Data Access) at [c00000004233f590]
    pc: c0000000000410f4: .hash_preload+0x258/0x338
    lr: c000000000041054: .hash_preload+0x1b8/0x338
    sp: c00000004233f810
   msr: 8000000000009032
   dar: 6b6b6b6b6b6c49b7
 dsisr: 40000000
  current = 0xc00000007e2c0070
  paca    = 0xc000000007fe0500
    pid   = 1, comm = init
enter ? for help
[c00000004233f810] c000000000041020 .hash_preload+0x184/0x338 (unreliable)
[c00000004233f8f0] c00000000003ed98 .update_mmu_cache+0xb0/0xd0
[c00000004233f990] c000000000157754 .__do_fault+0x48c/0x5dc
[c00000004233faa0] c000000000158fd0 .handle_mm_fault+0x508/0xa8c
[c00000004233fb90] c0000000006acdd4 .do_page_fault+0x428/0x6ac
[c00000004233fe30] c000000000005260 handle_page_fault+0x20/0x74

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-11-18 14:54:23 +11:00
Kumar Gala
4a89261b02 powerpc/mm: Fix build error in setup_initial_memory_limit
arch/powerpc/mm/tlb_nohash.c: In function 'setup_initial_memory_limit':
arch/powerpc/mm/tlb_nohash.c:588:29: error: 'ppc64_memblock_base' undeclared (first use in this function)
arch/powerpc/mm/tlb_nohash.c:588:29: note: each undeclared identifier is reported only once for each function it appears in

Due to a copy/paste typo with the following commit:

	commit cd3db0c4ca
	Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
	Date:   Tue Jul 6 15:39:02 2010 -0700

	    memblock: Remove rmo_size, burry it in arch/powerpc where it belongs

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-11-18 14:54:22 +11:00
Peter Zijlstra
20273941f2 mm: fix race in kunmap_atomic()
Christoph reported a nice splat which illustrated a race in the new stack
based kmap_atomic implementation.

The problem is that we pop our stack slot before we're completely done
resetting its state -- in particular clearing the PTE (sometimes that's
CONFIG_DEBUG_HIGHMEM).  If an interrupt happens before we actually clear
the PTE used for the last slot, that interrupt can reuse the slot in a
dirty state, which triggers a BUG in kmap_atomic().

Fix this by introducing kmap_atomic_idx() which reports the current slot
index without actually releasing it and use that to find the PTE and delay
the _pop() until after we're completely done.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Christoph Hellwig <hch@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 18:03:05 -07:00
Peter Zijlstra
3e4d3af501 mm: stack based kmap_atomic()
Keep the current interface but ignore the KM_type and use a stack based
approach.

The advantage is that we get rid of crappy code like:

	#define __KM_PTE			\
		(in_nmi() ? KM_NMI_PTE : 	\
		 in_irq() ? KM_IRQ_PTE :	\
		 KM_PTE0)

and in general can stop worrying about what context we're in and what kmap
slots might be appropriate for that.

The downside is that FRV kmap_atomic() gets more expensive.

For now we use a CPP trick suggested by Andrew:

  #define kmap_atomic(page, args...) __kmap_atomic(page)

to avoid having to touch all kmap_atomic() users in a single patch.

[ not compiled on:
  - mn10300: the arch doesn't actually build with highmem to begin with ]

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c]
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:08 -07:00
Linus Torvalds
d4429f608a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (71 commits)
  powerpc/44x: Update ppc44x_defconfig
  powerpc/watchdog: Make default timeout for Book-E watchdog a Kconfig option
  fsl_rio: Add comments for sRIO registers.
  powerpc/fsl-booke: Add e55xx (64-bit) smp defconfig
  powerpc/fsl-booke: Add p5020 DS board support
  powerpc/fsl-booke64: Use TLB CAMs to cover linear mapping on FSL 64-bit chips
  powerpc/fsl-booke: Add support for FSL Arch v1.0 MMU in setup_page_sizes
  powerpc/fsl-booke: Add support for FSL 64-bit e5500 core
  powerpc/85xx: add cache-sram support
  powerpc/85xx: add ngPIXIS FPGA device tree node to the P1022DS board
  powerpc: Fix compile error with paca code on ppc64e
  powerpc/fsl-booke: Add p3041 DS board support
  oprofile/fsl emb: Don't set MSR[PMM] until after clearing the interrupt.
  powerpc/fsl-booke: Add PCI device ids for P2040/P3041/P5010/P5020 QoirQ chips
  powerpc/mpc8xxx_gpio: Add support for 'qoriq-gpio' controllers
  powerpc/fsl_booke: Add support to boot from core other than 0
  powerpc/p1022: Add probing for individual DMA channels
  powerpc/fsl_soc: Search all global-utilities nodes for rstccr
  powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT
  powerpc/mpc83xx: Support for MPC8308 P1M board
  ...

Fix up conflict with the generic irq_work changes in arch/powerpc/kernel/time.c
2010-10-21 21:19:54 -07:00
Kumar Gala
55fd766b5f powerpc/fsl-booke64: Use TLB CAMs to cover linear mapping on FSL 64-bit chips
On Freescale parts typically have TLB array for large mappings that we can
bolt the linear mapping into.  We utilize the code that already exists
on PPC32 on the 64-bit side to setup the linear mapping to be cover by
bolted TLB entries.  We utilize a quarter of the variable size TLB array
for this purpose.

Additionally, we limit the amount of memory to what we can cover via
bolted entries so we don't get secondary faults in the TLB miss
handlers.  We should fix this limitation in the future.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-10-14 00:55:14 -05:00
Kumar Gala
988cf86d4f powerpc/fsl-booke: Add support for FSL Arch v1.0 MMU in setup_page_sizes
Update setup_page_sizes() to support for a MMU v1.0 FSL style MMU
implementation.  In such a processor, we don't have TLB0PS or EPTCFG
registers (and access to these registers may cause exceptions).  We need
to parse the older format of TLBnCFG for page size support.  Additionaly,
assume since we are an FSL implementation that we have 2 TLB arrays and
the second array contains the variable size pages.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-10-14 00:55:09 -05:00
Paul Gortmaker
92437d4137 powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT
There exists a four line chunk of code, which when configured for
64 bit address space, can incorrectly set certain page flags during
the TLB creation.  It turns out that this is code which isn't used,
but might still serve a purpose.  Since it isn't obvious why it exists
or why it causes problems, the below description covers both in detail.

For powerpc bootstrap, the physical memory (at most 768M), is mapped
into the kernel space via the following path:

MMU_init()
    |
    + adjust_total_lowmem()
            |
            + map_mem_in_cams()
                    |
                    + settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0);

On settlbcam(), the kernel will create TLB entries according to the flag,
PAGE_KERNEL_X.

settlbcam()
{
        ...
        TLBCAM[index].MAS1 = MAS1_VALID
                        | MAS1_IPROT | MAS1_TSIZE(tsize) | MAS1_TID(pid);
                                ^
			These entries cannot be invalidated by the
			kernel since MAS1_IPROT is set on TLB property.
        ...
        if (flags & _PAGE_USER) {
           TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
           TLBCAM[index].MAS3 |= ((flags & _PAGE_RW) ? MAS3_UW : 0);
        }

For classic BookE (flags & _PAGE_USER) is 'zero' so it's fine.
But on boards like the the Freescale P4080, we want to support 36-bit
physical address on it. So the following options may be set:

CONFIG_FSL_BOOKE=y
CONFIG_PTE_64BIT=y
CONFIG_PHYS_64BIT=y

As a result, boards like the P4080 will introduce PTE format as Book3E.
As per the file: arch/powerpc/include/asm/pgtable-ppc32.h

  * #elif defined(CONFIG_FSL_BOOKE) && defined(CONFIG_PTE_64BIT)
  * #include <asm/pte-book3e.h>

So PAGE_KERNEL_X is __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX) and the
book3E version of _PAGE_KERNEL_RWX is defined with:

  (_PAGE_BAP_SW | _PAGE_BAP_SR | _PAGE_DIRTY | _PAGE_BAP_SX)

Note the _PAGE_BAP_SR, which is also defined in the book3E _PAGE_USER:

  #define _PAGE_USER        (_PAGE_BAP_UR | _PAGE_BAP_SR) /* Can be read */

So the possibility exists to wrongly assign the user MAS3_U<RWX> bits
to kernel (PAGE_KERNEL_X) address space via the following code fragment:

        if (flags & _PAGE_USER) {
           TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
           TLBCAM[index].MAS3 |= ((flags & _PAGE_RW) ? MAS3_UW : 0);
        }

Here is a dump of the TLB info from Simics with the above code present:
------
L2 TLB1
                                            GT                   SSS UUU V I
 Row  Logical           Physical            SS TLPID  TID  WIMGE XWR XWR F P   V
----- ----------------- ------------------- -- ----- ----- ----- --- --- - -   -
  0   c0000000-cfffffff 000000000-00fffffff 00     0     0   M   XWR XWR 0 1   1
  1   d0000000-dfffffff 010000000-01fffffff 00     0     0   M   XWR XWR 0 1   1
  2   e0000000-efffffff 020000000-02fffffff 00     0     0   M   XWR XWR 0 1   1

Actually this conditional code was used for two legacy functions:

  1: support KGDB to set break point.
     KGDB already dropped this; now uses its core write to set break point.

  2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE)
     for device IO space.
     This use case is also removed from the latest PowerPC kernel.

However, there may still be a use case for it in the future, like
large user pages, so we can't remove it entirely.  As an alternative,
we match on all bits of _PAGE_USER instead of just any bits, so the
case where just _PAGE_BAP_SR is set can't sneak through.

With this done, the TLB appears without U having XWR as below:

-------
L2 TLB1
                                            GT                   SSS UUU V I
 Row  Logical           Physical            SS TLPID  TID  WIMGE XWR XWR F P   V
----- ----------------- ------------------- -- ----- ----- ----- --- --- - -   -
  0   c0000000-cfffffff 000000000-00fffffff 00     0     0   M   XWR     0 1   1
  1   d0000000-dfffffff 010000000-01fffffff 00     0     0   M   XWR     0 1   1
  2   e0000000-efffffff 020000000-02fffffff 00     0     0   M   XWR     0 1   1

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-10-14 00:52:55 -05:00
matt mooney
4108d9ba90 powerpc/Makefiles: Change to new flag variables
Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y.

Signed-off-by: matt mooney <mfm@muteddisk.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-10-13 16:19:22 +11:00
Yinghai Lu
c7fc2de0c8 memblock, bootmem: Round pfn properly for memory and reserved regions
We need to round memory regions correctly -- specifically, we need to
round reserved region in the more expansive direction (lower limit
down, upper limit up) whereas usable memory regions need to be rounded
in the more restrictive direction (lower limit up, upper limit down).

This introduces two set of inlines:

	memblock_region_memory_base_pfn()
	memblock_region_memory_end_pfn()
	memblock_region_reserved_base_pfn()
	memblock_region_reserved_end_pfn()

Although they are antisymmetric (and therefore are technically
duplicates) the use of the different inlines explicitly documents the
programmer's intention.

The lack of proper rounding caused a bug on ARM, which was then found
to also affect other architectures.

Reported-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB4CDFD.4020105@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-12 15:37:51 -07:00
Matthew McClintock
0d35e1620d powerpc/mm: Assume first cpu is boot_cpuid not 0
arch/powerpc/mm/mmu_context_nohash.c assumes the boot cpu
will always have smp_processor_id() == 0. This patch fixes
that assumption

Signed-off-by: Matthew McClintock <msm@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02 14:07:34 +10:00
Anton Blanchard
28b549905b powerpc: Check end of stack canary at oops time
Add a check for the stack canary when we oops, similar to x86. This should make
it clear that we overran our stack:

Unable to handle kernel paging request for data at address 0x24652f63700ac689
Faulting instruction address: 0xc000000000063d24
Thread overran stack, or stack corrupted

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02 14:07:30 +10:00
Ingo Molnar
daab7fc734 Merge commit 'v2.6.36-rc3' into x86/memblock
Conflicts:
	arch/x86/kernel/trampoline.c
	mm/memblock.c

Merge reason: Resolve the conflicts, update to latest upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31 09:45:46 +02:00
Sonny Rao
79c3095fb3 powerpc: Export memstart_addr and kernstart_addr on ppc64
Some modules (like eHCA) want to map all of kernel memory, for this to
work with a relocated kernel, we need to export kernstart_addr so
modules can use PHYSICAL_START and memstart_addr so they could use
MEMORY_START.  Note that the 32bit code already exports these symbols.

Signed-off-By: Sonny Rao <sonnyrao@us.ibm.com>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-24 15:26:26 +10:00
Benjamin Herrenschmidt
b1515af291 Merge remote branch 'jwb/merge' into merge 2010-08-24 14:36:45 +10:00
Dave Kleikamp
32412aa214 powerpc/47x: Add an isync before the tlbivax instruction
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2010-08-23 07:38:31 -04:00
Cesar Eduardo Barros
597781f3e5 kmap_atomic: make kunmap_atomic() harder to misuse
kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse"
list[1] ("Follow common convention and you'll get it wrong"), except in
some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3].

kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes
takes a pointer to within the page itself.  This seems to once in a while
trip people up (the convention they are following is the one from
kunmap()).

Make it much harder to misuse, by moving it to level 9 on Rusty's list[4]
("The compiler/linker won't let you get it wrong").  This is done by
refusing to build if the type of its first argument is a pointer to a
struct page.

The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck()
(which is what you would call in case for some strange reason calling it
with a pointer to a struct page is not incorrect in your code).

The previous version of this patch was compile tested on x86-64.

[1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
[2] In these cases, it is at level 5, "Do it right or it will always
    break at runtime."
[3] At least mips and powerpc look very similar, and sparc also seems to
    share a common ancestor with both; there seems to be quite some
    degree of copy-and-paste coding here. The include/asm/highmem.h file
    for these three archs mention x86 CPUs at its top.
[4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
[5] As an aside, could someone tell me why mn10300 uses unsigned long as
    the first parameter of kunmap_atomic() instead of void *?

Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Cc: Russell King <linux@arm.linux.org.uk> (arch/arm)
Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips)
Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300)
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300)
Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc)
Cc: Helge Deller <deller@gmx.de> (arch/parisc)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc)
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc)
Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc)
Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc)
Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86)
Cc: Ingo Molnar <mingo@redhat.com> (arch/x86)
Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86)
Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic)
Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:44:54 -07:00
Linus Torvalds
cdd854bc42 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (79 commits)
  powerpc/8xx: Add support for the MPC8xx based boards from TQC
  powerpc/85xx: Introduce support for the Freescale P1022DS reference board
  powerpc/85xx: Adding DTS for the STx GP3-SSA MPC8555 board
  powerpc/85xx: Change deprecated binding for 85xx-based boards
  powerpc/tqm85xx: add a quirk for ti1520 PCMCIA bridge
  powerpc/tqm85xx: update PCI interrupt-map attribute
  powerpc/mpc8308rdb: support for MPC8308RDB board from Freescale
  powerpc/fsl_pci: add quirk for mpc8308 pcie bridge
  powerpc/85xx: Cleanup QE initialization for MPC85xxMDS boards
  powerpc/85xx: Fix booting for P1021MDS boards
  powerpc/85xx: Fix SWIOTLB initalization for MPC85xxMDS boards
  powerpc/85xx: kexec for SMP 85xx BookE systems
  powerpc/5200/i2c: improve i2c bus error recovery
  of/xilinxfb: update tft compatible versions
  powerpc/fsl-diu-fb: Support setting display mode using EDID
  powerpc/5121: doc/dts-bindings: update doc of FSL DIU bindings
  powerpc/5121: shared DIU framebuffer support
  powerpc/5121: move fsl-diu-fb.h to include/linux
  powerpc/5121: fsl-diu-fb: fix issue with re-enabling DIU area descriptor
  powerpc/512x: add clock structure for Video-IN (VIU) unit
  ...
2010-08-05 09:03:46 -07:00
Benjamin Herrenschmidt
4734b594c6 memblock: Remove memblock_type.size and add memblock.memory_size instead
Right now, both the "memory" and "reserved" memblock_type structures have
a "size" member. It represents the calculated memory size in the former
case and is unused in the latter.

This moves it out to the main memblock structure instead

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-05 12:56:11 +10:00
Benjamin Herrenschmidt
cd3db0c4ca memblock: Remove rmo_size, burry it in arch/powerpc where it belongs
The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact
server ppc64 though I hijack it on embedded ppc64 for similar purposes)
and represents the area of memory that can be accessed in real mode
(aka with MMU off), or on embedded, from the exception vectors (which
is bolted in the TLB) which pretty much boils down to the same thing.

We take that out of the generic MEMBLOCK data structure and move it into
arch/powerpc where it belongs, renaming it to "RMA" while at it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-05 12:56:08 +10:00
Benjamin Herrenschmidt
e63075a3c9 memblock: Introduce default allocation limit and use it to replace explicit ones
This introduce memblock.current_limit which is used to limit allocations
from memblock_alloc() or memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE).

The old MEMBLOCK_ALLOC_ANYWHERE changes value from 0 to ~(u64)0 and can still
be used with memblock_alloc_base() to allocate really anywhere.

It is -no-longer- cropped to MEMBLOCK_REAL_LIMIT which disappears.

Note to archs: I'm leaving the default limit to MEMBLOCK_ALLOC_ANYWHERE. I
strongly recommend that you ensure that you set an appropriate limit
during boot in order to guarantee that an memblock_alloc() at any time
results in something that is accessible with a simple __va().

The reason is that a subsequent patch will introduce the ability for
the array to resize itself by reallocating itself. The MEMBLOCK core will
honor the current limit when performing those allocations.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-05 12:56:07 +10:00
Benjamin Herrenschmidt
27f574c223 memblock: Expose MEMBLOCK_ALLOC_ANYWHERE
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-05 12:56:06 +10:00
Benjamin Herrenschmidt
28be7072ce memblock/powerpc: Use new accessors
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-04 14:39:01 +10:00
Benjamin Herrenschmidt
e3239ff92a memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-04 14:21:49 +10:00
Benjamin Herrenschmidt
412a4ac5e9 Merge commit 'gcl/next' into next 2010-08-04 10:26:03 +10:00
Benjamin Herrenschmidt
3fdfd99051 powerpc: Fix erroneous lmb->memblock conversions
Oooops... we missed these. We incorrectly converted strings
used when parsing the device-tree on pseries, thus breaking
access to drconf memory and hotplug memory.

While at it, also revert some variable names that represent
something the FW calls "lmb" and thus don't need to be converted
to "memblock".

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
2010-07-23 12:56:57 +10:00
Benjamin Herrenschmidt
4b8692c022 powerpc/mm: Add some debug output when hash insertion fails
This adds some debug output to our MMU hash code to print out some
useful debug data if the hypervisor refuses the insertion (which
should normally never happen).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
2010-07-23 12:56:56 +10:00
Benjamin Herrenschmidt
171aa2caaa powerpc/mm: Fix bugs in huge page hashing
There's a couple of nasty bugs lurking in our huge page hashing code.

First, we don't check the access permission atomically with setting
the _PAGE_BUSY bit, which means that the PTE value we end up using
for the hashing might be different than the one we have checked
the access permissions for.

We've seen cases where that leads us to try to use an invalidated
PTE for hashing, causing all sort of "interesting" issues.

Then, we also failed to set _PAGE_DIRTY on a write access.

Finally, a minor tweak but we should return 0 when we find the
PTE busy, in order to just re-execute the access, rather than 1
which means going to do_page_fault().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
2010-07-23 12:55:21 +10:00
Benjamin Herrenschmidt
ca91e6c09d powerpc/mm: Move around testing of _PAGE_PRESENT in hash code
Instead of adding _PAGE_PRESENT to the access permission mask
in each low level routine independently, we add it once from
hash_page().

We also move the preliminary access check (the racy one before
the PTE is locked) up so it applies to the huge page case. This
duplicates code in __hash_page_huge() which we'll remove in a
subsequent patch to fix a race in there.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-23 08:53:23 +10:00
Anton Blanchard
b1623e7eb2 powerpc/mm: Handle hypervisor pte insert failure in __hash_page_huge
If the hypervisor gives us an error on a hugepage insert we panic. The
normal page code already handles this by returning an error instead and we end
calling low_hash_fault which will just kill the task if possible.

The patch below does a similar thing for the hugepage case.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-23 08:44:51 +10:00
Yinghai Lu
95f72d1ed4 lmb: rename to memblock
via following scripts

      FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

      sed -i \
        -e 's/lmb/memblock/g' \
        -e 's/LMB/MEMBLOCK/g' \
        $FILES

      for N in $(find . -name lmb.[ch]); do
        M=$(echo $N | sed 's/lmb/memblock/g')
        mv $N $M
      done

and remove some wrong change like lmbench and dlmb etc.

also move memblock.c from lib/ to mm/

Suggested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14 17:14:00 +10:00
Benjamin Herrenschmidt
f2b26c9235 powerpc/book3e: Adjust the page sizes list based on MMU config
Use the MMU config registers to scan for available direct and
indirect page sizes and print out the result. Will be needed
for future hugetlbfs implementation.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14 14:13:53 +10:00
Benjamin Herrenschmidt
ff82c319e6 powerpc/book3e: Fix single step when using HW page tables
We patch the TLB miss exception vectors to point to alternate
functions when using HW page table on BookE.

However, we were patching in a new branch in the first instruction
of the exception handler instead of the second one, thus overriding
the nop that is in the first instruction.

This cause problems when single stepping as we rely on that nop for
the single step to stop properly within the exception vector range
rather than on the target of the branch.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14 14:13:51 +10:00
Christoph Egger
cccd234283 powerpc: Removing dead CONFIG_SMP_750
CONFIG_SMP_750 doesn't exist in Kconfig, therefore removing all
references for it from the source code.

Signed-off-by: Christoph Egger <siccegge@cs.fau.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09 11:28:38 +10:00
Anton Blanchard
41eab6f88f powerpc/numa: Use form 1 affinity to setup node distance
Form 1 affinity allows multiple entries in ibm,associativity-reference-points
which represent affinity domains in decreasing order of importance. The
Linux concept of a node is always the first entry, but using the other
values as an input to node_distance() allows the memory allocator to make
better decisions on which node to go first when local memory has been
exhausted.

We keep things simple and create an array indexed by NUMA node, capped at
4 entries. Each time we lookup an associativity property we initialise
the array which is overkill, but since we should only hit this path during
boot it didn't seem worth adding a per node valid bit.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09 11:28:35 +10:00
Paul E. McKenney
a591f6b56d powerpc: Remove all rcu head initializations
Remove all rcu head inits. We don't care about the RCU head state before
passing it to call_rcu() anyway. Only leave the "on_stack" variants so
debugobjects can keep track of objects on stack.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09 11:28:34 +10:00
Becky Bruce
d10ac3734d powerpc/fsl-booke: Fix comments in mmu code that mention BATS
There are no BATS on BookE - we have the TLBCAM instead.  Also correct
the page size information to included extended sizes.  We don't actually allow
a 4G page size to be used, so comment on that as well.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09 11:28:26 +10:00
Christoph Egger
8054a3428f powerpc: Remove dead CONFIG_HIGHPTE
CONFIG_HIGHPTE doesn't exist in Kconfig, therefore removing all
references for it from the source code.

Signed-off-by: Christoph Egger <siccegge@cs.fau.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-06-15 15:02:32 +10:00
Linus Torvalds
98edb6ca41 Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits)
  KVM: x86: Add missing locking to arch specific vcpu ioctls
  KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
  KVM: MMU: Segregate shadow pages with different cr0.wp
  KVM: x86: Check LMA bit before set_efer
  KVM: Don't allow lmsw to clear cr0.pe
  KVM: Add cpuid.txt file
  KVM: x86: Tell the guest we'll warn it about tsc stability
  x86, paravirt: don't compute pvclock adjustments if we trust the tsc
  x86: KVM guest: Try using new kvm clock msrs
  KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
  KVM: x86: add new KVMCLOCK cpuid feature
  KVM: x86: change msr numbers for kvmclock
  x86, paravirt: Add a global synchronization point for pvclock
  x86, paravirt: Enable pvclock flags in vcpu_time_info structure
  KVM: x86: Inject #GP with the right rip on efer writes
  KVM: SVM: Don't allow nested guest to VMMCALL into host
  KVM: x86: Fix exception reinjection forced to true
  KVM: Fix wallclock version writing race
  KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
  KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
  ...
2010-05-21 17:16:21 -07:00
Anton Blanchard
bc8449cc57 powerpc/numa: Use ibm,architecture-vec-5 to detect form 1 affinity
I've been told that the architected way to determine we are in form 1
affinity mode is by reading the ibm,architecture-vec-5 property which
mirrors the layout of the fifth vector of the ibm,client-architecture
structure.

Eventually we may want to parse the ibm,architecture-vec-5 and create
FW_FEATURE_* bits.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:12 +10:00
Kumar Gala
78f622377f powerpc/fsl-booke: Move loadcam_entry back to asm code to fix SMP ftrace
When we build with ftrace enabled its possible that loadcam_entry would
have used the stack pointer (even though the code doesn't need it).  We
call loadcam_entry in __secondary_start before the stack is setup.  To
ensure that loadcam_entry doesn't use the stack pointer the easiest
solution is to just have it in asm code.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-05-17 10:56:20 -05:00
Alexander Graf
c83ec269e6 PPC: Split context init/destroy functions
We need to reserve a context from KVM to make sure we have our own
segment space. While we did that split for Book3S_64 already, 32 bit
is still outstanding.

So let's split it now.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:20 +03:00
Benjamin Herrenschmidt
1ed31d6db9 Merge commit 'origin/master' into next 2010-05-07 11:29:25 +10:00
Anton Blanchard
25863de07a powerpc/cpumask: Convert NUMA code to new cpumask API
Convert NUMA code to new cpumask API. We shift the node to cpumask
setup code until after we complete bootmem allocation so we can
dynamically allocate the cpumasks.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-06 17:41:58 +10:00
Benjamin Herrenschmidt
e460c2c91a powerpc: Invoke oom-killer from page fault
As explained in commit 1c0fe6e3bd, we want to call the architecture independent
oom killer when getting an unexplained OOM from handle_mm_fault, rather than
simply killing current.

Cc: linuxppc-dev@ozlabs.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-06 17:15:58 +10:00
Mark Nelson
91eea67c6d powerpc/mm: Track backing pages allocated by vmemmap_populate()
We need to keep track of the backing pages that get allocated by
vmemmap_populate() so that when we use kdump, the dump-capture kernel knows
where these pages are.

We use a simple linked list of structures that contain the physical address
of the backing page and corresponding virtual address to track the backing
pages.
To save space, we just use a pointer to the next struct vmemmap_backing. We
can also do this because we never remove nodes.  We call the pointer "list"
to be compatible with changes made to the crash utility.

vmemmap_populate() is called either at boot-time or on a memory hotplug
operation. We don't have to worry about the boot-time calls because they
will be inherently single-threaded, and for a memory hotplug operation
vmemmap_populate() is called through:
sparse_add_one_section()
            |
            V
kmalloc_section_memmap()
            |
            V
sparse_mem_map_populate()
            |
            V
vmemmap_populate()
and in sparse_add_one_section() we're protected by pgdat_resize_lock().
So, we don't need a spinlock to protect the vmemmap_list.

We allocate space for the vmemmap_backing structs by allocating whole pages
in vmemmap_list_alloc() and then handing out chunks of this to
vmemmap_list_populate().

This means that we waste at most just under one page, but this keeps the code
is simple.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-06 16:49:27 +10:00
Benjamin Herrenschmidt
75c1d539ea powerpc: Fix CONFIG_DEBUG_PAGEALLOC on 603/e300
So we tried to speed things up a bit using flush_hash_pages() directly
but that falls over on 603 of course meaning we fail to flush the TLB
properly and we may even end up having it corrupt memory randomly by
accessing a hash table that doesn't exist.

This removes the "optimization" by always going through flush_tlb_page()
for now at least.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-06 16:49:26 +10:00
Dave Kleikamp
e7f75ad01d powerpc/47x: Base ppc476 support
This patch adds the base support for the 476 processor.  The code was
primarily written by Ben Herrenschmidt and Torez Smith, but I've been
maintaining it for a while.

The goal is to have a single binary that will run on 44x and 47x, but
we still have some details to work out.  The biggest is that the L1 cache
line size differs on the two platforms, but it's currently a compile-time
option.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Torez Smith  <lnxtorez@linux.vnet.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2010-05-05 09:11:10 -04:00
Linus Torvalds
b18262eda3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kgdb: don't needlessly skip PAGE_USER test for Fsl booke
2010-04-29 20:01:42 -07:00
Wufei
56151e7534 kgdb: don't needlessly skip PAGE_USER test for Fsl booke
The bypassing of this test is a leftover from 2.4 vintage
kernels, and is no longer appropriate, or even used by KGDB.
Currently KGDB uses probe_kernel_write() for all access to
memory via the KGDB core, so it can simply be deleted.

This fixes CVE-2010-1446.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Wufei <fei.wu@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-04-29 21:41:44 -05:00
Anton Blanchard
4b83c330b4 powerpc/numa: Add form 1 NUMA affinity
Firmware changed the way it represents memory and cpu affinity on POWER7.
Unfortunately the old method now caps the topology to work around issues
with legacy operating systems. For Linux to get the correct topology we
need to use the new form 1 affinity information.

We set the form 1 field in the client architecture, and if we see "1" in the
ibm,associativity-form property firmware supports form 1 affinity and
we should look at the first field in the ibm,associativity-reference-points
array. If not we use the second field as we always have.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-04-28 16:22:33 +10:00
Becky Bruce
e8137341b1 powerpc/fsl_booke: Correct test for MMU_FTR_BIG_PHYS
The code was looking for this in cpu_features, not mmu_features.  Fix this.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-04-19 23:12:44 -05:00
K.Prasad
9c7cc234dc powerpc: Disable interrupts for data breakpoint exceptions
Data address breakpoint exceptions are currently handled along with page-faults
which require interrupts to remain in enabled state. Since exception handling
for data breakpoints aren't pre-empt safe, we handle them separately.

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-04-07 14:44:38 +10:00
Benjamin Herrenschmidt
55052eeca6 powerpc: Fix ioremap_flags() with book3e pte definition
We can't just clear the user read permission in book3e pte, because
that will also clear supervisor read permission.  This surely isn't
desired.  Fix the problem by adding the supervisor read back.

BenH: Slightly simplified the ifdef and applied to ppc64 too

Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-04-07 14:39:47 +10:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
FUJITA Tomonori
a93272969c powerpc: Fix swiotlb to respect the boot option
powerpc initializes swiotlb before parsing the kernel boot options so
swiotlb options (e.g. specifying the swiotlb buffer size) are ignored.

Any time before freeing bootmem works for swiotlb so this patch moves
powerpc's swiotlb initialization after parsing the kernel boot
options, mem_init (as x86 does).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Becky Bruce <beckyb@kernel.crashing.org>
Tested-by: Albert Herranz <albert_herranz@yahoo.es>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-03-19 16:38:16 +11:00
Jiri Kosina
318ae2edc3 Merge branch 'for-next' into for-linus
Conflicts:
	Documentation/filesystems/proc.txt
	arch/arm/mach-u300/include/mach/debug-macro.S
	drivers/net/qlge/qlge_ethtool.c
	drivers/net/qlge/qlge_main.c
	drivers/net/typhoon.c
2010-03-08 16:55:37 +01:00
H Hartley Sweeten
72c3368856 nodemask.h: remove macro any_online_node
The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'.  Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Milton Miller <miltonm@bga.com>
Cc: Nathan Fontenot <nfont@austin.ibm.com>
Cc: Geoff Levand <geoffrey.levand@am.sony.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:31 -08:00
Linus Torvalds
ac0f6f927d Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (100 commits)
  ARM: Eliminate decompressor -Dstatic= PIC hack
  ARM: 5958/1: ARM: U300: fix inverted clk round rate
  ARM: 5956/1: misplaced parentheses
  ARM: 5955/1: ep93xx: move timer defines into core.c and document
  ARM: 5954/1: ep93xx: move gpio interrupt support to gpio.c
  ARM: 5953/1: ep93xx: fix broken build of clock.c
  ARM: 5952/1: ARM: MM: Add ARM_L1_CACHE_SHIFT_6 for handle inside each ARCH Kconfig
  ARM: 5949/1: NUC900 add gpio virtual memory map
  ARM: 5948/1: Enable timer0 to time4 clock support for nuc910
  ARM: 5940/2: ARM: MMCI: remove custom DBG macro and printk
  ARM: make_coherent(): fix problems with highpte, part 2
  MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself
  ARM: 5945/1: ep93xx: include correct irq.h in core.c
  ARM: 5933/1: amba-pl011: support hardware flow control
  ARM: 5930/1: Add PKMAP area description to memory.txt.
  ARM: 5929/1: Add checks to detect overlap of memory regions.
  ARM: 5928/1: Change type of VMALLOC_END to unsigned long.
  ARM: 5927/1: Make delimiters of DMA area globally visibly.
  ARM: 5926/1: Add "Virtual kernel memory..." printout.
  ARM: 5920/1: OMAP4: Enable L2 Cache
  ...

Fix up trivial conflict in arch/arm/mach-mx25/clock.c
2010-03-01 09:15:15 -08:00
Russell King
4b3073e1c5 MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself
On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies.  We do this via make_coherent() by making the pages
uncacheable.

This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().

Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():

  On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
  to construct a pointer to the pte again.  Passing a pte_t * is much
  more elegant.  Maybe we might even replace the pte argument with the
  pte_t?

Ben Herrenschmidt would also like the pte pointer for PowerPC:

  Passing the ptep in there is exactly what I want.  I want that
  -instead- of the PTE value, because I have issue on some ppc cases,
  for I$/D$ coherency, where set_pte_at() may decide to mask out the
  _PAGE_EXEC.

So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.

Includes a fix from Stephen Rothwell:

  sparc: fix fallout from update_mmu_cache API change

  Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-02-20 16:41:46 +00:00
Thomas Gleixner
3eb93c558a powerpc: Convert tlbivax_lock to raw_spinlock
tlbivax_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-19 14:52:33 +11:00
Thomas Gleixner
6b9c9b8a66 powerpc: Convert native_tlbie_lock to raw_spinlock
native_tlbie_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-19 14:52:30 +11:00
Thomas Gleixner
be833f3371 powerpc: Convert context_lock to raw_spinlock
context_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-19 14:52:30 +11:00
Benjamin Herrenschmidt
efd0f0f385 Merge commit 'jwb/next' into next 2010-02-18 09:34:38 +11:00
Anton Blanchard
66d99b8834 powerpc: Convert open coded native hashtable bit lock
Now we have real bit locks use them instead of open coding it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-17 14:03:15 +11:00
Benjamin Herrenschmidt
ec144a81ad Merge commit 'origin/master' into next 2010-02-17 10:00:42 +11:00
Stefan Roese
c7b6669812 powerpc/40x: Add support for PPC40x boards with > 512MB SDRAM
This patch adds support for boards with more that 512MByte RAM. Currently
only 512MB of memory are enabled in the DCCR/ICCR real-mode cache
control registers. This patch now enables caching in real-mode for
2GByte.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2010-02-12 07:54:45 -05:00
David Gibson
77058e1adc powerpc: Fix address masking bug in hpte_need_flush()
Commit f71dc176aa 'Make
hpte_need_flush() correctly mask for multiple page sizes' introduced
bug, which is triggered when a kernel with a 64k base page size is run
on a system whose hardware does not 64k hash PTEs.  In this case, we
emulate 64k pages with multiple 4k hash PTEs, however in
hpte_need_flush() we incorrectly only mask the hardware page size from
the address, instead of the logical page size.  This causes things to
go wrong when we later attempt to iterate through the hardware
subpages of the logical page.

This patch corrects the error.  It has been tested on pSeries bare
metal by Michael Neuling.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-10 13:58:06 +11:00
Anton Blanchard
7317ac8711 powerpc: Convert mmu context allocator from idr to ida
We can use the much more lightweight ida allocator since we don't
need the pointer storage idr provides.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-09 13:56:07 +11:00
Uwe Kleine-König
9ddc5b6f18 tree-wide: fix typos "ammount" -> "amount"
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-05 12:22:40 +01:00
Thadeu Lima de Souza Cascardo
2273130de8 fix comment typo leve -> level in powerpc
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-05 12:22:38 +01:00
Thadeu Lima de Souza Cascardo
6c504d4231 powerpc: Fix typo s/leve/level/ in TLB code
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:50 +11:00
Jiri Slaby
4bf936b9e4 powerpc: Use helpers for rlimits
Make sure compiler won't do weird things with limits. E.g. fetching
them twice may return 2 different values after writable limits are
implemented.

I.e. either use rlimit helpers added in
3e10e716ab
or ACCESS_ONCE if not applicable.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-01-15 13:20:08 +11:00
David Gibson
a1128f8f0f powerpc/mm: Fix stupid bug in subpge protection handling
Commit d28513bc7f ("Fix bug in pagetable
cache cleanup with CONFIG_PPC_SUBPAGE_PROT"), itself a fix for
breakage caused by an earlier clean up patch of mine, contains a
stupid bug.  I changed the parameters of the subpage_protection()
function, but failed to update one of the callers.

This patch fixes it, and replaces a void * with a typed pointer so
that the compiler will warn on such an error in future.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18 14:55:44 +11:00
Yang Li
f04b10cddb powerpc/mm: Fix typo of cpumask_clear_cpu()
The function name of cpumask_clear_cpu was not correct. Fortunately
nobody uses that code with hotplug yet :-)

Reported-by: Jin Qing <b24347@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18 14:54:27 +11:00
Sachin P. Sant
5c33991987 powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled.
This time without the funny characters.

Fix following build errors generated with DEBUG=1

cc1: warnings being treated as errors
arch/powerpc/mm/hash_utils_64.c: In function 'htab_dt_scan_page_sizes':
arch/powerpc/mm/hash_utils_64.c:343: error: format '%04x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
arch/powerpc/mm/hash_utils_64.c:343: error: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int'
arch/powerpc/mm/hash_utils_64.c: In function 'htab_initialize':
arch/powerpc/mm/hash_utils_64.c:666: error: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
... SNIP ...

Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18 14:54:27 +11:00
Benjamin Herrenschmidt
50891457f1 powerpc/mm: Fix a WARN_ON() with CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_VM
Set need to call __set_pte_at() and not set_pte_at() from __change_page_attr()
since the later will perform checks with CONFIG_DEBUG_VM that aren't suitable
to the way we override an existing PTE. (More specifically, it doesn't let
you write over a present PTE).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18 14:54:26 +11:00
Linus Torvalds
a73611b6aa Merge branch 'next' of git://git.secretlab.ca/git/linux-2.6
* 'next' of git://git.secretlab.ca/git/linux-2.6: (23 commits)
  powerpc: fix up for mmu_mapin_ram api change
  powerpc: wii: allow ioremap within the memory hole
  powerpc: allow ioremap within reserved memory regions
  wii: use both mem1 and mem2 as ram
  wii: bootwrapper: add fixup to calc useable mem2
  powerpc: gamecube/wii: early debugging using usbgecko
  powerpc: reserve fixmap entries for early debug
  powerpc: wii: default config
  powerpc: wii: platform support
  powerpc: wii: hollywood interrupt controller support
  powerpc: broadway processor support
  powerpc: wii: bootwrapper bits
  powerpc: wii: device tree
  powerpc: gamecube: default config
  powerpc: gamecube: platform support
  powerpc: gamecube/wii: flipper interrupt controller support
  powerpc: gamecube/wii: udbg support for usbgecko
  powerpc: gamecube/wii: do not include PCI support
  powerpc: gamecube/wii: declare as non-coherent platforms
  powerpc: gamecube/wii: introduce GAMECUBE_COMMON
  ...

Fix up conflicts in arch/powerpc/mm/fsl_booke_mmu.c.

Hopefully even close to correctly.
2009-12-16 13:26:53 -08:00
Stephen Rothwell
ae4cec4736 powerpc: fix up for mmu_mapin_ram api change
Today's linux-next build (powerpc ppc44x_defconfig) failed like this:

arch/powerpc/mm/pgtable_32.c: In function 'mapin_ram':
arch/powerpc/mm/pgtable_32.c:318: error: too many arguments to function 'mmu_mapin_ram'

Casued by commit de32400dd2 ("wii: use both
mem1 and mem2 as ram").

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-12-14 09:04:24 -07:00
Albert Herranz
c5df7f7751 powerpc: allow ioremap within reserved memory regions
Add a flag to let a platform ioremap memory regions marked as reserved.

This flag will be used later by the Nintendo Wii support code to allow
ioremapping the I/O region sitting between MEM1 and MEM2 and marked
as reserved RAM in the patch "wii: use both mem1 and mem2 as ram".

This will no longer be needed when proper discontig memory support
for 32-bit PowerPC is added to the kernel.

Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-12-12 22:24:32 -07:00
Albert Herranz
de32400dd2 wii: use both mem1 and mem2 as ram
The Nintendo Wii video game console has two discontiguous RAM regions:
- MEM1: 24MB @ 0x00000000
- MEM2: 64MB @ 0x10000000

Unfortunately, the kernel currently does not support discontiguous RAM
memory regions on 32-bit PowerPC platforms.

This patch adds a series of workarounds to allow the use of the second
memory region (MEM2) as RAM by the kernel.
Basically, a single range of memory from the beginning of MEM1 to the
end of MEM2 is reported to the kernel, and a memory reservation is
created for the hole between MEM1 and MEM2.

With this patch the system is able to use all the available RAM and not
just ~27% of it.

This will no longer be needed when proper discontig memory support
for 32-bit PowerPC is added to the kernel.

Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-12-12 22:24:31 -07:00
Joakim Tjernlund
5efab4a02c powerpc/8xx: Invalidate non present TLBs
8xx sometimes need to load a invalid/non-present TLBs in
it DTLB asm handler.

These must be invalidated separaly as linux mm don't.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-09 17:10:35 +11:00
David Gibson
d28513bc7f powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT
Commit a0668cdc15 cleans up the handling
of kmem_caches for allocating various levels of pagetables.
Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to
the latter's cleverly hidden technique of adding some extra allocation
space to the top level page directory to store the extra information
it needs.

Since that extra allocation really doesn't fit into the cleaned up
page directory allocating scheme, this patch alters
CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct
subpage_prot_table as part of the mm_context_t.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-08 15:59:33 +11:00
Benjamin Herrenschmidt
5a7b4193e5 Revert "powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT"
This reverts commit c045256d14.

It breaks build when CONFIG_PPC_SUBPAGE_PROT is not set. I will
commit a fixed version separately

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-02 09:28:35 +11:00
David Gibson
39adfa540f powerpc/mm: Fix bug in gup_hugepd()
Commit a4fe3ce769 introduced a new
get_user_pages() path for hugepages on powerpc.  Unfortunately, there
is a bug in it's loop logic, which can cause it to overrun the end of
the intended region.  This came about by copying the logic from the
normal page path, which assumes the address and end parameters have
been pagesize aligned at the top-level.  Since they're not *hugepage*
size aligned, the simplistic logic could step over the end of the gup
region without triggering the loop end condition.

This patch fixes the bug by using the technique that the normal page
path uses in levels above the lowest to truncate the ending address to
something we know we'll match with.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-27 14:24:30 +11:00
David Gibson
c045256d14 powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT
Commit a0668cdc15 cleans up the handling
of kmem_caches for allocating various levels of pagetables.
Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to
the latter's cleverly hidden technique of adding some extra allocation
space to the top level page directory to store the extra information
it needs.

Since that extra allocation really doesn't fit into the cleaned up
page directory allocating scheme, this patch alters
CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct
subpage_prot_table as part of the mm_context_t.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-27 14:24:29 +11:00
Kumar Gala
8b27f0b61d powerpc/fsl-booke: Rework TLB CAM code
Re-write the code so its more standalone and fixed some issues:
* Bump'd # of CAM entries to 64 to support e500mc
* Make the code handle MAS7 properly
* Use pr_cont instead of creating a string as we go

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-11-20 16:45:33 -06:00
Benjamin Herrenschmidt
0526484aa3 Merge commit 'origin/master' into next 2009-11-12 10:59:04 +11:00
Alexander Graf
e85a47106a Split init_new_context and destroy_context
For KVM we need to allocate a new context id, but don't really care about
all the mm context around it.

So let's split the alloc and destroy functions for the context id, so we can
grab one without allocating an mm context.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:50:25 +11:00
Alexander Graf
4ab79aa801 Export symbols for KVM module
We want to be able to build KVM as a module. To enable us doing so, we
need some more exports from core Linux parts.

This patch exports all functions and variables that are required for KVM.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:50:24 +11:00
Benjamin Herrenschmidt
f1167fb318 powerpc/mm: Remove debug context clamping from nohash code
I inadvertently left that debug code enabled, causing the number of
contexts to be clamped to 31 which is going to slow things down on
4xx and just plain breaks 8xx

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:41:59 +11:00
David Gibson
0895ecda79 powerpc/mm: Bring hugepage PTE accessor functions back into sync with normal accessors
The hugepage arch code provides a number of hook functions/macros
which mirror the functionality of various normal page pte access
functions.  Various changes in the normal page accessors (in
particular BenH's recent changes to the handling of lazy icache
flushing and PAGE_EXEC) have caused the hugepage versions to get out
of sync with the originals.  In some cases, this is a bug, at least on
some MMU types.

One of the reasons that some hooks were not identical to the normal
page versions, is that the fact we're dealing with a hugepage needed
to be passed down do use the correct dcache-icache flush function.
This patch makes the main flush_dcache_icache_page() function hugepage
aware (by checking for the PageCompound flag).  That in turn means we
can make set_huge_pte_at() just a call to set_pte_at() bringing it
back into sync.  As a bonus, this lets us remove the
hash_huge_page_do_lazy_icache() function, replacing it with a call to
the hash_page_do_lazy_icache() function it was based on.

Some other hugepage pte access hooks - huge_ptep_get_and_clear() and
huge_ptep_clear_flush() - are not so easily unified, but this patch at
least brings them back into sync with the current versions of the
corresponding normal page functions.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:21:23 +11:00
David Gibson
883a3e5236 powerpc/mm: Split hash MMU specific hugepage code into a new file
This patch separates the parts of hugetlbpage.c which are inherently
specific to the hash MMU into a new hugelbpage-hash64.c file.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:59 +11:00
David Gibson
d1837cba5d powerpc/mm: Cleanup initialization of hugepages on powerpc
This patch simplifies the logic used to initialize hugepages on
powerpc.  The somewhat oddly named set_huge_psize() is renamed to
add_huge_page_size() and now does all necessary verification of
whether it's given a valid hugepage sizes (instead of just some) and
instantiates the generic hstate structure (but no more).

hugetlbpage_init() now steps through the available pagesizes, checks
if they're valid for hugepages by calling add_huge_page_size() and
initializes the kmem_caches for the hugepage pagetables.  This means
we can now eliminate the mmu_huge_psizes array, since we no longer
need to pass the sizing information for the pagetable caches from
set_huge_psize() into hugetlbpage_init()

Determination of the default huge page size is also moved from the
hash code into the general hugepage code.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:58 +11:00
David Gibson
a4fe3ce769 powerpc/mm: Allow more flexible layouts for hugepage pagetables
Currently each available hugepage size uses a slightly different
pagetable layout: that is, the bottem level table of pointers to
hugepages is a different size, and may branch off from the normal page
tables at a different level.  Every hugepage aware path that needs to
walk the pagetables must therefore look up the hugepage size from the
slice info first, and work out the correct way to walk the pagetables
accordingly.  Future hardware is likely to add more possible hugepage
sizes, more layout options and more mess.

This patch, therefore reworks the handling of hugepage pagetables to
reduce this complexity.  In the new scheme, instead of having to
consult the slice mask, pagetable walking code can check a flag in the
PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
and the entry also contains the information (eseentially hugepage
shift) necessary to then interpret that table without recourse to the
slice mask.  This scheme can be extended neatly to handle multiple
levels of self-describing "special" hugepage pagetables, although for
now we assume only one level exists.

This approach means that only the pagetable allocation path needs to
know how the pagetables should be set out.  All other (hugepage)
pagetable walking paths can just interpret the structure as they go.

There already was a flag bit in PGD/PUD/PMD entries for hugepage
directory pointers, but it was only used for debug.  We alter that
flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
pointer (normally it would be 1 since the pointer lies in the linear
mapping).  This means that asm pagetable walking can test for (and
punt on) hugepage pointers with the same test that checks for
unpopulated page directory entries (beq becomes bge), since hugepage
pointers will always be positive, and normal pointers always negative.

While we're at it, we get rid of the confusing (and grep defeating)
#defining of hugepte_shift to be the same thing as mmu_huge_psizes.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:58 +11:00
David Gibson
a0668cdc15 powerpc/mm: Cleanup management of kmem_caches for pagetables
Currently we have a fair bit of rather fiddly code to manage the
various kmem_caches used to store page tables of various levels.  We
generally have two caches holding some combination of PGD, PUD and PMD
tables, plus several more for the special hugepage pagetables.

This patch cleans this all up by taking a different approach.  Rather
than the caches being designated as for PUDs or for hugeptes for 16M
pages, the caches are simply allocated to be a specific size.  Thus
sharing of caches between different types/levels of pagetables happens
naturally.  The pagetable size, where needed, is passed around encoded
in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the
pagetable contains 2^n pointers.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:57 +11:00
David Gibson
f71dc176aa powerpc/mm: Make hpte_need_flush() correctly mask for multiple page sizes
Currently, hpte_need_flush() only correctly flushes the given address
for normal pages.  Callers for hugepages are required to mask the
address themselves.

But hpte_need_flush() already looks up the page sizes for its own
reasons, so this is a rather silly imposition on the callers.  This
patch alters it to mask based on the pagesize it has looked up itself,
and removes the awkward masking code in the hugepage caller.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:57 +11:00
Benjamin Herrenschmidt
8d8997f34e powerpc/mm: Fix hang accessing top of vmalloc space
On pSeries, we always force the IO space to be mapped using 4K
pages even with a 64K base page size to cope with some limitations
in the HV interface to some devices.

However, the SLB miss handler code to discriminate between vmalloc
and ioremap space uses a CPU feature section such that the code
is nop'ed out when the processor support large pages non-cachable
mappings.

Thus, we end up always using the ioremap page size for vmalloc
segments on such processors, causing a discrepency between the
segment and the hash table, and thus a hang continously hashing
the page.

It works for the first segment of the vmalloc space since that
segment is "bolted" in by C code correctly, and thankfully we
almost never use the vmalloc space beyond the first segment,
but the new percpu code made the bug happen.

This fixes it by removing the feature section from the assembly,
we now always do the comparison between vmalloc and ioremap.

Signed-off-by; Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-14 16:58:36 +11:00
Rex Feany
e0908085fc powerpc/8xx: Fix regression introduced by cache coherency rewrite
After upgrading to the latest kernel on my mpc875 userspace started
running incredibly slow (hours to get to a shell, even!).
I tracked it down to commit 8d30c14cab,
that patch removed a work-around for the 8xx. Adding it
back makes my problem go away.

Signed-off-by: Rex Feany <rfeany@mrv.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24 15:56:30 +10:00
Huang Weiyi
b9eceb2307 powerpc/mm: Remove duplicated #include
Remove duplicated #include('s) in
  arch/powerpc/mm/tlb_low_64e.S

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24 15:31:42 +10:00
KAMEZAWA Hiroyuki
3089aa1b0c kcore: use registerd physmem information
For /proc/kcore, each arch registers its memory range by kclist_add().
In usual,

	- range of physical memory
	- range of vmalloc area
	- text, etc...

are registered but "range of physical memory" has some troubles.  It
doesn't updated at memory hotplug and it tend to include unnecessary
memory holes.  Now, /proc/iomem (kernel/resource.c) includes required
physical memory range information and it's properly updated at memory
hotplug.  Then, it's good to avoid using its own code(duplicating
information) and to rebuild kclist for physical memory based on
/proc/iomem.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki
908eedc616 walk system ram range
Originally, walk_memory_resource() was introduced to traverse all memory
of "System RAM" for detecting memory hotplug/unplug range.  For doing so,
flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
memory hotplug.

But for using other purpose, /proc/kcore, this may includes some firmware
area marked as IORESOURCE_BUSY | IORESOUCE_MEM.  This patch makes the
check strict to find out busy "System RAM".

Note: PPC64 keeps their own walk_memory_resouce(), which walk through
ppc64's lmb informaton.  Because old kclist_add() is called per lmb, this
patch makes no difference in behavior, finally.

And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
Because pfn_valid() just show "there is memmap or not* and cannot be used
for "there is physical memory or not", this function is useful in generic
to scan physical memory range.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki
a0614da88b kcore: register vmalloc area in generic way
For /proc/kcore, vmalloc areas are registered per arch.  But, all of them
registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies
them.  By this.  archs which have no kclist_add() hooks can see vmalloc
area correctly.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki
c30bb2a25f kcore: add kclist types
Presently, kclist_add() only eats start address and size as its arguments.
Considering to make kclist dynamically reconfigulable, it's necessary to
know which kclists are for System RAM and which are not.

This patch add kclist types as
  KCORE_RAM
  KCORE_VMALLOC
  KCORE_TEXT
  KCORE_OTHER

This "type" is used in a patch following this for detecting KCORE_RAM.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:41 -07:00
Geert Uytterhoeven
cc013a8890 arches: drop superfluous casts in nr_free_pages() callers
Commit 9617729941 ("Drop free_pages()")
modified nr_free_pages() to return 'unsigned long' instead of 'unsigned
int'.  This made the casts to 'unsigned long' in most callers superfluous,
so remove them.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <zankel@tensilica.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:34 -07:00
Ingo Molnar
cdd6c482c9 perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:28:04 +02:00
Linus Torvalds
723e9db7a4 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
  powerpc/nvram: Enable use Generic NVRAM driver for different size chips
  powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
  powerpc/ps3: Workaround for flash memory I/O error
  powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
  powerpc/perf_counters: Reduce stack usage of power_check_constraints
  powerpc: Fix bug where perf_counters breaks oprofile
  powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
  powerpc/irq: Improve nanodoc
  powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
  powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
  powerpc/book3e: Add missing page sizes
  powerpc/pseries: Fix to handle slb resize across migration
  powerpc/powermac: Thermal control turns system off too eagerly
  powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
  powerpc/405ex: support cuImage via included dtb
  powerpc/405ex: provide necessary fixup function to support cuImage
  powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
  powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
  powerpc/44x: Update Arches defconfig
  powerpc/44x: Update Arches dts
  ...

Fix up conflicts in drivers/char/agp/uninorth-agp.c
2009-09-15 09:51:09 -07:00
Linus Torvalds
ada3fa1505 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
  powerpc64: convert to dynamic percpu allocator
  sparc64: use embedding percpu first chunk allocator
  percpu: kill lpage first chunk allocator
  x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
  percpu: update embedding first chunk allocator to handle sparse units
  percpu: use group information to allocate vmap areas sparsely
  vmalloc: implement pcpu_get_vm_areas()
  vmalloc: separate out insert_vmalloc_vm()
  percpu: add chunk->base_addr
  percpu: add pcpu_unit_offsets[]
  percpu: introduce pcpu_alloc_info and pcpu_group_info
  percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
  percpu: add @align to pcpu_fc_alloc_fn_t
  percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
  percpu: drop @static_size from first chunk allocators
  percpu: generalize first chunk allocator selection
  percpu: build first chunk allocators selectively
  percpu: rename 4k first chunk allocator to page
  percpu: improve boot messages
  percpu: fix pcpu_reclaim() locking
  ...

Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-15 09:39:44 -07:00
Brian King
46db2f86a3 powerpc/pseries: Fix to handle slb resize across migration
The SLB can change sizes across a live migration, which was not
being handled, resulting in possible machine crashes during
migration if migrating to a machine which has a smaller max SLB
size than the source machine. Fix this by first reducing the
SLB size to the minimum possible value, which is 32, prior to
migration. Then during the device tree update which occurs after
migration, we make the call to ensure the SLB gets updated. Also
add the slb_size to the lparcfg output so that the migration
tools can check to make sure the kernel has this capability
before allowing migration in scenarios where the SLB size will change.

BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid
      breaking ppc32 build

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-02 16:19:01 +10:00
Kumar Gala
df5d6ecf81 powerpc/mm: Add MMU features for TLB reservation & Paired MAS registers
Support for TLB reservation (or TLB Write Conditional) and Paired MAS
registers are optional for a processor implementation so we handle
them via MMU feature sections.

We currently only used paired MAS registers to access the full RPN + perm
bits that are kept in MAS7||MAS3.  We assume that if an implementation has
hardware page table at this time it also implements in TLB reservations.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:12 +10:00
Benjamin Herrenschmidt
3c2ee2d9f4 Merge commit 'kumar/next' into next 2009-08-27 13:13:41 +10:00
Benjamin Herrenschmidt
ea3cc330ac powerpc/mm: Cleanup handling of execute permission
This is an attempt at cleaning up a bit the way we handle execute
permission on powerpc. _PAGE_HWEXEC is gone, _PAGE_EXEC is now only
defined by CPUs that can do something with it, and the myriad of
#ifdef's in the I$/D$ coherency code is reduced to 2 cases that
hopefully should cover everything.

The logic on BookE is a little bit different than what it was though
not by much. Since now, _PAGE_EXEC will be set by the generic code
for executable pages, we need to filter out if they are unclean and
recover it. However, I don't expect the code to be more bloated than
it already was in that area due to that change.

I could boast that this brings proper enforcing of per-page execute
permissions to all BookE and 40x but in fact, we've had that now for
some time as a side effect of my previous rework in that area (and
I didn't even know it :-) We would only enable execute permission if
the page was cache clean and we would only cache clean it if we took
and exec fault. Since we now enforce that the later only work if
VM_EXEC is part of the VMA flags, we de-fact already enforce per-page
execute permissions... Unless I missed something

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-27 13:12:51 +10:00
Kumar Gala
fc4bdb35fb powerpc/booke: Move MMUCSR definition into mmu-book3e.h
The MMUCSR is now defined as part of the Book-3E architecture so we
can move it into mmu-book3e.h and add some of the additional bits
defined by the architecture specs.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-08-24 20:48:05 -05:00
Benjamin Herrenschmidt
4f0dbc2781 Merge commit 'paulus-perf/master' into next 2009-08-20 11:07:56 +10:00
Kumar Gala
797a747a82 powerpc/mm: Fix assert_pte_locked to work properly on uniprocessor
Since the pte_lockptr is a spinlock it gets optimized away on
uniprocessor builds so using spin_is_locked is not correct.  We can use
assert_spin_locked instead and get the proper behavior between UP and
SMP builds.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:28:32 +10:00
Roel Kluin
8dcd038a13 powerpc/fsl-booke: read buffer overflow
cam[tlbcam_index] is checked before tlbcam_index < ARRAY_SIZE(cam)

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:27:12 +10:00
Kumar Gala
67050b5c3e powerpc/mm: Fix switch_mmu_context to iterate of the proper list of cpus
Introduced a temporary variable into our iterating over the list cpus
that are threads on the same core.  For some reason Ben forgot how for
loops work.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:12 +10:00
Benjamin Herrenschmidt
2d27cfd328 powerpc: Remaining 64-bit Book3E support
This contains all the bits that didn't fit in previous patches :-) This
includes the actual exception handlers assembly, the changes to the
kernel entry, other misc bits and wiring it all up in Kconfig.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:11 +10:00
Benjamin Herrenschmidt
32a74949b7 powerpc/mm: Add support for SPARSEMEM_VMEMMAP on 64-bit Book3E
The base TLB support didn't include support for SPARSEMEM_VMEMMAP, though
we did carve out some virtual space for it, the necessary support code
wasn't there. This implements it by using 16M pages for now, though the
page size could easily be changed at runtime if necessary.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:10 +10:00
Benjamin Herrenschmidt
25d21ad6e7 powerpc: Add TLB management code for 64-bit Book3E
This adds the TLB miss handler assembly, the low level TLB flush routines
along with the necessary hook for dealing with our virtual page tables
or indirect TLB entries that need to be flushes when PTE pages are freed.

There is currently no support for hugetlbfs

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:09 +10:00
Benjamin Herrenschmidt
a8f7758c1c powerpc/mm: Move around mmu_gathers definition on 64-bit
The definition for the global structure mmu_gathers, used by generic code,
is currently defined in multiple places not including anything used by
64-bit Book3E. This changes it by moving to one place common to all
processors.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:09 +10:00
Benjamin Herrenschmidt
57e2a99f74 powerpc: Add memory management headers for new 64-bit BookE
This adds the PTE and pgtable format definitions, along with changes
to the kernel memory map and other definitions related to implementing
support for 64-bit Book3E. This also shields some asm-offset bits that
are currently only relevant on 32-bit

We also move the definition of the "linux" page size constants to
the common mmu.h file and add a few sizes that are relevant to
embedded processors.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:06 +10:00
Benjamin Herrenschmidt
c7cc58a1ad powerpc/mm: Rework & cleanup page table freeing code path
That patch used to just add a hook to page table flushing but
pulling that string brought out a whole bunch of issues, so it
now does that and more:

 - We now make the RCU batching of page freeing SMP only, as I
believe it was intended initially. We make a few more things compile
to nothing on !CONFIG_SMP

 - Some macros are turned into functions, though that forced me to
out of line a few stuffs due to unsolvable include depenencies,
however it's probably better that way anyway, it's not -that-
critical code path.

 - 32-bit didn't call pte_free_finish() on tlb_flush() which means
that it wouldn't push out the batch to RCU for delayed freeing when
a bunch of page tables have been freed, they would just stay in there
until the batch gets full.

64-bit BookE will use that hook to maintain the virtually linear
page tables or the indirect entries in the TLB when using the
HW loader.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:24:56 +10:00
Benjamin Herrenschmidt
d4e167da4c powerpc/mm: Make low level TLB flush ops on BookE take additional args
We need to pass down whether the page is direct or indirect and we'll
need to pass the page size to _tlbil_va and _tlbivax_bcast

We also add a new low level _tlbil_pid_noind() which does a TLB flush
by PID but avoids flushing indirect entries if possible

This implements those new prototypes but defines them with inlines
or macros so that no additional arguments are actually passed on current
processors.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:41 +10:00
Benjamin Herrenschmidt
a245067e20 powerpc/mm: Add support for early ioremap on non-hash 64-bit processors
This adds some code to do early ioremap's using page tables instead of
bolting entries in the hash table. This will be used by the upcoming
64-bits BookE port.

The patch also changes the test for early vs. late ioremap to use
slab_is_available() instead of our old hackish mem_init_done.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:40 +10:00
Benjamin Herrenschmidt
fcce810986 powerpc/mm: Add HW threads support to no_hash TLB management
The current "no hash" MMU context management code is written with
the assumption that one CPU == one TLB. This is not the case on
implementations that support HW multithreading, where several
linux CPUs can share the same TLB.

This adds some basic support for this to our context management
and our TLB flushing code.

It also cleans up the optional debugging output a bit

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:37 +10:00
Benjamin Herrenschmidt
ee43eb788b powerpc: Use names rather than numbers for SPRGs (v2)
The kernel uses SPRG registers for various purposes, typically in
low level assembly code as scratch registers or to hold per-cpu
global infos such as the PACA or the current thread_info pointer.

We want to be able to easily shuffle the usage of those registers
as some implementations have specific constraints realted to some
of them, for example, some have userspace readable aliases, etc..
and the current choice isn't always the best.

This patch should not change any code generation, and replaces the
usage of SPRN_SPRGn everywhere in the kernel with a named replacement
and adds documentation next to the definition of the names as to
what those are used for on each processor family.

The only parts that still use the original numbers are bits of KVM
or suspend/resume code that just blindly needs to save/restore all
the SPRGs.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:27 +10:00
Anton Blanchard
de4376c284 powerpc: Preload application text segment instead of TASK_UNMAPPED_BASE
TASK_UNMAPPED_BASE is not used with the new top down mmap layout. We can
reuse this preload slot by loading in the segment at 0x10000000, where almost
all PowerPC binaries are linked at.

On a microbenchmark that bounces a token between two 64bit processes over pipes
and calls gettimeofday each iteration (to access the VDSO), both the 32bit and
64bit context switch rate improves (tested on a 4GHz POWER6):

32bit: 273k/sec -> 283k/sec
64bit: 277k/sec -> 284k/sec

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:26 +10:00
Anton Blanchard
5eb9bac040 powerpc: Rearrange SLB preload code
With the new top down layout it is likely that the pc and stack will be in the
same segment, because the pc is most likely in a library allocated via a top
down mmap. Right now we bail out early if these segments match.

Rearrange the SLB preload code to sanity check all SLB preload addresses
are not in the kernel, then check all addresses for conflicts.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:25 +10:00
Paul Mackerras
9c1e105238 powerpc: Allow perf_counters to access user memory at interrupt time
This provides a mechanism to allow the perf_counters code to access
user memory in a PMU interrupt routine.  Such an access can cause
various kinds of interrupt: SLB miss, MMU hash table miss, segment
table miss, or TLB miss, depending on the processor.  This commit
only deals with 64-bit classic/server processors, which use an MMU
hash table.  32-bit processors are already able to access user memory
at interrupt time.  Since we don't soft-disable on 32-bit, we avoid
the possibility of reentering hash_page or the TLB miss handlers,
since they run with interrupts disabled.

On 64-bit processors, an SLB miss interrupt on a user address will
update the slb_cache and slb_cache_ptr fields in the paca.  This is
OK except in the case where a PMU interrupt occurs in switch_slb,
which also accesses those fields.  To prevent this, we hard-disable
interrupts in switch_slb.  Interrupts are already soft-disabled at
this point, and will get hard-enabled when they get soft-enabled
later.

This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
and to make sure that it clears the slb_cache_ptr when called from
other callers than switch_slb, the existing routine is renamed to
__slb_flush_and_rebolt, which is called by switch_slb and the new
version of slb_flush_and_rebolt.

Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
hard_irq_disable() to protect the per-cpu variables used there and
in ste_allocate.

If a MMU hashtable miss interrupt occurs, normally we would call
hash_page to look up the Linux PTE for the address and create a HPTE.
However, hash_page is fairly complex and takes some locks, so to
avoid the possibility of deadlock, we check the preemption count
to see if we are in a (pseudo-)NMI handler, and if so, we don't call
hash_page but instead treat it like a bad access that will get
reported up through the exception table mechanism.  An interrupt
whose handler runs even though the interrupt occurred when
soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
handler, which should use nmi_enter()/nmi_exit() rather than
irq_enter()/irq_exit().

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-08-18 14:48:43 +10:00
Tejun Heo
384be2b18a Merge branch 'percpu-for-linus' into percpu-for-next
Conflicts:
	arch/sparc/kernel/smp_64.c
	arch/x86/kernel/cpu/perf_counter.c
	arch/x86/kernel/setup_percpu.c
	drivers/cpufreq/cpufreq_ondemand.c
	mm/percpu.c

Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids.  As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-14 14:45:31 +09:00
Kumar Gala
5156ddce6c powerpc/mm: Fix SMP issue with MMU context handling code
In switch_mmu_context() if we call steal_context_smp() to get a context
to use we shouldn't fall through and than call steal_context_up().  Doing
so can be problematic in that the 'mm' that steal_context_up() ends up
using will not get marked dirty in the stale_map[] for other CPUs that
might have used that mm.  Thus we could end up with stale TLB entries in
the other CPUs that can cause all kinda of havoc.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-07-29 23:05:43 -05:00
Benjamin Herrenschmidt
9e1b32caa5 mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()

Upcoming paches to support the new 64-bit "BookE" powerpc architecture
will need to have the virtual address corresponding to PTE page when
freeing it, due to the way the HW table walker works.

Basically, the TLB can be loaded with "large" pages that cover the whole
virtual space (well, sort-of, half of it actually) represented by a PTE
page, and which contain an "indirect" bit indicating that this TLB entry
RPN points to an array of PTEs from which the TLB can then create direct
entries. Thus, in order to invalidate those when PTE pages are deleted,
we need the virtual address to pass to tlbilx or tlbivax instructions.

The old trick of sticking it somewhere in the PTE page struct page sucks
too much, the address is almost readily available in all call sites and
almost everybody implemets these as macros, so we may as well add the
argument everywhere. I added it to the pmd and pud variants for consistency.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-27 12:10:38 -07:00
Michael Ellerman
30c5af435b powerpc: Use pr_devel() in do_dcache_icache_coherency()
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   2036     368       8    2412     96c arch/powerpc/mm/pgtable.o

size after:
   text    data     bss     dec     hex filename
   1677     248       8    1933     78d arch/powerpc/mm/pgtable.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:24 +10:00
Michael Ellerman
29e5fa59e5 powerpc: Use pr_devel() in arch/powerpc/mm/gup.c
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   3252     384       0    3636     e34 arch/powerpc/mm/gup.o

size after:
   text    data     bss     dec     hex filename
   2576      96       0    2672     a70 arch/powerpc/mm/gup.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:23 +10:00
Michael Ellerman
651e2dd2a1 powerpc: Cleanup & use pr_devel() in arch/powerpc/mm/slb.c
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   3261     416       4    3681     e61 arch/powerpc/mm/slb.o

size after:
   text    data     bss     dec     hex filename
   2861     248       4    3113     c29 arch/powerpc/mm/slb.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:22 +10:00
Michael Ellerman
a1ac38ab98 powerpc: Use pr_devel() in arch/powerpc/mm/mmu_context_nohash.c
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text	   data	    bss	    dec	    hex	filename
   1508	     48	     28	   1584	    630	powerpc/mm/mmu_context_nohash.o

size after:
   text	   data	    bss	    dec	    hex	filename
   1088	      0	     28	   1116	    45c	powerpc/mm/mmu_context_nohash.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:22 +10:00
Joe Perches
d258e64ef5 powerpc: Remove unnecessary semicolons
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:21 +10:00
Tejun Heo
c43768cbb7 Merge branch 'master' into for-next
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes.  As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.

Conflicts:
	arch/alpha/include/asm/percpu.h
	arch/mn10300/kernel/vmlinux.lds.S
	include/linux/percpu-defs.h
2009-07-04 07:13:18 +09:00
Benjamin Herrenschmidt
850f6ac316 powerpc/mm: Make k(un)map_atomic out of line
Those functions are way too big to be inline, besides, kmap_atomic()
wants to call debug_kmap_atomic() which isn't exported for modules
and causes module link failures.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-26 14:37:25 +10:00
Tejun Heo
204fba4aa3 percpu: cleanup percpu array definitions
Currently, the following three different ways to define percpu arrays
are in use.

1. DEFINE_PER_CPU(elem_type[array_len], array_name);
2. DEFINE_PER_CPU(elem_type, array_name[array_len]);
3. DEFINE_PER_CPU(elem_type, array_name)[array_len];

Unify to #1 which correctly separates the roles of the two parameters
and thus allows more flexibility in the way percpu variables are
defined.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm@kvack.org
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
2009-06-24 15:13:45 +09:00
Linus Torvalds
d06063cc22 Move FAULT_FLAG_xyz into handle_mm_fault() callers
This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault().  All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-21 13:08:22 -07:00
Michael Ellerman
ba55bd7436 powerpc: Add configurable -Werror for arch/powerpc
Add the option to build the code under arch/powerpc with -Werror.

The intention is to make it harder for people to inadvertantly introduce
warnings in the arch/powerpc code. It needs to be configurable so that
if a warning is introduced, people can easily work around it while it's
being fixed.

The option is a negative, ie. don't enable -Werror, so that it will be
turned on for allyes and allmodconfig builds.

The default is n, in the hope that developers will build with -Werror,
that will probably lead to some build breaks, I am prepared to be flamed.

It's not enabled for math-emu, which is a steaming pile of warnings.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-16 14:15:45 +10:00
Benjamin Herrenschmidt
7dafd239ab Merge commit 'origin/master' into next 2009-06-15 10:36:54 +10:00
Sankar P
5cdcd9d691 trivial: spelling fix in ppc code comments
Fixes a trivial spelling error in powerpc code comments.

Signed-off-by: Sankar P <sankar.curiosity@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12 18:01:47 +02:00
Benjamin Herrenschmidt
bc47ab0241 Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/kernel/asm-offsets.c
2009-06-12 16:53:38 +10:00
Peter Zijlstra
f4dbfa8f31 perf_counter: Standardize event names
Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:15 +02:00
Benjamin Herrenschmidt
944916858a powerpc: Shield code specific to 64-bit server processors
This is a random collection of added ifdef's around portions of
code that only mak sense on server processors. Using either
CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate.

This is meant to make the future merging of Book3E 64-bit support
easier.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:47:38 +10:00
Benjamin Herrenschmidt
d3f6204a7d powerpc: Set init_bootmem_done on NUMA platforms as well
For some obscure reason, we only set init_bootmem_done after initializing
bootmem when NUMA isn't enabled. We even document this next to the declaration
of that global in system.h which of course I didn't read before I had to
debug why some WIP code wasn't working properly...

This patch changes it so that we always set it after bootmem is initialized
which should have always been the case... go figure !

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:43:04 +10:00
Benjamin Herrenschmidt
b46b6942b3 powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock
The MMU context_lock can be taken from switch_mm() while the
rq->lock is held. The rq->lock can also be taken from interrupts,
thus if we get interrupted in destroy_context() with the context
lock held and that interrupt tries to take the rq->lock, there's
a possible deadlock scenario with another CPU having the rq->lock
and calling switch_mm() which takes our context lock.

The fix is to always ensure interrupts are off when taking our
context lock. The switch_mm() path is already good so this fixes
the destroy_context() path.

While at it, turn the context lock into a new style spinlock.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:43:04 +10:00
Benjamin Herrenschmidt
3035c8634f powerpc/mm: Fix some SMP issues with MMU context handling
This patch fixes a couple of issues that can happen as a result
of steal_context() dropping the context_lock when all possible
PIDs are ineligible for stealing (hopefully an extremely hard to
hit occurence).

This case exposes the possibility of a stale context_mm[] entry
to be seen since destroy_context() doesn't clear it and the free
map isn't re-tested. It also means steal_context() will not notice
a context freed while the lock was help, thus possibly trying to
steal a context when a free one was available.

This fixes it by always returning to the caller from steal_context
when it dropped the lock with a return value that causes the
caller to re-samble the number of free contexts, along with
properly clearing the context_mm[] array for destroyed contexts.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:42:21 +10:00
Ingo Molnar
23db9f430b Merge branch 'linus' into perfcounters/core
Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6
              based - to pick up the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-01 10:01:39 +02:00
Benjamin Herrenschmidt
435462c6e6 Merge branch 'merge' into next 2009-05-29 13:54:52 +10:00
Benjamin Herrenschmidt
8b31e49d1d powerpc: Fix up dma_alloc_coherent() on platforms without cache coherency.
The implementation we just revived has issues, such as using a
Kconfig-defined virtual address area in kernel space that nothing
actually carves out (and thus will overlap whatever is there),
or having some dependencies on being self contained in a single
PTE page which adds unnecessary constraints on the kernel virtual
address space.

This fixes it by using more classic PTE accessors and automatically
locating the area for consistent memory, carving an appropriate hole
in the kernel virtual address space, leaving only the size of that
area as a Kconfig option. It also brings some dma-mask related fixes
from the ARM implementation which was almost identical initially but
grew its own fixes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-27 16:33:59 +10:00
Benjamin Herrenschmidt
f637a49e50 powerpc: Minor cleanups of kernel virt address space definitions
Make FIXADDR_TOP a compile time constant and cleanup a
couple of definitions relative to the layout of the kernel
address space on ppc32. We also print out that layout at
boot time for debugging purposes.

This is a pre-requisite for properly fixing non-coherent
DMA allocactions.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-27 16:32:50 +10:00
Benjamin Herrenschmidt
b16e7766d6 powerpc: Move dma-noncoherent.c from arch/powerpc/lib to arch/powerpc/mm
(pre-requisite to make the next patches more palatable)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-27 16:32:05 +10:00
Hideo Saito
8e35961b57 powerpc/mm: Fix broken MMU PID stealing on !SMP
The recent rework of the MMU PID handling for non-hash CPUs has a
subtle bug in the !SMP "optimized" variant of the PID stealing
function.  It clears the PID in the mm context before it calls
local_flush_tlb_mm(). However, the later will not flush anything
if the PID in the context is clear...

Signed-off-by: Hideo Saito <hsaito.ppc@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-26 13:46:49 +10:00
Milton Miller
60dbf43851 powerpc: Add 2.06 tlbie mnemonics
This adds the PowerPC 2.06 tlbie mnemonics and keeps backwards
compatibilty for CPUs before 2.06.

Only useful for bare metal systems.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:21 +10:00
Ingo Molnar
dc3f81b129 Merge commit 'v2.6.30-rc6' into perfcounters/core
Merge reason: this branch was on an -rc4 base, merge it up to -rc6
              to get the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 07:37:49 +02:00
Mel Gorman
af3e4aca47 powerpc: Do not assert pte_locked for hugepage PTE entries
With CONFIG_DEBUG_VM, an assertion is made when changing the protection
flags of a PTE that the PTE is locked. Huge pages use a different pagetable
format and the assertion is bogus and will always trigger with a bug looking
something like

 Unable to handle kernel paging request for data at address 0xf1a00235800006f8
 Faulting instruction address: 0xc000000000034a80
 Oops: Kernel access of bad area, sig: 11 [#1]
 SMP NR_CPUS=32 NUMA Maple
 Modules linked in: dm_snapshot dm_mirror dm_region_hash
  dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic
  pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid
  windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor
  windfarm_cpufreq_clamp windfarm_core i2c_powermac
 NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003
 REGS: c000000003037600 TRAP: 0300   Not tainted (2.6.30-rc3-autokern1)
 MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 28002484  XER: 200fffff
 DAR: f1a00235800006f8, DSISR: 0000000040010000
 TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2
 GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500
 GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001
 GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8
 GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000
 GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20
 GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000
 GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8
 GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880
 NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0
 LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
 Call Trace:
 [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable)
 [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
 [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674
 [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8
 [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828
 [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584
 [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c
 Instruction dump:
 7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4
 7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000

This patch fixes the problem by not asseting the PTE is locked for VMAs
backed by huge pages.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18 15:19:04 +10:00
Becky Bruce
49a8496525 powerpc: Allow mem=x cmdline to work with 4G+
We're currently choking on mem=4g (and above) due to memory_limit
being specified as an unsigned long. Make memory_limit
phys_addr_t to fix this.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15 16:43:41 +10:00
Ingo Molnar
e7fd5d4b3d Merge branch 'linus' into perfcounters/core
Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up
              the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 14:47:05 +02:00
Stephen Rothwell
b62c31ae40 powerpc: fix for long standing bug noticed by gcc 4.4.0
Previous gcc versions didn't notice this because one of the preceding
#ifs always evaluated to true.

gcc 4.4.0 produced this error:

arch/powerpc/mm/tlb_nohash_low.S:206:6: error: #elif with no expression

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-23 08:52:16 -05:00
Kumar Gala
323d23aeac Revert "powerpc: Add support for early tlbilx opcode"
This reverts commit e996557740.  Our HW
guys were able to fix this so it never sees the light of day.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-23 08:51:22 -05:00
Michael Ellerman
24f1ce803c powerpc: Fix crash on CPU hotplug
early_init_mmu_secondary() is called at CPU hotplug time, so it
must be marked as __cpuinit, not __init.

Caused by 757c74d2 ("powerpc/mm: Introduce early_init_mmu() on 64-bit").

Tested-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-22 14:56:34 +10:00
Peter Zijlstra
78f13e9525 perf_counter: allow for data addresses to be recorded
Paul suggested we allow for data addresses to be recorded along with
the traditional IPs as power can provide these.

For now, only the software pagefault events provide data addresses,
but in the future power might as well for some events.

x86 doesn't seem capable of providing this atm.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.394816925@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 19:05:56 +02:00
Kumar Gala
52ce67f157 powerpc/mm: Fix compile warning
arch/powerpc/mm/tlb_nohash.c: In function 'flush_tlb_mm':
arch/powerpc/mm/tlb_nohash.c:128: warning: unused variable 'cpu_mask'

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-07 22:11:10 -05:00
Kumar Gala
e996557740 powerpc: Add support for early tlbilx opcode
During the ISA 2.06 development the opcode for tlbilx changed and some
early implementations used to old opcode.  Add support for a MMU_FTR
fixup to deal with this.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-07 01:36:30 -05:00
Peter Zijlstra
ac17dc8e58 perf_counter: provide major/minor page fault software events
Provide separate sw counters for major and minor page faults.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:29:40 +02:00
Peter Zijlstra
7dd1fcc258 perf_counter: provide pagefault software events
We use the generic software counter infrastructure to provide
page fault events.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:29:37 +02:00
Benjamin Herrenschmidt
757c74d298 powerpc/mm: Introduce early_init_mmu() on 64-bit
This moves some MMU related init code out of setup_64.c into hash_utils_64.c
and calls it early_init_mmu() and early_init_mmu_secondary(). This will
make it easier to plug in a new MMU type.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt
ff7c660092 powerpc/mm: Fix printk type warning in mmu_context_nohash
We need to use %zu instead of %d when printing a sizeof()

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt
d62cbf45a8 powerpc/mm: Rename arch/powerpc/kernel/mmap.c to mmap_64.c
This file is only useful on 64-bit, so we name it accordingly.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:33 +11:00
Benjamin Herrenschmidt
8d1cf34e7a powerpc/mm: Tweak PTE bit combination definitions
This patch tweaks the way some PTE bit combinations are defined, in such a
way that the 32 and 64-bit variant become almost identical and that will
make it easier to bring in a new common pte-* file for the new variant
of the Book3-E support.

The combination of bits defining access to kernel pages are now clearly
separated from the combination used by userspace and the core VM. The
resulting generated code should remain identical unless I made a mistake.

Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB
in ppc_mmu_32.c which could cause kernel mappings to be user accessible when
that option is enabled. Probably something that bitrot.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:33 +11:00
Rusty Russell
56aa4129e8 cpumask: Use mm_cpumask() wrapper instead of cpu_vm_mask
Makes code futureproof against the impending change to mm->cpu_vm_mask.

It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:29 +11:00
Benjamin Herrenschmidt
9e5efaa936 powerpc/mm: Properly wire up get_user_pages_fast() on 32-bit
While we did add support for _PAGE_SPECIAL on some 32-bit platforms,
we never actually built get_user_pages_fast() on them. This fixes
it which requires a little bit of ifdef'ing around.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:34 +11:00
Benjamin Herrenschmidt
1cdab55d8a powerpc: Wire up /proc/vmallocinfo to our ioremap()
This adds the necessary bits and pieces to powerpc implementation of
ioremap to benefit from caller tracking in /proc/vmallocinfo, at least
for ioremap's done after mem init as the older ones aren't tracked.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:10:14 +11:00
Kumar Gala
c3071951d0 powerpc/fsl-booke: Add support for tlbilx instructions
The e500mc core supports the new tlbilx instructions that do core
local invalidates and also provide us the ability to take down
all TLB entries matching a given PID.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-03-09 09:25:38 -05:00
Anton Blanchard
002b0ec73d powerpc: Increase stack gap on 64bit binaries
On 64bit there is a possibility our stack and mmap randomisation will put
the two close enough such that we can't expand our stack to match the ulimit
specified.

To avoid this, start the upper mmap address at 1GB + 128MB below the top of our
address space, so in the worst case we end up with the same ~128MB hole as in
32bit. This works because we randomise the stack over a 1GB range.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:21 +11:00
Anton Blanchard
a5adc91a4b powerpc: Ensure random space between stack and mmaps
get_random_int() returns the same value within a 1 jiffy interval. This means
that the mmap and stack regions will almost always end up the same distance
apart, making a relative offset based attack possible.

To fix this, shift the randomness we use for the mmap region by 1 bit.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:21 +11:00
Anton Blanchard
9f14c42d75 powerpc: Randomise mmap start address
Randomise mmap start address - 8MB on 32bit and 1GB on 64bit tasks.
Until ppc32 uses the mmap.c functionality, this is ppc64 specific.

Before:

# ./test & cat /proc/${!}/maps|tail -2|head -1
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0

After:
# ./test & cat /proc/${!}/maps|tail -2|head -1
f718b000-f7b8c000 rw-p f718b000 00:00 0
f7551000-f7f52000 rw-p f7551000 00:00 0
f6ee7000-f78e8000 rw-p f6ee7000 00:00 0
f74d4000-f7ed5000 rw-p f74d4000 00:00 0
f6e9d000-f789e000 rw-p f6e9d000 00:00 0

Similar for 64bit, but with 1GB of scatter:
# ./test & cat /proc/${!}/maps|tail -2|head -1
fffb97b5000-fffb97b6000 rw-p fffb97b5000 00:00 0
fffce9a3000-fffce9a4000 rw-p fffce9a3000 00:00 0
fffeaaf2000-fffeaaf3000 rw-p fffeaaf2000 00:00 0
fffd88ac000-fffd88ad000 rw-p fffd88ac000 00:00 0
fffbc62e000-fffbc62f000 rw-p fffbc62e000 00:00 0

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:07 +11:00
Anton Blanchard
13a2cb3694 powerpc: Rearrange mmap.c
Rearrange mmap.c to better match the x86 version.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:06 +11:00
Nathan Fontenot
0f16ef7fd3 powerpc/numa: Cleanup hot_add_scn_to_nid
This patch reworks the hot_add_scn_to_nid and its supporting functions
to make them easier to understand.  There are no functional changes in
this patch and has been tested on machine with memory represented in the
device tree as memory nodes and in the ibm,dynamic-memory property.

My previous patch that introduced support for hotplug memory add on
systems whose memory was represented by the ibm,dynamic-memory property
of the device tree only left the code more unintelligible.  This
will hopefully makes things easier to understand.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:04 +11:00
Anton Blanchard
13870b6575 powerpc/mm: Reduce hashtable size when using 64kB pages
At the moment we size the hashtable based on 4kB pages / 2, even on a
64kB kernel. This results in a hashtable that is much larger than it
needs to be.

Grab the real page size and size the hashtable based on that

Note: This only has effect on non hypervisor machines.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:58 +11:00
Benjamin Herrenschmidt
3b7faeb49e Merge commit 'kumar/next' into next 2009-02-18 13:23:30 +11:00
Benjamin Herrenschmidt
82a0a1cc8f Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/include/asm/pgtable-ppc32.h
2009-02-18 13:19:25 +11:00
Dave Hansen
06eccea6c3 powerpc/mm: Fix numa reserve bootmem page selection
Fix the powerpc NUMA reserve bootmem page selection logic.

commit 8f64e1f2d1 (powerpc: Reserve
in bootmem lmb reserved regions that cross NUMA nodes) changed
the logic for how the powerpc LMB reserved regions were converted
to bootmen reserved regions.  As the folowing discussion reports,
the new logic was not correct.

mark_reserved_regions_for_nid() goes through each LMB on the
system that specifies a reserved area.  It searches for
active regions that intersect with that LMB and are on the
specified node.  It attempts to bootmem-reserve only the area
where the active region and the reserved LMB intersect.  We
can not reserve things on other nodes as they may not have
bootmem structures allocated, yet.

We base the size of the bootmem reservation on two possible
things.  Normally, we just make the reservation start and
stop exactly at the start and end of the LMB.

However, the LMB reservations are not aware of NUMA nodes and
on occasion a single LMB may cross into several adjacent
active regions.  Those may even be on different NUMA nodes
and will require separate calls to the bootmem reserve
functions.  So, the bootmem reservation must be trimmed to
fit inside the current active region.

That's all fine and dandy, but we trim the reservation
in a page-aligned fashion.  That's bad because we start the
reservation at a non-page-aligned address: physbase.

The reservation may only span 2 bytes, but that those bytes
may span two pfns and cause a reserve_size of 2*PAGE_SIZE.

Take the case where you reserve 0x2 bytes at 0x0fff and
where the active region ends at 0x1000.  You'll jump into
that if() statment, but node_ar.end_pfn=0x1 and
start_pfn=0x0.  You'll end up with a reserve_size=0x1000,
and then call

  reserve_bootmem_node(node, physbase=0xfff, size=0x1000);

0x1000 may not be on the same node as 0xfff.  Oops.

In almost all the vm code, end_<anything> is not inclusive.
If you have an end_pfn of 0x1234, page 0x1234 is not
included in the range.  Using PFN_UP instead of the
(>> >> PAGE_SHIFT) will make this consistent with the other VM
code.

We also need to do math for the reserved size with physbase
instead of start_pfn.  node_ar.end_pfn << PAGE_SHIFT is
*precisely* the end of the node.  However,
(start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
of the reserved area.  That is, of course, physbase.
If we don't use physbase here, the reserve_size can be
made too large.

From: Dave Hansen <dave@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>  Tested on PS3.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-13 16:37:45 +11:00
Kumar Gala
96a8bac589 powerpc/fsl-booke: Fix compile warning
arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem':
arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t'

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:54:53 -06:00
Kumar Gala
d66c82ea45 powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines
The Power ISA 2.06 added power of two page sizes to the embedded MMU
architecture.  Its done it such a way to be code compatiable with the
existing HW.  Made the minor code changes to support both power of two
and power of four page sizes.  Also added some new MAS bits and macros
that are defined as part of the 2.06 ISA.  Renamed some things to use
the 'Book-3e' concept to convey the new MMU that is based on the
Freescale Book-E MMU programming model.

Note, its still invalid to try and use a page size that isn't supported
by cpu.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:37:11 -06:00
Kumar Gala
f99fb8a2cb powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW
The following commit:

commit 64b3d0e812
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date:   Thu Dec 18 19:13:51 2008 +0000

    powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED

broke setting of the _PAGE_COHERENT bit in the PPC HW PTE.  Since we now
actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it
out before we propogate it to the PPC HW PTE.

Reported-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:07:02 +11:00
Benjamin Herrenschmidt
8d30c14cab powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.

The "old" way was split in 3 different parts depending on the processor type:

   - Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.

   - Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().

   - Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.

That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.

Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.

This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c

The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).

If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.

For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.

For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.

This is also a first step for adding _PAGE_EXEC support for embedded platforms

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:00:10 +11:00
Anton Blanchard
91b0f5ec53 powerpc/mm: Move 64-bit unmapped_area to top of address space
We currently place mmaps just below the stack on 32bit, but leave them
in the middle of the address space on 64bit:

00100000-00120000 r-xp 00100000 00:00 0                    [vdso]
10000000-10010000 r-xp 00000000 08:06 179534               /tmp/sleep
10010000-10020000 rw-p 00000000 08:06 179534               /tmp/sleep
10020000-10130000 rw-p 10020000 00:00 0                    [heap]
40000000000-40000030000 r-xp 00000000 08:06 440743         /lib64/ld-2.9.so
40000030000-40000040000 rw-p 00020000 08:06 440743         /lib64/ld-2.9.so
40000050000-400001f0000 r-xp 00000000 08:06 440671         /lib64/libc-2.9.so
400001f0000-40000200000 r--p 00190000 08:06 440671         /lib64/libc-2.9.so
40000200000-40000220000 rw-p 001a0000 08:06 440671         /lib64/libc-2.9.so
40000220000-40008230000 rw-p 40000220000 00:00 0
fffffbc0000-fffffd10000 rw-p fffffeb0000 00:00 0           [stack]

Right now it isn't an issue, but at some stage we will run into mmap or
hugetlb allocation issues. Using the same layout as 32bit gives us a
some breathing room. This matches what x86-64 is doing too.

00100000-00103000 r-xp 00100000 00:00 0                    [vdso]
10000000-10001000 r-xp 00000000 08:06 554894               /tmp/test
10010000-10011000 r--p 00000000 08:06 554894               /tmp/test
10011000-10012000 rw-p 00001000 08:06 554894               /tmp/test
10012000-10113000 rw-p 10012000 00:00 0                    [heap]
fffefdf7000-ffff7df8000 rw-p fffefdf7000 00:00 0
ffff7df8000-ffff7f97000 r-xp 00000000 08:06 130591         /lib64/libc-2.9.so
ffff7f97000-ffff7fa6000 ---p 0019f000 08:06 130591         /lib64/libc-2.9.so
ffff7fa6000-ffff7faa000 r--p 0019e000 08:06 130591         /lib64/libc-2.9.so
ffff7faa000-ffff7fc0000 rw-p 001a2000 08:06 130591         /lib64/libc-2.9.so
ffff7fc0000-ffff7fc4000 rw-p ffff7fc0000 00:00 0
ffff7fc4000-ffff7fec000 r-xp 00000000 08:06 130663         /lib64/ld-2.9.so
ffff7fee000-ffff7ff0000 rw-p ffff7fee000 00:00 0
ffff7ffa000-ffff7ffb000 rw-p ffff7ffa000 00:00 0
ffff7ffb000-ffff7ffc000 r--p 00027000 08:06 130663         /lib64/ld-2.9.so
ffff7ffc000-ffff7fff000 rw-p 00028000 08:06 130663         /lib64/ld-2.9.so
ffff7fff000-ffff8000000 rw-p ffff7fff000 00:00 0
fffffc59000-fffffc6e000 rw-p ffffffeb000 00:00 0           [stack]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:00:07 +11:00
Milton Miller
8b16cd238d powerpc/numa: Remove redundant find_cpu_node()
Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in
a less restrictive section (text vs cpuinit).

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:37:59 +11:00
Milton Miller
20fcefe5a0 powerpc/numa: Avoid possible reference beyond prop. length in find_min_common_depth()
find_min_common_depth() was checking the property length incorrectly.
The value is in bytes not cells, and it is using the second entry.

Signed-off-By: Milton Miller <miltonm@bga.com>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:37:58 +11:00
Benjamin Herrenschmidt
edbc29d76d Merge commit 'kumar/next' into next 2009-02-11 13:37:44 +11:00
Kumar Gala
6c24b17453 powerpc/fsl-booke: Fix mapping functions to use phys_addr_t
Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t
instead of unsigned long.  In 36-bit physical mode we really need these
functions to deal with phys_addr_t when trying to match a physical
address or when returning one.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-09 21:11:55 -06:00
Trent Piepho
96051465fd powerpc/fsl-booke: Make CAM entries used for lowmem configurable
On booke processors, the code that maps low memory only uses up to three
CAM entries, even though there are sixteen and nothing else uses them.

Make this number configurable in the advanced options menu along with max
low memory size.  If one wants 1 GB of lowmem, then it's typically
necessary to have four CAM entries.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-28 18:16:54 -06:00
Trent Piepho
c8f3570b7e powerpc/fsl-booke: Allow larger CAM sizes than 256 MB
The code that maps kernel low memory would only use page sizes up to 256
MB.  On E500v2 pages up to 4 GB are supported.

However, a page must be aligned to a multiple of the page's size.  I.e.
256 MB pages must aligned to a 256 MB boundary.  This was enforced by a
requirement that the physical and virtual addresses of the start of lowmem
be aligned to 256 MB.  Clearly requiring 1GB or 4GB alignment to allow
pages of that size isn't acceptable.

To solve this, I simply have adjust_total_lowmem() take alignment into
account when it decides what size pages to use.  Give it PAGE_OFFSET =
0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map
pages like this:
PA 0x3000_0000 VA 0x7000_0000 Size 256 MB
PA 0x4000_0000 VA 0x8000_0000 Size 1 GB
PA 0x8000_0000 VA 0xC000_0000 Size 256 MB
PA 0x9000_0000 VA 0xD000_0000 Size 256 MB
PA 0xA000_0000 VA 0xE000_0000 Size 256 MB

Because the lowmem mapping code now takes alignment into account,
PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB.  Even lower might be
possible.  The lowmem code will work down to 4 kB but it's possible some of
the boot code will fail before then.  Poor alignment will force small pages
to be used, which combined with the limited number of TLB1 pages available,
will result in very little memory getting mapped.  So alignments less than
64 MB probably aren't very useful anyway.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-28 18:16:53 -06:00
Trent Piepho
f88747e7f6 powerpc/fsl-booke: Remove code duplication in lowmem mapping
The code to map lowmem uses three CAM aka TLB[1] entries to cover it.  The
size of each is stored in three globals named __cam0, __cam1, and __cam2.
All the code that uses them is duplicated three times for each of the three
variables.

We have these things called arrays and loops....

Once converted to use an array, it will be easier to make the number of
CAMs configurable.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-28 18:16:51 -06:00
Gerhard Pircher
4c456a67f5 powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup code
_PAGE_COHERENT is now always set in _PAGE_RAM resp. PAGE_KERNEL.
Thus it has to be masked out, if the BAT mapping should be non
cacheable or CPU_FTR_NEED_COHERENT is not set.

This will work on normal SMP setups because we force-set
CPU_FTR_NEED_COHERENT as part of CPU_FTR_COMMON on SMP.

Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-28 17:15:52 +11:00
Dave Kleikamp
9ba0fdbfae powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices
powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices

The subpage_prot syscall fails on second and subsequent calls for a given
region, because is_hugepage_only_range() is mis-identifying the 4 kB
slices when the process has a 64 kB page size.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-16 16:15:16 +11:00
Ingo Molnar
fe333321e2 powerpc: Change u64/s64 to a long long integer type
Convert arch/powerpc/ over to long long based u64:

 -#ifdef __powerpc64__
 -# include <asm-generic/int-l64.h>
 -#else
 -# include <asm-generic/int-ll64.h>
 -#endif
 +#include <asm-generic/int-ll64.h>

This will avoid reoccuring spurious warnings in core kernel code that
comes when people test on their own hardware. (i.e. x86 in ~98% of the
cases) This is what x86 uses and it generally helps keep 64-bit code
32-bit clean too.

[Adjusted to not impact user mode (from paulus) - sfr]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-13 14:47:59 +11:00
Benjamin Herrenschmidt
30aae739a9 Merge commit 'kumar/kumar-next' into next 2009-01-13 13:59:03 +11:00
Anton Vorontsov
7021d86afa powerpc/mm: Make clear_fixmap() actually work
The clear_fixmap() routine issues map_page() with flags set to 0.
Currently this causes a BUG_ON() inside the map_page(), as it assumes
that a PTE should be clear before mapping.

This patch makes the map_page() to trigger the BUG_ON() only if the
flags were set.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:17 +11:00
Benjamin Herrenschmidt
4a0826824b powerpc: Fix missing semicolons in mmu_decl.h
This is a brown paper bag from one of my earlier patches that
breaks build on 40x and 8xx.

And yes, I've now added 40x and 8xx to my list of test configs :-)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:17 +11:00
Dave Liu
d6a09e0cd6 powerpc: Remove the redundant _tlbil_pid at SMP case
Signed-off-by: Dave Liu <daveliu@freescale.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:13 +11:00
Dave Hansen
893473df78 powerpc/mm: Cleanup careful_allocation(): consolidate memset()
Both users of careful_allocation() immediately memset() the
result.  So, just do it in one place.

Also give careful_allocation() a 'z' prefix to bring it in
line with kzmalloc() and friends.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:09 +11:00
Dave Hansen
0be210fd66 powerpc/mm: Make careful_allocation() return virtual addrs
Since we memset() the result in both of the uses here,
just make careful_alloc() return a virtual address.
Also, add a separate variable to store the physial
address that comes back from the lmb_alloc() functions.
This makes it less likely that someone will screw it up
forgetting to convert before returning since the vaddr
is always in a void* and the paddr is always in an
unsigned long.

I admit this is arbitrary since one of its users needs
a paddr and one a vaddr, but it does remove a good
number of casts.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:08 +11:00
Dave Hansen
5d21ea2b0e powerpc/mm:: Cleanup careful_allocation(): bootmem already panics
If we fail a bootmem allocation, the bootmem code itself
panics.  No need to redo it here.

Also change the wording of the other panic.  We don't
strictly have to allocate memory on the specified node.
It is just a hint and that node may not even *have* any
memory on it.  In that case we can and do fall back to
other nodes.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:08 +11:00
Dave Hansen
c555e520ef powerpc/mm: Add better comment on careful_allocation()
The behavior in careful_allocation() really confused me
at first.  Add a comment to hopefully make it easier
on the next doofus that looks at it.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:08 +11:00
Trent Piepho
6fd8be4bf7 powerpc/fsl-booke: Remove num_tlbcam_entries
This is a global variable defined in fsl_booke_mmu.c with a value that gets
initialized in assembly code in head_fsl_booke.S.

It's never used.

If some code ever does want to know the number of entries in TLB1, then
"numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a
global initialized during kernel boot from assembly.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 15:33:07 -06:00
Trent Piepho
19f5465e82 powerpc/fsl-booke: Don't hard-code size of struct tlbcam
Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam
to 20 when it indexed the TLBCAM table.  Anyone changing the size of struct
tlbcam would not know to expect that.

The kernel already has a system to get the size of C structures into
assembly language files, asm-offsets, so let's use it.

The definition of the struct gets moved to a header, so that asm-offsets.c
can include it.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 15:33:06 -06:00
Gary Hade
c04fc586c1 mm: show node to memory section relationship with symlinks in sysfs
Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX.  For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
  - Provides information needed to determine the specific node
    on which a defective DIMM is located.  This will reduce system
    downtime when the node or defective DIMM is swapped out.
  - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM.  This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node.  The consequences of reintroducing the defective memory
    could be ugly.
  - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
Future:
  - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems.  Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:00 -08:00
Mel Gorman
3340289ddf mm: report the MMU pagesize in /proc/pid/smaps
The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
kernel to back a VMA.  This matches the size used by the MMU in the
majority of cases.  However, one counter-example occurs on PPC64 kernels
whereby a kernel using 64K as a base pagesize may still use 4K pages for
the MMU on older processor.  To distinguish, this patch reports
MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:58:58 -08:00
Linus Torvalds
3c92ec8ae9 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits)
  powerpc/44x: Support 16K/64K base page sizes on 44x
  powerpc: Force memory size to be a multiple of PAGE_SIZE
  powerpc/32: Wire up the trampoline code for kdump
  powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
  powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
  powerpc/32: Setup OF properties for kdump
  powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
  powerpc: Prepare xmon_save_regs for use with kdump
  powerpc: Remove default kexec/crash_kernel ops assignments
  powerpc: Make default kexec/crash_kernel ops implicit
  powerpc: Setup OF properties for ppc32 kexec
  powerpc/pseries: Fix cpu hotplug
  powerpc: Fix KVM build on ppc440
  powerpc/cell: add QPACE as a separate Cell platform
  powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
  powerpc/mpc5200: fix error paths in PSC UART probe function
  powerpc/mpc5200: add rts/cts handling in PSC UART driver
  powerpc/mpc5200: Make PSC UART driver update serial errors counters
  powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver
  powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
  ...

Fix trivial conflict in drivers/char/Makefile as per Paul's directions
2008-12-28 16:54:33 -08:00
Ilya Yanok
ca9153a3a2 powerpc/44x: Support 16K/64K base page sizes on 44x
This adds support for 16k and 64k page sizes on PowerPC 44x processors.

The PGDIR table is much smaller than a page when using 16k or 64k
pages (512 and 32 bytes respectively) so we allocate the PGDIR with
kzalloc() instead of __get_free_pages().

One PTE table covers rather a large memory area when using 16k or 64k
pages (32MB or 512MB respectively), so we can easily put FIXMAP and
PKMAP in the area covered by one PTE table.

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Vladimir Panfilov <pvr@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-29 09:53:25 +11:00
James Morris
cbacc2c7f0 Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
Dale Farnsworth
ccdcef72c2 powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
Add the ability for a classic ppc kernel to be loaded at an address
of 32MB.  This done by fixing a few places that assume we are loaded
at address 0, and by changing several uses of KERNELBASE to use
PAGE_OFFSET, instead.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:29 +11:00
Anton Vorontsov
01695a9687 powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
While for debugging it is good to catch bogus users of ioremap, though
for kdump support it is more convenient to use __ioremap for
copy_oldmem_page() (exactly as we do for PPC64 currently).

Note that copy_oldmem_page() calls __ioremap with flags set to '0',
so it should be safe with the regard to the caches.

The other option is to use kmap_atomic_pfn()[1], but it will not work
for kernels compiled without HIGHMEM.

That is, on a board with 256MB RAM and crashkernel=64M@32M case, the
!HIGHMEM capturing kernel maps 0-96M range, which does not include all
the memory needed to capture the dump. And, obviously, accessing
anything upper than 96M will cause faults.

[1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046747.html

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:29 +11:00
Benjamin Herrenschmidt
a14953597b powerpc: Fix missing 'blr' in _tlbia()
Rework to MMU code dropped a much missed 'blr' instruction.

Brown-Paper-Bag-Worn-By: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-12-21 02:54:25 -07:00
Benjamin Herrenschmidt
64b3d0e812 powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
in the hash code based on some CPU feature bit.  We also manipulate
_PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.

This changes the logic so that instead, the PTE now contains
_PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
that need it.  The hash code clears it if the feature bit is not set.

It also adds some clean accessors to setup various valid combinations
of access flags and change various bits of code to use them instead.

This should help having the PTE actually containing the bit
combinations that we really want.

I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
set it explicitely from the TLB miss.  I will ultimately remove it
completely as it appears that it might not be needed after all
but in the meantime, having it in the TLB miss makes things a
lot easier.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
7752035180 powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs
This makes the MMU context code used for CPUs with no hash table
(except 603) dynamically allocate the various maps used to track
the state of contexts.

Only the main free map and CPU 0 stale map are allocated at boot
time.  Other CPU maps are allocated when those CPUs are brought up
and freed if they are unplugged.

This also moves the initialization of the MMU context management
slightly later during the boot process, which should be fine as
it's really only needed when userland if first started anyways.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
760ec0e02d powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440
The handlers for Critical, Machine Check or Debug interrupts
will save and restore MMUCR nowadays, thus we only need to
disable normal interrupts when invalidating TLB entries.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
2a4aca1144 powerpc/mm: Split low level tlb invalidate for nohash processors
Currently, the various forms of low level TLB invalidations are all
implemented in misc_32.S for 32-bit processors, in a fairly scary
mess of #ifdef's and with interesting duplication such as a whole
bunch of code for FSL _tlbie and _tlbia which are no longer used.

This moves things around such that _tlbie is now defined in
hash_low_32.S and is only used by the 32-bit hash code, and all
nohash CPUs use the various _tlbil_* forms that are now moved to
a new file, tlb_nohash_low.S.

I moved all the definitions for that stuff out of
include/asm/tlbflush.h as they are really internal mm stuff, into
mm/mmu_decl.h

The code should have no functional changes.  I kept some variants
inline for trivial forms on things like 40x and 8xx.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
f048aace29 powerpc/mm: Add SMP support to no-hash TLB handling
This commit moves the whole no-hash TLB handling out of line into a
new tlb_nohash.c file, and implements some basic SMP support using
IPIs and/or broadcast tlbivax instructions.

Note that I'm using local invalidations for D->I cache coherency.

At worst, if another processor is trying to execute the same and
has the old entry in its TLB, it will just take a fault and re-do
the TLB flush locally (it won't re-do the cache flush in any case).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
7c03d653cd powerpc/mm: Introduce MMU features
We're soon running out of CPU features and I need to add some new
ones for various MMU related bits, so this patch separates the MMU
features from the CPU features.  I moved over the 32-bit MMU related
ones, added base features for MMU type families, but didn't move
over any 64-bit only feature yet.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt
2ca8cf7389 powerpc/mm: Rework context management for CPUs with no hash table
This reworks the context management code used by 4xx,8xx and
freescale BookE.  It adds support for SMP by implementing a
concept of stale context map to lazily flush the TLB on
processors where a context may have been invalidated.  This
also contains the ground work for generalizing such lazy TLB
flushing by just picking up a new PID and marking the old one
stale.  This will be implemented later.

This is a first implementation that uses a global spinlock.

Ideally, we should try to get at least the fast path (context ID
already assigned) lockless or limited to a per context lock,
but for now this will do.

I tried to keep the UP case reasonably simple to avoid adding
too much overhead to 8xx which does a lot of context stealing
since it effectively has only 16 PIDs available.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt
5e696617c4 powerpc/mm: Split mmu_context handling
This splits the mmu_context handling between 32-bit hash based
processors, 64-bit hash based processors and everybody else.  This is
preliminary work for adding SMP support for BookE processors.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt
f63837f058 powerpc/mm: Remove flush_HPTE()
The function flush_HPTE() is used in only one place, the implementation
of DEBUG_PAGEALLOC on ppc32.

It's actually a dup of flush_tlb_page() though it's -slightly- more
efficient on hash based processors.  We remove it and replace it by
a direct call to the hash flush code on those processors and to
flush_tlb_page() for everybody else.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 15:53:34 +11:00
Benjamin Herrenschmidt
e41e811a79 powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c
This renames the files to clarify the fact that they are used by
the hash based family of CPUs (the 603 being an exception in that
family but is still handled by that code).

This paves the way for the new tlb_nohash.c coming via a subsequent
commit.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 15:53:30 +11:00
Paul Mackerras
1e1c568d6c Merge branch 'merge' into next 2008-12-16 14:38:58 +11:00
Dave Hansen
a4c74ddd5e powerpc: Fix bootmem reservation on uninitialized node
careful_allocation() was calling into the bootmem allocator for
nodes which had not been fully initialized and caused a previous
bug:  http://patchwork.ozlabs.org/patch/10528/  So, I merged a
few broken out loops in do_init_bootmem() to fix it.  That changed
the code ordering.

I think this bug is triggered by having reserved areas for a node
which are spanned by another node's contents.  In the
mark_reserved_regions_for_nid() code, we attempt to reserve the
area for a node before we have allocated the NODE_DATA() for that
nid.  We do this since I reordered that loop.  I suck.

This is causing crashes at bootup on some systems, as reported
by Jon Tollefson.

This may only present on some systems that have 16GB pages
reserved.  But, it can probably happen on any system that is
trying to reserve large swaths of memory that happen to span other
nodes' contents.

This commit ensures that we do not touch bootmem for any node which
has not been initialized, and also removes a compile warning about
an unused variable.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 13:48:18 +11:00
Brian King
48f797de55 powerpc: Check for valid hugepage size in hugetlb_get_unmapped_area
It looks like most of the hugetlb code is doing the correct thing if
hugepages are not supported, but the mmap code is not.  If we get into
the mmap code when hugepages are not supported, such as in an LPAR
which is running Active Memory Sharing, we can oops the kernel.  This
fixes the oops being seen in this path.

oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: nfs(N) lockd(N) nfs_acl(N) sunrpc(N) ipv6(N) fuse(N) loop(N)
dm_mod(N) sg(N) ibmveth(N) sd_mod(N) crc_t10dif(N) ibmvscsic(N)
scsi_transport_srp(N) scsi_tgt(N) scsi_mod(N)
Supported: No
NIP: c000000000038d60 LR: c00000000003945c CTR: c0000000000393f0
REGS: c000000077e7b830 TRAP: 0300   Tainted: G
(2.6.27.5-bz50170-2-ppc64)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 44000448  XER: 20000001
DAR: c000002000af90a8, DSISR: 0000000040000000
TASK = c00000007c1b8600[4019] 'hugemmap01' THREAD: c000000077e78000 CPU: 6
GPR00: 0000001fffffffe0 c000000077e7bab0 c0000000009a4e78 0000000000000000
GPR04: 0000000000010000 0000000000000001 00000000ffffffff 0000000000000001
GPR08: 0000000000000000 c000000000af90c8 0000000000000001 0000000000000000
GPR12: 000000000000003f c000000000a73880 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000010000
GPR20: 0000000000000000 0000000000000003 0000000000010000 0000000000000001
GPR24: 0000000000000003 0000000000000000 0000000000000001 ffffffffffffffb5
GPR28: c000000077ca2e80 0000000000000000 c00000000092af78 0000000000010000
NIP [c000000000038d60] .slice_get_unmapped_area+0x6c/0x4e0
LR [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
Call Trace:
[c000000077e7bbc0] [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
[c000000077e7bc30] [c000000000107e30] .get_unmapped_area+0x64/0xd8
[c000000077e7bcb0] [c00000000010b140] .do_mmap_pgoff+0x140/0x420
[c000000077e7bd80] [c00000000000bf5c] .sys_mmap+0xc4/0x140
[c000000077e7be30] [c0000000000086b4] syscall_exit+0x0/0x40
Instruction dump:
fac1ffb0 fae1ffb8 fb01ffc0 fb21ffc8 fb41ffd0 fb61ffd8 fb81ffe0 fbc1fff0
fbe1fff8 f821fef1 f8c10158 f8e10160 <7d49002e> f9010168 e92d01b0 eb4902b0

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 13:48:18 +11:00
James Morris
ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
Kumar Gala
0186f47e70 powerpc: Use RCU based pte freeing mechanism for all powerpc
Refactor the RCU based pte free code that was used on ppc64 to be used
on all powerpc.

Additionally refactor pte_free() & pte_free_kernel() into common code
between ppc32 & ppc64.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-03 20:46:35 +11:00
Kumar Gala
f4f3a1261a powerpc: hash_page_sync should only be used on SMP & STD_MMU_32
Clean up the ifdefs so we only use hash_page_sync if we have
CONFIG_SMP && CONFIG_PPC_STD_MMU_32.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-03 20:46:35 +11:00
Paul Mackerras
5274918855 Merge branch 'merge' 2008-12-03 20:11:06 +11:00
Linus Torvalds
03cfdb86ac Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc: Fix system calls on Cell entered with XER.SO=1
  powerpc/cell: Fix GDB watchpoints, again
  powerpc/mpic: Don't reset affinity for secondary MPIC on boot
  powerpc/cell/axon-msi: Retry on missing interrupt
  powerpc: Fix boot freeze on machine with empty memory node
  powerpc: Fix IRQ assignment for some PCIe devices
  powerpc/spufs: Fix spinning in spufs_ps_fault on signal
  powerpc/mpc832x_rdb: fix swapped ethernet ids
  powerpc: Use generic PHY driver for Marvell 88E1111 PHY on GE Fanuc SBC610
  powerpc/85xx: L2 cache size wrong in 8572DS dts
  powerpc/virtex: Update defconfigs
  powerpc/52xx: update defconfigs
  xsysace: Fix driver to use resource_size_t instead of unsigned long
  powerpc/virtex: fix various format/casting printk mismatches
  powerpc/mpc5200: fix bestcomm Kconfig dependencies
  powerpc/44x: Fix 460EX/460GT machine check handling
  powerpc/40x: Limit allocable DRAM during early mapping
2008-11-30 16:44:18 -08:00
Dave Hansen
4a6186696e powerpc: Fix boot freeze on machine with empty memory node
I got a bug report about a distro kernel not booting on a particular
machine.  It would freeze during boot:

> ...
> Could not find start_pfn for node 1
> [boot]0015 Setup Done
> Built 2 zonelists in Node order, mobility grouping on.  Total pages: 123783
> Policy zone: DMA
> Kernel command line:
> [boot]0020 XICS Init
> [boot]0021 XICS Done
> PID hash table entries: 4096 (order: 12, 32768 bytes)
> clocksource: timebase mult[7d0000] shift[22] registered
> Console: colour dummy device 80x25
> console handover: boot [udbg0] -> real [hvc0]
> Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes)
> Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes)
> freeing bootmem node 0

I've reproduced this on 2.6.27.7.  It is caused by commit
8f64e1f2d1 ("powerpc: Reserve in bootmem
lmb reserved regions that cross NUMA nodes").

The problem is that Jon took a loop which was (in pseudocode):

	for_each_node(nid)
		NODE_DATA(nid) = careful_alloc(nid);
		setup_bootmem(nid);
		reserve_node_bootmem(nid);

and broke it up into:

	for_each_node(nid)
		NODE_DATA(nid) = careful_alloc(nid);
		setup_bootmem(nid);
	for_each_node(nid)
		reserve_node_bootmem(nid);

The issue comes in when the 'careful_alloc()' is called on a node with
no memory.  It falls back to using bootmem from a previously-initialized
node.  But, bootmem has not yet been reserved when Jon's patch is
applied.  It gives back bogus memory (0xc000000000000000) and pukes
later in boot.

The following patch collapses the loop back together.  It also breaks
the mark_reserved_regions_for_nid() code out into a function and adds
some comments.  I think a huge part of introducing this bug is because
for loop was too long and hard to read.

The actual bug fix here is the:

+		if (end_pfn <= node->node_start_pfn ||
+		    start_pfn >= node_end_pfn)
+			continue;

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-01 09:40:18 +11:00
Al Viro
4ea8fb9c1c powerpc set_huge_psize() false positive
called only from __init, calls __init.  Incidentally, it ought to be static
in file.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:35 -08:00
Robert Jennings
a6326e98a2 powerpc: Correct page-in counter for CMM with 64k pages
Linux will report the number of page-ins so that the hypervisor can
better determine partition memory pressure.  The hardware page size
and the OS page size can be different.  In the case where the hardware
page size is 4k and the OS is running with 64k pages the code in
commit 409001948d ("powerpc: Update
page-in counter for CMM") would under-report the number of pages.

This corrects the reporting to the hypervisor by incrementing the
page_in count by 1 << PAGE_FACTOR each time.

Reported-by: Andrew Theurer <habanero@linux.vnet.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-19 16:05:05 +11:00
David Howells
1330deb0f6 CRED: Wrap task credential accesses in the PowerPC arch
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:39 +11:00
Grant Erickson
5907630ffc powerpc/40x: Limit allocable DRAM during early mapping
If the size of DRAM is not an exact power of two, we may not have
covered DRAM in its entirety with large 16 and 4 MiB pages.  If that
is the case, we can get non-recoverable page faults when doing the
final PTE mappings for the non-large page PTEs.

Consequently, we restrict the top end of DRAM currently allocable
by updating '__initial_memory_limit_addr' so that calls to the LMB to
allocate PTEs for "tail" coverage with normal-sized pages (or other
reasons) do not attempt to allocate outside the allowed range.

Signed-off-by: Grant Erickson <gerickson@nuovations.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2008-11-13 10:10:56 -05:00
Jon Tollefson
7d4320f3d5 powerpc: Hugetlb pgtable cache access cleanup
Andrew Morton suggested that using a macro that makes an array
reference look like a function call makes it harder to understand the
code.

This therefore removes the huge_pgtable_cache(psize) macro and
replaces its uses with pgtable_cache[HUGE_PGTABLE_INDEX(psize)].

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-06 09:49:39 +11:00
Brian King
409001948d powerpc: Update page-in counter for CMM
A new field has been added to the VPA as a method for the client OS to
communicate to firmware the number of page-ins it is performing when
running collaborative memory overcommit.  The hypervisor will use this
information to better determine if a partition is experiencing memory
pressure and needs more memory allocated to it.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05 22:08:28 +11:00
Jon Tollefson
4792adbac9 powerpc: Don't use a 16G page if beyond mem= limits
If mem= is used on the boot command line to limit memory then the memory block where a 16G page resides may not be available.

Thanks to Michael Ellerman for finding the problem.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-22 15:01:21 +11:00
Benjamin Herrenschmidt
a02efb906d Merge commit 'origin' into master
Manual merge of:

	arch/powerpc/Kconfig
	arch/powerpc/include/asm/page.h
2008-10-21 15:52:04 +11:00
Milton Miller
fe55249d17 powerpc: Always trim numa memory to lmb_end_of_DRAM()
numa_enforce_memory_limit tried to be smart and only call lmb_end_of_DRAM
when a memory limit was set via mem= on the command line.  However,
the early boot code will also limit memory added to the lmb system
when iommu=off is specified.  When this happens, the page allocator
is given pages not in the linear mapping and this results in a fatal
data reference to the unmapped page.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-21 15:19:12 +11:00
Jon Tollefson
e81703724a powerpc/numa: Make memory reserve code more robust
Adjust amount to reserve based on previous nodes for reserves spanning
multiple nodes. Check if the node active range is empty before attempting
to pass the reserve to bootmem.  In practice the range shouldn't be empty,
but to be sure we check.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-21 15:17:48 +11:00
Badari Pulavarty
71088785c6 mm: cleanup to make remove_memory() arch-neutral
There is nothing architecture specific about remove_memory().
remove_memory() function is common for all architectures which support
hotplug memory remove.  Instead of duplicating it in every architecture,
collapse them into arch neutral function.

[akpm@linux-foundation.org: fix the export]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:50:25 -07:00
David Gibson
f5ea64dcba powerpc: Get USE_STRICT_MM_TYPECHECKS working again
The typesafe version of the powerpc pagetable handling (with
USE_STRICT_MM_TYPECHECKS defined) has bitrotted again.  This patch
makes a bunch of small fixes to get it back to building status.

It's still not enabled by default as gcc still generates worse
code with it for some reason.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-14 10:35:27 +11:00
Jon Tollefson
8f64e1f2d1 powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes
If there are multiple reserved memory blocks via lmb_reserve() that are
contiguous addresses and on different NUMA nodes we are losing track of which
address ranges to reserve in bootmem on which node.  I discovered this
when I recently got to try 16GB huge pages on a system with more then 2 nodes.

When scanning the device tree in early boot we call lmb_reserve() with
the addresses of the 16G pages that we find so that the memory doesn't
get used for something else.  For example the addresses for the pages
could be 4000000000, 4400000000, 4800000000, 4C00000000, etc - 8 pages,
one on each of eight nodes.  In the lmb after all the pages have been
reserved it will look something like the following:

lmb_dump_all:
    memory.cnt            = 0x2
    memory.size           = 0x3e80000000
    memory.region[0x0].base       = 0x0
                      .size     = 0x1e80000000
    memory.region[0x1].base       = 0x4000000000
                      .size     = 0x2000000000
    reserved.cnt          = 0x5
    reserved.size         = 0x3e80000000
    reserved.region[0x0].base       = 0x0
                      .size     = 0x7b5000
    reserved.region[0x1].base       = 0x2a00000
                      .size     = 0x78c000
    reserved.region[0x2].base       = 0x328c000
                      .size     = 0x43000
    reserved.region[0x3].base       = 0xf4e8000
                      .size     = 0xb18000
    reserved.region[0x4].base       = 0x4000000000
                      .size     = 0x2000000000

The reserved.region[0x4] contains the 16G pages.  In
arch/powerpc/mm/num.c: do_init_bootmem() we loop through each of the
node numbers looking for the reserved regions that belong to the
particular node.  It is not able to identify region 0x4 as being a part
of each of the 8 nodes.  It is assuming that a reserved region is only
on a single node.

This patch takes out the reserved region loop from inside
the loop that goes over each node.  It looks up the active region containing
the start of the reserved region.  If it extends past that active region then
it adjusts the size and gets the next active region containing it.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-10 15:55:19 +11:00
Roland Dreier
a880e76233 powerpc: Avoid integer overflow in page_is_ram()
Commit 8b150478 ("ppc: make phys_mem_access_prot() work with pfns
instead of addresses") fixed page_is_ram() in arch/ppc to avoid overflow
for addresses above 4G on 32-bit kernels.  However arch/powerpc's
page_is_ram() is missing the same fix -- it computes a physical address
by doing pfn << PAGE_SHIFT, which overflows if pfn corresponds to a page
above 4G.

In particular this causes pages above 4G to be mapped with the wrong
caching attribute; for example many ppc440-based SoCs have PCI space
above 4G, and mmap()ing MMIO space may end up with a mapping that has
caching enabled.

Fix this by working with the pfn and avoiding the conversion to
physical address that causes the overflow.  This patch compares the
pfn to max_pfn, which is a semantic change from the old code -- that
code compared the physical address to high_memory, which corresponds
to max_low_pfn.  However, I think that was is another bug, since
highmem pages are still RAM.

Reported-by: vb <vb@vsbe.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-07 14:26:18 +11:00
Becky Bruce
4ee7084eb1 POWERPC: Allow 32-bit hashed pgtable code to support 36-bit physical
This rearranges a bit of code, and adds support for
36-bit physical addressing for configs that use a
hashed page table.  The 36b physical support is not
enabled by default on any config - it must be
explicitly enabled via the config system.

This patch *only* expands the page table code to accomodate
large physical addresses on 32-bit systems and enables the
PHYS_64BIT config option for 86xx.  It does *not*
allow you to boot a board with more than about 3.5GB of
RAM - for that, SWIOTLB support is also required (and
coming soon).

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-09-24 16:29:44 -05:00
Becky Bruce
82331ab15f powerpc/85xx: fix build warning, remove silly cast
This fixes a build warning when PHYS_64BIT is enabled, and removes an
unnecessary cast to phys_addr_t (the variable being cast is already
a phys_addr_t)

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-09-16 10:01:35 -05:00
David Gibson
0b26425ce1 powerpc: Clean up hugepage pagetable allocation for powerpc with 16G pages
There is a small bug in the handling of 16G hugepages recently added
to the kernel.  This doesn't cause a crash or other user-visible
problems, but it does mean that more levels of pagetable are allocated
than makes sense for 16G pages.  The hugepage pagetables for the 16G
pages are allocated much lower in the pagetable tree than they should
be, with the intervening levels allocated with full pmd and pud pages
which will only ever have one entry filled in.

This corrects this problem, at the same time cleaning up the handling
of which level 64k versus 16M hugepage pagetables are allocated at.
The new way of formatting the tests should be more robust against
changes in pagetable structure, or any newly added hugepage sizes.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:08:47 -07:00
Becky Bruce
aaf4a9b0f7 powerpc: Rename PTE_SIZE to HPTE_SIZE
It's the size of the hardware PTE; make that clear in the name.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:08:42 -07:00
Paul Mackerras
549e8152de powerpc: Make the 64-bit kernel as a position-independent executable
This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
a position-independent executable (PIE) when it is set.  This involves
processing the dynamic relocations in the image in the early stages of
booting, even if the kernel is being run at the address it is linked at,
since the linker does not necessarily fill in words in the image for
which there are dynamic relocations.  (In fact the linker does fill in
such words for 64-bit executables, though not for 32-bit executables,
so in principle we could avoid calling relocate() entirely when we're
running a 64-bit kernel at the linked address.)

The dynamic relocations are processed by a new function relocate(addr),
where the addr parameter is the virtual address where the image will be
run.  In fact we call it twice; once before calling prom_init, and again
when starting the main kernel.  This means that reloc_offset() returns
0 in prom_init (since it has been relocated to the address it is running
at), which necessitated a few adjustments.

This also changes __va and __pa to use an equivalent definition that is
simpler.  With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
constants (for 64-bit) whereas PHYSICAL_START is a variable (and
KERNELBASE ideally should be too, but isn't yet).

With this, relocatable kernels still copy themselves down to physical
address 0 and run there.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:08:38 -07:00
Chandru
cf00085d80 powerpc: Add support for dynamic reconfiguration memory in kexec/kdump kernels
Kdump kernel needs to use only those memory regions that it is allowed
to use (crashkernel, rtas, tce, etc.).  Each of these regions have
their own sizes and are currently added under 'linux,usable-memory'
property under each memory@xxx node of the device tree.

The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory
node (on POWER6) now stores in it the representation for most of the
logical memory blocks with the size of each memory block being a
constant (lmb_size).  If one or more or part of the above mentioned
regions lie under one of the lmb from ibm,dynamic-memory property,
there is a need to identify those regions within the given lmb.

This makes the kernel recognize a new 'linux,drconf-usable-memory'
property added by kexec-tools.  Each entry in this property is of the
form of a count followed by that many (base, size) pairs for the above
mentioned regions.  The number of cells in the count value is given by
the #size-cells property of the root node.

Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:07:58 -07:00
Paul Mackerras
7e392f8c29 Merge branch 'linux-2.6' 2008-09-10 11:36:13 +10:00
Paul Mackerras
9e88ba4e45 powerpc: Only make kernel text pages of linear mapping executable
Commit bc033b63bb ("powerpc/mm: Fix
attribute confusion with htab_bolt_mapping()") moved the check for
whether we should make pages of the linear mapping executable from
htab_bolt_mapping into its callers, including htab_initialize.
A side-effect of this is that the decision is now made once for
each contiguous section in the LMB array rather than for each page
individually.  This can often mean that the whole of the linear
mapping ends up being executable.

This reverts to the previous behaviour, where individual pages are
checked for being part of the kernel text or not, by moving the check
back down into htab_bolt_mapping.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-03 20:53:22 +10:00
Tony Breeds
e16a9c0990 powerpc: Guard htab_dt_scan_hugepage_blocks appropriately
htab_dt_scan_hugepage_blocks is only used when CONFIG_HUGETLB_PAGE is
defined, so guard the declaration likewise.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-20 16:34:57 +10:00
Benjamin Herrenschmidt
bc033b63bb powerpc/mm: Fix attribute confusion with htab_bolt_mapping()
The function htab_bolt_mapping() is used to create permanent
mappings in the MMU hash table, for example, in order to create
the linear mapping of vmemmap.  It's also used by early boot
ioremap (before mem_init_done).

However, the way ioremap uses it is incorrect as it passes it the
protection flags in the "linux PTE" form while htab_bolt_mapping()
expects them in the hash table format.  This is made more confusing by
the fact that some of those flags are actually in the same position in
both cases.

This fixes it all by making htab_bolt_mapping() take normal linux
protection flags instead, and use a little helper to convert them to
htab flags. Callers can now use the usual PAGE_* definitions safely.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

 arch/powerpc/include/asm/mmu-hash64.h |    2 -
 arch/powerpc/mm/hash_utils_64.c       |   65 ++++++++++++++++++++--------------
 arch/powerpc/mm/init_64.c             |    9 +---
 3 files changed, 44 insertions(+), 32 deletions(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-11 10:09:56 +10:00
Tony Breeds
c7c8eede27 powerpc: Force printing of 'total_memory' to unsigned long long
total_memory is a 'phys_addr_t', Which can be either 64 or 32 bits.
Force printing as unsigned long long to silence the warning.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-04 13:18:17 +10:00
Tony Breeds
fb61063587 powerpc: Fix compiler warning in arch/powerpc/mm/mem.c
Explicitly cast to unsigned long long, rather than u64.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-04 13:18:17 +10:00
Stephen Rothwell
b8b572e101 powerpc: Move include files to arch/powerpc/include/asm
from include/asm-powerpc.  This is the result of a

mkdir arch/powerpc/include/asm
git mv include/asm-powerpc/* arch/powerpc/include/asm

Followed by a few documentation/comment fixups and a couple of places
where <asm-powepc/...> was being used explicitly.  Of the latter only
one was outside the arch code and it is a driver only built for powerpc.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-04 12:02:00 +10:00
Nick Piggin
ce0ad7f095 powerpc/mm: Lockless get_user_pages_fast() for 64-bit (v3)
Implement lockless get_user_pages_fast for 64-bit powerpc.

Page table existence is guaranteed with RCU, and speculative page references
are used to take a reference to the pages without having a prior existence
guarantee on them.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-30 15:26:54 +10:00
Benjamin Herrenschmidt
00df438e89 powerpc: Disable 64K hugetlb support when doing 64K SPU mappings
The 64K SPU local store mapping feature is incompatible with the
64K huge pages support due to the inability of some parts of
the memory management to differenciate between them while they
use a different page table format.

For now, disable 64K huge pages when CONFIG_SPU_FS_64K_LS,
in the long run, this can be fixed by making this feature use
the hugetlb page table format.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-28 16:30:53 +10:00
Johannes Weiner
bda2fa5355 powerpc: use generic show_mem()
Remove arch-specific show_mem() in favor of the generic version.

This also removes the following redundant information display:

	- pages in swapcache, printed by show_swap_cache_info()

where show_mem() calls show_free_areas(), which calls
show_swap_cache_info().

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:11 -07:00
Alexey Dobriyan
51cc50685a SL*B: drop kmem cache argument from constructor
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:07 -07:00
Luis Machado
d6a61bfc06 powerpc: BookE hardware watchpoint support
This patch implements support for HW based watchpoint via the
DBSR_DAC (Data Address Compare) facility of the BookE processors.

It does so by interfacing with the existing DABR breakpoint code
and adding the necessary bits and pieces for the new bits to
be properly set or cleared

Signed-off-by: Luis Machado <luisgpm@br.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-25 15:44:39 +10:00
Jon Tollefson
0d9ea75443 powerpc: support multiple hugepage sizes
Instead of using the variable mmu_huge_psize to keep track of the huge
page size we use an array of MMU_PAGE_* values.  For each supported huge
page size we need to know the hugepte_shift value and have a
pgtable_cache.  The hstate or an mmu_huge_psizes index is passed to
functions so that they know which huge page size they should use.

The hugepage sizes 16M and 64K are setup(if available on the hardware) so
that they don't have to be set on the boot cmd line in order to use them.
The number of 16G pages have to be specified at boot-time though (e.g.
hugepagesz=16G hugepages=5).

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Jon Tollefson
91224346aa powerpc: define support for 16G hugepages
The huge page size is defined for 16G pages.  If a hugepagesz of 16G is
specified at boot-time then it becomes the huge page size instead of the
default 16M.

The change in pgtable-64K.h is to the macro pte_iterate_hashed_subpages to
make the increment to va (the 1 being shifted) be a long so that it is not
shifted to 0.  Otherwise it would create an infinite loop when the shift
value is for a 16G page (when base page size is 64K).

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Jon Tollefson
658013e93e powerpc: scan device tree for gigantic pages
The 16G huge pages have to be reserved in the HMC prior to boot.  The
location of the pages are placed in the device tree.  This patch adds code
to scan the device tree during very early boot and save these page
locations until hugetlbfs is ready for them.

Acked-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Jon Tollefson
ec4b2c0c83 powerpc: function to allocate gigantic hugepages
The 16G page locations have been saved during early boot in an array.  The
alloc_bootmem_huge_page() function adds a page from here to the
huge_boot_pages list.

Acked-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Andi Kleen
ceb8687961 hugetlb: introduce pud_huge
Straight forward extensions for huge pages located in the PUD instead of
PMDs.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:18 -07:00
Andi Kleen
a551643895 hugetlb: modular state for hugetlb page size
The goal of this patchset is to support multiple hugetlb page sizes.  This
is achieved by introducing a new struct hstate structure, which
encapsulates the important hugetlb state and constants (eg.  huge page
size, number of huge pages currently allocated, etc).

The hstate structure is then passed around the code which requires these
fields, they will do the right thing regardless of the exact hstate they
are operating on.

This patch adds the hstate structure, with a single global instance of it
(default_hstate), and does the basic work of converting hugetlb to use the
hstate.

Future patches will add more hstate structures to allow for different
hugetlbfs mounts to have different page sizes.

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Jan Beulich
42b7772812 mm: remove double indirection on tlb parameter to free_pgd_range() & Co
The double indirection here is not needed anywhere and hence (at least)
confusing.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Benjamin Herrenschmidt
a1f242ff46 powerpc ioremap_prot
This adds ioremap_prot and pte_pgprot() so that one can extract protection
bits from a PTE and use them to ioremap_prot() (in order to support ptrace
of VM_IO | VM_PFNMAP as per Rik's patch).

This moves a couple of flag checks around in the ioremap implementations
of arch/powerpc.  There's a side effect of allowing non-cacheable and
non-guarded mappings on ppc32 which before would always have _PAGE_GUARDED
set whenever _PAGE_NO_CACHE is.

(standard ioremap will still set _PAGE_GUARDED, but ioremap_prot will be
capable of setting such a non guarded mapping).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Johannes Weiner
b61bfa3c46 mm: move bootmem descriptors definition to a single place
There are a lot of places that define either a single bootmem descriptor or an
array of them.  Use only one central array with MAX_NUMNODES items instead.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:14 -07:00
Benjamin Herrenschmidt
84c3d4aaec Merge commit 'origin/master'
Manual merge of:

	arch/powerpc/Kconfig
	arch/powerpc/kernel/stacktrace.c
	arch/powerpc/mm/slice.c
	arch/ppc/kernel/smp.c
2008-07-16 11:07:59 +10:00
Stefan Roese
2bf3016f89 powerpc: Fix problems with 32bit PPC's running with >= 4GB of RAM
This patch enables 32bit PPC's (with 36bit physical address space, e.g.
IBM/AMCC PPC44x) to run with >= 4GB of RAM. Mostly its just replacing types
(unsigned long -> phys_addr_t).

Tested on an AMCC Katmai with 4GB of DDR2.

Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2008-07-09 14:13:01 -04:00
Benjamin Herrenschmidt
1bc54c0311 powerpc: rework 4xx PTE access and TLB miss
This is some preliminary work to improve TLB management on SW loaded
TLB powerpc platforms. This introduce support for non-atomic PTE
operations in pgtable-ppc32.h and removes write back to the PTE from
the TLB miss handlers. In addition, the DSI interrupt code no longer
tries to fixup write permission, this is left to generic code, and
_PAGE_HWWRITE is gone.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2008-07-09 13:36:17 -04:00
Stephen Rothwell
392096e98f generic-ipi: fix linux-next tree build failure
Today's linux-next build (powerpc ppc64_defconfig) failed like this:

arch/powerpc/mm/tlb_64.c: In function 'pgtable_free_now':
arch/powerpc/mm/tlb_64.c:66: error: too many arguments to function 'smp_call_function'
arch/powerpc/kernel/machine_kexec_64.c: In function 'kexec_prepare_cpus':
arch/powerpc/kernel/machine_kexec_64.c:175: error: too many arguments to function 'smp_call_function'

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <linuxppc-dev@ozlabs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-03 09:25:42 +02:00
Nathan Fontenot
0db9360aaa powerpc/pseries: Update numa association of hotplug memory add for drconf memory
Update the association of a memory section with a numa node that
occurs during hotplug add of a memory section.  This adds a check in
the hot_add_scn_to_nid() routine for the
ibm,dynamic-reconfiguration-memory node in the device tree.  If
present the new hot_add_drconf_scn_to_nid() routine is invoked, which
can properly parse the ibm,dynamic-reconfiguration-memory node of the
device tree and make the proper numa node associations.

This also introduces the valid_hot_add_scn() routine as a helper
function for code that is common to the hot_add_scn_to_nid() and
hot_add_drconf_scn_to_nid() routines.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03 16:58:18 +10:00
Nathan Fontenot
8342681d3e powerpc/pseries: Split code into helper routines for drconf memory
This splits off several pieces of code that parse the
ibm,dynamic-reconfiguration-memory node of the device tree into separate
helper routines.  This is in preparation for the next commit that will
use these helper routines.  There are no functional changes in this patch.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03 16:58:17 +10:00
Tony Breeds
db7f37de2c powerpc: Fix building of arch/powerpc/mm/mem.o when MEMORY_HOTPLUG=y and SPARSEMEM=n
Currently the kernel fails to build with the above config options with:
  CC      arch/powerpc/mm/mem.o
arch/powerpc/mm/mem.c: In function 'arch_add_memory':
arch/powerpc/mm/mem.c:130: error: implicit declaration of function 'create_section_mapping'

This explicitly includes asm/sparsemem.h in arch/powerpc/mm/mem.c and
moves the guards in include/asm-powerpc/sparsemem.h to protect the
SPARSEMEM specific portions only.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03 16:58:07 +10:00
Dave Kleikamp
87e9ab13c3 powerpc: hash_huge_page() should get the WIMG bits from the lpte
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01 11:28:02 +10:00
Paul Mackerras
3a8247cc2c powerpc: Only demote individual slices rather than whole process
At present, if we have a kernel with a 64kB page size, and some
process maps something that has to be mapped with 4kB pages (such as a
cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair
pages), we change the process to use 4kB pages everywhere.  This hurts
the performance of HPC programs that access eHCA from userspace.

With this patch, the kernel will only demote the slice(s) containing
the eHCA or cache-inhibited mappings, leaving the remaining slices
able to use 64kB hardware pages.

This also changes the slice_get_unmapped_area code so that it is
willing to place a 64k-page mapping into (or across) a 4k-page slice
if there is no better alternative, i.e. if the program specified
MAP_FIXED or if there is not sufficient space available in slices that
are either empty or already have 64k-page mappings in them.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-01 11:27:57 +10:00
Becky Bruce
316a405841 powerpc: Get rid of bitfields in ppc_bat struct
While working on the 36-bit physical support, I noticed that there
was exactly one line of code that actually referenced the bitfields.
So I got rid of them and redefined ppc_bat as a struct of 2 u32's:
batu and batl.  I also got rid of the previous union that held the
bitfield structs and a word representation of the batu/l values.

This seems like a nicer solution than adding in a bunch of
new bitfields to support extended bat addressing that would never
get used, and just leaving the struct as-is would have been
incomplete in the face of large physical addressing.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-30 22:31:05 +10:00
Becky Bruce
7c5c4325d2 powerpc: Change BAT code to use phys_addr_t
Currently, the physical address is an unsigned long, but it should
be phys_addr_t in set_bat, [v/p]_mapped_by_bat.  Also, create a
macro that can convert a large physical address into the correct
format for programming the BAT registers.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-30 22:31:03 +10:00
Benjamin Herrenschmidt
41743a4e34 powerpc: Free a PTE bit on ppc64 with 64K pages
This frees a PTE bit when using 64K pages on ppc64.  This is done
by getting rid of the separate _PAGE_HASHPTE bit.  Instead, we just test
if any of the 16 sub-page bits is set.  For non-combo pages (ie. real
64K pages), we set SUB0 and the location encoding in that field.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-30 22:30:53 +10:00
Paul Mackerras
e9a4b6a3f6 Merge branch 'linux-2.6' 2008-06-30 10:16:50 +10:00
Jens Axboe
15c8b6c1aa on_each_cpu(): kill unused 'retry' parameter
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:38 +02:00
Paul Mackerras
65ba6cdc83 [POWERPC] Clear sub-page HPTE present bits when demoting page size
When we demote a slice from 64k to 4k, and we are about to insert an
HPTE for a 4k subpage and we notice that there is an existing 64k
HPTE, we first invalidate that HPTE before inserting the new 4k
subpage HPTE.  Since the bits that encode which hash bucket the old
HPTE was in overlap with the bits that encode which of the 16 subpages
have HPTEs, we need to clear out the subpage HPTE-present bits before
starting to insert HPTEs for the 4k subpages.  If we don't do that, we
can erroneously think that a subpage already has an HPTE when it
doesn't.

That in itself wouldn't be such a problem except that when we go to
update the HPTE that we think is present on machines with a
hypervisor, the hypervisor can tell us that the HPTE we think is there
is actually there even though it isn't, which can lead to a process
getting stuck in a loop, continually faulting.  The reason for the
confusion is that the AVPN (abbreviated virtual page number) we are
looking for in the HPTE for a 4k subpage can actually match the AVPN
in a stale HPTE for another 64k page.  For example, the HPTE for
the 4k subpage at 0x84000f000 will be in the same hash bucket and have
the same AVPN as the HPTE for the 64k page at 0x8400f0000.

This fixes the code to clear out the subpage HPTE-present bits.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-18 21:40:43 +10:00
Paul Mackerras
8a3e1c670e Merge branch 'merge'
Conflicts:

	arch/powerpc/sysdev/fsl_soc.c
2008-06-09 12:19:41 +10:00
Nathan Lynch
0d5799449f [POWERPC] Make walk_memory_resource available with MEMORY_HOTPLUG=n
The ehea driver was recently changed[1] to use walk_memory_resource() to
detect the system's memory layout.  However, walk_memory_resource() is
available only when memory hotplug is enabled.  So CONFIG_EHEA was
made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a
network driver to have such a dependency.

Make the declaration of walk_memory_resource() and its powerpc
implementation (ehea is powerpc-specific) unconditionally available.

[1] 48cfb14f8b
    "ehea: Add DLPAR memory remove support"

[2] fb7b6ca2b6
    "ehea: Add dependency to Kconfig"

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-09 11:32:41 +10:00
Paul Mackerras
acf464817d Merge branch 'merge' into powerpc-next 2008-05-23 16:53:23 +10:00
David Gibson
46a7417963 [POWERPC] Fix __set_fixmap() for STRICT_MM_TYPECHECKS
__set_fixmap() in pgtable_32.c currently fails to compile if
STRICT_MM_TYPECHECKS is defined.  This fixes it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-23 16:15:32 +10:00
Adrian Bunk
d3d3d3cdb1 [POWERPC] powerpc/mm/hash_low_32.S: Remove CVS keyword
This removes a CVS keyword that wasn't updated for a long time from a
comment.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-20 09:34:18 +10:00
Paul Mackerras
fcff474ea5 Merge branch 'linux-2.6' into powerpc-next 2008-05-16 23:13:42 +10:00
Benjamin Herrenschmidt
cec08e7a94 [POWERPC] vmemmap fixes to use smaller pages
This changes vmemmap to use a different region (region 0xf) of the
address space, and to configure the page size of that region
dynamically at boot.

The problem with the current approach of always using 16M pages is that
it's not well suited to machines that have small amounts of memory such
as small partitions on pseries, or PS3's.

In fact, on the PS3, failure to allocate the 16M page backing vmmemmap
tends to prevent hotplugging the HV's "additional" memory, thus limiting
the available memory even more, from my experience down to something
like 80M total, which makes it really not very useable.

The logic used by my match to choose the vmemmap page size is:

 - If 16M pages are available and there's 1G or more RAM at boot,
   use that size.
 - Else if 64K pages are available, use that
 - Else use 4K pages

I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages)
and it seems to work fine.

Note that I intend to change the way we organize the kernel regions &
SLBs so the actual region will change from 0xf back to something else at
one point, as I simplify the SLB miss handler, but that will be for a
later patch.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-15 20:49:25 +10:00
Michael Ellerman
c884116ac3 [POWERPC] Remove duplicate variable definitions in mm/tlb_64.c
Somewhere along the way (e28f7faf05,
"Four level pagetables for ppc64") we ended up with duplicate
definitions for pte_freelist_cur and pte_freelist_force_free.
Somehow this compiles, but it would be better to just have one
definition for each.

The two definitions we end up with can be static too!

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:31:49 +10:00
Michael Ellerman
572fb578de [POWERPC] Move declaration of tce variables into mmu-hash64.h
... instead of having extern declarations in a .c file.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:31:47 +10:00
Michael Ellerman
09de9ff872 [POWERPC] Fix sparse warnings in arch/powerpc/mm
Make two vmemmap helpers static in init_64.c
Make stab variables static in stab.c
Make psize defs static in hash_utils_64.c

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:31:46 +10:00
Michael Ellerman
5f25f06529 [POWERPC] Move declaration of init_bootmem_done into system.h
... instead of having an extern declaration in a .c file.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:31:44 +10:00
Paul Mackerras
3b5750644b [POWERPC] Bolt in SLB entry for kernel stack on secondary cpus
This fixes a regression reported by Kamalesh Bulabel where a POWER4
machine would crash because of an SLB miss at a point where the SLB
miss exception was unrecoverable.  This regression is tracked at:

http://bugzilla.kernel.org/show_bug.cgi?id=10082

SLB misses at such points shouldn't happen because the kernel stack is
the only memory accessed other than things in the first segment of the
linear mapping (which is mapped at all times by entry 0 of the SLB).
The context switch code ensures that SLB entry 2 covers the kernel
stack, if it is not already covered by entry 0.  None of entries 0
to 2 are ever replaced by the SLB miss handler.

Where this went wrong is that the context switch code assumes it
doesn't have to write to SLB entry 2 if the new kernel stack is in the
same segment as the old kernel stack, since entry 2 should already be
correct.  However, when we start up a secondary cpu, it calls
slb_initialize, which doesn't set up entry 2.  This is correct for
the boot cpu, where we will be using a stack in the kernel BSS at this
point (i.e. init_thread_union), but not necessarily for secondary
cpus, whose initial stack can be allocated anywhere.  This doesn't
cause any immediate problem since the SLB miss handler will just
create an SLB entry somewhere else to cover the initial stack.

In fact it's possible for the cpu to go quite a long time without SLB
entry 2 being valid.  Eventually, though, the entry created by the SLB
miss handler will get overwritten by some other entry, and if the next
access to the stack is at an unrecoverable point, we get the crash.

This fixes the problem by making slb_initialize create a suitable
entry for the kernel stack, if we are on a secondary cpu and the stack
isn't covered by SLB entry 0.  This requires initializing the
get_paca()->kstack field earlier, so I do that in smp_create_idle
where the current field is initialized.  This also abstracts a bit of
the computation that mk_esid_data in slb.c does so that it can be used
in slb_initialize.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-02 15:00:45 +10:00
Geoff Levand
bbea346062 [POWERPC] Fix slb.c compile warnings
Arrange for a syntax check to always be done on the powerpc/mm/slb.c
DBG() macro by defining it to pr_debug() for non-debug builds.

Also, fix these related compile warnings:

  slb.c:273: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int
  slb.c:274: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int'

Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-02 15:00:44 +10:00
Badari Pulavarty
9d88a2eb6e [POWERPC] Provide walk_memory_resource() for powerpc
Provide walk_memory_resource() for 64-bit powerpc.  PowerPC maintains
logical memory region mapping in the lmb.memory structure.  Walk
through these structures and do the callbacks for the contiguous
chunks.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:53 +10:00
Jeremy Fitzhardinge
180c06efce hotplug-memory: make online_page() common
All architectures use an effectively identical definition of online_page(), so
just make it common code.  x86-64, ia64, powerpc and sh are actually
identical; x86-32 is slightly different.

x86-32's differences arise because it puts its hotplug pages in the highmem
zone.  We can handle this in the generic code by inspecting the page to see if
its in highmem, and update the totalhigh_pages count appropriately.  This
leaves init_32.c:free_new_highpage with a single caller, so I folded it into
add_one_highpage_init.

I also removed an incorrect comment referring to the NUMA case; any NUMA
details have already been dealt with by the time online_page() is called.

[akpm@linux-foundation.org: fix indenting]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:17 -07:00
Kumar Gala
f608600e74 [POWERPC] Clean up access to thread_info in assembly
Use (31-THREAD_SHIFT) to get to thread_info from stack pointer.  This makes
the code a bit easier to read and more robust if we ever change THREAD_SHIFT.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24 20:58:02 +10:00
Kumar Gala
2c419bdeca [POWERPC] Port fixmap from x86 and use for kmap_atomic
The fixmap code from x86 allows us to have compile time virtual addresses
that we change the physical addresses of at run time.

This is useful for applications like kmap_atomic, PCI config that is done
via direct memory map, kexec/kdump.

We got ride of CONFIG_HIGHMEM_START as we can now determine a more optimal
location for PKMAP_BASE based on where the fixmap addresses start and
working back from there.

Additionally, the kmap code in asm-powerpc/highmem.h always had debug
enabled.  Moved to using CONFIG_DEBUG_HIGHMEM to determine if we should
have the extra debug checking.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24 20:58:02 +10:00
Kumar Gala
37dd2badcf [POWERPC] 85xx: Add support for relocatable kernel (and booting at non-zero)
Added support to allow an 85xx kernel to be run from a non-zero physical
address (useful for cooperative asymmetric multiprocessing situations and
kdump).  The support can be configured at compile time by setting
CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as
desired.

Alternatively, the kernel build can set CONFIG_RELOCATABLE.  Setting this
config option causes the kernel to determine at runtime the physical
addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START.  If
CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning.
However, CONFIG_PHYSICAL_START will always be used to set the LOAD program
header physical address field in the resulting ELF image.

Currently we are limited to running at a physical address that is a
multiple of 256M.  This is due to how we map TLBs to cover
lowmem.  This should be fixed to allow 64M or maybe even 16M alignment
in the future.  It is considered an error to try and run a kernel at a
non-aligned physical address.

All the magic for this support is accomplished by proper initialization
of the kernel memory subsystem and use of ARCH_PFN_OFFSET.

The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings.
ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET.

/dev/mem continues to allow access to any physical address in the system
regardless of how CONFIG_PHYSICAL_START is set.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24 20:58:01 +10:00
Michael Ellerman
6df1646e31 [POWERPC] Add include of linux/of.h to numa.c
numa.c requires routines declared in linux/of.h, so should include it.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24 20:57:32 +10:00
Olof Johansson
49a9997884 [POWERPC] Remove unused __max_memory variable
Remove the __max_memory variable, as it is not referenced anywhere
in the tree besides some code in arch/ppc.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-18 15:37:11 +10:00
Kumar Gala
7711684947 [POWERPC] Remove unused machine call outs
When we moved to arch/powerpc we actively tried to avoid using the
ppc_md.setup_io_mappings().  Currently no board ports use it so let's
remove it to avoid any new boards using it.

Also, remove early_serial_map() since we don't even have a call out for
it in arch/powerpc.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-17 10:01:00 +10:00
Kumar Gala
09b5e63f82 [POWERPC] Rename __initial_memory_limit to __initial_memory_limit_addr
We always use __initial_memory_limit as an address so rename it
to be clear.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-17 07:46:13 +10:00
Kumar Gala
0aef996b37 [POWERPC] 85xx: Cleanup TLB initialization
* Determine the RPN we are running the kernel at runtime rather
  than using compile time constant for initial TLB

* Cleanup adjust_total_lowmem() to respect memstart_addr and
  be a bit more clear on variables that are sizes vs addresses.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-17 07:46:13 +10:00
Kumar Gala
d7917ba705 [POWERPC] Introduce lowmem_end_addr to distinguish from total_lowmem
total_lowmem represents the amount of low memory, not the physical
address that low memory ends at.  If the start of memory is at 0 it
happens that total_lowmem can be used as both the size and the address
that lowmem ends at (or more specifically one byte beyond the end).

To make the code a bit more clear and deal with the case when the start of
memory isn't at physical 0, we introduce lowmem_end_addr that represents
one byte beyond the last physical address in the lowmem region.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-17 07:46:13 +10:00
Kumar Gala
99c62dd773 [POWERPC] Remove and replace uses of PPC_MEMSTART with memstart_addr
A number of users of PPC_MEMSTART (40x, ppc_mmu_32) can just always
use 0 as we don't support booting these kernels at non-zero physical
addresses since their exception vectors must be at 0 (or 0xfffx_xxxx).

For the sub-arches that support relocatable interrupt vectors
(book-e), it's reasonable to have memory start at a non-zero physical
address.  For those cases use the variable memstart_addr instead of
the #define PPC_MEMSTART since the only uses of PPC_MEMSTART are for
initialization and in the future we can set memstart_addr at runtime
to have a relocatable kernel.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-17 07:46:12 +10:00
Paul Mackerras
ac7c5353b1 Merge branch 'linux-2.6' 2008-04-14 21:11:02 +10:00
Stephen Rothwell
ae86f0088d [POWERPC] htab_remove_mapping is only used by MEMORY_HOTPLUG
This eliminates a warning in builds that don't define
CONFIG_MEMORY_HOTPLUG.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-07 13:49:25 +10:00
Benjamin Herrenschmidt
b991f05f13 [POWERPC] Fix deadlock with mmu_hash_lock in hash_page_sync
hash_page_sync() takes and releases the low level mmu hash
lock in order to sync with other processors disposing of page
tables.  Because that lock can be needed to service hash misses
triggered by interrupt handlers, taking it must be done with
interrupts off.  However, hash_page_sync() appears to be called
with interrupts enabled, thus causing occasional deadlocks.

We fix it by making sure hash_page_sync() masks interrupts while
holding the lock.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-03 22:11:11 +11:00
Johannes Weiner
745681a524 [POWERPC] Remove redundant display of free swap space in show_mem()
show_mem() has no need to print the amount of free swap space manually
because show_free_areas() does this already and is called by the
former.

The two outputs only differ in text formatting:

  printk("Free swap  = %lukB\n", ...);
  printk("Free swap:       %6ldkB\n", ...);

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-01 20:43:10 +11:00
Harvey Harrison
e48b1b452f [POWERPC] Replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-01 20:43:09 +11:00
Geert Uytterhoeven
fa90f70a8e [POWERPC] arch_add_memory() cannot be __devinit
WARNING: vmlinux.o(.text+0xb41b0): Section mismatch in reference from the
function .add_memory() to the function .devinit.text:.arch_add_memory()
The function .add_memory() references
the function __devinit .arch_add_memory().
This is often because .add_memory lacks a __devinit
annotation or the annotation of .arch_add_memory is wrong.

arch_add_memory() is also not __devinit on other architectures

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-01 20:43:08 +11:00
Badari Pulavarty
52db9b4426 [POWERPC] Add error return from htab_remove_mapping()
If the platform doesn't support hpte_removebolted(), gracefully
return failure rather than success.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-01 20:43:08 +11:00
Paul Mackerras
54f53f2b94 Merge branch 'linux-2.6' 2008-03-26 08:44:18 +11:00
Paul Mackerras
cfe666b145 [POWERPC] Don't use 64k pages for ioremap on pSeries
On pSeries, the hypervisor doesn't let us map in the eHEA ethernet
adapter using 64k pages, and thus the ehea driver will fail if 64k
pages are configured.  This works around the problem by always
using 4k pages for ioremap on pSeries (but not on other platforms).
A better fix would be to check whether the partition could ever
have an eHEA adapter, and only force 4k pages if it could, but this
will do for 2.6.25.

This is based on an earlier patch by Tony Breeds.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-03-24 17:41:22 +11:00
Anton Blanchard
44387e9ff2 [POWERPC] Fix PMU + soft interrupt disable bug
Since the PMU is an NMI now, it can come at any time we are only soft
disabled.  We must hard disable around the two places we allow the kernel
stack SLB and r1 to go out of sync.  Otherwise the PMU exception can
force a kernel stack SLB into another slot, which can lead to it
getting evicted, which can lead to a nasty unrecoverable SLB miss
in the exception entry code.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-03-20 10:14:55 +11:00
Paul Mackerras
bed04a4413 Merge branch 'linux-2.6' 2008-03-13 15:26:33 +11:00
Michael Ellerman
31bf111944 [POWERPC] Fix large hash table allocation on Cell blades
My recent hack to allocate the hash table under 1GB on cell was poorly
tested, *cough*. It turns out on blades with large amounts of memory we
fail to allocate the hash table at all. This is because RTAS has been
instantiated just below 768MB, and 0-x MB are used by the kernel,
leaving no areas that are both large enough and also naturally-aligned.

For the cell IOMMU hack the page tables must be under 2GB, so use that
as the limit instead. This has been tested on real hardware and boots
happily.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-03-13 10:10:26 +11:00
Badari Pulavarty
f8c8803bda [POWERPC] Add code for removing HPTEs for parts of the linear mapping
For memory remove, we need to clean up htab mappings for the
section of the memory we are removing.

This implements support for removing htab bolted mappings for pSeries
logical partitions.  Other sub-archs may need to implement similar
functionality for hotplug memory remove to work on them.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-26 22:17:03 +11:00
David S. Miller
d9b2b2a277 [LIB]: Make PowerPC LMB code generic so sparc64 can use it too.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-13 16:56:49 -08:00
Linus Torvalds
dde0013782 Merge branch 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpc
  [POWERPC] Enable hotplug memory remove for 64-bit powerpc
  [POWERPC] Add remove_memory() for 64-bit powerpc
  [POWERPC] Make cell IOMMU fixed mapping printk more useful
  [POWERPC] Fix potential cell IOMMU bug when switching back to default DMA ops
  [POWERPC] Don't enable cell IOMMU fixed mapping if there are no dma-ranges
  [POWERPC] Fix cell IOMMU null pointer explosion on old firmwares
  [POWERPC] spufs: Fix timing dependent false return from spufs_run_spu
  [POWERPC] spufs: No need to have a runnable SPU for libassist update
  [POWERPC] spufs: Update SPU_Status[CISHP] in backing runcntl write
  [POWERPC] spufs: Fix state_mutex leaks
  [POWERPC] Disable G5 NAP mode during SMU commands on U3
2008-02-08 09:31:42 -08:00
Martin Schwidefsky
2f569afd9c CONFIG_HIGHPTE vs. sub-page page tables.
Background: I've implemented 1K/2K page tables for s390.  These sub-page
page tables are required to properly support the s390 virtualization
instruction with KVM.  The SIE instruction requires that the page tables
have 256 page table entries (pte) followed by 256 page status table entries
(pgste).  The pgstes are only required if the process is using the SIE
instruction.  The pgstes are updated by the hardware and by the hypervisor
for a number of reasons, one of them is dirty and reference bit tracking.
To avoid wasting memory the standard pte table allocation should return
1K/2K (31/64 bit) and 2K/4K if the process is using SIE.

Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
the s390 version for pte_alloc_one cannot return a pointer to a struct
page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
cannot return a pointer to a pte either, since that would require more than
32 bit for the return value of pte_alloc_one (and the pte * would not be
accessible since its not kmapped).

Solution: The only solution I found to this dilemma is a new typedef: a
pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
later patch.  For everybody else it will be a (struct page *).  The
additional problem with the initialization of the ptl lock and the
NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
a destructor pgtable_page_dtor.  The page table allocation and free
functions need to call these two whenever a page table page is allocated or
freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
 To get the pgtable_t back from a pmd entry that has been installed with
pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
call in free_pte_range and apply_to_pte_range.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:42 -08:00
Badari Pulavarty
a99824f327 [POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpc
walk_memory_resource() verifies if there are holes in a given memory
range, by checking against /proc/iomem.  On x86/ia64 system memory is
represented in /proc/iomem.  On powerpc, we don't show system memory as
IO resource in /proc/iomem - instead it's maintained in
/proc/device-tree.

This provides a way for an architecture to provide its own
walk_memory_resource() function.  On powerpc, the memory region is
small (16MB), contiguous and non-overlapping.  So extra checking
against the device-tree is not needed.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-08 19:52:48 +11:00
Badari Pulavarty
aa620abe75 [POWERPC] Add remove_memory() for 64-bit powerpc
Supply remove_memory() function for 64-bit powerpc.  This is still
not quite complete as it needs to do some more arch-specific stuff,
which will be added in a later patch.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-08 19:52:47 +11:00
Linus Torvalds
3796958130 Merge branch 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (69 commits)
  [POWERPC] Add SPE registers to core dumps
  [POWERPC] Use regset code for compat PTRACE_*REGS* calls
  [POWERPC] Use generic compat_sys_ptrace
  [POWERPC] Use generic compat_ptrace_request
  [POWERPC] Use generic ptrace peekdata/pokedata
  [POWERPC] Use regset code for PTRACE_*REGS* requests
  [POWERPC] Switch to generic compat_binfmt_elf code
  [POWERPC] Switch to using user_regset-based core dumps
  [POWERPC] Add user_regset compat support
  [POWERPC] Add user_regset_view definitions
  [POWERPC] Use user_regset accessors for GPRs
  [POWERPC] ptrace accessors for special regs MSR and TRAP
  [POWERPC] Use user_regset accessors for SPE regs
  [POWERPC] Use user_regset accessors for altivec regs
  [POWERPC] Use user_regset accessors for FP regs
  [POWERPC] mpc52xx: fix compile error introduce when rebasing patch
  [POWERPC] 4xx: PCIe indirect DCR spinlock fix.
  [POWERPC] Add missing native dcr dcr_ind_lock spinlock
  [POWERPC] 4xx: Fix offset value on Warp board
  [POWERPC] 4xx: Add 440EPx Sequoia ehci dts entry
  ...
2008-02-07 09:02:26 -08:00
Bernhard Walle
72a7fe3967 Introduce flags for reserve_bootmem()
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.

This patch:

Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past.  This is to avoid conflicts.

Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:25 -08:00
Balbir Singh
1daa6d08d1 [POWERPC] Fake NUMA emulation for PowerPC
Here's a dumb simple implementation of fake NUMA nodes for PowerPC.
Fake NUMA nodes can be specified using the following command line
option

numa=fake=<node range>

node range is of the format <range1>,<range2>,...<rangeN>

Each of the rangeX parameters is passed using memparse().  I find the
patch useful for fake NUMA emulation on my simple PowerPC machine.
I've tested it on a numa box with the following arguments

numa=fake=512M
numa=fake=512M,768M
numa=fake=256M,512M mem=512M
numa=fake=1G mem=768M
numa=fake=
without any numa= argument

The other side-effect introduced by this patch is that; in the case
where we don't have NUMA information, we now set a node online after
adding each LMB.  This node could very well be node 0, but in the case
that we enable fake NUMA nodes, when we cross node boundaries, we need
to set the new node online.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-07 11:40:19 +11:00
Scott Wood
551ed332da [POWERPC] update_mmu_cache: Don't cache-flush non-readable pages
Currently, update_mmu_cache will crash if given a no-access PTE.
There's no need to synchronize dcache/icache unless it's an exec
mapping -- however, due to the existence of older glibc versions that
execute out of a read-but-no-exec page, readability is tested instead.

This assumes no exec-only mappings; if such mappings become supported,
they will need to go through the kmap_atomic() version of
dcache/icache synchronization.

This fixes a bug reported by some users where the kernel would crash
while dumping core on a threaded program.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-06 16:30:01 +11:00
Benjamin Herrenschmidt
5e5419734c add mm argument to pte/pmd/pud/pgd_free
(with Martin Schwidefsky <schwidefsky@de.ibm.com>)

The pgd/pud/pmd/pte page table allocation functions get a mm_struct pointer as
first argument.  The free functions do not get the mm_struct argument.  This
is 1) asymmetrical and 2) to do mm related page table allocations the mm
argument is needed on the free function as well.

[kamalesh@linux.vnet.ibm.com: i386 fix]
[akpm@linux-foundation.org: coding-syle fixes]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:18 -08:00
Michael Ellerman
41d824bf61 [POWERPC] Allocate the hash table under 1G on cell
In order to support the fixed IOMMU mapping (in a subsequent patch),
we need the hash table to be inside the IOMMUs DMA window.  This is
usually 2G, but let's make sure the hash table is under 1G as that
will satisfy the IOMMU requirements and also means the hash table will
be on node 0.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-31 12:11:09 +11:00
Paul Mackerras
55852bed57 Revert "[POWERPC] Fake NUMA emulation for PowerPC"
This reverts commit 5c3f5892a2,
basically because it changes behaviour even when no fake NUMA
information is specified on the kernel command line.

Firstly, it changes the nid, thus destroying the real NUMA
information.  Secondly, it also changes behaviour in that if a node
ends up with no memory in it because of the memory limit, we used to
set it online and now we don't.

Also, in the non-NUMA case with no fake NUMA information, we do
node_set_online once for each LMB now, whereas previously we only did
it once.  I don't know if that is actually a problem, but it does seem
unnecessary.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-26 16:40:33 +11:00
Michael Neuling
c3b75bd7bb [POWERPC] Make setjmp/longjmp code usable outside of xmon
This makes the setjmp/longjmp code used by xmon, generically available
to other code.  It also removes the requirement for debugger hooks to
be only called on 0x300 (data storage) exception.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-25 22:52:50 +11:00
Paul Mackerras
dcb571be20 Merge branch 'for-2.6.25' of master.kernel.org:/pub/scm/linux/kernel/git/galak/powerpc into for-2.6.25 2008-01-24 15:29:14 +11:00
Dale Farnsworth
e8b6376155 [POWERPC] 85xx: Respect KERNELBASE, PAGE_OFFSET, and PHYSICAL_START on e500
The e500 MMU init code previously assumed KERNELBASE always equaled
PAGE_OFFSET and PHYSICAL_START was 0.  This is useful for kdump
support as well as asymetric multicore.

For the initial kdump support the secondary kernel will run at 32M
but need access to all of memory so we bump the initial TLB up to
64M.  This also matches with the forth coming ePAPR spec.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-01-23 19:34:36 -06:00
Kumar Gala
f98eeb4eb1 [POWERPC] Fix handling of memreserve if the range lands in highmem
There were several issues if a memreserve range existed and happened
to be in highmem:

* The bootmem allocator is only aware of lowmem so calling
  reserve_bootmem with a highmem address would cause a BUG_ON
* All highmem pages were provided to the buddy allocator

Added a lmb_is_reserved() api that we now use to determine if a highem
page should continue to be PageReserved or provided to the buddy
allocator.

Also, we incorrectly reported the amount of pages reserved since all
highmem pages are initally marked reserved and we clear the
PageReserved flag as we "free" up the highmem pages.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-01-23 19:29:08 -06:00
Paul Mackerras
9156ad4833 Merge branch 'linux-2.6' 2008-01-24 10:07:21 +11:00
Paul Mackerras
fa28237cfc [POWERPC] Provide a way to protect 4k subpages when using 64k pages
Using 64k pages on 64-bit PowerPC systems makes life difficult for
emulators that are trying to emulate an ISA, such as x86, which use a
smaller page size, since the emulator can no longer use the MMU and
the normal system calls for controlling page protections.  Of course,
the emulator can emulate the MMU by checking and possibly remapping
the address for each memory access in software, but that is pretty
slow.

This provides a facility for such programs to control the access
permissions on individual 4k sub-pages of 64k pages.  The idea is
that the emulator supplies an array of protection masks to apply to a
specified range of virtual addresses.  These masks are applied at the
level where hardware PTEs are inserted into the hardware page table
based on the Linux PTEs, so the Linux PTEs are not affected.  Note
that this new mechanism does not allow any access that would otherwise
be prohibited; it can only prohibit accesses that would otherwise be
allowed.  This new facility is only available on 64-bit PowerPC and
only when the kernel is configured for 64k pages.

The masks are supplied using a new subpage_prot system call, which
takes a starting virtual address and length, and a pointer to an array
of protection masks in memory.  The array has a 32-bit word per 64k
page to be protected; each 32-bit word consists of 16 2-bit fields,
for which 0 allows any access (that is otherwise allowed), 1 prevents
write accesses, and 2 or 3 prevent any access.

Implicit in this is that the regions of the address space that are
protected are switched to use 4k hardware pages rather than 64k
hardware pages (on machines with hardware 64k page support).  In fact
the whole process is switched to use 4k hardware pages when the
subpage_prot system call is used, but this could be improved in future
to switch only the affected segments.

The subpage protection bits are stored in a 3 level tree akin to the
page table tree.  The top level of this tree is stored in a structure
that is appended to the top level of the page table tree, i.e., the
pgd array.  Since it will often only be 32-bit addresses (below 4GB)
that are protected, the pointers to the first four bottom level pages
are also stored in this structure (each bottom level page contains the
protection bits for 1GB of address space), so the protection bits for
addresses below 4GB can be accessed with one fewer loads than those
for higher addresses.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-24 10:06:01 +11:00
Jon Tollefson
4ec161cf73 [POWERPC] Add hugepagesz boot-time parameter
This adds the hugepagesz boot-time parameter for ppc64.  It lets one
pick the size for huge pages.  The choices available are 64K and 16M
when the base page size is 4k.  It defaults to 16M (previously the
only only choice) if nothing or an invalid choice is specified.

Tested 64K huge pages successfully with the libhugetlbfs 1.2.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-17 14:57:36 +11:00
Paul Mackerras
dfbe0d3b6b [POWERPC] Fix boot failure on POWER6
Commit 473980a993 added a call to clear
the SLB shadow buffer before registering it.  Unfortunately this means
that we clear out the entries that slb_initialize has previously set in
there.  On POWER6, the hypervisor uses the SLB shadow buffer when doing
partition switches, and that means that after the next partition switch,
each non-boot CPU has no SLB entries to map the kernel text and data,
which causes it to crash.

This fixes it by reverting most of 473980a9 and instead clearing the
3rd entry explicitly in slb_initialize.  This fixes the problem that
473980a9 was trying to solve, but without breaking POWER6.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-15 17:30:58 +11:00
Michael Neuling
473980a993 [POWERPC] Fix CPU hotplug when using the SLB shadow buffer
Before we register the SLB shadow buffer, we need to invalidate the
entries in the buffer, otherwise we can end up stale entries from when
we previously offlined the CPU.

This does this invalidate as well as unregistering the buffer with
PHYP before we offline the cpu.  Tested and fixes crashes seen on
970MP (thanks to tonyb) and POWER5.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-11 16:33:55 +11:00
Balbir Singh
5c3f5892a2 [POWERPC] Fake NUMA emulation for PowerPC
Here's a dumb simple implementation of fake NUMA nodes for PowerPC.
Fake NUMA nodes can be specified using the following command line option

numa=fake=<node range>

node range is of the format <range1>,<range2>,...<rangeN>

Each of the rangeX parameters is passed using memparse().  I find this
useful for fake NUMA emulation on my simple PowerPC machine.  I've
tested it on a non-numa box with the following arguments:

numa=fake=1G
numa=fake=1G,2G
name=fake=1G,512M,2G
numa=fake=1500M,2800M mem=3500M
numa=fake=1G mem=512M
numa=fake=1G mem=1G

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-20 16:11:46 +11:00
Michael Neuling
584f8b71a2 [POWERPC] Use SLB size from the device tree
Currently we hardwire the number of SLBs to 64, but PAPR says we
should use the ibm,slb-size property to obtain the number of SLB
entries.  This uses this property instead of assuming 64.  If no
property is found, we assume 64 entries as before.

This soft patches the SLB handler, so it shouldn't change performance
at all.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-11 13:45:56 +11:00
joe@perches.com
df3c9019ed [POWERPC] Add missing spaces in printk formats
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-03 13:56:27 +11:00
Benjamin Herrenschmidt
0b47759db5 [POWERPC] Fix 8xx build breakage due to _tlbie changes
My changes to _tlbie to fix 4xx unfortunately broke 8xx build in a
couple of places.  This fixes it.

Spotted by Olof Johansson.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-20 18:42:00 +11:00
Kamalesh Babulal
f9b6c1de69 [POWERPC] Fix build failure on legacy iSeries
Include <asm/iseries/hv_call.h> in arch/powerpc/mm/stab.c to fix the
following compile error (found with randconfig):

  CC      arch/powerpc/mm/stab.o
arch/powerpc/mm/stab.c: In function "stab_initialize":
arch/powerpc/mm/stab.c:282: error: implicit declaration of function "HvCall1"
arch/powerpc/mm/stab.c:282: error: "HvCallBaseSetASR" undeclared (first use in this function)
arch/powerpc/mm/stab.c:282: error: (Each undeclared identifier is reported only once
arch/powerpc/mm/stab.c:282: error: for each function it appears in.)
make[1]: *** [arch/powerpc/mm/stab.o] Error 1
make: *** [arch/powerpc/mm] Error 2

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-20 11:37:39 +11:00
Stephen Rothwell
6548d83a37 [POWERPC] Silence an annoying boot message
vmemmap_populate will printk (with KERN_WARNING) for a lot of pages
if CONFIG_SPARSEMEM_VMEMMAP is enabled (at least it does on iSeries).
Use pr_debug for it instead.

Replace the only other use of DBG in this file with pr_debug as well.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-13 16:23:47 +11:00
Olof Johansson
9bafbb0c4d [POWERPC] Fix CONFIG_SMP=n build error on ppc64
The patch "KVM: fix !SMP build error" change the way smp_call_function()
actually uses the passed in function names on non-SMP builds.  So
previously it was never caught that the function passed in was never
actually defined.

This causes a build error on ppc64_defconfig + CONFIG_SMP=n:

arch/powerpc/mm/tlb_64.c: In function 'pgtable_free_now':
arch/powerpc/mm/tlb_64.c:71: error: 'pte_free_smp_sync' undeclared (first use in this function)
arch/powerpc/mm/tlb_64.c:71: error: (Each undeclared identifier is reported only once
arch/powerpc/mm/tlb_64.c:71: error: for each function it appears in.)

So we need to define it even if CONFIG_SMP is off. Either that or ifdef
out the smp_call_function() call, but that's ugly.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-13 16:22:44 +11:00
Paul Mackerras
688016f4e2 Merge branch 'for-2.6.24' of master.kernel.org:/pub/scm/linux/kernel/git/jwboyer/powerpc-4xx into merge 2007-11-08 14:28:14 +11:00
will schmidt
465ccab9eb [POWERPC] Fix switch_slb handling of 1T ESID values
Now that we have 1TB segment size support, we need to be using the
GET_ESID_1T macro when comparing ESID values for pc, stack, and
unmapped_base within switch_slb().   A new helper function called
esids_match() contains the logic for deciding when to call GET_ESID
and GET_ESID_1T.

This fixes a duplicate-slb-entry inspired machine-check exception I
was seeing when trying to run java on a power6 partition.

Tested on power6 and power5.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
will schmidt
aa39be09df [POWERPC] Include udbg.h when using udbg_printf
This fixes the error
	error: implicit declaration of function "udbg_printf"

We have a few spots where we reference udbg_printf() without #including
udbg.h.  These are within #ifdef DEBUG blocks, so unnoticed until we do
a #define DEBUG or #define DEBUG_LOW nearby.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
Grant Likely
bd942ba3db [POWERPC] ppc405 Fix arithmatic rollover bug when memory size under 16M
mmu_mapin_ram() loops over total_lowmem to setup page tables.  However, if
total_lowmem is less that 16M, the subtraction rolls over and results in
a number just under 4G (because total_lowmem is an unsigned value).

This patch rejigs the loop from countup to countdown to eliminate the
bug.

Special thanks to Magnus Hjorth who wrote the original patch to fix this
bug.  This patch improves on his by making the loop code simpler (which
also eliminates the possibility of another rollover at the high end)
and also applies the change to arch/powerpc.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2007-11-01 07:15:59 -05:00
Benjamin Herrenschmidt
b98ac05d5e [POWERPC] 4xx: Deal with 44x virtually tagged icache
The 44x family has an interesting "feature" which is a virtually
tagged instruction cache (yuck !). So far, we haven't dealt with
it properly, which means we've been mostly lucky or people didn't
report the problems, unless people have been running custom patches
in their distro...

This is an attempt at fixing it properly. I chose to do it by
setting a global flag whenever we change a PTE that was previously
marked executable, and flush the entire instruction cache upon
return to user space when that happens.

This is a bit heavy handed, but it's hard to do more fine grained
flushes as the icbi instruction, on those processor, for some very
strange reasons (since the cache is virtually mapped) still requires
a valid TLB entry for reading in the target address space, which
isn't something I want to deal with.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2007-11-01 07:15:30 -05:00
Benjamin Herrenschmidt
e701d269aa [POWERPC] 4xx: Fix 4xx flush_tlb_page()
On 4xx CPUs, the current implementation of flush_tlb_page() uses
a low level _tlbie() assembly function that only works for the
current PID. Thus, invalidations caused by, for example, a COW
fault triggered by get_user_pages() from a different context will
not work properly, causing among other things, gdb breakpoints
to fail.

This patch adds a "pid" argument to _tlbie() on 4xx processors,
and uses it to flush entries in the right context. FSL BookE
also gets the argument but it seems they don't need it (their
tlbivax form ignores the PID when invalidating according to the
document I have).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2007-11-01 07:15:09 -05:00
Benjamin Herrenschmidt
f6ab0b922c [POWERPC] powerpc: Fix demotion of segments to 4K pages
When demoting a process to use 4K HW pages (instead of 64K), which
happens under various circumstances such as doing cache inhibited
mappings on machines that do not support 64K CI pages, the assembly
hash code calls back into the C function flush_hash_page().  This
function prototype was recently changed to accomodate for 1T segments
but the assembly call site was not updated, causing applications that
do demotion to hang.  In addition, when updating the per-CPU PACA for
the new sizes, we didn't properly update the slice "map", thus causing
the SLB miss code to re-insert segments for the wrong size.

This fixes both and adds a warning comment next to the C
implementation to try to avoid problems next time someone changes it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-29 14:34:14 +11:00
Serge E. Hallyn
b460cbc581 pid namespaces: define is_global_init() and is_container_init()
is_init() is an ambiguous name for the pid==1 check.  Split it into
is_global_init() and is_container_init().

A cgroup init has it's tsk->pid == 1.

A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns.  But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.

Changelog:

	2.6.22-rc4-mm2-pidns1:
	- Use 'init_pid_ns.child_reaper' to determine if a given task is the
	  global init (/sbin/init) process. This would improve performance
	  and remove dependence on the task_pid().

	2.6.21-mm2-pidns2:

	- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
	  ppc,avr32}/traps.c for the _exception() call to is_global_init().
	  This way, we kill only the cgroup if the cgroup's init has a
	  bug rather than force a kernel panic.

[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:37 -07:00
Linus Torvalds
c548f08a4f Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (24 commits)
  [POWERPC] Fix vmemmap warning in init_64.c
  [POWERPC] Fix 64 bits vDSO DWARF info for CR register
  [POWERPC] Add 1TB workaround for PA6T
  [POWERPC] Enable NO_HZ and high res timers for pseries and ppc64 configs
  [POWERPC] Quieten cache information at boot
  [POWERPC] Quieten clockevent printk
  [POWERPC] Enable SLUB in *_defconfig
  [POWERPC] Fix 1TB segment detection
  [POWERPC] Fix iSeries_hpte_insert prototype
  [POWERPC] Fix copyright symbol
  [POWERPC] ibmebus: Move to of_device and of_platform_driver, match eHCA and eHEA drivers
  [POWERPC] ibmebus: Add device creation and bus probing based on of_device
  [POWERPC] ibmebus: Remove bus match/probe/remove functions
  [POWERPC] Move of_device allocation into of_device.[ch]
  [POWERPC] mpc52xx: device tree changes for FEC and MDIO
  [POWERPC] bestcomm: GenBD task support
  [POWERPC] bestcomm: FEC task support
  [POWERPC] bestcomm: ATA task support
  [POWERPC] bestcomm: core bestcomm support for Freescale MPC5200
  [POWERPC] mpc52xx: Update mpc52xx_psc structure with B revision changes
  ...
2007-10-17 09:05:55 -07:00
Roel Kluin
f7a75f0a40 spin_lock_unlocked cleanups
Replace some SPIN_LOCK_UNLOCKED with DEFINE_SPINLOCK

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:43:01 -07:00
Christoph Lameter
4ba9b9d0ba Slab API: remove useless ctor parameter and reorder parameters
Slab constructors currently have a flags parameter that is never used.  And
the order of the arguments is opposite to other slab functions.  The object
pointer is placed before the kmem_cache pointer.

Convert

        ctor(void *object, struct kmem_cache *s, unsigned long flags)

to

        ctor(struct kmem_cache *s, void *object)

throughout the kernel

[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Tony Breeds
f6b8076910 [POWERPC] Fix vmemmap warning in init_64.c
Use the right printk format to silence the following warning.

  CC      arch/powerpc/mm/init_64.o
arch/powerpc/mm/init_64.c: In function 'vmemmap_populate':
arch/powerpc/mm/init_64.c:243: warning: format '%p' expects type 'void *', but argument 4 has type 'long unsigned int'

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17 22:30:09 +10:00
Olof Johansson
f66bce5e6a [POWERPC] Add 1TB workaround for PA6T
PA6T has a bug where the slbie instruction does not honor the large
segment bit.  As a result, we have to always use slbia when switching
context.

We don't have to worry about changing the slbie's during fault processing,
since they should never be replacing one VSID with another using the
same ESID.  I.e. there's no risk for inserting duplicate entries due to a
failed slbie of the old entry.  So as long as we clear it out on context
switch we should be fine.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17 22:30:09 +10:00
Olof Johansson
f5534004e5 [POWERPC] Fix 1TB segment detection
Buglet in the 1TB detection makes it return after checking the first
property word, even if it's not a match.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17 22:30:08 +10:00
Anton Blanchard
e95206ab2c Update PowerPC vmemmap code for 1TB segments
htab_bolt_mapping takes another argument now the 1TB code has been
merged. Update vmemmap_populate to match.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 13:10:58 -07:00
KAMEZAWA Hiroyuki
48e94196a5 fix memory hot remove not configured case.
Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
This patch cleans up them. This is against 2.6.23-rc6-mm1.

 - fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
 - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
   which returns -EINVAL.
 - removed remove_pages() only used in powerpc.
 - removed no-op remove_memory() in i386, sh, sparc64, x86_64.

 - only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
   to return -EINVAL.

Note:
Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
archs if there are requirements and testers.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:02 -07:00
Andy Whitcroft
d29eff7bca ppc64: SPARSEMEM_VMEMMAP support
Enable virtual memmap support for SPARSEMEM on PPC64 systems.  Slice a 16th
off the end of the linear mapping space and use that to hold the vmemmap.
Uses the same size mapping as uses in the linear 1:1 kernel mapping.

[pbadari@gmail.com: fix warning]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:51 -07:00
Paul Mackerras
1189be6508 [POWERPC] Use 1TB segments
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).

We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.

We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments.  That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.

Parts of this patch were originally written by Ben Herrenschmidt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-12 14:05:17 +10:00
Dale Farnsworth
873553b3d6 [POWERPC] 85xx: Failure with odd memory sizes and CONFIG_HIGHMEM
The CONFIG_FSL_BOOKE mmu setup code fails when CONFIG_HIGHMEM=y
and the 3 fixed TLB entries cannot exactly map the lowmem size.
Each TLB entry can map 4MB, 16MB, 64MB or 256MB, so the failure
is observed when the kernel lowmem size is not equal to the
sum of up to 3 of those values.

Normally, memory is sized in nice numbers, but I observed this
problem while testing a crash dump kernel.  The failure can
also be observed by artificially reducing the kernel's main
memory via the mem= kernel command line parameter.

This commit fixes the problem by setting __initial_memory_limit
in adjust_total_lowmem().

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2007-10-08 08:38:34 -05:00
John Traill
544cdabe64 [POWERPC] 8xx: Set initial memory limit.
The 8xx can only support a max of 8M during early boot (it seems a lot of
8xx boards only have 8M so the bug was never triggered), but the early
allocator isn't aware of this.  The following change makes it able to run
with larger memory.

Signed-off-by: John Traill <john.traill@freescale.com>
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2007-10-03 20:36:36 -05:00
Ed Swarthout
df174e3be8 [POWERPC] Add memory regions to the kcore list for 32-bit machines
The entries are only 32-bit, so restrict the virtual address to stay
below 0xffff_ffff.  With KERNELBASE set to 0xc000_0000, this in effect
restricts access to the first 1GB of real memory.

Make setup_kcore conditional on CONFIG_PROC_KCORE for both 32/64.

Signed-off-by: Ed Swarthout <Ed.Swarthout@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-03 09:12:06 +10:00
Stephen Rothwell
2578bfae84 [POWERPC] Create and use CONFIG_WORD_SIZE
Linus made this suggestion for the x86 merge and this starts the process
for powerpc.  We assume that CONFIG_PPC64 implies CONFIG_PPC_MERGE and
CONFIG_PPC_STD_MMU_32 implies CONFIG_PPC_STD_MMU.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-03 09:12:02 +10:00
Michael Neuling
00efee7d5d [POWERPC] Remove barriers from the SLB shadow buffer update
After talking to an IBM POWER hypervisor (PHYP) design and development
guy, there seems to be no need for memory barriers when updating the SLB
shadow buffer provided we only update it from the current CPU, which we
do.

Also, these guys see no need in the future for these barriers.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-19 14:40:54 +10:00
Olof Johansson
a302cb9d95 [POWERPC] Export new __io{re,un}map_at() symbols
Export new __io{re,un}map_at() symbols so modules can use them.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-14 01:33:21 +10:00
Paul Mackerras
35438c4327 Merge branch 'linux-2.6' into for-2.6.24 2007-08-28 15:56:11 +10:00
Paul Mackerras
175587cca7 [POWERPC] Fix SLB initialization at boot time
This partially reverts edd0622bd2.

It turns out that the part of that commit that aimed to ensure that we
created an SLB entry for the kernel stack on secondary CPUs when
starting the CPU didn't achieve its aim, and in fact caused a
regression, because get_paca()->kstack is not initialized at the point
where slb_initialize is called.

This therefore just reverts that part of that commit, while keeping
the change to slb_flush_and_rebolt, which is correct and necessary.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-25 16:58:43 +10:00
Josh Boyer
4d922c8dc3 [POWERPC] 40x MMU
Add MMU definitions for 40x platforms.  Also fixes two warnings in 40x_mmu.c.

Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
2007-08-20 07:28:48 -05:00
Josh Boyer
15f6527e8e [POWERPC] Rename 4xx paths to 40x
4xx is a bit of a misnomer for certain things, as they really apply to PowerPC
40x only.  Rename some of the files to clean this up.

Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
2007-08-20 07:27:07 -05:00
Stephen Rothwell
e8ff0646e5 [POWERPC] Tidy up CONFIG_PPC_MM_SLICES code
This removes some of the #ifdefs from .c files.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-17 11:01:59 +10:00
Stephen Rothwell
9dfe5c53d0 [POWERPC] Fix non HUGETLB_PAGE build warning
arch/powerpc/mm/mmu_context_64.c: In function 'init_new_context':
arch/powerpc/mm/mmu_context_64.c:31: warning: unused variable 'new_context'

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-17 11:01:58 +10:00
Jesper Juhl
9420dc65ff [POWERPC] Clean out a bunch of duplicate includes
This removes several duplicate includes from arch/powerpc/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-17 11:01:51 +10:00
Ilpo Järvinen
2b02d13996 [POWERPC] Fix invalid semicolon after if statement
A similar fix to netfilter from Eric Dumazet inspired me to
look around a bit by using some grep/sed stuff as looking for
this kind of bugs seemed easy to automate.  This is one of them
I found where it looks like this semicolon is not valid.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-17 10:48:52 +10:00
Benjamin Herrenschmidt
d1f5a77f2c [POWERPC] Fix size check for hugetlbfs
My "slices" address space management code that was added in the 2.6.22
implementation of get_unmapped_area() doesn't properly check that the
size is a multiple of the requested page size.  This allows userland to
create VMAs that aren't a multiple of the huge page size with hugetlbfs
(since hugetlbfs entirely relies on get_unmapped_area() to do that
checking) which leads to a kernel BUG() when such areas are torn down.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-10 21:04:42 +10:00
Paul Mackerras
edd0622bd2 [POWERPC] Fix potential duplicate entry in SLB shadow buffer
We were getting a duplicate entry in the SLB shadow buffer in
slb_flush_and_rebolt() if the kernel stack was in the same segment
as PAGE_OFFSET, which on POWER6 causes the hypervisor to terminate
the partition with an error.  This fixes it.

Also we were not creating an SLB entry (or an SLB shadow buffer
entry) for the kernel stack on secondary CPUs when starting the
CPU.  This isn't a major problem, since an appropriate entry will
be created on demand, but this fixes that also for consistency.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-10 21:04:07 +10:00
Michael Neuling
67439b76f2 [POWERPC] Fixes for the SLB shadow buffer code
On a machine with hardware 64kB pages and a kernel configured for a
64kB base page size, we need to change the vmalloc segment from 64kB
pages to 4kB pages if some driver creates a non-cacheable mapping in
the vmalloc area.  However, we never updated with SLB shadow buffer.
This fixes it.  Thanks to paulus for finding this.

Also added some write barriers to ensure the shadow buffer contents
are always consistent.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-03 19:36:01 +10:00
Michael Ellerman
b9c3fdb0f0 [POWERPC] Fix parse_drconf_memory() for 64-bit start addresses
Some new machines use the "ibm,dynamic-reconfiguration-memory" property
to provide memory layout information, rather than via memory nodes.

There is a bug in the code to parse this property for start addresses
over 4GB; we store the start address in an unsigned int, which means
we throw away the high bits and add apparently duplicate regions.
This results in a BUG() in free_bootmem_core().  This fixes it by
using an unsigned long instead.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-03 19:36:00 +10:00
Paul Mackerras
430404ed9c [POWERPC] Fix special PTE code for secondary hash bucket
The code for mapping special 4k pages on kernels using a 64kB base
page size was missing the code for doing the RPN (real page number)
manipulation when inserting the hardware PTE in the secondary hash
bucket.  It needs the same code as has already been added to the
code that inserts the HPTE in the primary hash bucket.  This adds it.

Spotted by Ben Herrenschmidt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-03 19:16:11 +10:00
Manish Ahuja
56d6d1a73d [POWERPC] Fix loop with unsigned long counter variable
This fixes a possible infinite loop when the unsigned long counter "i"
is used in lmb_add_region() in the following for loop:

for (i = rgn->cnt-1; i >= 0; i--)

by making the loop counter "i" be signed.

Signed-off-by: Manish Ahuja <ahuja@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-07-26 16:12:17 +10:00
Linus Torvalds
dc79747019 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Clean up duplicate includes in drivers/macintosh/
  [POWERPC] Quiet section mismatch warning on pcibios_setup
  [POWERPC] init and exit markings for hvc_iseries
  [POWERPC] Quiet section mismatch in hvc_rtas.c
  [POWERPC] Constify of_platform_driver match_table
  [POWERPC] hvcs: Make some things static and const
  [POWERPC] Constify of_platform_driver name
  [POWERPC] MPIC protected sources
  [POWERPC] of_detach_node()'s device node argument cannot be const
  [POWERPC] Fix ARCH=ppc builds
  [POWERPC] mv64x60: Use mutex instead of semaphore
  [POWERPC] Allow smp_call_function_single() to current cpu
  [POWERPC] Allow exec faults on readable areas on classic 32-bit PowerPC
  [POWERPC] Fix future firmware feature fixups function failure
  [POWERPC] fix showing xmon help
  [POWERPC] Make xmon_write accept a const buffer
  [POWERPC] Fix misspelled "CONFIG_CHECK_CACHE_COHERENCY" Kconfig option.
  [POWERPC] cell: CONFIG_SPE_BASE is a typo
2007-07-22 11:17:35 -07:00
Paul Mackerras
08ae6cc15d [POWERPC] Allow exec faults on readable areas on classic 32-bit PowerPC
Classic 32-bit PowerPC CPUs, and the early 64-bit PowerPC CPUs, don't
provide a way to prevent execution from readable pages, that is, the
MMU doesn't distinguish between data reads and instruction reads,
although a different exception is taken for faults in data accesses
and instruction accesses.

Commit 9ba4ace39f, in the course of
fixing another bug, added a check that meant that a page fault due
to an instruction access would fail if the vma did not have the
VM_EXEC flag set.  This gives an inconsistent enforcement on these
CPUs of the no-execute status of the vma (since reading from the page
is sufficient to allow subsequent execution from it), and causes old
versions of ppc32 glibc (2.2 and earlier) to fail, since they rely
on executing the word before the GOT but don't have it marked
executable.

This fixes the problem by allowing execution from readable (or writable)
areas on CPUs which do not provide separate control over data and
instruction reads.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Jon Loeliger <jdl@freescale.com>
2007-07-22 21:30:58 +10:00
Geert Uytterhoeven
1e57ba8ddd [POWERPC] cell: CONFIG_SPE_BASE is a typo
The config symbol for SPE support is called CONFIG_SPU_BASE, not
CONFIG_SPE_BASE.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-07-22 21:30:57 +10:00
Mariusz Kozlowski
97d22d26b4 powerpc: tlb_32.c build fix
allnoconfig results in this:

 CC      arch/powerpc/mm/tlb_32.o
In file included from include/asm/tlb.h:60,
                 from arch/powerpc/mm/tlb_32.c:30:
include/asm-generic/tlb.h: In function 'tlb_flush_mmu':
include/asm-generic/tlb.h:76: error: implicit declaration of function 'release_pages'
include/asm-generic/tlb.h: In function 'tlb_remove_page':
include/asm-generic/tlb.h:105: error: implicit declaration of function 'page_cache_release'

Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 17:49:16 -07:00
Paul Mundt
20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Nick Piggin
83c54070ee mm: fault feedback #2
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer.  This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).

[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Paul Mackerras
bf22f6fe2d Merge branch 'for-2.6.23' into merge 2007-07-11 13:28:26 +10:00
Manish Ahuja
b3e998ee05 [POWERPC] Remove extra return statement
Found 2 instances of return one right after each other in
arch_add_memory().  This removes the superfluous one.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-07-10 22:01:01 +10:00
Kumar Gala
74a0ba61b1 [POWERPC] Move inline asm eieio to using eieio inline function
Use the eieio function so we can redefine what eieio does rather
than direct inline asm.  This is part code clean up and partially
because not all PPCs have eieio (book-e has mbar that maps to eieio).

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2007-07-10 00:33:14 -05:00
Segher Boessenkool
9ba4ace39f [POWERPC] PowerPC: Prevent data exception in kernel space (32-bit)
The "is_exec" branch of the protection check in do_page_fault()
didn't do anything on 32-bit PowerPC.  So if a userland program
jumps to a page with Linux protection flags "---p", all the tests
happily fall through, and handle_mm_fault() is called, which in
turn calls handle_pte_fault(), which calls update_mmu_cache(),
which goes flush the dcache to a page with no access rights.

Boom.

This fixes it.

Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-06-20 22:07:38 +10:00
David Gibson
8e561e7eda [POWERPC] Kill typedef-ed structs for hash PTEs and BATs
Using typedefs to rename structure types if frowned on by CodingStyle.
However, we do so for the hash PTE structure on both ppc32 (where it's
called "PTE") and ppc64 (where it's called "hpte_t").  On ppc32 we
also have such a typedef for the BATs ("BAT").

This removes this unhelpful use of typedefs, in the process
bringing ppc32 and ppc64 closer together, by using the name "struct
hash_pte" in both cases.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-06-14 22:30:16 +10:00
David Gibson
c0770f686c [POWERPC] Remove a couple of unused definitions from pgtable_32.c
In arch/powerpc/mm/pgtable_32.c, the variable io_bat_index and the
macro is_power_of_4() no longer have any users.  This removes them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-06-14 22:30:15 +10:00
David Gibson
f21f49ea63 [POWERPC] Remove the dregs of APUS support from arch/powerpc
APUS (the Amiga Power-Up System) is not supported under arch/powerpc
and it's unlikely it ever will be.  Therefore, this patch removes the
fragments of APUS support code from arch/powerpc which have been
copied from arch/ppc.

A few APUS references are left in asm-powerpc in .h files which are
still used from arch/ppc.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-06-14 22:30:15 +10:00
David Gibson
90ac19a8b2 [POWERPC] Abolish iopa(), mm_ptov(), io_block_mapping() from arch/powerpc
These old-fashioned IO mapping functions no longer have any callers in
code which remains relevant on arch/powerpc.  Therefore, this removes
them from arch/powerpc.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-06-14 22:30:15 +10:00
will schmidt
effe24bdd4 [POWERPC] During VM oom condition, kill all threads in process group
We have had complaints where a threaded application is left in a bad state
after one of it's threads is killed when we hit a VM: out_of_memory
condition.

Killing just one of the process threads can leave the application in a
bad state, whereas killing the entire process group would allow for
the application to restart, or be otherwise handled, and makes it very
obvious that something has gone wrong.

This change allows the entire process group to be taken down, rather than
just the one thread.

lightly tested on powerpc

Signed-off-by: Will <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-06-14 22:29:59 +10:00
Benjamin Herrenschmidt
3d5134ee83 [POWERPC] Rewrite IO allocation & mapping on powerpc64
This rewrites pretty much from scratch the handling of MMIO and PIO
space allocations on powerpc64.  The main goals are:

 - Get rid of imalloc and use more common code where possible
 - Simplify the current mess so that PIO space is allocated and
   mapped in a single place for PCI bridges
 - Handle allocation constraints of PIO for all bridges including
   hot plugged ones within the 2GB space reserved for IO ports,
   so that devices on hotplugged busses will now work with drivers
   that assume IO ports fit in an int.
 - Cleanup and separate tracking of the ISA space in the reserved
   low 64K of IO space. No ISA -> Nothing mapped there.

I booted a cell blade with IDE on PIO and MMIO and a dual G5 so
far, that's it :-)

With this patch, all allocations are done using the code in
mm/vmalloc.c, though we use the low level __get_vm_area with
explicit start/stop constraints in order to manage separate
areas for vmalloc/vmap, ioremap, and PCI IOs.

This greatly simplifies a lot of things, as you can see in the
diffstat of that patch :-)

A new pair of functions pcibios_map/unmap_io_space() now replace
all of the previous code that used to manipulate PCI IOs space.
The allocation is done at mapping time, which is now called from
scan_phb's, just before the devices are probed (instead of after,
which is by itself a bug fix). The only other caller is the PCI
hotplug code for hot adding PCI-PCI bridges (slots).

imalloc is gone, as is the "sub-allocation" thing, but I do beleive
that hotplug should still work in the sense that the space allocation
is always done by the PHB, but if you unmap a child bus of this PHB
(which seems to be possible), then the code should properly tear
down all the HPTE mappings for that area of the PHB allocated IO space.

I now always reserve the first 64K of IO space for the bridge with
the ISA bus on it. I have moved the code for tracking ISA in a separate
file which should also make it smarter if we ever are capable of
hot unplugging or re-plugging an ISA bridge.

This should have a side effect on platforms like powermac where VGA IOs
will no longer work. This is done on purpose though as they would have
worked semi-randomly before. The idea at this point is to isolate drivers
that might need to access those and fix them by providing a proper
function to obtain an offset to the legacy IOs of a given bus.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-06-14 22:29:56 +10:00
Benjamin Herrenschmidt
c19c03fc74 [POWERPC] unmap_vm_area becomes unmap_kernel_range for the public
This makes unmap_vm_area static and a wrapper around a new
exported unmap_kernel_range that takes an explicit range instead
of a vm_area struct.

This makes it more versatile for code that wants to play with kernel
page tables outside of the standard vmalloc area.

(One example is some rework of the PowerPC PCI IO space mapping
code that depends on that patch and removes some code duplication
and horrible abuse of forged struct vm_struct).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-06-14 22:29:56 +10:00
Jon Tollefson
3f1df7a260 [POWERPC] Move common code out of if/else
Move common code out of if/else.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
----

hash_native_64.c |    3 +--
 1 files changed, 1 insertion(+), 2 deletions(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-06-14 22:29:55 +10:00
Kumar Gala
f1aed92464 [POWERPC] Fix modpost warning
Mark pte_alloc_one_kernel as __init_refok to fix the following warning:

WARNING: arch/powerpc/mm/built-in.o(.text+0x1068): Section mismatch: reference to .init.text:early_get_page (between 'pte_alloc_one_kernel' and 'steal_context')

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2007-05-23 07:49:37 -05:00
Benjamin Herrenschmidt
5453e7723b [POWERPC] Fix warning in 32-bit builds with CONFIG_HIGHMEM
Some missing fixup for the removal of 4 level fixup header.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-22 20:20:57 +10:00
Alexey Dobriyan
e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Jon Tollefson
5b82583185 [POWERPC] Correct #endif comment
Fix up comment on two #endifs to match their #ifs.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
----

 hash_utils_64.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-17 21:11:19 +10:00
Benjamin Herrenschmidt
017e3c53f1 [POWERPC] Add spinlock to request_phb_iospace()
request_phb_iospace() can be called from different CPUs at init
time (at least with my next patch) and thus needs a spinlock.
As for the next patch, this is a temporary workaround for 2.6.22
issues until my rewrite of IO mappings is ready (for 2.6.23)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-17 21:11:14 +10:00
Kumar Gala
991eb43af9 [POWERPC] Fix COMMON symbol warnings
We get the following warnings in various ARCH=powerpc builds:

WARNING: "ee_restarts" [arch/powerpc/kernel/built-in] is COMMON symbol
WARNING: "fee_restarts" [arch/powerpc/kernel/built-in] is COMMON symbol
WARNING: "htab_hash_searches" [arch/powerpc/mm/built-in] is COMMON symbol
WARNING: "next_slot" [arch/powerpc/mm/built-in] is COMMON symbol
WARNING: "mmu_hash_lock" [arch/powerpc/mm/built-in] is COMMON symbol
WARNING: "primary_pteg_full" [arch/powerpc/mm/built-in] is COMMON symbol
WARNING: "global_dbcr0" [arch/powerpc/kernel/built-in] is COMMON symbol

Switch to moving local symbols (except mmu_hash_lock which is global) and
space directive instead.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2007-05-17 21:10:15 +10:00
Stephen Rothwell
8980ae8677 [POWERPC] Remove unused variable in hpte_decode()
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-12 11:32:47 +10:00
Stephen Rothwell
0c12fe5697 [POWERPC] Assign correct variable in hpte_decode()
This case will never be hit, but it should be corrected anyway.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-12 11:32:47 +10:00
Paul Mackerras
2454c7e98c [POWERPC] Fix warning in hpte_decode(), and generalize it
This adds the necessary support to hpte_decode() to handle 1TB
segments and 16GB pages, and removes an uninitialized value
warning on avpn.

We don't have any code to generate HPTEs for 1TB segments or 16GB
pages yet, so this is mostly for completeness, and to fix the
warning.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2007-05-10 21:28:13 +10:00
Linus Torvalds
aabded9c3a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Further fixes for the removal of 4level-fixup hack from ppc32
  [POWERPC] EEH: log all PCI-X and PCI-E AER registers
  [POWERPC] EEH: capture and log pci state on error
  [POWERPC] EEH: Split up long error msg
  [POWERPC] EEH: log error only after driver notification.
  [POWERPC] fsl_soc: Make mac_addr const in fs_enet_of_init().
  [POWERPC] Don't use SLAB/SLUB for PTE pages
  [POWERPC] Spufs support for 64K LS mappings on 4K kernels
  [POWERPC] Add ability to 4K kernel to hash in 64K pages
  [POWERPC] Introduce address space "slices"
  [POWERPC] Small fixes & cleanups in segment page size demotion
  [POWERPC] iSeries: Make HVC_ISERIES the default
  [POWERPC] iSeries: suppress build warning in lparmap.c
  [POWERPC] Mark pages that don't exist as nosave
  [POWERPC] swsusp: Introduce register_nosave_region_late
2007-05-09 12:56:01 -07:00
Rafael J. Wysocki
8bb7844286 Add suspend-related notifications for CPU hotplug
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress.  This
patch introduces such notifications and causes them to be used during
suspend and resume transitions.  It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
David Gibson
f1a1eb299a [POWERPC] Further fixes for the removal of 4level-fixup hack from ppc32
Commit d1953c8888 removed the use of
4level-fixup.h for 32-bit systems under arch/powerpc.  However, I
missed a few things activated on some configurations, resulting in
some warnings (at least with STRICT_MM_TYPECHECKS enabled) and build
errors in some circumstances.  This fixes it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:35:01 +10:00
Hugh Dickins
517e22638c [POWERPC] Don't use SLAB/SLUB for PTE pages
The SLUB allocator relies on struct page fields first_page and slab,
overwritten by ptl when SPLIT_PTLOCK: so the SLUB allocator cannot then
be used for the lowest level of pagetable pages.  This was obstructing
SLUB on PowerPC, which uses kmem_caches for its pagetables.  So convert
its pte level to use normal gfp pages (whereas pmd, pud and 64k-page pgd
want partpages, so continue to use kmem_caches for pmd, pud and pgd).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:35:00 +10:00
Benjamin Herrenschmidt
16c2d47623 [POWERPC] Add ability to 4K kernel to hash in 64K pages
This adds the ability for a kernel compiled with 4K page size
to have special slices containing 64K pages and hash the right type
of hash PTEs.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:35:00 +10:00
Benjamin Herrenschmidt
d0f13e3c20 [POWERPC] Introduce address space "slices"
The basic issue is to be able to do what hugetlbfs does but with
different page sizes for some other special filesystems; more
specifically, my need is:

 - Huge pages

 - SPE local store mappings using 64K pages on a 4K base page size
kernel on Cell

 - Some special 4K segments in 64K-page kernels for mapping a dodgy
type of powerpc-specific infiniband hardware that requires 4K MMU
mappings for various reasons I won't explain here.

The main issues are:

 - To maintain/keep track of the page size per "segment" (as we can
only have one page size per segment on powerpc, which are 256MB
divisions of the address space).

 - To make sure special mappings stay within their allotted
"segments" (including MAP_FIXED crap)

 - To make sure everybody else doesn't mmap/brk/grow_stack into a
"segment" that is used for a special mapping

Some of the necessary mechanisms to handle that were present in the
hugetlbfs code, but mostly in ways not suitable for anything else.

The patch relies on some changes to the generic get_unmapped_area()
that just got merged.  It still hijacks hugetlb callbacks here or
there as the generic code hasn't been entirely cleaned up yet but
that shouldn't be a problem.

So what is a slice ?  Well, I re-used the mechanism used formerly by our
hugetlbfs implementation which divides the address space in
"meta-segments" which I called "slices".  The division is done using
256MB slices below 4G, and 1T slices above.  Thus the address space is
divided currently into 16 "low" slices and 16 "high" slices.  (Special
case: high slice 0 is the area between 4G and 1T).

Doing so simplifies significantly the tracking of segments and avoids
having to keep track of all the 256MB segments in the address space.

While I used the "concepts" of hugetlbfs, I mostly re-implemented
everything in a more generic way and "ported" hugetlbfs to it.

Slices can have an associated page size, which is encoded in the mmu
context and used by the SLB miss handler to set the segment sizes.  The
hash code currently doesn't care, it has a specific check for hugepages,
though I might add a mechanism to provide per-slice hash mapping
functions in the future.

The slice code provide a pair of "generic" get_unmapped_area() (bottomup
and topdown) functions that should work with any slice size.  There is
some trickiness here so I would appreciate people to have a look at the
implementation of these and let me know if I got something wrong.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:35:00 +10:00
Benjamin Herrenschmidt
16f1c74675 [POWERPC] Small fixes & cleanups in segment page size demotion
The code for demoting segments to 4K had some issues, like for example,
when using _PAGE_4K_PFN flag, the first CPU to hit it would do the
demotion, but other CPUs hitting the same page wouldn't properly flush
their SLBs if mmu_ci_restriction isn't set.  There are also potential
issues with hash_preload not handling _PAGE_4K_PFN.  All of these are
non issues on current hardware but might bite us in the future.

This patch thus fixes it by:

 - Taking the test comparing the mm and current CPU context page
sizes to decide to flush SLBs out of the mmu_ci_restrictions test
since that can also be triggered by _PAGE_4K_PFN pages

 - Due to the above being done all the time, demote_segment_4k
doesn't need update the context and flush the SLB

 - demote_segment_4k can be static and doesn't need an EXPORT_SYMBOL

 - Making hash_preload ignore anything that has either _PAGE_4K_PFN
or _PAGE_NO_CACHE set, thus avoiding duplication of the complicated
logic in hash_page() (and possibly making hash_preload a little bit
faster for the normal case).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:35:00 +10:00
Johannes Berg
4e8ad3e816 [POWERPC] Mark pages that don't exist as nosave
On some powerpc architectures (notably 64-bit powermac) there is a memory
hole, for example on powermacs between 2G and 4G. Since we use the flat
memory model regardless, these pages must be marked as nosave (for suspend
to disk.)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:34:56 +10:00
Linus Torvalds
df6d3916f3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (77 commits)
  [POWERPC] Abolish powerpc_flash_init()
  [POWERPC] Early serial debug support for PPC44x
  [POWERPC] Support for the Ebony 440GP reference board in arch/powerpc
  [POWERPC] Add device tree for Ebony
  [POWERPC] Add powerpc/platforms/44x, disable platforms/4xx for now
  [POWERPC] MPIC U3/U4 MSI backend
  [POWERPC] MPIC MSI allocator
  [POWERPC] Enable MSI mappings for MPIC
  [POWERPC] Tell Phyp we support MSI
  [POWERPC] RTAS MSI implementation
  [POWERPC] PowerPC MSI infrastructure
  [POWERPC] Rip out the existing powerpc msi stubs
  [POWERPC] Remove use of 4level-fixup.h for ppc32
  [POWERPC] Add powerpc PCI-E reset API implementation
  [POWERPC] Holly bootwrapper
  [POWERPC] Holly DTS
  [POWERPC] Holly defconfig
  [POWERPC] Add support for 750CL Holly board
  [POWERPC] Generalize tsi108 PCI setup
  [POWERPC] Generalize tsi108 PHY types
  ...

Fixed conflict in include/asm-powerpc/kdebug.h manually

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:50:19 -07:00
Randy Dunlap
e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Christoph Hellwig
1eeb66a1bb move die notifier handling to common code
This patch moves the die notifier handling to common code.  Previous
various architectures had exactly the same code for it.  Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)

arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at.  avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:04 -07:00
Akinobu Mita
0e6b9c98be use SLAB_PANIC flag cleanup
Use SLAB_PANIC and delete duplicated panic().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Ian Molton <spyro@f2s.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
David Gibson
d1953c8888 [POWERPC] Remove use of 4level-fixup.h for ppc32
For 32-bit systems, powerpc still relies on the 4level-fixup.h hack,
to pretend that the generic pagetable handling stuff is 3-levels
rather than 4.  This patch removes this, instead using the newer
pgtable-nopmd.h to handle the elision of both the pud and pmd
pagetable levels (ppc32 pagetables are actually 2 levels).

This removes a little extraneous code, and makes it more easily
compared to the 64-bit pagetable code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-08 13:40:31 +10:00
Paul Mackerras
02bbc0f09c Merge branch 'linux-2.6' 2007-05-08 13:37:51 +10:00
Michael Ellerman
0108d3fe3c [POWERPC] Add __init annotations to reserve_mem() and stabs_alloc()
reserve_mem() and stabs_alloc() are both called only from other __init
routines, so can be marked __init.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-08 11:54:19 +10:00
Benjamin Herrenschmidt
d506a77251 get_unmapped_area handles MAP_FIXED on powerpc
The current get_unmapped_area code calls the f_ops->get_unmapped_area or the
arch one (via the mm) only when MAP_FIXED is not passed.  That makes it
impossible for archs to impose proper constraints on regions of the virtual
address space.  To work around that, get_unmapped_area() then calls some
hugetlbfs specific hacks.

This cause several problems, among others:

- It makes it impossible for a driver or filesystem to do the same thing
  that hugetlbfs does (for example, to allow a driver to use larger page sizes
  to map external hardware) if that requires applying a constraint on the
  addresses (constraining that mapping in certain regions and other mappings
  out of those regions).

- Some archs like arm, mips, sparc, sparc64, sh and sh64 already want
  MAP_FIXED to be passed down in order to deal with aliasing issues.  The code
  is there to handle it...  but is never called.

This series of patches moves the logic to handle MAP_FIXED down to the various
arch/driver get_unmapped_area() implementations, and then changes the generic
code to always call them.  The hugetlbfs hacks then disappear from the generic
code.

Since I need to do some special 64K pages mappings for SPEs on cell, I need to
work around the first problem at least.  I have further patches thus
implementing a "slices" layer that handles multiple page sizes through slices
of the address space for use by hugetlbfs, the SPE code, and possibly others,
but it requires that serie of patches first/

There is still a potential (but not practical) issue due to the fact that
filesystems/drivers implemeting g_u_a will effectively bypass all arch checks.
 This is not an issue in practice as the only filesystems/drivers using that
hook are doing so for arch specific purposes in the first place.

There is also a problem with mremap that will completely bypass all arch
checks.  I'll try to address that separately, I'm not 100% certain yet how,
possibly by making it not work when the vma has a file whose f_ops has a
get_unmapped_area callback, and by making it use is_hugepage_only_range()
before expanding into a new area.

Also, I want to turn is_hugepage_only_range() into a more generic
is_normal_page_range() as that's really what it will end up meaning when used
in stack grow, brk grow and mremap.

None of the above "issues" however are introduced by this patch, they are
already there, so I think the patch can go ini for 2.6.22.

This patch:

Handle MAP_FIXED in powerpc's arch_get_unmapped_area() in all 3
implementations of it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:55 -07:00
Christoph Lameter
f0f3980b21 slab allocators: remove multiple alignment specifications
It is not necessary to tell the slab allocators to align to a cacheline
if an explicit alignment was already specified. It is rather confusing
to specify multiple alignments.

Make sure that the call sites only use one form of alignment.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:55 -07:00
Christoph Lameter
5af6083990 slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN
This patch was recently posted to lkml and acked by Pekka.

The flag SLAB_MUST_HWCACHE_ALIGN is

1. Never checked by SLAB at all.

2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB

3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.

The only remaining use is in sparc64 and ppc64 and their use there
reflects some earlier role that the slab flag once may have had. If
its specified then SLAB_HWCACHE_ALIGN is also specified.

The flag is confusing, inconsistent and has no purpose.

Remove it.

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:55 -07:00
Luke Browning
71bf08b6c0 [POWERPC] 64K page support for kexec
This fixes a couple of kexec problems related to 64K page
support in the kernel.  kexec issues a tlbie for each pte.  The
parameters for the tlbie are the page size and the virtual address.
Support was missing for the computation of these two parameters
for 64K pages.  This adds that support.

Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-07 20:31:12 +10:00
Christoph Hellwig
9f90b997de [POWERPC] Minor fault path optimization
Call the kprobes pagefault handler directly instead of going through
the complex notifier chain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-02 20:57:39 +10:00
David Gibson
57d7909e0d [POWERPC] Revise PPC44x MMU code for arch/powerpc
This patch takes the definitions for the PPC44x MMU (a software loaded
TLB) from asm-ppc/mmu.h, cleans them up of things no longer necessary
in arch/powerpc and puts them in a new asm-powerpc/mmu_44x.h file.  It
also substantially simplifies arch/powerpc/mm/44x_mmu.c and makes a
couple of small fixes necessary for the 44x MMU code to build and work
properly in arch/powerpc.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-02 20:04:29 +10:00
Michael Ellerman
ed16669298 [POWERPC] Initialise spinlock in the DEBUG_PAGEALLOC code
Fixes:

BUG: spinlock bad magic on CPU#0, swapper/0
 lock: c00000000064ec30, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
Call Trace:
[c00000000062b980] [c00000000000f920] .show_stack+0x6c/0x1a0 (unreliable)
[c00000000062ba20] [c0000000001c2b40] .spin_bug+0xb0/0xd4
[c00000000062bab0] [c0000000001c2ed0] ._raw_spin_lock+0x44/0x184
[c00000000062bb50] [c0000000003a42b4] ._spin_lock+0x10/0x24
[c00000000062bbd0] [c00000000002b4dc] .kernel_map_pages+0x198/0x278
[c00000000062bc90] [c000000000079720] .free_hot_cold_page+0x124/0x418
[c00000000062bd70] [c000000000530278] .free_all_bootmem_core+0x14c/0x224
[c00000000062be50] [c00000000052a178] .mem_init+0x68/0x170
[c00000000062bee0] [c00000000051d874] .start_kernel+0x2a0/0x37c
[c00000000062bf90] [c0000000000084c8] .start_here_common+0x54/0x8c

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-02 20:04:29 +10:00
Johannes Berg
a3cf4bdef0 [POWERPC] Remove unneeded page_is_ram export
arch/powerpc/mm/mem.c exports page_is_ram, which is not used anywhere
that could be modular.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-02 16:40:57 +10:00
David Gibson
37f01d64d8 [POWERPC] Abolish PHYS_FMT macro from arch/powerpc
32-bit powerpc systems define a macro, PHYS_FMT, giving a printf
format string fragment for displaying physical addresses, since most
32-bit powerpc platforms use 32-bit physical addresses but a few use
64-bit physical addresses.

This macro is used in exactly one place, a rare error message, where
we can solve the problem more simply by just unconditionally casting
the address up to 64-bit quantity before formatting it.

This patch does so, meaning that as we bring MMU definitions from
asm-ppc over to asm-powerpc, cleaning them up in the process, we don't
need to implement this ugly macro (which additionally has a very bad
name for something global).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-24 22:11:16 +10:00
David Gibson
6210230725 [POWERPC] Cleanup and fix breakage in tlbflush.h
BenH's commit a741e67969 in powerpc.git,
although (AFAICT) only intended to affect ppc64, also has side-effects
which break 44x.  I think 40x, 8xx and Freescale Book E are also
affected, though I haven't tested them.

The problem lies in unconditionally removing flush_tlb_pending() from
the versions of flush_tlb_mm(), flush_tlb_range() and
flush_tlb_kernel_range() used on ppc64 - which are also used the
embedded platforms mentioned above.

The patch below cleans up the convoluted #ifdef logic in tlbflush.h,
in the process restoring the necessary flushes for the software TLB
platforms.  There are three sets of definitions for the flushing
hooks: the software TLB versions (revised to avoid using names which
appear to related to TLB batching), the 32-bit hash based versions
(external functions) amd the 64-bit hash based versions (which
implement batching).

It also moves the declaration of update_mmu_cache() to always be in
tlbflush.h (previously it was in tlbflush.h except for PPC64, where it
was in pgtable.h).

Booted on Ebony (440GP) and compiled for 64-bit and 32-bit
multiplatform.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-24 22:08:56 +10:00
Benjamin Herrenschmidt
370a908db1 [POWERPC] DEBUG_PAGEALLOC for 64-bit
Here's an implementation of DEBUG_PAGEALLOC for 64 bits powerpc.
It applies on top of the 32 bits patch.

Unlike Anton's previous attempt, I'm not using updatepp. I'm removing
the hash entries from the bolted mapping (using a map in RAM of all the
slots). Expensive but it doesn't really matter, does it ? :-)

Memory hot-added doesn't benefit from this unless it's added at an
address that is below end_of_DRAM() as calculated at boot time.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

 arch/powerpc/Kconfig.debug      |    2
 arch/powerpc/mm/hash_utils_64.c |   84 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 82 insertions(+), 4 deletions(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 04:09:39 +10:00
Benjamin Herrenschmidt
88df6e90fa [POWERPC] DEBUG_PAGEALLOC for 32-bit
Here's an implementation of DEBUG_PAGEALLOC for ppc32. It disables BAT
mapping and is only tested with Hash table based processor though it
shouldn't be too hard to adapt it to others.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

 arch/powerpc/Kconfig.debug       |    9 ++++++
 arch/powerpc/mm/init_32.c        |    4 +++
 arch/powerpc/mm/pgtable_32.c     |   52 +++++++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/ppc_mmu_32.c     |    4 ++-
 include/asm-powerpc/cacheflush.h |    6 ++++
 5 files changed, 74 insertions(+), 1 deletion(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 04:09:39 +10:00
Benjamin Herrenschmidt
ee4f2ea486 [POWERPC] Fix 32-bit mm operations when not using BATs
On hash table based 32 bits powerpc's, the hash management code runs with
a big spinlock. It's thus important that it never causes itself a hash
fault. That code is generally safe (it does memory accesses in real mode
among other things) with the exception of the actual access to the code
itself. That is, the kernel text needs to be accessible without taking
a hash miss exceptions.

This is currently guaranteed by having a BAT register mapping part of the
linear mapping permanently, which includes the kernel text. But this is
not true if using the "nobats" kernel command line option (which can be
useful for debugging) and will not be true when using DEBUG_PAGEALLOC
implemented in a subsequent patch.

This patch fixes this by pre-faulting in the hash table pages that hit
the kernel text, and making sure we never evict such a page under hash
pressure.

Signed-off-by: Benjamin Herrenchmidt <benh@kernel.crashing.org>

 arch/powerpc/mm/hash_low_32.S |   22 ++++++++++++++++++++--
 arch/powerpc/mm/mem.c         |    3 ---
 arch/powerpc/mm/mmu_decl.h    |    4 ++++
 arch/powerpc/mm/pgtable_32.c  |   11 +++++++----
 4 files changed, 31 insertions(+), 9 deletions(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 04:09:39 +10:00
Benjamin Herrenschmidt
3be4e6990e [POWERPC] Cleanup 32-bit map_page
The 32 bits map_page() function is used internally by the mm code
for early mmu mappings and for ioremap. It should never be called
for an address that already has a valid PTE or hash entry, so we
add a BUG_ON for that and remove the useless flush_HPTE call.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

 arch/powerpc/mm/pgtable_32.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 04:09:39 +10:00
Benjamin Herrenschmidt
a741e67969 [POWERPC] Make tlb flush batch use lazy MMU mode
The current tlb flush code on powerpc 64 bits has a subtle race since we
lost the page table lock due to the possible faulting in of new PTEs
after a previous one has been removed but before the corresponding hash
entry has been evicted, which can leads to all sort of fatal problems.

This patch reworks the batch code completely. It doesn't use the mmu_gather
stuff anymore. Instead, we use the lazy mmu hooks that were added by the
paravirt code. They have the nice property that the enter/leave lazy mmu
mode pair is always fully contained by the PTE lock for a given range
of PTEs. Thus we can guarantee that all batches are flushed on a given
CPU before it drops that lock.

We also generalize batching for any PTE update that require a flush.

Batching is now enabled on a CPU by arch_enter_lazy_mmu_mode() and
disabled by arch_leave_lazy_mmu_mode(). The code epects that this is
always contained within a PTE lock section so no preemption can happen
and no PTE insertion in that range from another CPU. When batching
is enabled on a CPU, every PTE updates that need a hash flush will
use the batch for that flush.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 04:09:38 +10:00
Stephen Rothwell
e2eb63927b [POWERPC] Rename get_property to of_get_property: arch/powerpc
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 03:55:19 +10:00
Paul Mackerras
721151d004 [POWERPC] Allow drivers to map individual 4k pages to userspace
Some drivers have resources that they want to be able to map into
userspace that are 4k in size.  On a kernel configured with 64k pages
we currently end up mapping the 4k we want plus another 60k of
physical address space, which could contain anything.  This can
introduce security problems, for example in the case of an infiniband
adaptor where the other 60k could contain registers that some other
program is using for its communications.

This patch adds a new function, remap_4k_pfn, which drivers can use to
map a single 4k page to userspace regardless of whether the kernel is
using a 4k or a 64k page size.  Like remap_pfn_range, it would
typically be called in a driver's mmap function.  It only maps a
single 4k page, which on a 64k page kernel appears replicated 16 times
throughout a 64k page.  On a 4k page kernel it reduces to a call to
remap_pfn_range.

The way this works on a 64k kernel is that a new bit, _PAGE_4K_PFN,
gets set on the linux PTE.  This alters the way that __hash_page_4K
computes the real address to put in the HPTE.  The RPN field of the
linux PTE becomes the 4k RPN directly rather than being interpreted as
a 64k RPN.  Since the RPN field is 32 bits, this means that physical
addresses being mapped with remap_4k_pfn have to be below 2^44,
i.e. 0x100000000000.

The patch also factors out the code in arch/powerpc/mm/hash_utils_64.c
that deals with demoting a process to use 4k pages into one function
that gets called in the various different places where we need to do
that.  There were some discrepancies between exactly what was done in
the various places, such as a call to spu_flush_all_slbs in one case
but not in others.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 03:55:18 +10:00
Stephen Rothwell
9213feea6e [POWERPC] Rename prom_n_size_cells to of_n_size_cells
This is more consistent and gets us closer to the Sparc code.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 03:55:18 +10:00
Stephen Rothwell
a8bda5dd4f [POWERPC] Rename prom_n_addr_cells to of_n_addr_cells
This is more consistent and gets us closer to the Sparc code.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 03:55:18 +10:00
Paul Mackerras
e049d1ca30 Merge branch 'linux-2.6' into for-2.6.22 2007-04-13 03:50:03 +10:00
Benjamin Herrenschmidt
94b2a4393c [POWERPC] Fix spu SLB invalidations
The SPU code doesn't properly invalidate SPUs SLBs when necessary,
for example when changing a segment size from the hugetlbfs code. In
addition, it saves and restores the SLB content on context switches
which makes it harder to properly handle those invalidations.

This patch removes the saving & restoring for now, something more
efficient might be found later on. It also adds a spu_flush_all_slbs(mm)
that can be used by the core mm code to flush the SLBs of all SPEs that
are running a given mm at the time of the flush.

In order to do that, it adds a spinlock to the list of all SPEs and move
some bits & pieces from spufs to spu_base.c

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2007-03-10 00:07:50 +01:00
David Gibson
eb6de28637 [POWERPC] Allow duplicate lmb_reserve() calls
At present calling lmb_reserve() (and hence lmb_add_region()) twice
for exactly the same memory region will cause strange behaviour.

This makes life difficult when booting from a flat device tree with
memory reserve map.  Which regions are automatically reserved by the
kernel has changed over time, so it's quite possible a newer kernel
could attempt to auto-reserve a region which is also explicitly listed
in the device tree's reserve map, leading to trouble.

This patch avoids the problem by making lmb_reserve() ignore a call to
reserve a previously reserved region.  It also removes a now redundant
test designed to avoid one specific case of the problem noted above.

At present, this patch deals only with duplicate reservations of an
identical region.  Attempting to reserve two different, but
overlapping regions will still cause problems.  I might post another
patch later dealing with this case, but I'm avoiding it now since it
is substantially more complicated to deal with, less likely to occur
and more likely to indicate a genuine bug elsewhere if it does occur.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-08 15:43:28 +11:00
Linus Torvalds
874ff01bd9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
  Documentation/kernel-docs.txt update.
  arch/cris: typo in KERN_INFO
  Storage class should be before const qualifier
  kernel/printk.c: comment fix
  update I/O sched Kconfig help texts - CFQ is now default, not AS.
  Remove duplicate listing of Cris arch from README
  kbuild: more doc. cleanups
  doc: make doc. for maxcpus= more visible
  drivers/net/eexpress.c: remove duplicate comment
  add a help text for BLK_DEV_GENERIC
  correct a dead URL in the IP_MULTICAST help text
  fix the BAYCOM_SER_HDX help text
  fix SCSI_SCAN_ASYNC help text
  trivial documentation patch for platform.txt
  Fix typos concerning hierarchy
  Fix comment typo "spin_lock_irqrestore".
  Fix misspellings of "agressive".
  drivers/scsi/a100u2w.c: trivial typo patch
  Correct trivial typo in log2.h.
  Remove useless FIND_FIRST_BIT() macro from cardbus.c.
  ...
2007-02-19 13:29:02 -08:00
Uwe Kleine-König
1b3c3714cb Fix typos concerning hierarchy
heirarchical, hierachical -> hierarchical
        heirarchy, hierachy -> hierarchy

Signed-off-by: Uwe Kleine-König <zeisberg@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-17 19:23:03 +01:00
Benjamin Herrenschmidt
a32525449b [POWERPC] Fix bug with early ioremap and 64k pages
The code for bolting hash entries for ioremap done before proper
mm initialization has a grown a bug when using 64K pages on a
machine where non-cacheable mappings are demoted to 4K HW pages.
The wrong page size index is being passed to the hash table mapping
functions causing a crash at boot on some pSeries machines using
bare metal linux.  This fixes it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-16 14:00:20 +11:00
Benjamin Herrenschmidt
7ac9a13717 [POWERPC] Fix vDSO page count calculation
The recent vDSO consolidation patches broke powerpc due to a mistake
in the definition of MAXPAGES constants. This fixes it by moving to
a dynamically allocated array of pages instead as I don't like much
hard coded size limits. Also move the vdso initialisation to an initcall
since it doesn't really need to be done -that- early.

Applogies for not catching the breakage earlier, Roland _did_ CC me on
his patches a while ago, I got busy with other things and forgot to test
them.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-13 15:35:52 +11:00
Kumar Gala
8dabba5d1a [POWERPC] Fix is_power_of_4(x) compile error
When building an 85xx kernel we get:

  CC      arch/powerpc/mm/pgtable_32.o
arch/powerpc/mm/pgtable_32.c: In function 'io_block_mapping':
arch/powerpc/mm/pgtable_32.c:330: error: expected identifier before '(' token
arch/powerpc/mm/pgtable_32.c:330: error: expected statement before ')' token

The is_power_of_2(x) fixup patch left an extra ')' on the is_power_of_4 macro.
There is a similiar issue on the arch/ppc side.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2007-02-09 09:30:05 -06:00
Johannes Berg
bcff4948c6 [POWERPC] Remove bogus comment about page_is_ram
arch/powerpc/mm/mem.c states that page_is_ram is called by the code that
implements /dev/mem, which isn't true.  Remove the comment.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-08 16:08:43 +11:00
Robert P. J. Day
63c2f782e8 [POWERPC] Add "is_power_of_2" checking to log2.h.
Add the inline function "is_power_of_2()" to log2.h, where the value
zero is *not* considered to be a power of two.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-07 14:03:19 +11:00
Vitaly Bordug
dbbb06b7f6 [POWERPC] 8xx: platform specific mmu updates
This is just a straight port of the same done in arch/ppc
by Marcelo Tosatti. One used to be
[PATCH] ppc32 8xx: update_mmu_cache() needs unconditional tlbie,
commit eb07d964b4

In a nutshell, the board is nearly stuck without this, yet without any
visible failure - being just very slow.

Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-07 12:00:32 +11:00
Ishizaki Kou
d649bd7b76 [POWERPC] TLB insertion cleanup
This patch changes handling return value of ppc_md.hpte_insert() into
the same way as __hash_page_*().

Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-01-24 21:13:59 +11:00
David Gibson
6aa3e1e944 [POWERPC] Fix bogus BUG_ON() in in hugetlb_get_unmapped_area()
The powerpc specific version of hugetlb_get_unmapped_area() makes some
unwarranted assumptions about what checks have been made to its
parameters by its callers.  This will lead to a BUG_ON() if a 32-bit
process attempts to make a hugepage mapping which extends above
TASK_SIZE (4GB).

I'm not sure if these assumptions came about because they were valid
with earlier versions of the get_unmapped_area() path, or if it was
always broken.  Nonetheless this patch fixes the logic, and removes
the crash.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-01-09 17:03:01 +11:00
Robert P. J. Day
5cbded585d [PATCH] getting rid of all casts of k[cmz]alloc() calls
Run this:

	#!/bin/sh
	for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
	  echo "De-casting $f..."
	  perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
	done

And then go through and reinstate those cases where code is casting pointers
to non-pointers.

And then drop a few hunks which conflicted with outstanding work.

Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:58 -08:00
Paul Mackerras
0204568a08 [POWERPC] Support ibm,dynamic-reconfiguration-memory nodes
For PAPR partitions with large amounts of memory, the firmware has an
alternative, more compact representation for the information about the
memory in the partition and its NUMA associativity information.  This
adds the code to the kernel to parse this alternative representation.

The other part of this patch is telling the firmware that we can
handle the alternative representation.  There is however a subtlety
here, because the firmware will invoke a reboot if the memory
representation we request is different from the representation that
firmware is currently using.  This is because firmware can't change
the representation on the fly.  Further, some firmware versions used
on POWER5+ machines have a bug where this reboot leaves the machine
with an altered value of load-base, which will prevent any kernel
booting until it is reset to the normal value (0x4000).  Because of
this bug, we do NOT set fake_elf.rpanote.new_mem_def = 1, and thus we
do not request the new representation on POWER5+ and earlier machines.
We do request the new representation on POWER6, which uses the
ibm,client-architecture-support call.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-11 13:49:49 +11:00
Christoph Lameter
e18b890bb0 [PATCH] slab: remove kmem_cache_t
Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Chen, Kenneth W
39dde65c99 [PATCH] shared page table for hugetlb page
Following up with the work on shared page table done by Dave McCracken.  This
set of patch target shared page table for hugetlb memory only.

The shared page table is particular useful in the situation of large number of
independent processes sharing large shared memory segments.  In the normal
page case, the amount of memory saved from process' page table is quite
significant.  For hugetlb, the saving on page table memory is not the primary
objective (as hugetlb itself already cuts down page table overhead
significantly), instead, the purpose of using shared page table on hugetlb is
to allow faster TLB refill and smaller cache pollution upon TLB miss.

With PT sharing, pte entries are shared among hundreds of processes, the cache
consumption used by all the page table is smaller and in return, application
gets much higher cache hit ratio.  One other effect is that cache hit ratio
with hardware page walker hitting on pte in cache will be higher and this
helps to reduce tlb miss latency.  These two effects contribute to higher
application performance.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Dave McCracken <dmccr@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:21 -08:00
Stephen Rothwell
0470466dba [POWERPC] Fix cputable.h for combined build
Remove CPU_FTR_16M_PAGE from the cupfeatures mask at runtime on iSeries.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-04 20:41:59 +11:00
Arnd Bergmann
e22ba7e381 [POWERPC] ps3: multiplatform build fixes
A few code paths need to check whether or not they are running
on the PS3's LV1 hypervisor before making hcalls. This introduces
a new firmware feature bit for this, FW_FEATURE_PS3_LV1.

Now when both PS3 and IBM_CELL_BLADE are enabled, but not PSERIES,
FW_FEATURE_PS3_LV1 and FW_FEATURE_LPAR get enabled at compile time,
which is a bug. The same problem can also happen for (PPC_ISERIES &&
!PPC_PSERIES && PPC_SOMETHING_ELSE). In order to solve this, I
introduce a new CONFIG_PPC_NATIVE option that is set when at least
one platform is selected that can run without a hypervisor and then
turns the firmware feature check into a run-time option.

The new cell oprofile support that was recently merged does not
work on hypervisor based platforms like the PS3, therefore make
it depend on PPC_CELL_NATIVE instead of PPC_CELL. This may change
if we get oprofile support for PS3.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
2006-12-04 20:41:16 +11:00
Geert Uytterhoeven
adaa3a7962 [POWERPC] setup_kcore(): Fix incorrect function name in panic() call.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-04 20:39:39 +11:00
Stephen Rothwell
56291e19e3 [POWERPC] iSeries: fix slb.c for combined build
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-04 20:39:19 +11:00
Benjamin Herrenschmidt
68a64357d1 [POWERPC] Merge 32 and 64 bits asm-powerpc/io.h
powerpc: Merge 32 and 64 bits asm-powerpc/io.h

The rework on io.h done for the new hookable accessors made it easier,
so I just finished the work and merged 32 and 64 bits io.h for arch/powerpc.

arch/ppc still uses the old version in asm-ppc, there is just too much gunk
in there that I really can't be bothered trying to cleanup.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-04 20:39:05 +11:00
Benjamin Herrenschmidt
3d1ea8e8cb [POWERPC] Remove ioremap64 and fixup_bigphys_addr
In order to suppose platforms with devices above 4Gb on 32 bits platforms
with a >32 bits physical address space, we used to have a special ioremap64
along with a fixup routine fixup_bigphys_addr.

This shouldn't be necessary anymore as struct resource now supports 64 bits
addresses even on 32 bits archs. This patch enables that option when
CONFIG_PHYS_64BIT is set and removes ioremap64 and fixup_bigphys_addr.

This is a preliminary work for the upcoming merge of 32 and 64 bits io.h

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-04 20:39:04 +11:00
Benjamin Herrenschmidt
4cb3cee03d [POWERPC] Allow hooking of PCI MMIO & PIO accessors on 64 bits
This patch reworks the way iSeries hooks on PCI IO operations (both MMIO
and PIO) and provides a generic way for other platforms to do so (we
have need to do that for various other platforms).

While reworking the IO ops, I ended up doing some spring cleaning in
io.h and eeh.h which I might want to split into 2 or 3 patches (among
others, eeh.h had a lot of useless stuff in it).

A side effect is that EEH for PIO should work now (it used to pass IO
ports down to the eeh address check functions which is bogus).

Also, new are MMIO "repeat" ops, which other archs like ARM already had,
and that we have too now: readsb, readsw, readsl, writesb, writesw,
writesl.

In the long run, I might also make EEH use the hooks instead
of wrapping at the toplevel, which would make things even cleaner and
relegate EEH completely in platforms/iseries, but we have to measure the
performance impact there (though it's really only on MMIO reads)

Since I also need to hook on ioremap, I shuffled the functions a bit
there. I introduced ioremap_flags() to use by drivers who want to pass
explicit flags to ioremap (and it can be hooked). The old __ioremap() is
still there as a low level and cannot be hooked, thus drivers who use it
should migrate unless they know they want the low level version.

The patch "arch provides generic iomap missing accessors" (should be
number 4 in this series) is a pre-requisite to provide full iomap
API support with this patch.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-04 20:38:52 +11:00
Paul Mackerras
79acbb3ff2 Merge branch 'linux-2.6' into for-linus 2006-12-04 15:59:07 +11:00
Hugh Dickins
68589bc353 [PATCH] hugetlb: prepare_hugepage_range check offset too
(David:)

If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
because the given file offset is not hugepage aligned - then do_mmap_pgoff
will go to the unmap_and_free_vma backout path.

But at this stage the vma hasn't been marked as hugepage, and the backout path
will call unmap_region() on it.  That will eventually call down to the
non-hugepage version of unmap_page_range().  On ppc64, at least, that will
cause serious problems if there are any existing hugepage pagetable entries in
the vicinity - for example if there are any other hugepage mappings under the
same PUD.  unmap_page_range() will trigger a bad_pud() on the hugepage pud
entries.  I suspect this will also cause bad problems on ia64, though I don't
have a machine to test it on.

(Hugh:)

prepare_hugepage_range() should check file offset alignment when it checks
virtual address and length, to stop MAP_FIXED with a bad huge offset from
unmapping before it fails further down.  PowerPC should apply the same
prepare_hugepage_range alignment checks as ia64 and all the others do.

Then none of the alignment checks in hugetlbfs_file_mmap are required (nor
is the check for too small a mapping); but even so, move up setting of
VM_HUGETLB and add a comment to warn of what David Gibson discovered - if
hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
when unwinding from error will go the non-huge way, which may cause bad
behaviour on architectures (powerpc and ia64) which segregate their huge
mappings into a separate region of the address space.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-14 09:09:27 -08:00
Michael Ellerman
a416dd8d9c [PATCH] Do a single one-line printk in bad_page_fault()
bad_page_fault() prints a message telling the user what type of bad
fault we took. The first line of this message is currently implemented
as two separate printks. This has the unfortunate effect that if
several cpus simultaneously take a bad fault, the first and second parts
of the printk get jumbled up, which looks dodge and is hard to read.

So do a single one-line printk for each fault type.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-11-13 14:48:56 +11:00
Hugh Dickins
96268889ee [POWERPC] Make high hugepage areas preempt safe
Checking source for other get_paca()->field preemption dangers found that
open_high_hpage_areas does a structure copy into its paca while preemption
is enabled: unsafe however gcc accomplishes it.  Just remove that copy:
it's done safely afterwards by on_each_cpu, as in open_low_hpage_areas.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-11-01 14:52:48 +11:00
Geoff Levand
035223fb28 [POWERPC] Make pSeries_lpar_hpte_insert static
Change the powerpc hpte_insert routines now called through ppc_md to
static scope.

Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-10-16 16:33:04 +10:00
Mel Gorman
6391af174a [PATCH] mm: use symbolic names instead of indices for zone initialisation
Arch-independent zone-sizing is using indices instead of symbolic names to
offset within an array related to zones (max_zone_pfns).  The unintended
impact is that ZONE_DMA and ZONE_NORMAL is initialised on powerpc instead
of ZONE_DMA and ZONE_HIGHMEM when CONFIG_HIGHMEM is set.  As a result, the
the machine fails to boot but will boot with CONFIG_HIGHMEM turned off.

The following patch properly initialises the max_zone_pfns[] array and uses
symbolic names instead of indices in each architecture using
arch-independent zone-sizing.  Two users have successfully booted their
powerpcs with it (one an ibook G4).  It has also been boot tested on x86,
x86_64, ppc64 and ia64.  Please merge for 2.6.19-rc2.

Credit to Benjamin Herrenschmidt for identifying the bug and rolling the
first fix.  Additional credit to Johannes Berg and Andreas Schwab for
reporting the problem and testing on powerpc.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11 11:14:14 -07:00
Paul Mackerras
c730f5b621 Merge branch 'master' of git://oak/home/sfr/kernels/iseries/work 2006-10-04 15:02:27 +10:00
Stephen Rothwell
3f639ee8c5 [POWERPC] implement BEGIN/END_FW_FTR_SECTION
and use it an all the obvious places in assembler code.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2006-10-03 16:50:21 +10:00
Sukadev Bhattiprolu
f400e198b2 [PATCH] pidspace: is_init()
This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280).  It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().

Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.

Eric's original description:

	There are a lot of places in the kernel where we test for init
	because we give it special properties.  Most  significantly init
	must not die.  This results in code all over the kernel test
	->pid == 1.

	Introduce is_init to capture this case.

	With multiple pid spaces for all of the cases affected we are
	looking for only the first process on the system, not some other
	process that has pid == 1.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:12 -07:00
Jason Baron
df67b3daea [PATCH] make PROT_WRITE imply PROT_READ
Make PROT_WRITE imply PROT_READ for a number of architectures which don't
support write only in hardware.

While looking at this, I noticed that some architectures which do not
support write only mappings already take the exact same approach.  For
example, in arch/alpha/mm/fault.c:

"
        if (cause < 0) {
                if (!(vma->vm_flags & VM_EXEC))
                        goto bad_area;
        } else if (!cause) {
                /* Allow reads even for write-only mappings */
                if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                        goto bad_area;
        } else {
                if (!(vma->vm_flags & VM_WRITE))
                        goto bad_area;
        }
"

Thus, this patch brings other architectures which do not support write only
mappings in-line and consistent with the rest.  I've verified the patch on
ia64, x86_64 and x86.

Additional discussion:

Several architectures, including x86, can not support write-only mappings.
The pte for x86 reserves a single bit for protection and its two states are
read only or read/write.  Thus, write only is not supported in h/w.

Currently, if i 'mmap' a page write-only, the first read attempt on that page
creates a page fault and will SEGV.  That check is enforced in
arch/blah/mm/fault.c.  However, if i first write that page it will fault in
and the pte will be set to read/write.  Thus, any subsequent reads to the page
will succeed.  It is this inconsistency in behavior that this patch is
attempting to address.  Furthermore, if the page is swapped out, and then
brought back the first read will also cause a SEGV.  Thus, any arbitrary read
on a page can potentially result in a SEGV.

According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
implementation may also allow read access." Also as mentioned, some
archtectures, such as alpha, shown above already take the approach that i am
suggesting.

The counter-argument to this raised by Arjan, is that the kernel is enforcing
the write only mapping the best it can given the h/w limitations.  This is
true, however Alan Cox, and myself would argue that the inconsitency in
behavior, that is applications can sometimes work/sometimes fails is highly
undesireable.  If you read through the thread, i think people, came to an
agreement on the last patch i posted, as nobody has objected to it...

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:05 -07:00
Mel Gorman
c67c3cb4c9 [PATCH] Have Power use add_active_range() and free_area_init_nodes()
Size zones and holes in an architecture independent manner for Power.

[judith@osdl.org: build fix]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Stephen Rothwell
5e203d6862 [POWERPC] fix ioremap for a combined kernel
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2006-09-25 13:36:31 +10:00
Paul Mackerras
ea0763a7e6 Merge branch 'merge' 2006-08-25 14:56:07 +10:00
Matt Porter
054389f114 [POWERPC] Fix powerpc 44x_mmu build
The PIN_SIZE definition name changed, update 44x_mmu.c accordingly.

Signed-off-by: Matt Porter <mporter@embeddedalley.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-25 13:41:41 +10:00
Adam Litke
c9169f8747 [POWERPC] hugepage BUG fix
On Tue, 2006-08-15 at 08:22 -0700, Dave Hansen wrote:
> kernel BUG in cache_free_debugcheck at mm/slab.c:2748!

Alright, this one is only triggered when slab debugging is enabled.  The
slabs are assumed to be aligned on a HUGEPTE_TABLE_SIZE boundary.  The free
path makes use of this assumption and uses the lowest nibble to pass around
an index into an array of kmem_cache pointers.  With slab debugging turned
on, the slab is still aligned, but the "working" object pointer is not.
This would break the assumption above that a full nibble is available for
the PGF_CACHENUM_MASK.

The following patch reduces PGF_CACHENUM_MASK to cover only the two least
significant bits, which is enough to cover the current number of 4 pgtable
cache types.  Then use this constant to mask out the appropriate part of
the huge pte pointer.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-24 10:07:23 +10:00
Michael Neuling
2f6093c847 [POWERPC] Implement SLB shadow buffer
This adds a shadow buffer for the SLBs and regsiters it with PHYP.
Only the bolted SLB entries (top 3) are shadowed.

The SLB shadow buffer tells the hypervisor what the kernel needs to
have in the SLB for the kernel to be able to function.  The hypervisor
can use this information to speed up partition context switches.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-08 17:08:56 +10:00
Matt Porter
452b5e2121 [POWERPC] Fix powerpc 44x_mmu build
The PIN_SIZE definition name changed, update 44x_mmu.c accordingly.

Signed-off-by: Matt Porter <mporter@embeddedalley.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-08 17:07:08 +10:00
Jeremy Kerr
a7f67bdf2c [POWERPC] Constify & voidify get_property()
Now that get_property() returns a void *, there's no need to cast its
return value. Also, treat the return value as const, so we can
constify get_property later.

powerpc core changes.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-31 15:55:04 +10:00
Michael Ellerman
30f30e1305 [POWERPC] Fix mem= handling when the memory limit is > RMO size
There's a bug in my cleaned up mem= handling, if the memory limit is
larger than the RMO size we'll erroneously enlarge the RMO size.

Fix is to only change the RMO size if the memory limit is less than
the current RMO value.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-26 01:28:24 +10:00
Michael Ellerman
2d69ff32eb [POWERPC] Fix a compiler warning in mm/tlb_64.c
The compiler doesn't understand that BUG() never returns, so complains that
psize isn't set. Just set it to the normal value, which seems to produce nice
code and keeps gcc happy.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2006-07-13 18:43:25 +10:00
Michael Ellerman
e7c1f69d4f [POWERPC] Fix mem= handling when the memory limit is > RMO size
There's a bug in my cleaned up mem= handling, if the memory limit is
larger than the RMO size we'll erroneously enlarge the RMO size.

Fix is to only change the RMO size if the memory limit is less than
the current RMO value.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-07 20:19:16 +10:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Linus Torvalds
3aa590c6b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (43 commits)
  [POWERPC] Use little-endian bit from firmware ibm,pa-features property
  [POWERPC] Make sure smp_processor_id works very early in boot
  [POWERPC] U4 DART improvements
  [POWERPC] todc: add support for Time-Of-Day-Clock
  [POWERPC] Make lparcfg.c work when both iseries and pseries are selected
  [POWERPC] Fix idr locking in init_new_context
  [POWERPC] mpc7448hpc2 (taiga) board config file
  [POWERPC] Add tsi108 pci and platform device data register function
  [POWERPC] Add general support for mpc7448hpc2 (Taiga) platform
  [POWERPC] Correct the MAX_CONTEXT definition
  powerpc: minor cleanups for mpc86xx
  [POWERPC] Make sure we select CONFIG_NEW_LEDS if ADB_PMU_LED is set
  [POWERPC] Simplify the code defining the 64-bit CPU features
  [POWERPC] powerpc: kconfig warning fix
  [POWERPC] Consolidate some of kernel/misc*.S
  [POWERPC] Remove unused function call_with_mmu_off
  [POWERPC] update asm-powerpc/time.h
  [POWERPC] Clean up it_lp_queue.h
  [POWERPC] Skip the "copy down" of the kernel if it is already at zero.
  [POWERPC] Add the use of the firmware soft-reset-nmi to kdump.
  ...
2006-06-29 11:32:34 -07:00
Sonny Rao
f86c9747fe [POWERPC] Fix idr locking in init_new_context
We always need to serialize accesses to mmu_context_idr.

I hit this bug when testing with a small number of mmu contexts.

Signed-off-by: Sonny Rao <sonny@burdell.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-29 16:22:46 +10:00
Michael Ellerman
c30a4df3f1 [POWERPC] Use ppc_md.hpte_insert() in htab_bolt_mapping()
With the ppc_md htab pointers setup earlier, we can use ppc_md.hpte_insert
in htab_bolt_mapping(), rather than deciding which version to call by hand.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 11:59:47 +10:00
Michael Ellerman
7d0daae4ae [POWERPC] powerpc: Initialise ppc_md htab pointers earlier
Initialise the ppc_md htab callbacks earlier, in the probe routines. This
allows us to call htab_finish_init() from htab_initialize(), and makes it
private to hash_utils_64.c. Move htab_finish_init() and make_bl() above
htab_initialize() to avoid forward declarations.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 11:59:47 +10:00
Chandra Seetharaman
74b85f3790 [PATCH] cpu hotplug: make cpu_notifier related notifier blocks __cpuinit only
Make notifier_blocks associated with cpu_notifier as __cpuinitdata.

__cpuinitdata makes sure that the data is init time only unless
CONFIG_HOTPLUG_CPU is defined.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:41 -07:00
Randy Dunlap
c9cf55285e [PATCH] add poison.h and patch primary users
Localize poison values into one header file for better documentation and
easier/quicker debugging and so that the same values won't be used for
multiple purposes.

Use these constants in core arch., mm, driver, and fs code.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:38 -07:00
Yasunori Goto
bc02af93dd [PATCH] pgdat allocation for new node add (specify node id)
Change the name of old add_memory() to arch_add_memory.  And use node id to
get pgdat for the node at NODE_DATA().

Note: Powerpc's old add_memory() is defined as __devinit. However,
      add_memory() is usually called only after bootup.
      I suppose it may be redundant. But, I'm not well known about powerpc.
      So, I keep it. (But, __meminit is better at least.)

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:35 -07:00
Anil S Keshavamurthy
4f9e87c045 [PATCH] Notify page fault call chain for powerpc
Overloading of page fault notification with the notify_die() has performance
issues(since the only interested components for page fault is kprobes and/or
kdb) and hence this patch introduces the new notifier call chain exclusively
for page fault notifications their by avoiding notifying unnecessary
components in the do_page_fault() code path.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:22 -07:00
Linus Torvalds
45c091bb2d Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (139 commits)
  [POWERPC] re-enable OProfile for iSeries, using timer interrupt
  [POWERPC] support ibm,extended-*-frequency properties
  [POWERPC] Extra sanity check in EEH code
  [POWERPC] Dont look for class-code in pci children
  [POWERPC] Fix mdelay badness on shared processor partitions
  [POWERPC] disable floating point exceptions for init
  [POWERPC] Unify ppc syscall tables
  [POWERPC] mpic: add support for serial mode interrupts
  [POWERPC] pseries: Print PCI slot location code on failure
  [POWERPC] spufs: one more fix for 64k pages
  [POWERPC] spufs: fail spu_create with invalid flags
  [POWERPC] spufs: clear class2 interrupt status before wakeup
  [POWERPC] spufs: fix Makefile for "make clean"
  [POWERPC] spufs: remove stop_code from struct spu
  [POWERPC] spufs: fix spu irq affinity setting
  [POWERPC] spufs: further abstract priv1 register access
  [POWERPC] spufs: split the Cell BE support into generic and platform dependant parts
  [POWERPC] spufs: dont try to access SPE channel 1 count
  [POWERPC] spufs: use kzalloc in create_spu
  [POWERPC] spufs: fix initial state of wbox file
  ...

Manually resolved conflicts in:
	drivers/net/phy/Makefile
	include/asm-powerpc/spu.h
2006-06-22 22:11:30 -07:00
Jon Loeliger
ee0339f205 [POWERPC] Add starting of secondary 86xx CPUs.
Clear the high BATS during load_up_mmu if FTR_HAS_HIGH_BATS.
Allow just a bit more time for secondary CPUs to phone home.

Signed-off-by: Wei Zhang <Wei.Zhang@freescale.com>
Signed-off-by: Haiying Wang <Haiying.Wang@freescale.com>
Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:28 +10:00
Arnd Bergmann
19242b2407 [PATCH] powerpc: Fix 64k pages on non-partitioned machines
The page size encoding passed to tlbie is incorrect for new-style
large pages.  This fixes it.  This doesn't affect anything on older
machines because mmu_psize_defs[psize].penc (the page size encoding)
is 0 for 4k and 16M pages (the two are distinguished by a separate "is
a large page" bit).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:56:24 -07:00
Anton Blanchard
227318bbde [POWERPC] Remove stale 64bit on 32bit kernel code
Remove some stale POWER3/POWER4/970 on 32bit kernel support.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 19:31:26 +10:00
Paul Mackerras
bf72aeba2f powerpc: Use 64k pages without needing cache-inhibited large pages
Some POWER5+ machines can do 64k hardware pages for normal memory but
not for cache-inhibited pages.  This patch lets us use 64k hardware
pages for most user processes on such machines (assuming the kernel
has been configured with CONFIG_PPC_64K_PAGES=y).  User processes
start out using 64k pages and get switched to 4k pages if they use any
non-cacheable mappings.

With this, we use 64k pages for the vmalloc region and 4k pages for
the imalloc region.  If anything creates a non-cacheable mapping in
the vmalloc region, the vmalloc region will get switched to 4k pages.
I don't know of any driver other than the DRM that would do this,
though, and these machines don't have AGP.

When a region gets switched from 64k pages to 4k pages, we do not have
to clear out all the 64k HPTEs from the hash table immediately.  We
use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page
was hashed in as a 64k page or a set of 4k pages.  If hash_page is
trying to insert a 4k page for a Linux PTE and it sees that it has
already been inserted as a 64k page, it first invalidates the 64k HPTE
before inserting the 4k HPTE.  The hash invalidation routines also use
the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a
set of 4k HPTEs to remove.  With those two changes, we can tolerate a
mix of 4k and 64k HPTEs in the hash table, and they will all get
removed when the address space is torn down.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 10:45:18 +10:00
Paul Mackerras
4306443128 powerpc: Remove unused paca->pgdir field
The pgdir field in the paca was a leftover from the dynamic VSIDs
patch, and is not used in the current kernel code.  This removes it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-12 18:38:21 +10:00
Paul Mackerras
6218a761bb powerpc: add context.vdso_base for 32-bit too
This adds a vdso_base element to the mm_context_t for 32-bit compiles
(both for ARCH=powerpc and ARCH=ppc).  This fixes the compile errors
that have been reported in arch/powerpc/kernel/signal_32.c.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-11 14:15:17 +10:00
Benjamin Herrenschmidt
c5cf0e30bf [PATCH] powerpc: Fix buglet with MMU hash management
Our MMU hash management code would not set the "C" bit (changed bit) in
the hardware PTE when updating a RO PTE into a RW PTE. That would cause
the hardware to possibly to a write back to the hash table to set it on
the first store access, which in addition to being a performance issue,
might also hit a bug when running with native hash management (non-HV)
as our code is specifically optimized for the case where no write back
happens.

Thus there is a very small therocial window were a hash PTE can become
corrupted if that HPTE has just been upgraded to read write, a store
access happens on it, and that races with another processor evicting
that same slot. Since eviction (caused by an almost full hash) is
extremely rare, the bug is very unlikely to happen fortunately.

This fixes by allowing the updating of the protection bits in the native
hash handling to also set (but not clear) the "C" bit, and, in order to
also improve performances in the general case, by always setting that
bit on newly inserted hash PTE so that writeback really never happens.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-09 21:20:59 +10:00
Michael Ellerman
2babf5c2ec [PATCH] powerpc: Unify mem= handling
We currently do mem= handling in three seperate places. And as benh pointed out
I wrote two of them. Now that we parse command line parameters earlier we can
clean this mess up.

Moving the parsing out of prom_init means the device tree might be allocated
above the memory limit. If that happens we'd have to move it. As it happens
we already have logic to do that for kdump, so just genericise it.

This also means we might have reserved regions above the memory limit, if we
do the bootmem allocator will blow up, so we have to modify
lmb_enforce_memory_limit() to truncate the reserves as well.

Tested on P5 LPAR, iSeries, F50, 44p. Tested moving device tree on P5 and
44p and F50.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-19 15:02:15 +10:00
Paul Mackerras
f18fc729cd Merge ../linux-2.6 2006-05-05 15:45:48 +10:00
Jeremy Kerr
953039c8df [PATCH] powerpc: Allow devices to register with numa topology
Change of_node_to_nid() to traverse the device tree, looking for a numa id.
Cell uses this to assign ids to SPUs, which are children of the CPU node.
Existing users of of_node_to_nid() are altered to use of_node_to_nid_single(),
which doesn't do the traversal.

Export an attach_sysdev_to_node() function, allowing system devices (eg.
SPUs) to link themselves into the numa topology in sysfs.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01 18:17:46 -07:00
Paul Mackerras
29f147d746 Merge branch 'merge' 2006-04-29 16:15:57 +10:00
David Gibson
f10a04c034 [PATCH] powerpc: Fix pagetable bloat for hugepages
At present, ARCH=powerpc kernels can waste considerable space in
pagetables when making large hugepage mappings.  Hugepage PTEs go in
PMD pages, but each PMD page maps 256M and so contains only 16
hugepage PTEs (128 bytes of data), but takes up a 1024 byte
allocation.  With CONFIG_PPC_64K_PAGES enabled (64k base page size),
the situation is worse.  Now hugepage PTEs are at the PTE page level
(also mapping 256M), so we store 16 hugepage PTEs in a 64k allocation.

The PowerPC MMU already means that any 256M region is either all
hugepage, or all normal pages.  Thus, with some care, we can use a
different allocation for the hugepage PTE tables and only allocate the
128 bytes necessary.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-28 15:02:51 +10:00
Olof Johansson
e110b281dc [PATCH] powerpc: Less verbose mem configuration output
Quieten some of the debug ram config output. we already print out available
memory at KERN_INFO level.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-22 18:45:12 +10:00
Olof Johansson
f430c02b13 [PATCH] powerpc: Quiet page order output
No need to always print page orders.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-22 18:45:09 +10:00
Anton Blanchard
fc5266ea52 [PATCH] powerpc: trivial spelling fixes in fault.c
This comment exceeded my bad spelling threshold :)

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-01 22:37:13 +11:00
KAMEZAWA Hiroyuki
0e5519548f [PATCH] for_each_possible_cpu: powerpc
for_each_cpu() actually iterates across all possible CPUs.  We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs.  This is inefficient and
possibly buggy.

We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.

This patch replaces for_each_cpu with for_each_possible_cpu.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-29 13:44:15 +11:00
Eugene Surovegin
bab70a4af7 [PATCH] lock PTE before updating it in 440/BookE page fault handler
Fix 44x and BookE page fault handler to correctly lock PTE before
trying to pte_update() it, otherwise this PTE might be swapped out
after pte_present() check but before pte_uptdate() call, resulting in
corrupted PTE. This can happen with enabled preemption and low memory
condition.

Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-29 13:44:15 +11:00
Paul Mackerras
bac30d1a78 Merge ../linux-2.6 2006-03-29 13:24:50 +11:00
Benjamin Herrenschmidt
e8222502ee [PATCH] powerpc: Kill _machine and hard-coded platform numbers
This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism.  With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.

We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants.  This commit also
changes various drivers to use the new macro instead of looking at
_machine.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 23:15:54 +11:00
KAMEZAWA Hiroyuki
ec936fc563 [PATCH] for_each_online_pgdat: renaming for_each_pgdat
Replace for_each_pgdat() with for_each_online_pgdat().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:48 -08:00
Anton Blanchard
c258dd40ab [PATCH] powerpc: Consistent printing of node id
We were printing node ids in hex in one spot. Lets be consistent and
always print them in decimal.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:50 +11:00
Andrew Morton
069007ae07 [PATCH] powerpc: hot_add_scn_to_nid() build fix
The return statement is to prevent `warning: 'nid' might be used uninitialized
in this function'.

Cc: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:34 +11:00
Ingo Molnar
14cc3e2b63 [PATCH] sem2mutex: misc static one-file mutexes
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jens Axboe <axboe@suse.de>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:55 -08:00
Linus Torvalds
2e6e33bab6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (78 commits)
  [PATCH] powerpc: Add FSL SEC node to documentation
  [PATCH] macintosh: tidy-up driver_register() return values
  [PATCH] powerpc: tidy-up of_register_driver()/driver_register() return values
  [PATCH] powerpc: via-pmu warning fix
  [PATCH] macintosh: cleanup the use of i2c headers
  [PATCH] powerpc: dont allow old RTC to be selected
  [PATCH] powerpc: make powerbook_sleep_grackle static
  [PATCH] powerpc: Fix warning in add_memory
  [PATCH] powerpc: update mailing list addresses
  [PATCH] powerpc: Remove calculation of io hole
  [PATCH] powerpc: iseries: Add bootargs to /chosen
  [PATCH] powerpc: iseries: Add /system-id, /model and /compatible
  [PATCH] powerpc: Add strne2a() to convert a string from EBCDIC to ASCII
  [PATCH] powerpc: iseries: Make more stuff static in platforms/iseries/mf.c
  [PATCH] powerpc: iseries: Remove pointless iSeries_(restart|power_off|halt)
  [PATCH] powerpc: iseries: mf related cleanups
  [PATCH] powerpc: Replace platform_is_lpar() with a firmware feature
  [PATCH] powerpc: trivial: Cleanup whitespace in cputable.h
  [PATCH] powerpc: Remove unused iommu_off logic from pSeries_init_early()
  [PATCH] powerpc: Unconfuse htab_bolt_mapping() callers
  ...
2006-03-22 22:20:46 -08:00
Andrew Morton
2d0eee14b2 [PATCH] powerpc: Fix warning in add_memory
arch/powerpc/mm/mem.c: In function `add_memory':
arch/powerpc/mm/mem.c:128: warning: assignment makes integer from pointer without a cast

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-23 14:39:51 +11:00
David Gibson
42b88befd6 [PATCH] hugepage: is_aligned_hugepage_range() cleanup
Quite a long time back, prepare_hugepage_range() replaced
is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
verify if an address range is suitable for a hugepage mapping.
is_aligned_hugepage_range() stuck around, but only to implement
prepare_hugepage_range() on archs which didn't implement their own.

Most archs (everything except ia64 and powerpc) used the same
implementation of is_aligned_hugepage_range().  On powerpc, which
implements its own prepare_hugepage_range(), the custom version was never
used.

In addition, "is_aligned_hugepage_range()" was a bad name, because it
suggests it returns true iff the given range is a good hugepage range,
whereas in fact it returns 0-or-error (so the sense is reversed).

This patch cleans up by abolishing is_aligned_hugepage_range().  Instead
prepare_hugepage_range() is defined directly.  Most archs use the default
version, which simply checks the given region is aligned to the size of a
hugepage.  ia64 and powerpc define custom versions.  The ia64 one simply
checks that the range is in the correct address space region in addition to
being suitably aligned.  The powerpc version (just as previously) checks
for suitable addresses, and if necessary performs low-level MMU frobbing to
set up new areas for use by hugepages.

No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:04 -08:00
Nick Piggin
7835e98b2e [PATCH] remove set_page_count() outside mm/
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().

This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:02 -08:00
Nick Piggin
70dc991d66 [PATCH] remove set_page_count(page, 0) users (outside mm)
A couple of places set_page_count(page, 1) that don't need to.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:01 -08:00
Michael Ellerman
f8642ebee8 [PATCH] powerpc: Remove calculation of io hole
In mm_init_ppc64() we calculate the location of the "IO hole", but then
no one ever looks at the value. So don't bother.

That's actually all mm_init_ppc64() does, so get rid of it too.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-22 15:04:30 +11:00
Michael Ellerman
57cfb814f6 [PATCH] powerpc: Replace platform_is_lpar() with a firmware feature
It has been decreed that platform numbers are evil, so as a step in that
direction, replace platform_is_lpar() with a FW_FEATURE_LPAR bit.

Currently FW_FEATURE_LPAR really means i/pSeries LPAR, in the future we might
have to clean that up if we need to be more specific about what LPAR actually
means. But that's another patch ...

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-22 15:04:17 +11:00
Michael Ellerman
caf80e579b [PATCH] powerpc: Unconfuse htab_bolt_mapping() callers
htab_bolt_mapping() takes a vstart and pstart parameter, but all but one of
its callers actually pass it vstart and vstart. Luckily before it passes
paddr (calculated from paddr) to the hpte_insert routines it calls
virt_to_abs() (aka. __pa()) on the address, so there isn't actually a bug.

map_io_page() however does pass pstart properly, so currently it's broken
AFAICT because we're calling __pa(paddr) which will get us something very
large. Presumably no one's calling map_io_page() in the right context.

Anyway, change htab_bolt_mapping() callers to properly pass pstart, and then
use it properly in htab_bolt_mapping(), ie. don't call __pa() on it again.

Booted on p5 LPAR, iSeries and Power3.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-22 15:04:09 +11:00
Nathan Lynch
2b2612272c [PATCH] powerpc numa: Consolidate assignment of cpus to nodes
We can plug the boot cpu into its node independently of whether numa
topology is detected.  And numa_setup_cpu does the right thing for all
cases now, so remove special-casing for non-numa from the cpu hotplug
callback.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-22 15:04:03 +11:00
Nathan Lynch
482ec7c403 [PATCH] powerpc numa: Support sparse online node map
The powerpc numa code unconditionally onlines all nodes from 0 to the
highest node id found, regardless of whether cpus or memory are
present in the nodes.  This wastes 8K per node and complicates some
cpu and memory hotplug situations, such as adding a resource that
doesn't map to one of the nodes discovered at boot.

Set nodes online as resources are scanned.  Fall back to node 0 only
when we're sure this isn't a NUMA machine.

Instead of defaulting to node 0 for cases of hot-adding a resource
which doesn't belong to any initialized node, assign it to the first
online node.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-22 15:04:01 +11:00
Nathan Lynch
bc16a75926 [PATCH] powerpc numa: Consolidate handling of Power4 special case
Code to handle Power4's invalid node id (0xffff) is duplicated for cpu
and memory.  Better to handle this case in one place --
of_node_to_nid.  Overall behavior should be unchanged.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-22 15:03:57 +11:00
Nathan Lynch
cf950b7af0 [PATCH] powerpc numa: Get rid of "numa domain" terminology
Since we effectively treat the domain ids given to us by firmare as
logical node ids, make this explicit (basically s/numa_domain/nid/).

No functional changes, only variable and function names are modified.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-22 15:03:52 +11:00
Nathan Lynch
2e5ce39d67 [PATCH] powerpc numa: Minor cpu hotplug-related cleanups
map_cpu_to_node does not need to be inline, it is never called in a
hot path.

map_cpu_to_node, numa_setup_cpu, and find_cpu_node can be marked
__cpuinit, as they are never used after boot if CONFIG_HOTPLUG_CPU=n.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-22 15:03:48 +11:00
Nathan Lynch
bf4b85b0e4 [PATCH] powerpc numa: Minor debugging code changes
Add debug statement for map_cpu_to_node; it's useful for cpu hotplug.

Clarify debug statement about not finding the numa reference points
property.

Don't print a meaningless associativity depth (-1) on non-numa systems.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-22 15:03:45 +11:00
Nathan Lynch
c08888cf3c [PATCH] powerpc numa: fix boot_cpuid always assigned to node 0
At boot, the numa code is assigning boot_cpuid to node 0
unconditionally.  Basically, numa_setup_cpu is being stupid about it,
but this is the minimal fix -- just call numa_setup_cpu(boot_cpuid)
later, after all nodes have been set online.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-22 15:03:40 +11:00
Michael Ellerman
2c276603c3 [PATCH] powerpc: Fix bug in bug fix for bug in lmb_alloc()
My patch (d7a5b2ffa1) to always panic if
lmb_alloc() fails is broken because it checks alloc < 0, but should be
checking alloc == 0.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-17 13:28:24 +11:00
Paul Mackerras
23dd640112 Merge ../linux-2.6 2006-03-17 12:01:19 +11:00
Olaf Hering
920573bd03 [PATCH] powerpc: remove duplicate EXPORT_SYMBOLS
remove warnings when building a 64bit kernel.
smp_call_function triggers also with 32bit kernel.

WARNING: vmlinux: duplicate symbol 'smp_call_function' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:164:EXPORT_SYMBOL(smp_call_function);
arch/powerpc/kernel/smp.c:300:EXPORT_SYMBOL(smp_call_function);

WARNING: vmlinux: duplicate symbol 'ioremap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:113:EXPORT_SYMBOL(ioremap);
arch/powerpc/mm/pgtable_64.c:321:EXPORT_SYMBOL(ioremap);

WARNING: vmlinux: duplicate symbol '__ioremap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:117:EXPORT_SYMBOL(__ioremap);
arch/powerpc/mm/pgtable_64.c:322:EXPORT_SYMBOL(__ioremap);

WARNING: vmlinux: duplicate symbol 'iounmap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:118:EXPORT_SYMBOL(iounmap);
arch/powerpc/mm/pgtable_64.c:323:EXPORT_SYMBOL(iounmap);

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-16 16:55:05 +11:00
Paul Mackerras
6749c55073 Merge ../powerpc-merge 2006-02-28 16:35:24 +11:00
Michael Ellerman
56ec6462af [PATCH] powerpc/iseries: Fix double phys_to_abs bug in htab_bolt_mapping
Before the merge I updated create_pte_mapping() to work for iSeries, by
calling iSeries_hpte_bolt_or_insert. (4c55130b2a)

Later we changed iSeries_hpte_insert to cope with the bolting case, and called
that instead from create_pte_mapping() (which was renamed to htab_bolt_mapping)
(3c726f8dee).

Unfortunately that change introduced a subtle bug, where we pass an absolute
address to iSeries_hpte_insert() where it expects a physical address. This
leads to us calling phys_to_abs() twice on the physical address, which is
seriously bogus.

This only causes a problem if the absolute address from the first translation
can be looked up again in the chunk_map, which depends on the size and layout
of memory. I've seen it fail on one box, but not others.

The minimal fix is to pass the physical address to iSeries_hpte_insert(). For
2.6.17 we should make phys_to_abs() BUG if we try to double-translate an
address.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-28 16:25:55 +11:00
Paul Mackerras
a00428f5b1 Merge ../powerpc-merge 2006-02-24 14:05:47 +11:00
R Sharada
47f78a4920 [PATCH] powerpc64: fix spinlock recursion in native_hpte_clear
native_hpte_clear has a spinlock recursion problem with the native_tlbie_lock
being called twice, once in native_hpte_clear() and once within tlbie().
Fix the problem by changing the call to tlbie() in native_hpte_clear() to
__tlbie(). It still supports only 4k pages for now.

Signed-off-by: R Sharada <sharada@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-24 11:36:29 +11:00
Michael Ellerman
337a7128db [PATCH] powerpc: Only calculate htab_size in one place for kexec
For kexec we need to know the size of the MMU hash table.

Currently we calculate the size once in the htab code, and then twice more in
the kexec code, once using htab_hash_mask and once using ppc64_pft_size.
On some machines the ppc64_pft_size calculation is broken because
ppc64_pft_size is not set.

So we need to fix the second calculation, but better still we should just
calculate the size once and use it everywhere else.

Tested on Power5 LPAR, Power4 non-LPAR and Power3.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-24 11:36:18 +11:00
Jon Mason
2ef9481e66 [PATCH] powerpc: trivial: modify comments to refer to new location of files
This patch removes all self references and fixes references to files
in the now defunct arch/ppc64 tree.  I think this accomplises
everything wanted, though there might be a few references I missed.

Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-10 16:53:51 +11:00
Michael Ellerman
3b9331dac1 [PATCH] powerpc: Move LMB_ALLOC_ANYWHERE out of lmb.h
LMB_ALLOC_ANYWHERE doesn't need to be part of the API, it's only used in
lmb.c - so move it out of the header file.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:38:36 +11:00
Michael Ellerman
d7a5b2ffa1 [PATCH] powerpc: Always panic if lmb_alloc() fails
Currently most callers of lmb_alloc() don't check if it worked or not, if it
ever does weird bad things will probably happen. The few callers who do check
just panic or BUG_ON.

So make lmb_alloc() panic internally, to catch bugs at the source. The few
callers who did check the result no longer need to.

The only caller that did anything interesting with the return result was
careful_allocation(). For it we create __lmb_alloc_base() which _doesn't_ panic
automatically, a little messy, but passable.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:38:34 +11:00
David Gibson
09f5dc44ae [PATCH] powerpc: Cleanup, consolidating icache dirtying logic
The code to mark a page as icache dirty (so that it will later be
icache-dcache flushed when we try to execute from it) is duplicated in
three places: flush_dcache_page() does this marking and nothing else,
but clear_user_page() and copy_user_page() duplicate it, since those
functions make the page icache dirty themselves.

This patch makes those other functions call flush_dcache_page()
instead, so the logic's all in one place.  This will make life less
confusing if we ever need to tweak the details of the the lazy icache
flush mechanism.

 arch/powerpc/mm/mem.c |   14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 21:51:53 +11:00
Michael Ellerman
8c20fafa85 [PATCH] powerpc: Make sure we don't create empty lmb regions
To prevent problems later in boot, make sure we don't create zero-size lmb
regions.

I've checked all the callers, and at the moment no one should ever hit this.
All callers use a constant size, or they check the computed size before they
call us.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 21:28:38 +11:00
Linas Vepstas
d177c207ba [PATCH] powerpc: IOMMU: don't ioremap null addresses
240-ioremap-null-ptr-test.patch

Under highly unusual circumstances, a buggy driver will ask a null ptr
to be ioremapped, an operation that curently succeeds but leads to
later trouble.  Instead, refuse to remap the null pointer.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from e71d9e598533c1889e7162f5f4647e5d378c102c commit)
2006-01-10 15:30:31 +11:00
Adrian Bunk
943ffb587c spelling: s/retreive/retrieve/
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-10 00:10:13 +01:00
Michael Ellerman
e0fa93d6e6 [PATCH] powerpc: Don't use KERNELBASE in add_memory()
In add_memory() we should be using __va() to get a virtual address.
Spotted by Mike Kravetz.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 20:18:19 +11:00
Anton Blanchard
bce6c5fd8c [PATCH] powerpc: DABR exceptions should report the address not the PC
When taking a DABR exception we were reporting the PC. It makes more
sense to report the address that caused the exception, and the gdb guys
would like it that way.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 16:03:33 +11:00
Mike Kravetz
b226e46212 [PATCH] powerpc: don't add memory to empty node/zone
The system will oops if an attempt is made to add memory to an
empty node/zone.  This patch prevents adding memory to an empty
node.  The code to dynamically add a node/zone is non-trivial.
This patch is temporary and will be removed when the ability
to dynamically add a node/zone is complete.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 15:14:22 +11:00
Arnd Bergmann
021c733549 [PATCH] powerpc: fix two build warnings
Building the arch/powerpc tree currently gives me
two warnings with gcc-4.0:

arch/powerpc/mm/imalloc.c: In function '__im_get_area':
arch/powerpc/mm/imalloc.c:225: warning: 'tmp' may be used uninitialized in this function
arch/powerpc/mm/hugetlbpage.c: In function 'hugetlb_get_unmapped_area':
arch/powerpc/mm/hugetlbpage.c:608: warning: unused variable 'vma'

both fixes are trivial.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 15:14:14 +11:00
David Gibson
14c89e7fc8 [PATCH] powerpc: Replace VMALLOCBASE with VMALLOC_START
On ppc64, we independently define VMALLOCBASE and VMALLOC_START to be
the same thing: the start of the vmalloc() area at 0xd000000000000000.
VMALLOC_START is used much more widely, including in generic code, so
this patch gets rid of the extraneous VMALLOCBASE.

This does require moving the definitions of region IDs from page_64.h
to pgtable.h, but they don't clearly belong in the former rather than
the latter, anyway.  While we're moving them, clean up the definitions
of the REGION_IDs:
	- Abolish REGION_SIZE, it was only used once, to define
REGION_MASK anyway
	- Define the specific region ids in terms of the REGION_ID()
macro.
	- Define KERNEL_REGION_ID in terms of PAGE_OFFSET rather than
KERNELBASE.  It amounts to the same thing, but conceptually this is
about the region of the linear mapping (which starts at PAGE_OFFSET)
rather than of the kernel text itself (which is at KERNELBASE).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 15:05:47 +11:00
Benjamin Herrenschmidt
cc5d0189b9 [PATCH] powerpc: Remove device_node addrs/n_addr
The pre-parsed addrs/n_addrs fields in struct device_node are finally
gone. Remove the dodgy heuristics that did that parsing at boot and
remove the fields themselves since we now have a good replacement with
the new OF parsing code. This patch also fixes a bunch of drivers to use
the new code instead, so that at least pmac32, pseries, iseries and g5
defconfigs build.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:53:55 +11:00
Anton Blanchard
4b703a2317 [PATCH] ppc64: Add NUMA cpu summary at boot
We used to print a NUMA cpu summary at boot before the hotplug cpu code
was added. This has been useful for catching machine configuration as
well as firmware bugs in the past.

This patch restores that functionality. An example of the output is:

Node 0 CPUs: 0-7
Node 1 CPUs: 8-15

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:53:37 +11:00
Michael Ellerman
ba7594852f [PATCH] powerpc: Add support for "linux,usable-memory" on memory nodes
Milton has proposed that we should support a "linux,usable-memory" property
on memory nodes which describes, in preference to "reg", the regions of memory
Linux should use.

This facility is required for kdump to inform the second kernel which memory
it should use.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:52:38 +11:00
Mike Kravetz
237a0989e2 [PATCH] powerpc: numa placement for dynamically added memory
This places dynamically added memory within the appropriate
numa node.  A new routine hot_add_scn_to_nid() replicates most of
the memory scanning code in parse_numa_properties().

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:51:57 +11:00
Michael Ellerman
b5666f7039 [PATCH] powerpc: Separate usage of KERNELBASE and PAGE_OFFSET
This patch separates usage of KERNELBASE and PAGE_OFFSET. I haven't
looked at any of the PPC32 code, if we ever want to support Kdump on
PPC we'll have to do another audit, ditto for iSeries.

This patch makes PAGE_OFFSET the constant, it'll always be 0xC * 1
gazillion for 64-bit.

To get a physical address from a virtual one you subtract PAGE_OFFSET,
_not_ KERNELBASE.

KERNELBASE is the virtual address of the start of the kernel, it's
often the same as PAGE_OFFSET, but _might not be_.

If you want to know something's offset from the start of the kernel
you should subtract KERNELBASE.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:51:54 +11:00
Michael Ellerman
51fae6de24 [PATCH] powerpc: Add a is_kernel_addr() macro
There's a bunch of code that compares an address with KERNELBASE to see if
it's a "kernel address", ie. >= KERNELBASE. The proper test is actually to
compare with PAGE_OFFSET, since we're going to change KERNELBASE soon.

So replace all of them with an is_kernel_addr() macro that does that.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:51:50 +11:00
Mike Kravetz
84c9fdd11e [PATCH] powerpc: Minor numa memory code cleanup
Here is an updated version of the patch that panics if no memory is
found as Nathan suggested.  I'm still concerned that panic strings
(not just the one added here) at this stage of booting do not show
up on my system.  But, that is an issue separate from this patch.

Combine get_mem_*_cells() routines to avoid multiple memory node
lookups.  Added missing of_node_put() call.  Changed variable names
to help with some confusion as to meaning.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:51:34 +11:00
Paul Mackerras
54c233102f Revert "[PATCH] powerpc: Minor numa memory code cleanup"
This reverts f1fdc0117004d343698b9830e141491d5ae320d1 commit.
2006-01-09 14:51:30 +11:00
Mike Kravetz
74761bb53d [PATCH] powerpc: Minor numa memory code cleanup
I started to add missing of_node_put() calls to the routines that
determine the number of cells for memory.  Decided to combine the
routines instead of making separate node lookups.  Changed variable
names to help with some confusion as to meaning.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:51:02 +11:00
David Gibson
456752f750 [PATCH] powerpc: Make hugepage mappings respect hint addresses
Currently, the powerpc version of hugetlb_get_unmapped_area() entirely
ignores the hint address.  The only way to get a hugepage mapping at a
specified address is with MAP_FIXED, in which case there's no way
(short of parsing /proc/self/maps) for userspace to tell if it will
clobber an existing mapping.  This is inconvenient, so the patch below
makes hugepage mappings use the given hint address if possible.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:50:28 +11:00
Benjamin Herrenschmidt
51d3082fe6 [PATCH] powerpc: Unify udbg (#2)
This patch unifies udbg for both ppc32 and ppc64 when building the
merged achitecture. xmon now has a single "back end". The powermac udbg
stuff gets enriched with some ADB capabilities and btext output. In
addition, the early_init callback is now called on ppc32 as well,
approx. in the same order as ppc64 regarding device-tree manipulations.
The init sequences of ppc32 and ppc64 are getting closer, I'll unify
them in a later patch.

For now, you can force udbg to the scc using "sccdbg" or to btext using
"btextdbg" on powermacs. I'll implement a cleaner way of forcing udbg
output to something else than the autodetected OF output device in a
later patch.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:49:54 +11:00
Arnd Bergmann
67207b9664 [PATCH] spufs: The SPU file system, base
This is the current version of the spu file system, used
for driving SPEs on the Cell Broadband Engine.

This release is almost identical to the version for the
2.6.14 kernel posted earlier, which is available as part
of the Cell BE Linux distribution from
http://www.bsc.es/projects/deepcomputing/linuxoncell/.

The first patch provides all the interfaces for running
spu application, but does not have any support for
debugging SPU tasks or for scheduling. Both these
functionalities are added in the subsequent patches.

See Documentation/filesystems/spufs.txt on how to use
spufs.

Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:49:12 +11:00
Anton Blanchard
e597cb32e9 [PATCH] ppc64: htab_initialize_secondary cannot be marked __init
Sonny has noticed hotplug CPU on ppc64 is broken in 2.6.15-*. One of the
problems is that htab_initialize_secondary is called when a cpu is being
brought up, but it is marked __init.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-29 10:26:36 -08:00
David Gibson
23ed6cb9a2 [PATCH] powerpc: Fix SLB flushing path in hugepage
On ppc64, when opening a new hugepage region, we need to make sure any
old normal-page SLBs for the area are flushed on all CPUs.  There was
a bug in this logic - after putting the new hugepage area masks into
the thread structure, we copied it into the paca (read by the SLB miss
handler) only on one CPU, not on all.  This could cause incorrect SLB
entries to be loaded when a multithreaded program was running
simultaneously on several CPUs.  This patch corrects the error,
copying the context information into the PACA on all CPUs using the mm
in question before flushing any existing SLB entries.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-12-09 16:57:35 +11:00
David Gibson
cbf52afdc0 [PATCH] powerpc: Add missing icache flushes for hugepages
On most powerpc CPUs, the dcache and icache are not coherent so
between writing and executing a page, the caches must be flushed.
Userspace programs assume pages given to them by the kernel are icache
clean, so we must do this flush between the kernel clearing a page and
it being mapped into userspace for execute.  We were not doing this
for hugepages, this patch corrects the situation.

We use the same lazy mechanism as we use for normal pages, delaying
the flush until userspace actually attempts to execute from the page
in question.

Tested on G5.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-12-09 16:30:48 +11:00
Benjamin Herrenschmidt
325c82a029 [PATCH] powerpc: Fix a huge page bug
The 64k pages patch changed the meaning of one argument passed to the
low level hash functions (from "large" it became "psize" or page size
index), but one of the call sites wasn't properly updated, causing
potential random weird problems with huge pages. This fixes it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-12-08 16:57:41 +11:00
Mike Kravetz
6d91bb93e4 [PATCH] powerpc/pseries: boot failures on numa if no memory on node
This bug exists in the current code and prevents machines from booting
with numa enabled if there is a node that does not contain memory.
Workaround is to boot with 'numa=off'.  Looks like a simple typo.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-12-08 16:54:39 +11:00
Olof Johansson
ca507eaf32 [PATCH] powerpc: remove redundant code in stab init
There's never been a hardware platform that has both pSeries/RPA LPAR
hypervisor and stab (pre-POWER4 segment management). This removes
the redundant code in stab_initalize().

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-12-05 15:41:35 +11:00
David Gibson
9a94c5793a [PATCH] powerpc: More hugepage boundary case fixes
Blah.  The patch [0] I recently sent fixing errors with
in_hugepage_area() and prepare_hugepage_range() for powerpc itself has
an off-by-one bug.  Furthermore, the related functions
touches_hugepage_*_range() and within_hugepage_*_range() are also
buggy.  Some of the bugs, like those addressed in [0] originated with
commit 7d24f0b8a5 where we tweaked the
semantics of where hugepages are allowed.  Other bugs have been there
essentially forever, and are due to the undefined behaviour of '<<'
with shift counts greater than the type width (LOW_ESID_MASK could
return non-zero for high ranges with the right congruences).

The good news is that I now have a testsuite which should pick up
things like this if they creep in again.

[0] "powerpc-fix-for-hugepage-areas-straddling-4gb-boundary"

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-25 22:12:45 +11:00
David Gibson
5e391dc9e3 [PATCH] powerpc: fix for hugepage areas straddling 4GB boundary
Commit 7d24f0b8a5 fixed bugs in the ppc64 SLB
miss handler with respect to hugepage handling, and in the process tweaked
the semantics of the hugepage address masks in mm_context_t.

Unfortunately, it left out a couple of necessary changes to go with that
change.  First, the in_hugepage_area() macro was not updated to match,
second prepare_hugepage_range() was not updated to correctly handle
hugepages regions which straddled the 4GB point.

The latter appears only to cause process-hangs when attempting to map such
a region, but the former can cause oopses if a get_user_pages() is
triggered at the wrong point.  This patch addresses both bugs.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23 16:08:39 -08:00
Hugh Dickins
7ce774b480 [PATCH] mm: powerpc init_mm without ptlock
Restore an earlier mod which went missing in the powerpc reshuffle: the 4xx
mmu_mapin_ram does not need to take init_mm.page_table_lock.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23 16:08:38 -08:00
Hugh Dickins
01edcd891c [PATCH] mm: powerpc ptlock comments
Update comments (only) on page_table_lock and mmap_sem in arch/powerpc.
Removed the comment on page_table_lock from hash_huge_page: since it's no
longer taking page_table_lock itself, it's irrelevant whether others are; but
how it is safe (even against huge file truncation?) I can't say.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23 16:08:38 -08:00
David Gibson
800fc3eeb0 [PATCH] powerpc: Remove imalloc.h
asm-ppc64/imalloc.h is only included from files in arch/powerpc/mm.
We already have a header for mm local definitions,
arch/powerpc/mm/mmu_decl.h.  Thus, this patch moves the contents of
imalloc.h into mmu_decl.h.  The only exception are the definitions of
PHBS_IO_BASE, IMALLOC_BASE and IMALLOC_END.  Those are moved into
pgtable.h, next to similar definitions of VMALLOC_START and
VMALLOC_SIZE.

Built for multiplatform 32bit and 64bit (ARCH=powerpc).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-19 14:46:02 +11:00
Linus Torvalds
d58a75ef75 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge 2005-11-16 07:58:48 -08:00
Michael Ellerman
eb481899aa [PATCH] powerpc: Fixup debugging in lmb.c
Somewhere we lost the include of udbg.h in lmb.c. While we're there, add a DBG
macro like every other file has and use it in lmb_dump_all().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-16 13:28:40 +11:00
Paul Mackerras
fb6d73d301 [PATCH] powerpc: Fix sparsemem with memory holes [was Re: ppc64 oops..]
This patch should fix the crashes we have been seeing on 64-bit
powerpc systems with a memory hole when sparsemem is enabled.
I'd appreciate it if people who know more about NUMA and sparsemem
than me could look over it.

There were two bugs.  The first was that if NUMA was enabled but there
was no NUMA information for the machine, the setup_nonnuma() function
was adding a single region, assuming memory was contiguous.  The
second was that the loops in mem_init() and show_mem() assumed that
all pages within the span of a pgdat were valid (had a valid struct
page).

I also fixed the incorrect setting of num_physpages that Mike Kravetz
pointed out.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15 16:57:12 -08:00
Kumar Gala
4c8d3d997e [PATCH] Update email address for Kumar
Changed jobs and the Freescale address is no longer valid.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13 18:14:10 -08:00
Benjamin Herrenschmidt
a7f290dad3 [PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel
This patch moves the vdso's to arch/powerpc, adds support for the 32
bits vdso to the 32 bits kernel, rename systemcfg (finally !), and adds
some new (still untested) routines to both vdso's: clock_gettime() with
support for CLOCK_REALTIME and CLOCK_MONOTONIC, clock_getres() (same
clocks) and get_tbfreq() for glibc to retreive the timebase frequency.

Tom,Steve: The implementation of get_tbfreq() I've done for 32 bits
returns a long long (r3, r4) not a long. This is such that if we ever
add support for >4Ghz timebases on ppc32, the userland interface won't
have to change.

I have tested gettimeofday() using some glibc patches in both ppc32 and
ppc64 kernels using 32 bits userland (I haven't had a chance to test a
64 bits userland yet, but the implementation didn't change and was
tested earlier). I haven't tested yet the new functions.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-11 22:25:39 +11:00
Anton Blanchard
45fb6cea09 [PATCH] ppc64: Convert NUMA to sparsemem (3)
Convert to sparsemem and remove all the discontigmem code in the
process. This has a few advantages:

- The old numa_memory_lookup_table can go away
- All the arch specific discontigmem magic can go away

We also remove the triple pass of memory properties and instead create a
list of per node extents that we iterate through. A final cleanup would
be to change our lmb code to store extents per node, then we can reuse
that information in the numa code.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-11 22:21:11 +11:00
Anton Blanchard
3e66c4def1 [PATCH] ppc64: prep for NUMA sparsemem rework 2
Remove ppc64 specific version of nr_cpus_node and use the generic one
provided.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-11 22:20:57 +11:00
Paul Mackerras
49b09853df powerpc: Move some extern declarations from C code into headers
This also make klimit have the same type on 32-bit as on 64-bit,
namely unsigned long, and defines and initializes it in one place.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 15:53:40 +11:00
Benjamin Herrenschmidt
87655ff268 [PATCH] powerpc: 64k pages pmd alloc fix
This patch makes the kernel use a different kmem cache for PMD pages
as they are smaller than PTE pages. Avoids waste of memory.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 15:03:29 +11:00
Paul Mackerras
799d6046d3 [PATCH] powerpc: merge code values for identifying platforms
This patch merges platform codes.  systemcfg->platform is no longer used,
systemcfg use in general is deprecated as much as possible (and renamed
_systemcfg before it gets completely moved elsewhere in a future patch),
_machine is now used on ppc64 along as ppc32.  Platform codes aren't gone
yet but we are getting a step closer. A bunch of asm code in head[_64].S
is also turned into C code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 13:37:51 +11:00
Benjamin Herrenschmidt
77ac166fba [PATCH] ppc64: Don't panic when early __ioremap fails
Early calls to __ioremap() will panic if the hash insertion fails. This
patch makes them return NULL instead. It happens with some pSeries users
who enabled CONFIG_BOOTX_TEXT. The later is getting an incorrect address
for the fame buffer and the hash insertion fails. With this patch, it
will display an error instead of crashing at boot.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 11:26:06 +11:00
Mike Kravetz
dd7ccbd3ee [PATCH] Memory Add Fixes for ppc64
memmap_init_zone() sets page count to 1.  Before 'freeing' the
page, we need to clear the count.  This is the same that is done
on free_all_bootmem_core() for memory discovered at boot time.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-08 15:17:19 +11:00
Mike Kravetz
54b79248b2 [PATCH] revised Memory Add Fixes for ppc64
Add the create_section_mapping() routine to create hptes for memory
sections dynamically added after system boot.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-08 15:17:13 +11:00
Benjamin Herrenschmidt
76c8e25b90 [PATCH] ppc64: Fix the lazy icache/dcache code for non-RAM pages
For some stupid reason I can't explain (brown paper bag is at hand), I
removed the check pfn_valid() in the code that does the icache/dcache
coherency on POWER4 and later. That causes us to eventually try to
access non existing struct page when hashing in IO pages.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-08 13:07:50 +11:00
Anton Blanchard
bcb3557694 [PATCH] ppc64: fix Memory: summary line
On ppc64 we end up with a negative value for the data size in the memory
boot message:

Memory: 2035560k/2097152k available (5792k kernel code, 89564k reserved,
18014398509481632k data, 870k bss, 352k init)

It turns out the section ordering of the linker script is different on
ppc32 and ppc64, so just count data as _edata - _sdata which should work
on both.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-08 11:19:53 +11:00
Paul Mackerras
24bfb00123 Merge ../linux-2.6 2005-11-08 11:14:20 +11:00
Benjamin Herrenschmidt
863c84b97c [PATCH] ppc: Fix ppc32 build after 64K pages
Oops, some last minute changes caused the 64K pages patch to break ppc32
build, this fixes it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:23 -08:00
David Gibson
7d24f0b8a5 [PATCH] ppc64: Fix bug in SLB miss handler for hugepages
This patch, however, should be applied on top of the 64k-page-size patch to
fix some problems with hugepage (some pre-existing, another introduced by
this patch).

The patch fixes a bug in the SLB miss handler for hugepages on ppc64
introduced by the dynamic hugepage patch (commit id
c594adad56) due to a misunderstanding of the
srd instruction's behaviour (mea culpa).  The problem arises when a 64-bit
process maps some hugepages in the low 4GB of the address space (unusual).
In this case, as well as the 256M segment in question being marked for
hugepages, other segments at 32G intervals will be incorrectly marked for
hugepages.

In the process, this patch tweaks the semantics of the hugepage bitmaps to
be more sensible.  Previously, an address below 4G was marked for hugepages
if the appropriate segment bit in the "low areas" bitmask was set *or* if
the low bit in the "high areas" bitmap was set (which would mark all
addresses below 1TB for hugepage).  With this patch, any given address is
governed by a single bitmap.  Addresses below 4GB are marked for hugepage
if and only if their bit is set in the "low areas" bitmap (256M
granularity).  Addresses between 4GB and 1TB are marked for hugepage iff
the low bit in the "high areas" bitmap is set.  Higher addresses are marked
for hugepage iff their bit in the "high areas" bitmap is set (1TB
granularity).

To avoid conflicts, this patch must be applied on top of BenH's pending
patch for 64k base page size [0].  As such, this patch also addresses a
hugepage problem introduced by that patch.  That patch allows hugepages of
1MB in size on hardware which supports it, however, that won't work when
using 4k pages (4 level pagetable), because in that case hugepage PTEs are
stored at the PMD level, and each PMD entry maps 2MB.  This patch simply
disallows hugepages in that case (we can do something cleverer to re-enable
them some other day).

Built, booted, and a handful of hugepage related tests passed on POWER5
LPAR (both ARCH=powerpc and ARCH=ppc64).

[0] http://gate.crashing.org/~benh/ppc64-64k-pages.diff

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:23 -08:00
Paul Mackerras
c613523455 Merge ../linux-2.6 2005-11-07 14:42:09 +11:00
Paul Mackerras
2249ca9d60 powerpc: Various UP build fixes
Mostly this involves adding #include <asm/smp.h>, since that defines
things like boot_cpuid[_phys] and [gs]et_hard_smp_processor_id, which
are SMP-related but still needed on UP.  This incorporates fixes
posted by Olof Johansson and Heikki Lindholm.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-07 13:18:13 +11:00
David Gibson
dcad47fc42 [PATCH] powerpc: Kill ppcdebug
The ancient ppcdebug/PPCDBG mechanism is now only used in two places.
First, in the hash setup code, one of the bits allows the size of the
hash table to be reduced by a factor of 8 - which would be better
accomplished with a command line option for that purpose.  The other
was a bunch of bus walking related messages in the iSeries code, which
would seem to be insufficient reason to keep the mechanism.

This patch removes the last traces of this mechanism.

Built and booted on iSeries and pSeries POWER5 LPAR (ARCH=powerpc).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-07 12:37:45 +11:00
Olof Johansson
723925b7b1 [PATCH] powerpc: Nicer printing of address at oops
Add nicer printing of faulting address on unresolvable kernel faults.

Makes life a little easier for those who don't know how to decode our
register contents at oops time.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-07 12:37:28 +11:00
Benjamin Herrenschmidt
3c726f8dee [PATCH] ppc64: support 64k pages
Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel
base page size to 64K.  The resulting kernel still boots on any
hardware.  On current machines with 4K pages support only, the kernel
will maintain 16 "subpages" for each 64K page transparently.

Note that while real 64K capable HW has been tested, the current patch
will not enable it yet as such hardware is not released yet, and I'm
still verifying with the firmware architects the proper to get the
information from the newer hypervisors.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-06 16:56:47 -08:00
Paul Mackerras
e2f2e58e79 powerpc: import a fix from arch/ppc/mm/pgtable.c
... namely, the change to the 2-argument pte_alloc_kernel.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-31 14:40:03 +11:00
Paul Mackerras
734d652480 powerpc: apply recent changes to merged code
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-31 13:57:01 +11:00
Paul Mackerras
23fd07750a Merge ../linux-2.6 by hand 2005-10-31 13:37:12 +11:00
Paul Mackerras
cf00a8d18b powerpc: Fix bug arising from having multiple memory_limit variables
We had a static memory_limit in prom.c, and then another one defined
in setup_64.c and used in numa.c, which resulted in the kernel crashing
when mem=xxx was given on the command line.  This puts the declaration
in system.h and the definition in mem.c.  This also moves the
definition of tce_alloc_start/end out of setup_64.c.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-31 13:07:02 +11:00
Paul Mackerras
3ee1fcac33 powerpc: import a gfp_t fix to arch/powerpc/mm/pgtable_32.c
This applies the same fix as Al Viro recently made to
arch/ppc/mm/pgtable.c.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-29 22:10:38 +10:00
Roland Dreier
8b150478ae [PATCH] ppc: make phys_mem_access_prot() work with pfns instead of addresses
Change the phys_mem_access_prot() function to take a pfn instead of an
address.  This allows mmap64() to work on /dev/mem for addresses above 4G
on 32-bit architectures.  We start with a pfn in mmap_mem(), so there's no
need to convert to an address; in fact, it's actively bad, since the
conversion can overflow when the address is above 4G.

Similarly fix the ppc32 page_is_ram() function to avoid a conversion to an
address by directly comparing to max_pfn.  Working with max_pfn instead of
high_memory fixes page_is_ram() to give the right answer for highmem pages.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-29 14:25:49 +10:00
Kumar Gala
cffb09ce6b [PATCH] powerpc: Fix warning related to do_dabr
do_dabr() is not relevant on 40x or Book-E processors so dont build it

Signed-off-by: Kumar K. Gala <kumar.gala@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-27 20:51:19 +10:00
David Gibson
e37bc5df8e [PATCH] powerpc: Purge bootinfo.h
With ARCH=powerpc we assume the presence of a device tree, so we don't
require any support for the old bi_recs method of passing boot
parameters.  Likewise, we've never needed it for ppc64, but we still
had an include/asm-ppc64/bootinfo.h from which nothing was used.  This
patch removes that file, and all references to it in arch/ppc64 and
arch/powerpc.  A related, unused variable 'boot_mem_size' is also
removed from setup_32.c.  The bootinfo stuff remains in ARCH=ppc for
the time being.

Built and booted on Power5 (ARCH=ppc64 and ARCH=powerpc), built for
32-bit powermac (ARCH=powerpc and ARCH=ppc).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-27 20:50:46 +10:00
Paul Mackerras
fa39dc437a powerpc32: Limit memory to lowmem if !CONFIG_HIGHMEM.
This trims off the extra unusable memory from the lmb structure,
so we don't try to use it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-26 21:54:21 +10:00
Paul Mackerras
985990137e Merge changes from linux-2.6 by hand 2005-10-22 16:51:34 +10:00
Paul Mackerras
5c8c56ebdf powerpc: Move agp_special_page export to where it is defined
... instead of exporting it in arch/*/kernel/ppc_ksyms.c.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-22 14:42:51 +10:00
Kumar Gala
b15125fa81 [PATCH] powerpc: Some more fixes to allow building for a Book-E processor
Some minor fixes that are needed if we are building for a book-e
processor.

Signed-off-by: Kumar K. Gala <kumar.gala@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-20 09:43:32 +10:00
Paul Mackerras
30cd4a4e9c powerpc: Initialize btext subsystem later, after prom_init
We were initializing the btext stuff from prom_init(), thus breaking
the rule that all communication between prom_init() and the rest of
the kernel has to be via the flattened device tree.  This removes
the btext initialization calls from prom_init() and initializes it
instead after the device tree is unflattened.  It would be nice to
do it earlier, but that needs some more infrastructure to find the
properties we need in the flattened device tree.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-17 19:20:46 +10:00
Paul Mackerras
3eac8c69d1 powerpc: Move default hash table size calculation to hash_utils_64.c
We weren't computing the size of the hash table correctly on iSeries
because the relevant code in prom.c was #ifdef CONFIG_PPC_PSERIES.
This moves the code to hash_utils_64.c, makes it unconditional, and
cleans it up a bit.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-12 16:58:53 +10:00
Stephen Rothwell
3a5f8c5f78 powerpc: make iSeries boot again
On ARCH=ppc64 we were getting htab_hash_mask recalculated
to the correct value for our particular machine by accident.
In the merge tree, that code was commented out, so htab_hash_mask
was being corrupted.

We now set ppc64_pft_size instead which gets htab_has_mask
calculated correctly for us later.  We should put an
ibm,pft-size property in the device tree at some point.

Also set -mno-minimal-toc in some makefiles.
Allow iSeries to configure PROC_DEVICETREE.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2005-10-12 10:58:00 +10:00
Paul Mackerras
b3b8dc6c07 powerpc: Use reg.h instead of processor.h when we just want reg names
Now that the register names and bit definitions are all in reg.h,
use that instead of processor.h in assembly code in a few places.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-10 22:20:10 +10:00
Paul Mackerras
ab1f9dac6e powerpc: Merge arch/ppc64/mm to arch/powerpc/mm
This moves the remaining files in arch/ppc64/mm to arch/powerpc/mm,
and arranges that we use them when compiling with ARCH=ppc64.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-10 21:58:35 +10:00
Paul Mackerras
70d64ceaa1 powerpc: Rename files to have consistent _32/_64 suffixes
This doesn't change any code, just renames things so we consistently
have foo_32.c and foo_64.c where we have separate 32- and 64-bit
versions.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-10 21:52:43 +10:00
Paul Mackerras
7c8c6b9776 powerpc: Merge lmb.c and make MM initialization use it.
This also creates merged versions of do_init_bootmem, paging_init
and mem_init and moves them to arch/powerpc/mm/mem.c.  It gets rid
of the mem_pieces stuff.

I made memory_limit a parameter to lmb_enforce_memory_limit rather
than a global referenced by that function.  This will require some
small changes to ppc64 if we want to continue building ARCH=ppc64
using the merged lmb.c.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-06 12:23:33 +10:00
Paul Mackerras
20c8c21063 powerpc: Fixes to get the merged kernel to boot on powermac.
This merges ppc_ksyms.c, puts back the actual do_execve call in
sys_execve, makes init_MMU call find_end_of_memory rather than
ppc_md.find_end_of_memory (every platform has a device tree
with a /memory node now, right?) and fixes some problems with the
mpic initialization on newworld powermacs.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-09-28 20:28:14 +10:00
Paul Mackerras
14cf11af6c powerpc: Merge enough to start building in arch/powerpc.
This creates the directory structure under arch/powerpc and a bunch
of Kconfig files.  It does a first-cut merge of arch/powerpc/mm,
arch/powerpc/lib and arch/powerpc/platforms/powermac.  This is enough
to build a 32-bit powermac kernel with ARCH=powerpc.

For now we are getting some unmerged files from arch/ppc/kernel and
arch/ppc/syslib, or arch/ppc64/kernel.  This makes some minor changes
to files in those directories and files outside arch/powerpc.

The boot directory is still not merged.  That's going to be interesting.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-09-26 16:04:21 +10:00