linux

Author	SHA1	Message	Date
Scott Wood	0d61f0b3e2	powerpc/booke64: Move mb() to __set_pte_at() with kernel-addr test map_kernel() doesn't catch all places that create kernel PTEs. In particular, vmalloc() calls set_pte_at() directly. This causes a crash when booting a non-SMP kernel on e6500. Move the sync to __set_pte(), to be executed only for kernel addresses. Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-08-07 23:00:01 -05:00
Scott Wood	2f7d2b74a9	powerpc/mm: Don't call __flush_dcache_icache_phys() with PA>VA __flush_dcache_icache_phys() requires the ability to access the memory with the MMU disabled, which means that on a 32-bit system any memory above 4 GiB is inaccessible. In particular, mpc86xx is 32-bit and can have more than 4 GiB of RAM. Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-08-07 22:59:20 -05:00
Anton Blanchard	eab861a7a5	powerpc: Add plain English description for alignment exception oopses If we take an alignment exception which we cannot fix, the oops currently prints: Unable to handle kernel paging request for unknown fault Lets print something more useful: Unable to handle kernel paging request for unaligned access at address 0xc0000000f77bba8f Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-07-06 20:24:35 +10:00
Linus Torvalds	9d90f03531	Replace module_init with appropriate alternate initcall in non modules. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVkO6nAAoJEOvOhAQsB9HWpHMP/Aknc+lmX2dZeIn96gdkP+UK 1qL24C5oq2sm/9yTZLdoXbyApLaaTbAJHS9O4kolaOU6uOs3JrgtXqL1697PVp1R qV4f4DOzXmmEHaE2oO21afAri3tXIVQNqA2NQl2TmKfwz0Atu01Vj5RJPu/ZOBPl dONXcFnE6nO2p7AEFRP/GfDZwkng4xALyZPhwL7tJDAeGaBpqG/n2hCuq+Szn9g8 wjTFACBdad/mRrYsL6YsWZ1e+LKI8vsArQbdPTam+jPaEUlK7yjFReFKCJVzL2JP xfQoTcCgFztzTUV0JTGR9sqeYA3WH9AkJOFDxNE/eIili4xiTh789WbEpHLVECSX 1LsW025I3DkRWBPT4L+9ZP805ha71kNXDFc5N3XJkzrCYaFvD2BgsUzxi6FXj7aC 9lEVKt6xO04FFG5SwTKnO0f8PEhPemZH3BDnVvjBDWQYLjUcPSNz7bfyHUhif0G5 ulOGVB0ncJJF9iP8PyZs1RA/F8kKxXWnhYMIHzvl0f0vLUA7rAKsACnhBgq8s9ZQ uM5YjzU91Z/4pe5C2E5MmQIZ84b79ZPsee1lF0GJdjK5W3PDvnCjIdXfQ5M/f3S8 76cssXWNhS78/P+19YqirLeb0u7Zw0jf73m9t9ywRgcByWfY5ZUDm0DFpQnWKkoR QY/aFO/yHKTO3VHj8Ril =KDJO -----END PGP SIGNATURE----- Merge tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull module_init replacement part two from Paul Gortmaker: "Replace module_init with appropriate alternate initcall in non modules. This series converts non-modular code that is using the module_init() call to hook itself into the system to instead use one of our alternate priority initcalls. Unlike the previous series that used device_initcall and hence was a runtime no-op, these commits change to one of the alternate initcalls, because (a) we have them and (b) it seems like the right thing to do. For example, it would seem logical to use arch_initcall for arch specific setup code and fs_initcall for filesystem setup code. This does mean however, that changes in the init ordering will be taking place, and so there is a small risk that some kind of implicit init ordering issue may lie uncovered. But I think it is still better to give these ones sensible priorities than to just assign them all to device_initcall in order to exactly preserve the old ordering. Thad said, we have already made similar changes in core kernel code in commit `c96d6660dc` ("kernel: audit/fix non-modular users of module_init in core code") without any regressions reported, so this type of change isn't without precedent. It has also got the same local testing and linux-next coverage as all the other pull requests that I'm sending for this merge window have got. Once again, there is an unused module_exit function removal that shows up as an outlier upon casual inspection of the diffstat" * tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling mm/page_owner.c: use late_initcall to hook in enabling lib/list_sort: use late_initcall to hook in self tests arm: use subsys_initcall in non-modular pl320 IPC code powerpc: don't use module_init for non-modular core hugetlb code powerpc: use subsys_initcall for Freescale Local Bus x86: don't use module_init for non-modular core bootflag code netfilter: don't use module_init/exit in core IPV4 code fs/notify: don't use module_init for non-modular inotify_user code mm: replace module_init usages with subsys_initcall in nommu.c	2015-07-02 10:36:29 -07:00
Linus Torvalds	8d7804a2f0	Driver core patches for 4.2-rc1 Here is the driver core / firmware changes for 4.2-rc1. A number of small changes all over the place in the driver core, and in the firmware subsystem. Nothing really major, full details in the shortlog. Some of it is a bit of churn, given that the platform driver probing changes was found to not work well, so they were reverted. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlWNoCQACgkQMUfUDdst+ym4JACdFrrXoMt2pb8nl5gMidGyM9/D jg8AnRgdW8ArDA/xOarULd/X43eA3J3C =Al2B -----END PGP SIGNATURE----- Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the driver core / firmware changes for 4.2-rc1. A number of small changes all over the place in the driver core, and in the firmware subsystem. Nothing really major, full details in the shortlog. Some of it is a bit of churn, given that the platform driver probing changes was found to not work well, so they were reverted. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits) Revert "base/platform: Only insert MEM and IO resources" Revert "base/platform: Continue on insert_resource() error" Revert "of/platform: Use platform_device interface" Revert "base/platform: Remove code duplication" firmware: add missing kfree for work on async call fs: sysfs: don't pass count == 0 to bin file readers base:dd - Fix for typo in comment to function driver_deferred_probe_trigger(). base/platform: Remove code duplication of/platform: Use platform_device interface base/platform: Continue on insert_resource() error base/platform: Only insert MEM and IO resources firmware: use const for remaining firmware names firmware: fix possible use after free on name on asynchronous request firmware: check for file truncation on direct firmware loading firmware: fix __getname() missing failure check drivers: of/base: move of_init to driver_init drivers/base: cacheinfo: fix annoying typo when DT nodes are absent sysfs: disambiguate between "error code" and "failure" in comments driver-core: fix build for !CONFIG_MODULES driver-core: make __device_attach() static ...	2015-06-26 15:07:37 -07:00
Aneesh Kumar K.V	8809aa2d28	mm: clarify that the function operates on hugepage pte We have confusing functions to clear pmd, pmd_clear_* and pmd_clear. Add _huge_ to pmdp_clear functions so that we are clear that they operate on hugepage pte. We don't bother about other functions like pmdp_set_wrprotect, pmdp_clear_flush_young, because they operate on PTE bits and hence indicate they are operating on hugepage ptes Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-24 17:49:44 -07:00
Aneesh Kumar K.V	f28b6ff8c3	powerpc/mm: use generic version of pmdp_clear_flush() Also move the pmd_trans_huge check to generic code. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-24 17:49:44 -07:00
Aneesh Kumar K.V	15a25b2ead	mm/thp: split out pmd collapse flush into separate functions Architectures like ppc64 [1] need to do special things while clearing pmd before a collapse. For them this operation is largely different from a normal hugepage pte clear. Hence add a separate function to clear pmd before collapse. After this patch pmdp_* functions operate only on hugepage pte, and not on regular pmd_t values pointing to page table. [1] ppc64 needs to invalidate all the normal page pte mappings we already have inserted in the hardware hash page table. But before doing that we need to make sure there are no parallel hash page table insert going on. So we need to do a kick_all_cpus_sync() before flushing the older hash table entries. By moving this to a separate function we capture these details and mention how it is different from a hugepage pte clear. This patch is a cleanup and only does code movement for clarity. There should not be any change in functionality. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-24 17:49:44 -07:00
Zhang Zhen	e81f2d2237	mm/hugetlb: reduce arch dependent code about huge_pmd_unshare Currently we have many duplicates in definitions of huge_pmd_unshare. In all architectures this function just returns 0 when CONFIG_ARCH_WANT_HUGE_PMD_SHARE is N. This patch puts the default implementation in mm/hugetlb.c and lets these architectures use the common code. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: David Rientjes <rientjes@google.com> Cc: James Yang <James.Yang@freescale.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-24 17:49:41 -07:00
Linus Torvalds	08d183e3c1	powerpc updates for 4.2 - Disable the 32-bit vdso when building LE, so we can build with a 64-bit only toolchain. - EEH fixes from Gavin & Richard. - Enable the sys_kcmp syscall from Laurent. - Sysfs control for fastsleep workaround from Shreyas. - Expose OPAL events as an irq chip by Alistair. - MSI ops moved to pci_controller_ops by Daniel. - Fix for kernel to userspace backtraces for perf from Anton. - Merge pseries and pseries_le defconfigs from Cyril. - CXL in-kernel API from Mikey. - OPAL prd driver from Jeremy. - Fix for DSCR handling & tests from Anshuman. - Powernv flash mtd driver from Cyril. - Dynamic DMA Window support on powernv from Alexey. - LLVM clang fixes & workarounds from Anton. - Reworked version of the patch to abort syscalls when transactional. - Fix the swap encoding to support 4TB, from Aneesh. - Various fixes as usual. - Freescale updates from Scott: Highlights include more 8xx optimizations, an e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and various fixes and cleanup. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJViSZqAAoJEFHr6jzI4aWAA7kQAKq3+pejfo2rY7alpKJyeVao vlaIEaDNOTh+ctcmu3MFF9Jy6fai8gNZziRXU5JRmE5RW4GVBN4KZiqXRbkVjdBK uG9sCX7Y58VRsS2vnGBYLsamfTMgjaXeDvgunQHVLiechJnrDr0RHEK90F3LSi73 Axp6l8XIG63a3zFZmkhzANMCme2lm5+MWmGlSjUUNi5F+viQUgJc5iiO8xrVUgM5 RpNlV2NJSqFiU+gMQWJ226V85UIniouq4j+qtyUcu8/m9BberyolXVU0GPlPFdsx r/Qh9uCJyZaUdSB5hzomQZj50IsSz6J6nEuJTeGRoVZOmeI8Dnc2xU9fxQF5fC8H lUJw10WPoNOggQZTeSUKn7wTXw3i4p3KsWNUczaW68VJdhqZUVaSp0+I6mnDSqzs 9iGC+VffLYNa1OHq7mGRFrgDdLBCHes31aZ3CxlQsmyNpAPCwMzsD4TUfVnvOG6E oJOeaQ4mZM9PvqxEYJfoIL+vgRxmQ8sdIBtNY4in+C7J6eFnZNFO9xmPnJZuVU31 PGtx60kjFCOVMXvqn34WkRNbgqGWI91IK0KcRwFO2LXVio1uY77TWL52kNK2IMsp Az+VDDvqnT3+BoV1yz0P6SrXAkwTpvFk2y+IdmEiUUN7zZFL5ZSA2epej9AzHTAK WID2bc5yVtIL6p6x5ICH =d9Wh -----END PGP SIGNATURE----- Merge tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - disable the 32-bit vdso when building LE, so we can build with a 64-bit only toolchain. - EEH fixes from Gavin & Richard. - enable the sys_kcmp syscall from Laurent. - sysfs control for fastsleep workaround from Shreyas. - expose OPAL events as an irq chip by Alistair. - MSI ops moved to pci_controller_ops by Daniel. - fix for kernel to userspace backtraces for perf from Anton. - merge pseries and pseries_le defconfigs from Cyril. - CXL in-kernel API from Mikey. - OPAL prd driver from Jeremy. - fix for DSCR handling & tests from Anshuman. - Powernv flash mtd driver from Cyril. - dynamic DMA Window support on powernv from Alexey. - LLVM clang fixes & workarounds from Anton. - reworked version of the patch to abort syscalls when transactional. - fix the swap encoding to support 4TB, from Aneesh. - various fixes as usual. - Freescale updates from Scott: Highlights include more 8xx optimizations, an e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and various fixes and cleanup. * tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits) cxl: Fix typo in debug print cxl: Add CXL_KERNEL_API config option powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma() powerpc/mm: Change the swap encoding in pte. powerpc/mm: PTE_RPN_MAX is not used, remove the same powerpc/tm: Abort syscalls in active transactions powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off powerpc/include: Add opal-prd to installed uapi headers powerpc/powernv: fix construction of opal PRD messages powerpc/powernv: Increase opal-irqchip initcall priority powerpc: Make doorbell check preemption safe powerpc/powernv: pnv_init_idle_states() should only run on powernv macintosh/nvram: Remove as unused powerpc: Don't use gcc specific options on clang powerpc: Don't use -mno-strict-align on clang powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it powerpc: Only use -mabi=altivec if toolchain supports it powerpc: Fix duplicate const clang warning in user access code vfio: powerpc/spapr: Support Dynamic DMA windows vfio: powerpc/spapr: Register memory and define IOMMU v2 ...	2015-06-24 08:46:32 -07:00
Michael Ellerman	6096f88451	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Highlights include more 8xx optimizations, an e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and various fixes and cleanup."	2015-06-19 17:23:48 +10:00
Paul Gortmaker	6f114281c4	powerpc: don't use module_init for non-modular core hugetlb code The hugetlbpage.o is obj-y (always built in). It will never be modular, so using module_init as an alias for __initcall is somewhat misleading. Fix this up now, so that we can relocate module_init from init.h into module.h in the future. If we don't do this, we'd have to add module.h to obviously non-modular code, and that would be a worse thing. Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of arch_initcall (which makes sense for arch code) will thus change this registration from level 6-device to level 3-arch (i.e. slightly earlier). However no observable impact of that small difference has been observed during testing, or is expected. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2015-06-16 14:12:34 -04:00
Alexey Kardashevskiy	15b244a88e	powerpc/mmu: Add userspace-to-physical addresses translation cache We are adding support for DMA memory pre-registration to be used in conjunction with VFIO. The idea is that the userspace which is going to run a guest may want to pre-register a user space memory region so it all gets pinned once and never goes away. Having this done, a hypervisor will not have to pin/unpin pages on every DMA map/unmap request. This is going to help with multiple pinning of the same memory. Another use of it is in-kernel real mode (mmu off) acceleration of DMA requests where real time translation of guest physical to host physical addresses is non-trivial and may fail as linux ptes may be temporarily invalid. Also, having cached host physical addresses (compared to just pinning at the start and then walking the page table again on every H_PUT_TCE), we can be sure that the addresses which we put into TCE table are the ones we already pinned. This adds a list of memory regions to mm_context_t. Each region consists of a header and a list of physical addresses. This adds API to: 1. register/unregister memory regions; 2. do final cleanup (which puts all pre-registered pages); 3. do userspace to physical address translation; 4. manage usage counters; multiple registration of the same memory is allowed (once per container). This implements 2 counters per registered memory region: - @mapped: incremented on every DMA mapping; decremented on unmapping; initialized to 1 when a region is just registered; once it becomes zero, no more mappings allowe; - @used: incremented on every "register" ioctl; decremented on "unregister"; unregistration is allowed for DMA mapped regions unless it is the very last reference. For the very last reference this checks that the region is still mapped and returns -EBUSY so the userspace gets to know that memory is still pinned and unregistration needs to be retried; @used remains 1. Host physical addresses are stored in vmalloc'ed array. In order to access these in the real mode (mmu off), there is a real_vmalloc_addr() helper. In-kernel acceleration patchset will move it from KVM to MMU code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:54 +10:00
Aneesh Kumar K.V	cfcb3d80a2	powerpc/mm: Add trace point for tracking hash pte fault This enables us to understand how many hash fault we are taking when running benchmarks. For ex: -bash-4.2# ./perf stat -e powerpc:hash_fault -e page-faults /tmp/ebizzy.ppc64 -S 30 -P -n 1000 ... Performance counter stats for '/tmp/ebizzy.ppc64 -S 30 -P -n 1000': 1,10,04,075 powerpc:hash_fault 1,10,03,429 page-faults 30.865978991 seconds time elapsed NOTE: The impact of the tracepoint was not noticeable when running test. It was within the run-time variance of the test. For ex: without-patch: -------------- Performance counter stats for './a.out 3000 300': 643 page-faults # 0.089 M/sec 7.236562 task-clock (msec) # 0.928 CPUs utilized 2,179,213 stalled-cycles-frontend # 0.00% frontend cycles idle 17,174,367 stalled-cycles-backend # 0.00% backend cycles idle 0 context-switches # 0.000 K/sec 0.007794658 seconds time elapsed And with-patch: --------------- Performance counter stats for './a.out 3000 300': 643 page-faults # 0.089 M/sec 7.233746 task-clock (msec) # 0.921 CPUs utilized 0 context-switches # 0.000 K/sec 0.007854876 seconds time elapsed Performance counter stats for './a.out 3000 300': 643 page-faults # 0.087 M/sec 649 powerpc:hash_fault # 0.087 M/sec 7.430376 task-clock (msec) # 0.938 CPUs utilized 2,347,174 stalled-cycles-frontend # 0.00% frontend cycles idle 17,524,282 stalled-cycles-backend # 0.00% backend cycles idle 0 context-switches # 0.000 K/sec 0.007920284 seconds time elapsed Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-10 14:06:29 +10:00
Greg Kroah-Hartman	987aec39a7	Merge 4.1-rc7 into driver-core-next We want the fixes in this branch as well for testing and merge resolution. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-06-08 10:19:40 -07:00
Michael Neuling	ec249dd860	cxl: Move include file cxl.h -> cxl-base.h This moves the current include file from cxl.h -> cxl-base.h. This current include file is used only to pass information between the base driver that needs to be built into the kernel and the cxl module. This is to make way for a new include/misc/cxl.h which will contain just the kernel API for other driver to use Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-03 13:27:19 +10:00
Michael Neuling	85a97da958	powerpc/copro: Fix faulting kernel segments This fixes calculating the key bits (KP and KS) in the SLB VSID for kernel mappings. I'm not CCing this to stable as there are no uses of this currently. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-03 13:27:15 +10:00
Scott Wood	6c0cc62715	powerpc/mm: Use PFN_PHYS() in devmem_is_allowed() This function can run on systems where physical addresses don't fit in unsigned long, so make sure to use the macro that contains the proper cast. Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-06-02 21:37:23 -05:00
Scott Wood	c89ca8ab74	powerpc/e6500: Optimize hugepage TLB misses Some workloads take a lot of TLB misses despite using traditional hugepages. Handle these TLB misses in the asm fastpath rather than going through a bunch of C code. With this patch I measured around a 5x speedup in handling hugepage TLB misses. Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-06-02 21:37:19 -05:00
Ingo Molnar	f407a82586	Merge branch 'linus' into sched/core, to resolve conflict Conflicts: arch/sparc/include/asm/topology_64.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-02 08:05:42 +02:00
Michael Ellerman	09f3f326cd	powerpc/mm: Fix build break with STRICT_MM_TYPECHECKS && DEBUG_PAGEALLOC If both STRICT_MM_TYPECHECKS and DEBUG_PAGEALLOC are enabled, the code in kernel_map_linear_page() is built, and so we fail with: arch/powerpc/mm/hash_utils_64.c:1478:2: error: incompatible type for argument 1 of 'htab_convert_pte_flags' Fix it by using pgprot_val(). Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-02 13:24:48 +10:00
Bartosz Golaszewski	06931e6224	sched/topology: Rename topology_thread_cpumask() to topology_sibling_cpumask() Rename topology_thread_cpumask() to topology_sibling_cpumask() for more consistency with scheduler code. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Benoit Cousson <bcousson@baylibre.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jean Delvare <jdelvare@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/1432645896-12588-2-git-send-email-bgolaszewski@baylibre.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 15:22:15 +02:00
Luis R. Rodriguez	ecc8617053	module: add extra argument for parse_params() callback This adds an extra argument onto parse_params() to be used as a way to make the unused callback a bit more useful and generic by allowing the caller to pass on a data structure of its choice. An example use case is to allow us to easily make module parameters for every module which we will do next. @ parse @ identifier name, args, params, num, level_min, level_max; identifier unknown, param, val, doing; type s16; @@ extern char parse_args(const char name, char args, const struct kernel_param params, unsigned num, s16 level_min, s16 level_max, + void arg, int (unknown)(char param, char val, const char doing + , void arg )); @ parse_mod @ identifier name, args, params, num, level_min, level_max; identifier unknown, param, val, doing; type s16; @@ char parse_args(const char name, char args, const struct kernel_param params, unsigned num, s16 level_min, s16 level_max, + void arg, int (unknown)(char param, char val, const char doing + , void arg )) { ... } @ parse_args_found @ expression R, E1, E2, E3, E4, E5, E6; identifier func; @@ ( R = parse_args(E1, E2, E3, E4, E5, E6, + NULL, func); \| R = parse_args(E1, E2, E3, E4, E5, E6, + NULL, &func); \| R = parse_args(E1, E2, E3, E4, E5, E6, + NULL, NULL); \| parse_args(E1, E2, E3, E4, E5, E6, + NULL, func); \| parse_args(E1, E2, E3, E4, E5, E6, + NULL, &func); \| parse_args(E1, E2, E3, E4, E5, E6, + NULL, NULL); ) @ parse_args_unused depends on parse_args_found @ identifier parse_args_found.func; @@ int func(char param, char val, const char unused + , void arg ) { ... } @ mod_unused depends on parse_args_found @ identifier parse_args_found.func; expression A1, A2, A3; @@ - func(A1, A2, A3); + func(A1, A2, A3, NULL); Generated-by: Coccinelle SmPL Cc: cocci@systeme.lip6.fr Cc: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Hellwig <hch@infradead.org> Cc: Felipe Contreras <felipe.contreras@gmail.com> Cc: Ewan Milne <emilne@redhat.com> Cc: Jean Delvare <jdelvare@suse.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Jani Nikula <jani.nikula@intel.com> Cc: linux-kernel@vger.kernel.org Reviewed-by: Tejun Heo <tj@kernel.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-05-20 00:25:24 -07:00
David Hildenbrand	70ffdb9393	mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler Introduce faulthandler_disabled() and use it to check for irq context and disabled pagefaults (via pagefault_disable()) in the pagefault handlers. Please note that we keep the in_atomic() checks in place - to detect whether in irq context (in which case preemption is always properly disabled). In contrast, preempt_disable() should never be used to disable pagefaults. With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt counter, and therefore the result of in_atomic() differs. We validate that condition by using might_fault() checks when calling might_sleep(). Therefore, add a comment to faulthandler_disabled(), describing why this is needed. faulthandler_disabled() and pagefault_disable() are defined in linux/uaccess.h, so let's properly add that include to all relevant files. This patch is based on a patch from Thomas Gleixner. Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bigeasy@linutronix.de Cc: borntraeger@de.ibm.com Cc: daniel.vetter@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: hocko@suse.cz Cc: hughd@google.com Cc: mst@redhat.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: schwidefsky@de.ibm.com Cc: yang.shi@windriver.com Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 08:39:15 +02:00
David Hildenbrand	2cb7c9cb42	sched/preempt, mm/kmap: Explicitly disable/enable preemption in kmap_atomic_* The existing code relies on pagefault_disable() implicitly disabling preemption, so that no schedule will happen between kmap_atomic() and kunmap_atomic(). Let's make this explicit, to prepare for pagefault_disable() not touching preemption anymore. Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bigeasy@linutronix.de Cc: borntraeger@de.ibm.com Cc: daniel.vetter@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: hocko@suse.cz Cc: hughd@google.com Cc: mst@redhat.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: schwidefsky@de.ibm.com Cc: yang.shi@windriver.com Link: http://lkml.kernel.org/r/1431359540-32227-5-git-send-email-dahi@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 08:39:14 +02:00
Aneesh Kumar K.V	7b868e81be	powerpc/mm: Return NULL for not present hugetlb page We need to check whether pte is present in follow_huge_addr() and properly return NULL if mapping is not present. Also use READ_ONCE when dereferencing pte_t address. Without this patch, we may wrongly return a zero pfn page in follow_huge_addr(). Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-05-12 11:04:29 +10:00
Aneesh Kumar K.V	13bd817bb8	powerpc/thp: Serialize pmd clear against a linux page table walk. Serialize against find_linux_pte_or_hugepte() which does lock-less lookup in page tables with local interrupts disabled. For huge pages it casts pmd_t to pte_t. Since the format of pte_t is different from pmd_t we want to prevent transit from pmd pointing to page table to pmd pointing to huge page (and back) while interrupts are disabled. We clear pmd to possibly replace it with page table pointer in different code paths. So make sure we wait for the parallel find_linux_pte_or_hugepage() to finish. Without this patch, a find_linux_pte_or_hugepte() running in parallel to __split_huge_zero_page_pmd() or do_huge_pmd_wp_page_fallback() or zap_huge_pmd() can run into the above issue. With __split_huge_zero_page_pmd() and do_huge_pmd_wp_page_fallback() we clear the hugepage pte before inserting the pmd entry with a regular pgtable address. Such a clear need to wait for the parallel find_linux_pte_or_hugepte() to finish. With zap_huge_pmd(), we can run into issues, with a hugepage pte getting zapped due to a MADV_DONTNEED while other cpu fault it in as small pages. Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-05-12 11:03:29 +10:00
Aneesh Kumar K.V	2e826695d8	powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled This fix the below build error arch/powerpc/mm/hash_utils_64.c: In function ‘flush_hash_hugepage’: arch/powerpc/mm/hash_utils_64.c:1381:1: error: label at end of compound statement tm_abort: ^ make[1]: *** [arch/powerpc/mm/hash_utils_64.o] Error 1 Reported-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-04-23 17:42:14 +10:00
Aneesh Kumar K.V	7d6e7f7ffa	powerpc/mm/thp: Return pte address if we find trans_splitting. For THP that is marked trans splitting, we return the pte. This require the callers to handle the pmd_trans_splitting scenario, if they care. All the current callers are either looking at pfn or write_ok, hence we don't need to update them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-04-17 11:23:40 +10:00
Aneesh Kumar K.V	691e95fd73	powerpc/mm/thp: Make page table walk safe against thp split/collapse We can disable a THP split or a hugepage collapse by disabling irq. We do send IPI to all the cpus in the early part of split/collapse, and disabling local irq ensure we don't make progress with split/collapse. If the THP is getting split we return NULL from find_linux_pte_or_hugepte(). For all the current callers it should be ok. We need to be careful if we want to use returned pte_t pointer outside the irq disabled region. W.r.t to THP split, the pfn remains the same, but then a hugepage collapse will result in a pfn change. There are few steps we can take to avoid a hugepage collapse.One way is to take page reference inside the irq disable region. Other option is to take mmap_sem so that a parallel collapse will not happen. We can also disable collapse by taking pmd_lock. Another method used by kvm subsystem is to check whether we had a mmu_notifer update in between using mmu_notifier_retry(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-04-17 11:23:39 +10:00
Michael Ellerman	1cbee462a5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into fixes	2015-04-17 11:22:51 +10:00
Linus Torvalds	d19d5efd8c	powerpc updates for 4.1 - Numerous minor fixes, cleanups etc. - More EEH work from Gavin to remove its dependency on device_nodes. - Memory hotplug implemented entirely in the kernel from Nathan Fontenot. - Removal of redundant CONFIG_PPC_OF by Kevin Hao. - Rewrite of VPHN parsing logic & tests from Greg Kurz. - A fix from Nish Aravamudan to reduce memory usage by clamping nodes_possible_map. - Support for pstore on powernv from Hari Bathini. - Removal of old powerpc specific byte swap routines by David Gibson. - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing your firmware when it wasn't. - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver. - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek. - Some fixes for migration from Tyrel Datwyler. - A new syscall to switch the cpu endian by Michael Ellerman. - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn. - A fix for the OPAL sensor driver from Cédric Le Goater. - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman. - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per machine. - Small patch from Sam Bobroff to explicitly abort non-suspended transactions on syscalls, plus a test to exercise it. - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu. - Small patch to enable the hard lockup detector from Anton Blanchard. - Fix from Dave Olson for missing L2 cache information on some CPUs. - Some fixes from Michael Ellerman to get Cell machines booting again. - Freescale updates from Scott: Highlights include BMan device tree nodes, an MSI erratum workaround, a couple minor performance improvements, config updates, and misc fixes/cleanup. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVL2cxAAoJEFHr6jzI4aWAR8cP/19VTo/CzCE4ffPSx7qR464n F+WFZcbNjIMXu6+B0YLuJZEsuWtKKrCit/MCg3+mSgE4iqvxmtI+HDD0445Buszj UD4E4HMdPrXQ+KUSUDORvRjv/FFUXIa94LSv/0g2UeMsPz/HeZlhMxEu7AkXw9Nf rTxsmRTsOWME85Y/c9ss7XHuWKXT3DJV7fOoK9roSaN3dJAuWTtG3WaKS0nUu0ok 0M81D6ZczoD6ybwh2DUMPD9K6SGxLdQ4OzQwtW6vWzcQIBDfy5Pdeo0iAFhGPvXf T4LLPkv4cF4AwHsAC4rKDPHQNa+oZBoLlScrHClaebAlDiv+XYKNdMogawUObvSh h7avKmQr0Ygp1OvvZAaXLhuDJI9FJJ8lf6AOIeULgHsDR9SyKMjZWxRzPe11uarO Fyi0qj3oJaQu6LjazZraApu8mo+JBtQuD3z3o5GhLxeFtBBF60JXj6zAXJikufnl kk1/BUF10nKUhtKcDX767AMUCtMH3fp5hx8K/z9T5v+pobJB26Wup1bbdT68pNBT NjdKUppV6QTjZvCsA6U2/ECu6E9KeIaFtFSL2IRRoiI0dWBN5/5eYn3RGkO2ZFoL 1NdwKA2XJcchwTPkpSRrUG70sYH0uM2AldNYyaLfjzrQqza7Y6lF699ilxWmCN/H OplzJAE5cQ8Am078veTW =03Yh -----END PGP SIGNATURE----- Merge tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - Numerous minor fixes, cleanups etc. - More EEH work from Gavin to remove its dependency on device_nodes. - Memory hotplug implemented entirely in the kernel from Nathan Fontenot. - Removal of redundant CONFIG_PPC_OF by Kevin Hao. - Rewrite of VPHN parsing logic & tests from Greg Kurz. - A fix from Nish Aravamudan to reduce memory usage by clamping nodes_possible_map. - Support for pstore on powernv from Hari Bathini. - Removal of old powerpc specific byte swap routines by David Gibson. - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing your firmware when it wasn't. - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver. - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek. - Some fixes for migration from Tyrel Datwyler. - A new syscall to switch the cpu endian by Michael Ellerman. - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn. - A fix for the OPAL sensor driver from Cédric Le Goater. - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman. - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per machine. - Small patch from Sam Bobroff to explicitly abort non-suspended transactions on syscalls, plus a test to exercise it. - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu. - Small patch to enable the hard lockup detector from Anton Blanchard. - Fix from Dave Olson for missing L2 cache information on some CPUs. - Some fixes from Michael Ellerman to get Cell machines booting again. - Freescale updates from Scott: Highlights include BMan device tree nodes, an MSI erratum workaround, a couple minor performance improvements, config updates, and misc fixes/cleanup. * tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits) powerpc/powermac: Fix build error seen with powermac smp builds powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE powerpc: Remove PPC32 code from pseries specific find_and_init_phbs() powerpc/cell: Fix iommu breakage caused by controller_ops change powerpc/eeh: Fix crash in eeh_add_device_early() on Cell powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails powerpc/pseries: Correct memory hotplug locking powerpc: Fix missing L2 cache size in /sys/devices/system/cpu powerpc: Add ppc64 hard lockup detector support oprofile: Disable oprofile NMI timer on ppc64 powerpc/perf/hv-24x7: Add missing put_cpu_var() powerpc/perf/hv-24x7: Break up single_24x7_request powerpc/perf/hv-24x7: Define update_event_count() powerpc/perf/hv-24x7: Whitespace cleanup powerpc/perf/hv-24x7: Define add_event_to_24x7_request() powerpc/perf/hv-24x7: Rename hv_24x7_event_update powerpc/perf/hv-24x7: Move debug prints to separate function powerpc/perf/hv-24x7: Drop event_24x7_request() powerpc/perf/hv-24x7: Use pr_devel() to log message ... Conflicts: tools/testing/selftests/powerpc/Makefile tools/testing/selftests/powerpc/tm/Makefile	2015-04-16 13:53:32 -05:00
Scott Wood	50c6a665b3	powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range() Commit `dc6c9a35b6` ("mm: account pmd page tables to the process") added a counter that is incremented whenever a PMD is allocated and decremented whenever a PMD is freed. For hugepages on PPC, common code is used to allocated PMDs, but arch-specific code is used to free PMDs. This results in kernel output such as "BUG: non-zero nr_pmds on freeing mm: 1" when using hugepages. Update the PPC hugepage PMD freeing code to decrement the count, just as the above commit did for free_pmd_range(). Fixes: `dc6c9a35b6` ("mm: account pmd page tables to the process") Signed-off-by: Scott Wood <scottwood@freescale.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # 4.0.x	2015-04-15 15:24:22 -05:00
Kees Cook	2b68f6caea	mm: expose arch_mmap_rnd when available When an architecture fully supports randomizing the ELF load location, a per-arch mmap_rnd() function is used to find a randomized mmap base. In preparation for randomizing the location of ET_DYN binaries separately from mmap, this renames and exports these functions as arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE for describing this feature on architectures that support it (which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390 already supports a separated ET_DYN ASLR from mmap ASLR without the ARCH_BINFMT_ELF_RANDOMIZE_PIE logic). Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Russell King <linux@arm.linux.org.uk> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "David A. Long" <dave.long@linaro.org> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Arun Chandran <achandran@mvista.com> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Min-Hua Chen <orca.chen@gmail.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Alex Smith <alex@alex-smith.me.uk> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Vineeth Vijayan <vvijayan@mvista.com> Cc: Jeff Bailey <jeffbailey@google.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Behan Webster <behanw@converseincode.com> Cc: Ismael Ripoll <iripoll@upv.es> Cc: Jan-Simon Mller <dl9pf@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-14 16:49:05 -07:00
Kees Cook	ed6322746a	powerpc: standardize mmap_rnd() usage In preparation for splitting out ET_DYN ASLR, this refactors the use of mmap_rnd() to be used similarly to arm and x86. (Can mmap ASLR be safely enabled in the legacy mmap case here? Other archs use "mm->mmap_base = TASK_UNMAPPED_BASE + random_factor".) Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-14 16:49:05 -07:00
Michael Ellerman	f691fa1080	powerpc: Replace mem_init_done with slab_is_available() We have a powerpc specific global called mem_init_done which is "set on boot once kmalloc can be called". But that's not quite true. We set it at the bottom of mem_init(), and rely on the fact that mm_init() calls kmem_cache_init() immediately after that, and nothing is running in parallel. So replace it with the generic and 100% correct slab_is_available(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-04-10 20:02:48 +10:00
Michael Ellerman	4f9c53c8cc	powerpc: Fix compile errors with STRICT_MM_TYPECHECKS enabled Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Fix the 32-bit code also] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-04-10 20:02:47 +10:00
Michael Ellerman	5dd4e4f6fe	powerpc/mm: Change setbat() to take a pgprot_t rather than flags The callers of setbat() are actually passing a pgprot_t for the flags parameter. This doesn't matter unless STRICT_MM_TYPECHECKS is enabled. So we can turn that on without breaking the build, change setbat() to take a pgprot_t and have it convert it to an unsigned long internally. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-04-07 17:15:13 +10:00
Michael Ellerman	911083350e	powerpc/mm: Remove duplicate declaration of setbat() This is already declared in mmu_decl.h, so we don't need a second version in the C file. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-04-07 17:15:13 +10:00
Michael Ellerman	df60f57684	Merge branch 'next-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into test Merge miscellaneous bits from benh. Fix a minor conflict with OpalMessageType changing names to opal_msg_type.	2015-03-26 20:04:28 +11:00
Yanjiang Jin	e77553cb21	powerpc/mm: Free string after creating kmem cache kmem_cache_create()->kmem_cache_create_memcg()->kstrdup() allocates new space and copys name's content, so it is safe to free name memory after calling kmem_cache_create(). Else kmemleak will report the below warning: unreferenced object 0xc0000000f9002160 (size 16): comm "swapper/0", pid 0, jiffies 4294892296 (age 1386.640s) hex dump (first 16 bytes): 70 67 74 61 62 6c 65 2d 32 5e 39 00 de ad be ef pgtable-2^9..... backtrace: [<c0000000004e03ec>] .kvasprintf+0x5c/0xa0 [<c0000000004e045c>] .kasprintf+0x2c/0x50 [<c00000000002e36c>] .pgtable_cache_add+0xac/0x100 [<c00000000002e3e4>] .pgtable_cache_init+0x24/0x80 [<c000000000c6c67c>] .start_kernel+0x228/0x4c8 [<c000000000000594>] .start_here_common+0x24/0x90 Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-03-26 15:23:17 +11:00
Scott Wood	cc83458d3a	powerpc/32: %pF is only for function pointers Use %pS for actual addresses, otherwise you'll get bad output on arches like ppc64 where %pF expects a function descriptor. Even on other architectures, refrain from setting a bad example that people copy. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-03-23 15:14:48 +11:00
Nishanth Aravamudan	3af229f207	powerpc/numa: Reset node_possible_map to only node_online_map Raghu noticed an issue with excessive memory allocation on power with a simple cgroup test, specifically, in mem_cgroup_css_alloc -> for_each_node -> alloc_mem_cgroup_per_zone_info(), which ends up blowing up the kmalloc-2048 slab (to the order of 200MB for 400 cgroup directories). The underlying issue is that NODES_SHIFT on power is 8 (256 NUMA nodes possible), which defines node_possible_map, which in turn defines the value of nr_node_ids in setup_nr_node_ids and the iteration of for_each_node. In practice, we never see a system with 256 NUMA nodes, and in fact, we do not support node hotplug on power in the first place, so the nodes that are online when we come up are the nodes that will be present for the lifetime of this kernel. So let's, at least, drop the NUMA possible map down to the online map at runtime. This is similar to what x86 does in its initialization routines. mem_cgroup_css_alloc should also be fixed to only iterate over memory-populated nodes and handle hotplug, but that is a separate change. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-03-23 13:25:53 +11:00
Greg Kurz	3338a65bad	powerpc/vphn: parsing code rewrite The current VPHN parsing logic has some flaws that this patch aims to fix: 1) when the value 0xffff is read, the value 0xffffffff gets added to the the output list and its element count isn't incremented. This is wrong. According to PAPR+ the domain identifiers are packed into a sequence terminated by the "reserved value of all ones". This means that 0xffff is a stream terminator. 2) the combination of byteswaps and casts make the code hardly readable. Let's parse the stream one 16-bit field at a time instead. 3) it is assumed that the hypercall returns 12 32-bit values packed into 6 64-bit registers. According to PAPR+, the domain identifiers may be streamed as 16-bit values. Let's increase the number of expected numbers to 24. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-03-18 10:48:59 +11:00
Greg Kurz	4b6cfb2a8c	powerpc/vphn: move VPHN parsing logic to a separate file The goal behind this patch is to be able to write userland tests for the VPHN parsing code. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-03-18 10:48:59 +11:00
Greg Kurz	b1fc9484aa	powerpc/vphn: move endianness fixing to vphn_unpack_associativity() The first argument to vphn_unpack_associativity() is a const long *, but the parsing code expects __be64 values actually. Let's move the endian fixing down for consistency. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-03-18 10:48:59 +11:00
Greg Kurz	9d0e6d9db5	powerpc/vphn: clarify the H_HOME_NODE_ASSOCIATIVITY API The number of values returned by the H_HOME_NODE_ASSOCIATIVITY h_call deserves to be explicitly defined, for a better understanding of the code. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-03-18 10:48:59 +11:00
Kyle Moffett	b05ae4ee60	powerpc: Remove duplicate cacheable_memcpy/memzero functions These functions are only used from one place each. If the cacheable_* versions really are more efficient, then those changes should be migrated into the common code instead. NOTE: The old routines are just flat buggy on kernels that support hardware with different cacheline sizes. Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2015-03-17 11:25:50 +11:00
Kirill A. Shutemov	780fc5642f	powerpc: drop _PAGE_FILE and pte_file()-related helpers We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-16 17:56:05 -08:00
Linus Torvalds	c833e17e27	Tighten rules for ACCESS_ONCE This series tightens the rules for ACCESS_ONCE to only work on scalar types. It also contains the necessary fixups as indicated by build bots of linux-next. Now everything is in place to prevent new non-scalar users of ACCESS_ONCE and we can continue to convert code to READ_ONCE/WRITE_ONCE. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJU2H5MAAoJEBF7vIC1phx8Jm4QALPqKOMDSUBCrqJFWJeujtv2 ILxJKsnjrAlt3dxnlVI3q6e5wi896hSce75PcvZ/vs/K3GdgMxOjrakBJGTJ2Qjg 5njW9aGJDDr/SYFX33MLWfqy222TLtpxgSz379UgXjEzB0ymMWbJJ3FnGjVqQJdp RXDutpncRySc/rGHh9UPREIRR5GvimONsWE2zxgXjUzB8vIr2fCGvHTXfIb6RKbQ yaFoihzn0m+eisc5Gy4tQ1qhhnaYyWEGrINjHTjMFTQOWTlH80BZAyQeLdbyj2K5 qloBPS/VhBTr/5TxV5onM+nVhu0LiblVNrdMHVeb7jyST4LeFOCaWK98lB3axSB5 v/2D1YKNb3g1U1x3In/oNGQvs36zGiO1uEdMF1l8ZFXgCvHmATSFSTWBtqUhb5Ew JA3YyqMTG6dpRTMSnmu3/frr4wDqnxlB/ktQC1pf3tDp87mr1ZYEy/dQld+tltjh 9Z5GSdrw0nf91wNI3DJf+26ZDdz5B+EpDnPnOKG8anI1lc/mQneI21/K/xUteFXw UZ1XGPLV2vbv9/a13u44SdjenHvQs1egsGeebMxVPoj6WmDLVmcIqinyS6NawYzn IlDGy/b3bSnXWMBP0ZVBX94KWLxqDDc4a/ayxsmxsP1tPZ+jDXjVDa7E3zskcHxG Uj5ULCPyU087t8Sl76mv =Dj70 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux Pull ACCESS_ONCE() rule tightening from Christian Borntraeger: "Tighten rules for ACCESS_ONCE This series tightens the rules for ACCESS_ONCE to only work on scalar types. It also contains the necessary fixups as indicated by build bots of linux-next. Now everything is in place to prevent new non-scalar users of ACCESS_ONCE and we can continue to convert code to READ_ONCE/WRITE_ONCE" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux: kernel: Fix sparse warning for ACCESS_ONCE next: sh: Fix compile error kernel: tighten rules for ACCESS ONCE mm/gup: Replace ACCESS_ONCE with READ_ONCE x86/spinlock: Leftover conversion ACCESS_ONCE->READ_ONCE x86/xen/p2m: Replace ACCESS_ONCE with READ_ONCE ppc/hugetlbfs: Replace ACCESS_ONCE with READ_ONCE ppc/kvm: Replace ACCESS_ONCE with READ_ONCE	2015-02-14 10:54:28 -08:00
Mel Gorman	842915f566	ppc64: add paranoid warnings for unexpected DSISR_PROTFAULT ppc64 should not be depending on DSISR_PROTFAULT and it's unexpected if they are triggered. This patch adds warnings just in case they are being accidentally depended upon. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:08 -08:00
Mel Gorman	8a0516ed8b	mm: convert p[te\|md]_numa users to p[te\|md]_protnone_numa Convert existing users of pte_numa and friends to the new helper. Note that the kernel is broken after this patch is applied until the other page table modifiers are also altered. This patch layout is to make review easier. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:08 -08:00
Linus Torvalds	59d53737a8	Merge branch 'akpm' (patches from Andrew) Merge second set of updates from Andrew Morton: "More of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits) mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() vmstat: Reduce time interval to stat update on idle cpu mm/page_owner.c: remove unnecessary stack_trace field Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files mm: incorporate read-only pages into transparent huge pages vmstat: do not use deferrable delayed work for vmstat_update mm: more aggressive page stealing for UNMOVABLE allocations mm: always steal split buddies in fallback allocations mm: when stealing freepages, also take pages created by splitting buddy page mincore: apply page table walker on do_mincore() mm: /proc/pid/clear_refs: avoid split_huge_page() mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP) mempolicy: apply page table walker on queue_pages_range() arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma() memcg: cleanup preparation for page table walk numa_maps: remove numa_maps->vma numa_maps: fix typo in gather_hugetbl_stats pagemap: use walk->vma instead of calling find_vma() clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk() ...	2015-02-11 18:23:28 -08:00
Linus Torvalds	d3f180ea1a	powerpc updates for 3.20 Including: - Update of all defconfigs - Addition of a bunch of config options to modernise our defconfigs - Some PS3 updates from Geoff - Optimised memcmp for 64 bit from Anton - Fix for kprobes that allows 'perf probe' to work from Naveen - Several cxl updates from Ian & Ryan - Expanded support for the '24x7' PMU from Cody & Sukadev - Freescale updates from Scott: "Highlights include 8xx optimizations, some more work on datapath device tree content, e300 machine check support, t1040 corenet error reporting, and various cleanups and fixes." -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU2/LSAAoJEFHr6jzI4aWATDAQAKPU6v2Mq0sLnGst69waHU/Q vvpIq9hqVeSr6znHhrnazc3iQTLk0acqIdxUl/dT+5ADhi9+FxGD5Ckk+BH1DDve g6mQelSMlVZF9hKonHsbr4iUuTUyZyx2vj2qjdgOaRiv9Xubq6vUFNeolq3AeHxv J33vqRTmowj3VJ52u+V1dmzXQGfUye7DG2jHpjXoBieZsroTvyuYm5GoIPblWFO6 zbYRh6IitALnQRtXfwIManPyWMkJti9JX8PwDkmvacr+V+MXbrksHpIOITMhNlo1 WsVnFMpxuk80XuUfhaKZgISgBSfCqBckvKDn2QwztF2/kBnV6Su5xiOKVgouzM6B myy+maiMZlNJlNjqdMK5v2bqMXICP048zgfMbDN2e1K25jSSlRawt0RngoCQO2EP 7aWmEDAlL3shgzkl68pj1fevQokxC/40C1yExIgAa9C31+bjtMz4Xb1SfN1SSveW 7uWEY/eG9eLsrSE1CeBDvh6B8BRdyuIHgPhux4Tgc/bUtBGFQ29NuXwKh3QCeEy9 9wWrRGx3U69eP06Ey7P5js3jPTQs80bjJewyGaiPQF5XHB89To8Dg8VfXjEV49Dx Pa3OLL5QsQloKfEBiEhQeGfKYImC00pVYAxc0qpmnr9T+25Ri1TLdF1EBAwriSYE 5p9kSW+ZIht0lvzsdPNm =xDU3 -----END PGP SIGNATURE----- Merge tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - Update of all defconfigs - Addition of a bunch of config options to modernise our defconfigs - Some PS3 updates from Geoff - Optimised memcmp for 64 bit from Anton - Fix for kprobes that allows 'perf probe' to work from Naveen - Several cxl updates from Ian & Ryan - Expanded support for the '24x7' PMU from Cody & Sukadev - Freescale updates from Scott: "Highlights include 8xx optimizations, some more work on datapath device tree content, e300 machine check support, t1040 corenet error reporting, and various cleanups and fixes" * tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (102 commits) cxl: Add missing return statement after handling AFU errror cxl: Fail AFU initialisation if an invalid configuration record is found cxl: Export optional AFU configuration record in sysfs powerpc/mm: Warn on flushing tlb page in kernel context powerpc/powernv: Add OPAL soft-poweroff routine powerpc/perf/hv-24x7: Document sysfs event description entries powerpc/perf/hv-gpci: add the remaining gpci requests powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated powerpc/perf/hv-24x7: parse catalog and populate sysfs with events perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper perf: add PMU_EVENT_ATTR_STRING() helper perf: provide sysfs_show for struct perf_pmu_events_attr powerpc/kernel: Avoid initializing device-tree pointer twice powerpc: Remove old compile time disabled syscall tracing code powerpc/kernel: Make syscall_exit a local label cxl: Fix device_node reference counting powerpc/mm: bail out early when flushing TLB page powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80) perf/powerpc: reset event hw state when adding it to the PMU powerpc/qe: Use strlcpy() ...	2015-02-11 18:15:38 -08:00
Naoya Horiguchi	1757bbd9c5	arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma() We don't have to use mm_walk->private to pass vma to the callback function because of mm_walk->vma. And walk_page_vma() is useful if we walk over a single vma. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:06 -08:00
Naoya Horiguchi	61f77eda9b	mm/hugetlb: reduce arch dependent code around follow_huge_* Currently we have many duplicates in definitions around follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this patch tries to remove the m. The basic idea is to put the default implementation for these functions in mm/hugetlb.c as weak symbols (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement arch-specific code only when the arch needs it. For follow_huge_addr(), only powerpc and ia64 have their own implementation, and in all other architectures this function just returns ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as default. As for follow_huge_(pmd\|pud)(), if (pmd\|pud)_huge() is implemented to always return 0 in your architecture (like in ia64 or sparc,) it's never called (the callsite is optimized away) no matter how implemented it is. So in such architectures, we don't need arch-specific implementation. In some architecture (like mips, s390 and tile,) their current arch-specific follow_huge_(pmd\|pud)() are effectively identical with the common code, so this patch lets these architecture use the common code. One exception is metag, where pmd_huge() could return non-zero but it expects follow_huge_pmd() to always return NULL. This means that we need arch-specific implementation which returns NULL. This behavior looks strange to me (because non-zero pmd_huge() implies that the architecture supports PMD-based hugepage, so follow_huge_pmd() can/should return some relevant value,) but that's beyond this cleanup patch, so let's keep it. Justification of non-trivial changes: - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE is true when follow_huge_pmd() can be called (note that pmd_huge() has the same check and always returns 0 for !MACHINE_HAS_HPAGE.) - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common code. This patch forces these archs use PMD_MASK, but it's OK because they are identical in both archs. In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20. In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but PTE_ORDER is always 0, so these are identical. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:01 -08:00
Arseny Solokha	c2c896bee0	powerpc/mm: Warn on flushing tlb page in kernel context Function __flush_tlb_page() must only be called for user contexts, so put in extra hardening to warn on calling it for kernel context. Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-02-04 13:19:27 +11:00
Arseny Solokha	0dc294f717	powerpc/mm: bail out early when flushing TLB page MMU_NO_CONTEXT is conditionally defined as 0 or (unsigned int)-1. However, in __flush_tlb_page() a corresponding variable is only tested for open coded 0, which can cause NULL pointer dereference if `mm' argument was legitimately passed as such. Bail out early in case the first argument is NULL, thus eliminate confusion between different values of MMU_NO_CONTEXT and avoid disabling and then re-enabling preemption unnecessarily. Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-01-30 18:39:00 -06:00
LEROY Christophe	ce67f5d0a0	powerpc32: Use kmem_cache memory for PGDIR When pages are not 4K, PGDIR table is allocated with kmalloc(). In order to optimise TLB handlers, aligned memory is needed. kmalloc() doesn't provide aligned memory blocks, so lets use a kmem_cache pool instead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-01-29 21:59:02 -06:00
LEROY Christophe	debddd95ec	powerpc/8xx: reduce pressure on TLB due to context switches For nohash powerpc, when we run out of contexts, contexts are freed by stealing used contexts in-turn. When a victim has been selected, the associated TLB entries are freed using _tlbil_pid(). Unfortunatly, on the PPC 8xx, _tlbil_pid() does a tlbia, hence flushes ALL TLB entries and not only the one linked to the stolen context. Therefore, as implented today, at each task switch requiring a new context, all entries are flushed. This patch modifies the implementation so that when running out of contexts, all contexts get freed at once, hence dividing the number of calls to tlbia by 16. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-01-29 21:51:06 -06:00
LEROY Christophe	a7b9f671f2	powerpc32: adds handling of _PAGE_RO Some powerpc like the 8xx don't have a RW bit in PTE bits but a RO (Read Only) bit. This patch implements the handling of a _PAGE_RO flag to be used in place of _PAGE_RW Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood@freescale.com: fix whitespace] Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-01-29 20:11:51 -06:00
Emil Medve	238cac16c0	powerpc: Remove duplicate tlbcam_index declarations They seem to be leftovers from '14cf11a powerpc: Merge enough to start building in arch/powerpc' Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-01-29 19:59:03 -06:00
Linus Torvalds	33692f2759	vm: add VM_FAULT_SIGSEGV handling support The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit `fee7e49d45` ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-01-29 10:51:32 -08:00
Michael Ellerman	8aa989b8fb	powerpc: Remove some unused functions Remove slice_set_psize() which is not used. It was added in `3a8247cc2c` "powerpc: Only demote individual slices rather than whole process" but was never used. Remove vsx_assist_exception() which is not used. It was added in `ce48b21007` "powerpc: Add VSX context save/restore, ptrace and signal support" but was never used. Remove generic_mach_cpu_die() which is not used. Its last caller was removed in `375f561a41` "powerpc/powernv: Always go into nap mode when CPU is offline". Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are unused. These were introduced in `c5d56332fd` "[POWERPC] Add general support for mpc7448hpc2 (Taiga) platform" but were never used. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> [mpe: Update changelog with details on when/why they are unused] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-01-28 15:00:24 +11:00
Christian Borntraeger	da1a288d85	ppc/hugetlbfs: Replace ACCESS_ONCE with READ_ONCE ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Change the ppc/hugetlbfs code to replace ACCESS_ONCE with READ_ONCE. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2015-01-19 14:14:19 +01:00
Joonsoo Kim	031bc5743f	mm/debug-pagealloc: make debug-pagealloc boottime configurable Now, we have prepared to avoid using debug-pagealloc in boottime. So introduce new kernel-parameter to disable debug-pagealloc in boottime, and makes related functions to be disabled in this case. Only non-intuitive part is change of guard page functions. Because guard page is effective only if debug-pagealloc is enabled, turning off according to debug-pagealloc is reasonable thing to do. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:48 -08:00
Linus Torvalds	a7cb7bb664	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree update from Jiri Kosina: "Usual stuff: documentation updates, printk() fixes, etc" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits) intel_ips: fix a type in error message cpufreq: cpufreq-dt: Move newline to end of error message ps3rom: fix error return code treewide: fix typo in printk and Kconfig ARM: dts: bcm63138: change "interupts" to "interrupts" Replace mentions of "list_struct" to "list_head" kernel: trace: fix printk message scsi: mpt2sas: fix ioctl in comment zbud, zswap: change module author email clocksource: Fix 'clcoksource' typo in comment arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help gpio: msm-v1: make boolean argument more obvious usb: Fix typo in usb-serial-simple.c PCI: Fix comment typo 'COMFIG_PM_OPS' powerpc: Fix comment typo 'CONIFG_8xx' powerpc: Fix comment typos 'CONFiG_ALTIVEC' clk: st: Spelling s/stucture/structure/ isci: Spelling s/stucture/structure/ usb: gadget: zero: Spelling s/infrastucture/infrastructure/ treewide: Fix company name in module descriptions ...	2014-12-12 10:08:06 -08:00
Linus Torvalds	140cd7fb04	powerpc updates for 3.19 Some nice cleanups like removing bootmem, and removal of __get_cpu_var(). There is one patch to mm/gup.c. This is the generic GUP implementation, but is only used by us and arm(64). We have an ack from Steve Capper, and although we didn't get an ack from Andrew he told us to take the patch through the powerpc tree. There's one cxl patch. This is in drivers/misc, but Greg said he was happy for us to manage fixes for it. There is an infrastructure patch to support an IPMI driver for OPAL. That patch also appears in Corey Minyard's IPMI tree, you may see a conflict there. There is also an RTC driver for OPAL. We weren't able to get any response from the RTC maintainer, Alessandro Zummo, so in the end we just merged the driver. The usual batch of Freescale updates from Scott. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUiSTSAAoJEFHr6jzI4aWAirQP/3rIEng0LzLu5kW2zkGylIaM SNDum1vze3mHiTFl+CFcSIGpC1UEULoB49HA+2oE/ExKpIceG6lpL2LP+wNh2FW5 mozjMjS6mZt4w1Fu1D2ZtgQc3O1T1pxkqsnZmPa8gVf5k5d5IQNPY6yB0pgVWwbV gwBKxe4VwPAzJjppE9i9MDhNTJwmHZq0lI8XuoTXOOU/f+4G1WxmjrbyveQ7cRP5 i/sq2cKjxpWA+KDeIXo0GR0DpXR7qMeAvFX5xXY7oKuUJIFDM4kSHfmMYP6qLf5c 2vlsJqHVqfOgQdve41z1ooaPzNtg7ezVo+VqqguSgtSgwy2JUo/uHpnzz3gD1Olo AP5+6xj8LZac0rTPxF4n4Hoyrp7AaaFjEFt1zqT9PWniZW4B41wtia0QORBNUf1S UEmKAC9T3WZJ47mH7WMSadtOPF9E3Yd/zuiPD4udtptCNKPbr6/k1MpJPIW2D4Rn BJ0QZTRd7V0yRofXxZtHxaMxq8pWd/Tip7J/zr/ghz+ulnH8BuFamuhCCLuJlESU +A2PMfuseyTMpH9sMAmmTwSGPDKjaUFWvmFvY/n88NZL7r2LlomNrDWFSSQOIHUP FxjYmjUMpZeexsfyRdgFV/INhYC3o3cso2fRGO45YK6nkxNnjNFEBS6WhQLvNLBu sknd1WjXkuJtoMC15SrQ =jvyT -----END PGP SIGNATURE----- Merge tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: "Some nice cleanups like removing bootmem, and removal of __get_cpu_var(). There is one patch to mm/gup.c. This is the generic GUP implementation, but is only used by us and arm(64). We have an ack from Steve Capper, and although we didn't get an ack from Andrew he told us to take the patch through the powerpc tree. There's one cxl patch. This is in drivers/misc, but Greg said he was happy for us to manage fixes for it. There is an infrastructure patch to support an IPMI driver for OPAL. There is also an RTC driver for OPAL. We weren't able to get any response from the RTC maintainer, Alessandro Zummo, so in the end we just merged the driver. The usual batch of Freescale updates from Scott" * tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (101 commits) powerpc/powernv: Return to cpu offline loop when finished in KVM guest powerpc/book3s: Fix partial invalidation of TLBs in MCE code. powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault powerpc/xmon: Cleanup the breakpoint flags powerpc/xmon: Enable HW instruction breakpoint on POWER8 powerpc/mm/thp: Use tlbiel if possible powerpc/mm/thp: Remove code duplication powerpc/mm/hugetlb: Sanity check gigantic hugepage count powerpc/oprofile: Disable pagefaults during user stack read powerpc/mm: Check for matching hpte without taking hpte lock powerpc: Drop useless warning in eeh_init() powerpc/powernv: Cleanup unused MCE definitions/declarations. powerpc/eeh: Dump PHB diag-data early powerpc/eeh: Recover EEH error on ownership change for BCM5719 powerpc/eeh: Set EEH_PE_RESET on PE reset powerpc/eeh: Refactor eeh_reset_pe() powerpc: Remove more traces of bootmem powerpc/pseries: Initialise nvram_pstore_info's buf_lock cxl: Name interrupts in /proc/interrupt cxl: Return error to PSL if IRQ demultiplexing fails & print clearer warning ...	2014-12-11 17:48:14 -08:00
Linus Torvalds	7ef58b32f5	Devicetree changes for v3.19 Lots of activity in the devicetree code for v3.18. Most of it is related to getting all of the overlay support code in place, but there are other important things in there. There are a few trivial merge conflicts. They shouldn't give you any trouble. Highlights: - OF_RECONFIG notifiers for SPI, I2C and Platform devices. Those subsystems can now respond to live changes to the device tree. - CONFIG_OF_OVERLAY method for applying live changes to the device tree - Removal of the of_allnodes list. This used to be used to iterate over all the nodes in the device tree, but it is unnecessary because the same thing can be done by iterating over the list of child pointers. Getting rid of of_allnodes saves some memory and avoids the possibility of of_allnodes being sorted differently from the child lists. - Support for retrieving original DTB blob via sysfs. Needed by kexec. - More unittests - Documentation and minor bug fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUiaTJAAoJEMWQL496c2LNdKkP/1rk20JXzJc948Z3VFZPXkzf TUKXC+Qn0FmVjQhESkx6LxLDrMDTQlQLlWBmFuWRB87Fk5E32FEf5zzW7I9oQPS4 msIqJoYf5T7EPlmJ/85156xjK5ezc0OyoKEizn23mcKrJE4bmXQEbVw99UUFhq4R Oz1a1ZPQQSSaMteKftOoRBiE3bJut3tJ3dfufNjwOuXi5rALJ0DVxuOeU/Hba13d t05qlImwocKXGBDd/B4psBI5fZl4Tf4AmGOD9aU7YHxrLg4jOCbvqies3DQQ0q3D o9YZBnuBw7A3tzJJ3F5KajRnFLazJBOV5BKGo7eYuTzT56mpZW/HF6eS9b1DbP9x 4q71Vd5qhIuU9JsQAStfZ6pdx3FBXRNGpIXXfwzbCSdaePIuOKS17zvA/Iy5bWeA 2TyqgMuKZwnXOXxQesMZJYIw2IEnIyobzh0A1wAnvReyos/nHF/tha/SA/Jutq1s +0gOkMlPW2EdpADmlfLPRSHgSqO8bfCPeNPihn672MS2dAv9H+XRLcoKuSNErhdl 1gYtnR7IK+Sl0KmMC5YoMvXPchkV5YS2qEp1f3p+ZmgcMSWyHHKMtf8VwjNTaSBU e1AshH6HvmYEPt0cnntSMAxbw+N596QjkVp4RbHsLpyj7qeUVVY56/K/aiM7M69P BvJkuewrhsAxyM2X2OsD =ak0A -----END PGP SIGNATURE----- Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux Pull devicetree changes from Grant Likely: "Lots of activity in the devicetree code for v3.18. Most of it is related to getting all of the overlay support code in place, but there are other important things in there. Highlights: - OF_RECONFIG notifiers for SPI, I2C and Platform devices. Those subsystems can now respond to live changes to the device tree. - CONFIG_OF_OVERLAY method for applying live changes to the device tree - Removal of the of_allnodes list. This used to be used to iterate over all the nodes in the device tree, but it is unnecessary because the same thing can be done by iterating over the list of child pointers. Getting rid of of_allnodes saves some memory and avoids the possibility of of_allnodes being sorted differently from the child lists. - Support for retrieving original DTB blob via sysfs. Needed by kexec. - More unittests - Documentation and minor bug fixes" * tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux: (42 commits) of: Delete unnecessary check before calling "of_node_put()" of: Drop ->next pointer from struct device_node spi: Check for spi_of_notifier when CONFIG_OF_DYNAMIC=y of: support passing console options with stdout-path of: add optional options parameter to of_find_node_by_path() of: Add bindings for chosen node, stdout-path of: Remove unneeded and incorrect MODULE_DEVICE_TABLE ARM: dt: fix up PL011 device tree bindings of: base, fix of_property_read_string_helper kernel-doc of: remove select of non-existant OF_DEVICE config symbol spi/of: Add OF notifier handler spi/of: Create new device registration method and accessors i2c/of: Add OF_RECONFIG notifier handler i2c/of: Factor out Devicetree registration code of/overlay: Add overlay unittests of/overlay: Introduce DT overlay support of/reconfig: Add OF_DYNAMIC notifier for platform_bus_type of/reconfig: Always use the same structure for notifiers of/reconfig: Add debug output for OF_RECONFIG notifiers of/reconfig: Add empty stubs for the of_reconfig methods ...	2014-12-11 13:06:58 -08:00
Linus Torvalds	b64bb1d758	arm64 updates for 3.19 Changes include: - Support for alternative instruction patching from Andre - seccomp from Akashi - Some AArch32 instruction emulation, required by the Android folks - Optimisations for exception entry/exit code, cmpxchg, pcpu atomics - mmu_gather range calculations moved into core code - EFI updates from Ard, including long-awaited SMBIOS support - /proc/cpuinfo fixes to align with the format used by arch/arm/ - A few non-critical fixes across the architecture -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJUhbSAAAoJELescNyEwWM07PQH/AolxqOJTTg8TKe2wvRC+DwY R98bcECMwhXvwep1KhTBew7z7NRzXJvVVs+EePSpXWX2+KK2aWN4L50rAb9ow4ty PZ5EFw564g3rUpc7cbqIrM/lasiYWuIWw/BL+wccOm3mWbZfokBB2t0tn/2rVv0K 5tf2VCLLxgiFJPLuYk61uH7Nshvv5uJ6ODwdXjbrH+Mfl6xsaiKv17ZrfP4D/M4o hrLoXxVTuuWj3sy/lBJv8vbTbKbQ6BGl9JQhBZGZHeKOdvX7UnbKH4N5vWLUFZya QYO92AK1xGolu8a9bEfzrmxn0zXeAHgFTnRwtDCekOvy0kTR9MRIqXASXKO3ZEU= =rnFX -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "Here's the usual mixed bag of arm64 updates, also including some related EFI changes (Acked by Matt) and the MMU gather range cleanup (Acked by you). Changes include: - support for alternative instruction patching from Andre - seccomp from Akashi - some AArch32 instruction emulation, required by the Android folks - optimisations for exception entry/exit code, cmpxchg, pcpu atomics - mmu_gather range calculations moved into core code - EFI updates from Ard, including long-awaited SMBIOS support - /proc/cpuinfo fixes to align with the format used by arch/arm/ - a few non-critical fixes across the architecture" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits) arm64: remove the unnecessary arm64_swiotlb_init() arm64: add module support for alternatives fixups arm64: perf: Prevent wraparound during overflow arm64/include/asm: Fixed a warning about 'struct pt_regs' arm64: Provide a namespace to NCAPS arm64: bpf: lift restriction on last instruction arm64: Implement support for read-mostly sections arm64: compat: align cacheflush syscall with arch/arm arm64: add seccomp support arm64: add SIGSYS siginfo for compat task arm64: add seccomp syscall for compat task asm-generic: add generic seccomp.h for secure computing mode 1 arm64: ptrace: allow tracer to skip a system call arm64: ptrace: add NT_ARM_SYSTEM_CALL regset arm64: Move some head.text functions to executable section arm64: jump labels: NOP out NOP -> NOP replacement arm64: add support to dump the kernel page tables arm64: Add FIX_HOLE to permanent fixed addresses arm64: alternatives: fix pr_fmt string for consistency arm64: vmlinux.lds.S: don't discard .exit.* sections at link-time ...	2014-12-09 13:12:47 -08:00
Aneesh Kumar K.V	aefa5688c0	powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault upatepp can get called for a nohpte fault when we find from the linux page table that the translation was hashed before. In that case we are sure that there is no existing translation, hence we could avoid doing tlbie. We could possibly race with a parallel fault filling the TLB. But that should be ok because updatepp is only ever relaxing permissions. We also look at linux pte permission bits when filling hash pte permission bits. We also hold the linux pte busy bits while inserting/updating a hashpte entry, hence a paralle update of linux pte is not possible. On the other hand mprotect involves ptep_modify_prot_start which cause a hpte invalidate and not updatepp. Performance number: We use randbox_access_bench written by Anton. Kernel with THP disabled and smaller hash page table size. 86.60% random_access_b [kernel.kallsyms] [k] .native_hpte_updatepp 2.10% random_access_b random_access_bench [.] doit 1.99% random_access_b [kernel.kallsyms] [k] .do_raw_spin_lock 1.85% random_access_b [kernel.kallsyms] [k] .native_hpte_insert 1.26% random_access_b [kernel.kallsyms] [k] .native_flush_hash_range 1.18% random_access_b [kernel.kallsyms] [k] .__delay 0.69% random_access_b [kernel.kallsyms] [k] .native_hpte_remove 0.37% random_access_b [kernel.kallsyms] [k] .clear_user_page 0.34% random_access_b [kernel.kallsyms] [k] .__hash_page_64K 0.32% random_access_b [kernel.kallsyms] [k] fast_exception_return 0.30% random_access_b [kernel.kallsyms] [k] .hash_page_mm With Fix: 27.54% random_access_b random_access_bench [.] doit 22.90% random_access_b [kernel.kallsyms] [k] .native_hpte_insert 5.76% random_access_b [kernel.kallsyms] [k] .native_hpte_remove 5.20% random_access_b [kernel.kallsyms] [k] fast_exception_return 5.12% random_access_b [kernel.kallsyms] [k] .__hash_page_64K 4.80% random_access_b [kernel.kallsyms] [k] .hash_page_mm 3.31% random_access_b [kernel.kallsyms] [k] data_access_common 1.84% random_access_b [kernel.kallsyms] [k] .trace_hardirqs_on_caller Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-12-05 16:26:15 +11:00
Michael Ellerman	b5be75d008	Merge remote-tracking branch 'benh/next' into next Merge updates collected & acked by Ben. A few EEH patches from Gavin, some mm updates from Aneesh and a few odds and ends.	2014-12-02 14:19:20 +11:00
Aneesh Kumar K.V	d557b09800	powerpc/mm/thp: Use tlbiel if possible If we know that user address space has never executed on other cpus we could use tlbiel. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-12-02 14:10:11 +11:00
Aneesh Kumar K.V	f1581bf14b	powerpc/mm/thp: Remove code duplication Rename invalidate_old_hpte to flush_hash_hugepage and use that in other places. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-12-02 14:10:10 +11:00
James Yang	c4f3eb5fc5	powerpc/mm/hugetlb: Sanity check gigantic hugepage count Limit the number of gigantic hugepages specified by the hugepages= parameter to MAX_NUMBER_GPAGES. Signed-off-by: James Yang <James.Yang@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-12-02 14:10:09 +11:00
Aneesh Kumar K.V	0ec2698fe1	powerpc/mm: Check for matching hpte without taking hpte lock With smaller hash page table config, we would end up in situation where we would be replacing hash page table slot frequently. In such config, we will find the hpte to be not matching, and we can do that check without holding the hpte lock. We need to recheck the hpte again after holding lock. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-12-02 11:03:45 +11:00
Grant Likely	f5242e5a88	of/reconfig: Always use the same structure for notifiers The OF_RECONFIG notifier callback uses a different structure depending on whether it is a node change or a property change. This is silly, and not very safe. Rework the code to use the same data structure regardless of the type of notifier. Signed-off-by: Grant Likely <grant.likely@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com> Cc: <linuxppc-dev@lists.ozlabs.org>	2014-11-24 22:25:03 +00:00
Jiri Kosina	a02001086b	Merge Linus' tree to be be to apply submitted patches to newer code than current trivial.git base	2014-11-20 14:42:02 +01:00
Michael Ellerman	e39f223fc9	powerpc: Remove more traces of bootmem Although we are now selecting NO_BOOTMEM, we still have some traces of bootmem lying around. That is because even with NO_BOOTMEM there is still a shim that converts bootmem calls into memblock calls, but ultimately we want to remove all traces of bootmem. Most of the patch is conversions from alloc_bootmem() to memblock_virt_alloc(). In general a call such as: p = (struct foo )alloc_bootmem(x); Becomes: p = memblock_virt_alloc(x, 0); We don't need the cast because memblock_virt_alloc() returns a void . The alignment value of zero tells memblock to use the default alignment, which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses. We remove a number of NULL checks on the result of memblock_virt_alloc(). That is because memblock_virt_alloc() will panic if it can't allocate, in exactly the same way as alloc_bootmem(), so the NULL checks are and always have been redundant. The memory returned by memblock_virt_alloc() is already zeroed, so we remove several memsets of the result of memblock_virt_alloc(). Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS) to just plain memblock_virt_alloc(). We don't use memblock_alloc_base() because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation to that is pointless, 16XB ought to be enough for anyone. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-11-19 21:41:51 +11:00
Michael Ellerman	35891d40bf	Merge remote-tracking branch 'scottwood/next' into next Scott says: "Highlights include a bunch of 8xx optimizations, device tree bindings for Freescale BMan, QMan, and FMan datapath components, misc device tree updates, and inbound rio window support."	2014-11-18 17:00:38 +11:00
Will Deacon	fb7332a9fe	mmu_gather: move minimal range calculations into generic code On architectures with hardware broadcasting of TLB invalidation messages , it makes sense to reduce the range of the mmu_gather structure when unmapping page ranges based on the dirty address information passed to tlb_remove_tlb_entry. arm64 already does this by directly manipulating the start/end fields of the gather structure, but this confuses the generic code which does not expect these fields to change and can end up calculating invalid, negative ranges when forcing a flush in zap_pte_range. This patch moves the minimal range calculation out of the arm64 code and into the generic implementation, simplifying zap_pte_range in the process (which no longer needs to care about start/end, since they will point to the appropriate ranges already). With the range being tracked by core code, the need_flush flag is dropped in favour of checking that the end of the range has actually been set. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2014-11-17 10:12:42 +00:00
Aneesh Kumar K.V	b30e759072	powerpc/mm: Switch to generic RCU get_user_pages_fast This patch switch the ppc arch to use the generic RCU based gup implementation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-11-14 17:24:21 +11:00
Aneesh Kumar K.V	06743521d0	powerpc/mm: Add missing pmd accessors This patch add documentation and missing accessors. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-11-14 17:24:21 +11:00
Gavin Shan	d1d5304fcb	powerpc/mm: Use PAGE_FACTOR PAGE_FACTOR was defined to reflect the difference between configured page size and fixed 4KB page size. Replace (PAGE_SHIFT - HW_PAGE_SHIFT) with PAGE_FACTOR. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-11-12 13:54:36 +11:00
Anton Blanchard	21098b9e07	powerpc: Move sparse_init() into initmem_init We did part of sparse initialisation in setup_arch and part in initmem_init. Put them together. Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-11-10 09:59:26 +11:00
Anton Blanchard	68cf0d642f	powerpc: Remove superfluous bootmem includes Lots of places included bootmem.h even when not using bootmem. Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-11-10 09:59:26 +11:00
Anton Blanchard	14ed740957	powerpc: Remove some old bootmem related comments Now bootmem is gone from powerpc we can remove comments mentioning it. Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-11-10 09:59:25 +11:00
Anton Blanchard	10239733ee	powerpc: Remove bootmem allocator At the moment we transition from the memblock alloctor to the bootmem allocator. Gitting rid of the bootmem allocator removes a bunch of complicated code (most of which I owe the dubious honour of being responsible for writing). Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-11-10 09:59:25 +11:00
LEROY Christophe	c51a6821bd	powerpc/8xx: Invalidate non present TLB as early as possible 8xx sometimes need to load a invalid/non-present TLBs in it DTLB asm handler. These must be invalidated separaly as linux mm doesn't. Commit `5efab4a02c` was invalidating them in arch/powerpc/mm/fault.c. This patch does the invalidation earlier in order to free the TLB as soon as possible. This also has the advantage of removing some 8xx specific code from fault.c Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-11-07 18:10:45 -06:00
Anton Blanchard	16d0f5c4af	powerpc: Remove ppc_md.remove_memory We have an extra level of indirection on memory hot remove which is not matched on memory hot add. Memory hotplug is book3s only, so there is no need for it. This also enables means remove_memory() (ie memory hot unplug) works on powernv. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-11-05 21:00:46 +11:00
Michael Ellerman	dd521d1eb4	Merge branch 'topic/get-cpu-var' into next	2014-11-05 21:00:31 +11:00
Christoph Lameter	69111bac42	powerpc: Replace __get_cpu_var uses This still has not been merged and now powerpc is the only arch that does not have this change. Sorry about missing linuxppc-dev before. V2->V2 - Fix up to work against 3.18-rc1 __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int x = &__get_cpu_var(y); Converts to int x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int x = __get_cpu_var(y); Converts to int x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Christoph Lameter <cl@linux.com> [mpe: Fix build errors caused by set/or_softirq_pending(), and rework assignment in __set_breakpoint() to use memcpy().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-11-03 12:12:32 +11:00
Fabian Frederick	94966b712c	powerpc: Fix section mismatch warning Add __init to MMU_setup() which uses __initdata boot_command_line. Also MMU_setup() is only called from MMU_init(), which is also __init. Warning appeared since commit `3e47d1474c`. Fixes: `3e47d1474c` ("powerpc: Remove powerpc specific cmd_line") Signed-off-by: Fabian Frederick <fabf@skynet.be> [mpe: Update changelog] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-30 11:22:33 +11:00
Paul Bolle	b823982cd7	powerpc: Fix comment typo 'CONIFG_8xx' Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-10-29 14:42:08 +01:00
Nishanth Aravamudan	2c0a33f986	powerpc/numa: ensure per-cpu NUMA mappings are correct on topology update We received a report of warning in kernel/sched/core.c where the sched group was NULL on an LPAR after a topology update. This seems to occur because after the topology update has moved the CPUs, cpu_to_node is returning the old value still, which ends up breaking the consistency of the NUMA topology in the per-cpu maps. Ensure that we update the per-cpu fields when we re-map CPUs. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-29 09:41:22 +11:00
Nishanth Aravamudan	49f8d8c043	powerpc/numa: use cached value of update->cpu in update_cpu_topology There isn't any need to keep referring to update->cpu, as we've already checked cpu == update->cpu at this point. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-29 09:41:22 +11:00
Ian Munsie	03f5439797	powerpc/mm: Use appropriate ESID mask in copro_calculate_slb() This patch makes copro_calculate_slb() mask the ESID by the correct mask for 1T vs 256M segments. This has no effect by itself as the extra bits were ignored, but it makes debugging the segment table entries easier and means that we can directly compare the ESID values for duplicates without needing to worry about masking in the comparison. This will be used to simplify a comparison in the following patch. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-28 19:52:45 +11:00
Aneesh Kumar K.V	6643773ce1	powerpc/mm: Fix build error with hugetlfs disabled arch/powerpc/mm/slice.c:704:5: error: expected identifier or ‘(’ before numeric constant int is_hugepage_only_range(struct mm_struct mm, unsigned long addr, ^ make[1]: [arch/powerpc/mm/slice.o] Error 1 make: * [arch/powerpc/mm/slice.o] Error 2 This got introduced via `1217d34b53` "powerpc: Ensure global functions include their prototype". We started including linux/hugetlb.h with that patch and now we have #define is_hugepage_only_range(mm, addr, len) 0 with hugetlbfs disabled. Fixes: `1217d34b53` ("powerpc: Ensure global functions include their prototype") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-22 14:03:06 +11:00
Linus Torvalds	dc303408a7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull more powerpc updates from Michael Ellerman: "Here's some more updates for powerpc for 3.18. They are a bit late I know, though must are actually bug fixes. In my defence I nearly cut the top of my finger off last weekend in a gruesome bike maintenance accident, so I spent a good part of the week waiting around for doctors. True story, I can send photos if you like :) Probably the most interesting fix is the sys_call_table one, which enables syscall tracing for powerpc. There's a fix for HMI handling for old firmware, more endian fixes for firmware interfaces, more EEH fixes, Anton fixed our routine that gets the current stack pointer, and a few other misc bits" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (22 commits) powerpc: Only do dynamic DMA zone limits on platforms that need it powerpc: sync pseries_le_defconfig with pseries_defconfig powerpc: Add printk levels to setup_system output powerpc/vphn: NUMA node code expects big-endian powerpc/msi: Use WARN_ON() in msi bitmap selftests powerpc/msi: Fix the msi bitmap alignment tests powerpc/eeh: Block CFG upon frozen Shiner adapter powerpc/eeh: Don't collect logs on PE with blocked config space powerpc/eeh: Block PCI config access upon frozen PE powerpc/pseries: Drop config requests in EEH accessors powerpc/powernv: Drop config requests in EEH accessors powerpc/eeh: Rename flag EEH_PE_RESET to EEH_PE_CFG_BLOCKED powerpc/eeh: Fix condition for isolated state powerpc/pseries: Make CPU hotplug path endian safe powerpc/pseries: Use dump_stack instead of show_stack powerpc: Rename __get_SP() to current_stack_pointer() powerpc: Reimplement __get_SP() as a function not a define powerpc/numa: Add ability to disable and debug topology updates powerpc/numa: check error return from proc_create powerpc/powernv: Fallback to old HMI handling behavior for old firmware ...	2014-10-21 07:48:56 -07:00
Greg Kurz	5c9fb18994	powerpc/vphn: NUMA node code expects big-endian The associativity domain numbers are obtained from the hypervisor through registers and written into memory by the guest: the packed array passed to vphn_unpack_associativity() is then native-endian, unlike what was assumed in the following commit: commit `b08a2a12e4` Author: Alistair Popple <alistair@popple.id.au> Date: Wed Aug 7 02:01:44 2013 +1000 powerpc: Make NUMA device node code endian safe This issue fills the topology with bogus data and makes it unusable. It may lead to severe performance breakdowns. We should ideally patch the vphn_unpack_associativity() function to do the 64-bit loads, but this requires some more brain storming. In the meantime, let's go for a suboptimal and temporary bug fix: this patch converts each 64-bit value of the packed array to big endian, as expected by the current parsing code in vphn_unpack_associativity(). Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-16 12:43:21 +11:00
Linus Torvalds	faafcba3b5	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave Hansen) - Various sched/idle refinements for better idle handling (Nicolas Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot) - sched/numa updates and optimizations (Rik van Riel) - sysbench speedup (Vincent Guittot) - capacity calculation cleanups/refactoring (Vincent Guittot) - Various cleanups to thread group iteration (Oleg Nesterov) - Double-rq-lock removal optimization and various refactorings (Kirill Tkhai) - various sched/deadline fixes ... and lots of other changes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) sched/dl: Use dl_bw_of() under rcu_read_lock_sched() sched/fair: Delete resched_cpu() from idle_balance() sched, time: Fix build error with 64 bit cputime_t on 32 bit systems sched: Improve sysbench performance by fixing spurious active migration sched/x86: Fix up typo in topology detection x86, sched: Add new topology for multi-NUMA-node CPUs sched/rt: Use resched_curr() in task_tick_rt() sched: Use rq->rd in sched_setaffinity() under RCU read lock sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask' sched: Use dl_bw_of() under RCU read lock sched/fair: Remove duplicate code from can_migrate_task() sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW sched: print_rq(): Don't use tasklist_lock sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock() sched: Fix the task-group check in tg_has_rt_tasks() sched/fair: Leverage the idle state info when choosing the "idlest" cpu sched: Let the scheduler see CPU idle states sched/deadline: Fix inter- exclusive cpusets migrations sched/deadline: Clear dl_entity params when setscheduling to different class sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault() ...	2014-10-13 16:23:15 +02:00
Nishanth Aravamudan	2d73bae12b	powerpc/numa: Add ability to disable and debug topology updates We have hit a few customer issues with the topology update code (VPHN and PRRN). It would be nice to be able to debug the notifications coming from the hypervisor in both cases to the LPAR, as well as to disable responding to the notifications at boot-time, to narrow down the source of the problems. Add a basic level of such functionality, similar to the numa= command-line parameter. We already have a toggle in /proc/powerpc/topology_updates that allows run-time enabling/disabling, so the updates can be started at run-time if desired. But the bugs we've run into have occured during boot or very shortly after coming to login, and have resulted in a broken NUMA topology. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-13 18:16:17 +11:00
Nishanth Aravamudan	2d15b9b479	powerpc/numa: check error return from proc_create proc_create can fail, we should check the return value and pass up the failure. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-13 18:15:53 +11:00
Ian Munsie	4c6d9acce1	powerpc/mm: Add hooks for cxl This adds hooks into the core powerpc mm code for cxl. The core powerpc code sometimes uses local tlbie. Unfortunately this won't work with the current cxl driver as it relies on snooping tlbie broadcasts. The cxl hardware can have TLB entries invalidated via MMIO but this is not currently supported by the driver. In future we can make local tlbie smarter so that it invalidates cxl contexts via MMIO when it needs to but for now we have this workaround. This workaround checks for any active cxl contexts and if so, disables local tlbie. This also adds a hook for when SLBs are invalidated. This ensures any corresponding SLBs in cxl are also invalidated at the same time. This is required for segment demotion. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-08 20:15:55 +11:00
Ian Munsie	a1dca3465a	powerpc/mm: Add new hash_page_mm() This adds a new function hash_page_mm() based on the existing hash_page(). This version allows any struct mm to be passed in, rather than assuming current. This is useful for servicing co-processor faults which are not in the context of the current running process. We need to be careful here as the current hash_page() assumes current in a few places. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-08 20:15:44 +11:00
Ian Munsie	8ca7a82f7b	powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize Export mmu_kernel_ssize and mmu_linear_psize. These are needed by the cxl driver which has it's own MMU. To setup the MMU cxl needs access to these. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-08 20:15:42 +11:00
Ian Munsie	be3ebfe821	powerpc/cell: Make spu_flush_all_slbs() generic This moves spu_flush_all_slbs() into a generic call copro_flush_all_slbs(). This will be useful when we add cxl which also needs a similar SLB flush call. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-08 20:15:37 +11:00
Ian Munsie	73d16a6e0e	powerpc/cell: Move data segment faulting code out of cell platform __spu_trap_data_seg() currently contains code to determine the VSID and ESID required for a particular EA and mm struct. This code is generically useful for other co-processors. This moves the code of the cell platform so it can be used by other powerpc code. It also adds 1TB segment handling which Cell didn't support. The new function is called copro_calculate_slb(). This also moves the internal struct spu_slb to a generic struct copro_slb which is now used in the Cell and copro code. We use this new struct instead of passing around esid and vsid parameters. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-08 20:14:55 +11:00
Ian Munsie	e83d016975	powerpc/cell: Move spu_handle_mm_fault() out of cell platform Currently spu_handle_mm_fault() is in the cell platform. This code is generically useful for other non-cell co-processors on powerpc. This patch moves this function out of the cell platform into arch/powerpc/mm so that others may use it. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-08 20:14:54 +11:00
Michael Ellerman	75d43b2d0a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git Freescale updates from Scott (27 commits): "Highlights include DMA32 zone support (SATA, USB, etc now works on 64-bit FSL kernels), MSI changes, 8xx optimizations and cleanup, t104x board support, and PrPMC PCI enumeration."	2014-10-04 08:59:06 +10:00
Anton Blanchard	3e47d1474c	powerpc: Remove powerpc specific cmd_line There is no need for yet another copy of the command line, just use boot_command_line like everyone else. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-02 17:33:55 +10:00
Anton Blanchard	9d57472f61	powerpc: Fill in si_addr_lsb siginfo field Fill in the si_addr_lsb siginfo field so the hwpoison code can pass to userspace the length of memory that has been corrupted. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-02 17:15:17 +10:00
Anton Blanchard	3913fdd7a2	powerpc: Add VM_FAULT_HWPOISON handling to powerpc page fault handler do_page_fault was missing knowledge of HWPOISON, and we would oops if userspace tried to access a poisoned page: kernel BUG at arch/powerpc/mm/fault.c:180! Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-02 17:15:13 +10:00
Anton Blanchard	63af52629a	powerpc: Simplify do_sigbus Exit out early for a kernel fault, avoiding indenting of most of the function. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-02 17:15:09 +10:00
Michael Ellerman	9e34992a62	powerpc/mm: Unindent htab_dt_scan_page_sizes() We can unindent the bulk of htab_dt_scan_page_sizes() by returning early if the property is not found. That is nice in and of itself, but also has the advantage of making it clear that we always return success once we have found the ibm,segment-page-sizes property. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-09-25 23:14:56 +10:00
Li Zhong	297cf5025b	powerpc: some changes in numa_setup_cpu() this patches changes some error handling logics in numa_setup_cpu(), when cpu node is not found, so: if the cpu is possible, but not present, -1 is kept in numa_cpu_lookup_table, so later, if the cpu is added, we could set correct numa information for it. if the cpu is present, then we set the first online node to numa_cpu_lookup_table instead of 0 ( in case 0 might not be an online node? ) Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-09-25 23:14:53 +10:00
Li Zhong	bc3c4327c9	powerpc: Only set numa node information for present cpus at boottime As Nish suggested, it makes more sense to init the numa node informatiion for present cpus at boottime, which could also avoid WARN_ON(1) in numa_setup_cpu(). With this change, we also need to change the smp_prepare_cpus() to set up numa information only on present cpus. For those possible, but not present cpus, their numa information will be set up after they are started, as the original code did before commit `2fabf084b6`. Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Tested-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-09-25 23:14:53 +10:00
Li Zhong	70ad237515	powerpc: Fix warning reported by verify_cpu_node_mapping() With commit `2fabf084b6` ("powerpc: reorder per-cpu NUMA information's initialization"), during boottime, cpu_numa_callback() is called earlier(before their online) for each cpu, and verify_cpu_node_mapping() uses cpu_to_node() to check whether siblings are in the same node. It skips the checking for siblings that are not online yet. So the only check done here is for the bootcpu, which is online at that time. But the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which will be set up in smp_prepare_cpus()). So I saw something like following reported: [ 0.000000] CPU thread siblings 1/2/3 and 0 don't belong to the same node! As we don't actually do the checking during this early stage, so maybe we could directly call numa_setup_cpu() in do_init_bootmem(). Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-09-25 23:14:52 +10:00
Anton Blanchard	f6026df1a4	powerpc: Move htab_remove_mapping function prototype into header file A recent patch added a function prototype for htab_remove_mapping in c code. Fix it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-09-25 23:14:44 +10:00
Anton Blanchard	1217d34b53	powerpc: Ensure global functions include their prototype Fix a number of places where global functions were not including their prototype. This ensures the prototype and the function match. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-09-25 23:14:42 +10:00
Anton Blanchard	e51df2c170	powerpc: Make a bunch of things static Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-09-25 23:14:41 +10:00
Anton Blanchard	e1802b065d	powerpc: Move more symbol exports next to function definitions Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-09-25 23:14:38 +10:00
Scott Wood	6db35ad237	powerpc/mm: Use common paging_init() for NUMA Commit `1c98025c6c` "powerpc: Dynamic DMA zone limits" updated how zones are created in paging_init(), but missed the NUMA version of paging_init(). This was noticed via a linker error, since dma_pfn_limit_to_zone() was, like the non-NUMA paging_init(), limited by #ifndef CONFIG_NEED_MULTIPLE_NODES. It turns out that the NUMA paging_init() was not actually doing anything different from the standard paging_init(), other than a couple debug prints, a couple 32-bit-only ifdef sections, and a call to mark_nonram_nosave(). It's not clear whether mark_nonram_nosave() is inherently wrong to do for NUMA, or just not useful on targets that have NUMA, but for now I'm preserving the existing behavior. Fixes: `1c98025c6c` "powerpc: Dynamic DMA zone limits" Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-09-19 15:01:05 -05:00
Aaron Tomlin	a70857e46d	sched: Add helper for task stack page overrun checking This facility is used in a few places so let's introduce a helper function to improve code readability. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dzickus@redhat.com Cc: bmr@redhat.com Cc: jcastillo@redhat.com Cc: oleg@redhat.com Cc: riel@redhat.com Cc: prarit@redhat.com Cc: jgh@redhat.com Cc: minchan@kernel.org Cc: mpe@ellerman.id.au Cc: tglx@linutronix.de Cc: hannes@cmpxchg.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1410527779-8133-3-git-send-email-atomlin@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-19 12:35:23 +02:00
Aaron Tomlin	d4311ff1a8	init/main.c: Give init_task a canary Tasks get their end of stack set to STACK_END_MAGIC with the aim to catch stack overruns. Currently this feature does not apply to init_task. This patch removes this restriction. Note that a similar patch was posted by Prarit Bhargava some time ago but was never merged: http://marc.info/?l=linux-kernel&m=127144305403241&w=2 Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dzickus@redhat.com Cc: bmr@redhat.com Cc: jcastillo@redhat.com Cc: jgh@redhat.com Cc: minchan@kernel.org Cc: tglx@linutronix.de Cc: hannes@cmpxchg.org Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daeseok Youn <daeseok.youn@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-19 12:35:22 +02:00
Scott Wood	1c98025c6c	powerpc: Dynamic DMA zone limits Platform code can call limit_zone_pfn() to set appropriate limits for ZONE_DMA and ZONE_DMA32, and dma_direct_alloc_coherent() will select a suitable zone based on a device's mask and the pfn limits that platform code has configured. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com>	2014-09-03 17:58:21 -05:00
Aneesh Kumar K.V	9e813308a5	powerpc/thp: Add tracepoints to track hugepage invalidate Add tracepoint to track hugepage invalidate. This help us in debugging difficult to track bugs. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-13 18:20:42 +10:00
Aneesh Kumar K.V	7e467245bf	powerpc/thp: Use ACCESS_ONCE when loading pmdp We would get wrong results in compiler recomputed old_pmd. Avoid that by using ACCESS_ONCE CC: <stable@vger.kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-13 18:20:41 +10:00
Aneesh Kumar K.V	969b7b208f	powerpc/thp: Invalidate with vpn in loop As per ISA, for 4k base page size we compare 14..65 bits of VA specified with the entry_VA in tlb. That implies we need to make sure we do a tlbie with all the possible 4k va we used to access the 16MB hugepage. With 64k base page size we compare 14..57 bits of VA. Hence we cannot ignore the lower 24 bits of va while tlbie .We also cannot tlb invalidate a 16MB entry with just one tlbie instruction because we don't track which va was used to instantiate the tlb entry. CC: <stable@vger.kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-13 18:20:40 +10:00
Aneesh Kumar K.V	fc04795575	powerpc/thp: Handle combo pages in invalidate If we changed base page size of the segment, either via sub_page_protect or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash table entries. We do a lazy hash page table flush for all mapped pages in the demoted segment. This happens when we handle hash page fault for these pages. We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte, that implies that we could possibly have older 64K hash pte entries in the hash page table and we need to invalidate those entries. Use _PAGE_COMBO to determine the page size with which we should invalidate the hash table entries on unmap. CC: <stable@vger.kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-13 18:20:39 +10:00
Aneesh Kumar K.V	629149fae4	powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte If we changed base page size of the segment, either via sub_page_protect or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash table entries. We do a lazy hash page table flush for all mapped pages in the demoted segment. This happens when we handle hash page fault for these pages. We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte, that implies that we could possibly have older 64K hash pte entries in the hash page table and we need to invalidate those entries. Handle this correctly for 16M pages CC: <stable@vger.kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-13 18:20:39 +10:00
Aneesh Kumar K.V	fa1f8ae80f	powerpc/thp: Don't recompute vsid and ssize in loop on invalidate The segment identifier and segment size will remain the same in the loop, So we can compute it outside. We also change the hugepage_invalidate interface so that we can use it the later patch CC: <stable@vger.kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-13 18:20:38 +10:00
Aneesh Kumar K.V	b0aa44a3df	powerpc/thp: Add write barrier after updating the valid bit With hugepages, we store the hpte valid information in the pte page whose address is stored in the second half of the PMD. Use a write barrier to make sure clearing pmd busy bit and updating hpte valid info are ordered properly. CC: <stable@vger.kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-13 18:20:37 +10:00
Nishanth Aravamudan	2fabf084b6	powerpc: reorder per-cpu NUMA information's initialization There is an issue currently where NUMA information is used on powerpc (and possibly ia64) before it has been read from the device-tree, which leads to large slab consumption with CONFIG_SLUB and memoryless nodes. NUMA powerpc non-boot CPU's cpu_to_node/cpu_to_mem is only accurate after start_secondary(), similar to ia64, which is invoked via smp_init(). Commit `6ee0578b4d` ("workqueue: mark init_workqueues() as early_initcall()") made init_workqueues() be invoked via do_pre_smp_initcalls(), which is obviously before the secondary processors are online. Additionally, the following commits changed init_workqueues() to use cpu_to_node to determine the node to use for kthread_create_on_node: `bce903809a` ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]") `f3f90ad469` ("workqueue: determine NUMA node of workers accourding to the allowed cpumask") Therefore, when init_workqueues() runs, it sees all CPUs as being on Node 0. On LPARs or KVM guests where Node 0 is memoryless, this leads to a high number of slab deactivations (http://www.spinics.net/lists/linux-mm/msg67489.html). Fix this by initializing the powerpc-specific CPU<->node/local memory node mapping as early as possible, which on powerpc is do_init_bootmem(). Currently that function initializes the mapping for the boot CPU, but we extend it to setup the mapping for all possible CPUs. Then, in smp_prepare_cpus(), we can correspondingly set the per-cpu values for all possible CPUs. That ensures that before the early_initcalls run (and really as early as possible), the per-cpu NUMA mapping is accurate. While testing memoryless nodes on PowerKVM guests with a fix to the workqueue logic to use cpu_to_mem() instead of cpu_to_node(), with a guest topology of: available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 node 1 size: 16336 MB node 1 free: 15329 MB node distances: node 0 1 0: 10 40 1: 40 10 the slab consumption decreases from Slab: 932416 kB SUnreclaim: 902336 kB to Slab: 395264 kB SUnreclaim: 359424 kB And we a corresponding increase in the slab efficiency from slab mem objs slabs used active active ------------------------------------------------------------ kmalloc-16384 337 MB 11.28% 100.00% task_struct 288 MB 9.93% 100.00% to slab mem objs slabs used active active ------------------------------------------------------------ kmalloc-16384 37 MB 100.00% 100.00% task_struct 31 MB 100.00% 100.00% Powerpc didn't support memoryless nodes until recently (`64bb80d87f` "powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES" and `8c27226119` "powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"). Those commits also helped improve memory consumption with these kind of environments. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-13 15:14:05 +10:00
Scott Wood	5d61a2172a	powerpc/nohash: Split __early_init_mmu() into boot and secondary __early_init_mmu() does some things that are really only needed by the boot cpu. On FSL booke, This includes calling memblock_enforce_memory_limit(), which is labelled __init. Secondary cpu init code can't be __init as that would break CPU hotplug. While it's probably a bug that memblock_enforce_memory_limit() isn't __init_memblock instead, there's no reason why we should be doing this stuff for secondary cpus in the first place. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-13 15:13:25 +10:00
Laura Abbott	308c09f17d	lib/scatterlist: make ARCH_HAS_SG_CHAIN an actual Kconfig Rather than have architectures #define ARCH_HAS_SG_CHAIN in an architecture specific scatterlist.h, make it a proper Kconfig option and use that instead. At same time, remove the header files are are now mostly useless and just include asm-generic/scatterlist.h. [sfr@canb.auug.org.au: powerpc files now need asm/dma.h] Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [powerpc] Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-08 15:57:26 -07:00
Linus Torvalds	f536b3cae8	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc new goodies for 3.17. The short story: The biggest bit is Michael removing all of pre-POWER4 processor support from the 64-bit kernel. POWER3 and rs64. This gets rid of a ton of old cruft that has been bitrotting in a long while. It was broken for quite a few versions already and nobody noticed. Nobody uses those machines anymore. While at it, he cleaned up a bunch of old dusty cabinets, getting rid of a skeletton or two. Then, we have some base VFIO support for KVM, which allows assigning of PCI devices to KVM guests, support for large 64-bit BARs on "powernv" platforms, support for HMI (Hardware Management Interrupts) on those same platforms, some sparse-vmemmap improvements (for memory hotplug), There is the usual batch of Freescale embedded updates (summary in the merge commit) and fixes here or there, I think that's it for the highlights" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits) powerpc/eeh: Export eeh_iommu_group_to_pe() powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API powerpc: Reduce scariness of interrupt frames in stack traces powerpc: start loop at section start of start in vmemmap_populated() powerpc: implement vmemmap_free() powerpc: implement vmemmap_remove_mapping() for BOOK3S powerpc: implement vmemmap_list_free() powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE powerpc/book3s: Fix endianess issue for HMI handling on napping cpus. powerpc/book3s: handle HMIs for cpus in nap mode. powerpc/powernv: Invoke opal call to handle hmi. powerpc/book3s: Add basic infrastructure to handle HMI in Linux. powerpc/iommu: Fix comments with it_page_shift powerpc/powernv: Handle compound PE in config accessors powerpc/powernv: Handle compound PE for EEH powerpc/powernv: Handle compound PE powerpc/powernv: Split ioda_eeh_get_state() powerpc/powernv: Allow to freeze PE powerpc/powernv: Enable M64 aperatus for PHB3 powerpc/eeh: Aux PE data for error log ...	2014-08-07 08:50:34 -07:00
Wang Nan	f51202de8f	memory-hotplug: ppc: suitable memory should go to ZONE_MOVABLE This patch introduces zone_for_memory() to arch_add_memory() on powerpc to ensure new, higher memory added into ZONE_MOVABLE if movable zone has already setup. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Mel Gorman" <mgorman@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:21 -07:00
Li Zhong	16a05bff12	powerpc: start loop at section start of start in vmemmap_populated() vmemmap_populated() checks whether the [start, start + page_size) has valid pfn numbers, to know whether a vmemmap mapping has been created that includes this range. Some range before end might not be checked by this loop: sec11start......start11..sec11end/sec12start..end....start12..sec12end as the above, for start11(section 11), it checks [sec11start, sec11end), and loop ends as the next start(start12) is bigger than end. However, [sec11end/sec12start, end) is not checked here. So before the loop, adjust the start to be the start of the section, so we don't miss ranges like the above. After we adjust start to be the start of the section, it also means it's aligned with vmemmap as of the sizeof struct page, so we could use page_to_pfn directly in the loop. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-05 16:34:23 +10:00
Li Zhong	71b0bfe4f1	powerpc: implement vmemmap_free() vmemmap_free() does the opposite of vmemap_populate(). This patch also puts vmemmap_free() and vmemmap_list_free() into CONFIG_MEMMORY_HOTPLUG. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-05 16:34:19 +10:00
Li Zhong	ed5694a846	powerpc: implement vmemmap_remove_mapping() for BOOK3S This is to be called in vmemmap_free(), leave the implementation on BOOK3E empty as before. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-05 16:34:15 +10:00
Li Zhong	bd8cb03dbe	powerpc: implement vmemmap_list_free() This patch implements vmemmap_list_free() for vmemmap_free(). The freed entries will be removed from vmemmap_list, and form a freed list, with next as the header. The next position in the last allocated page is kept at the list tail. When allocation, if there are freed entries left, get it from the freed list; if no freed entries left, get it like before from the last allocated pages. With this change, realmode_pfn_to_page() also needs to be changed to walk all the entries in the vmemmap_list, as the virt_addr of the entries might not be stored in order anymore. It helps to reuse the memory when continuous doing memory hot-plug/remove operations, but didn't reclaim the pages already allocated, so the memory usage will only increase, but won't exceed the value for the largest memory configuration. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-05 16:34:10 +10:00
Scott Wood	7d17622159	powerpc/64e: Add __ref to early_alloc_pgtable() This silences a section mismatch warning. early_alloc_pgtable() is called from map_kernel_page() which cannot be __init, but only when slab_is_available() returns false which can only happen during early boot. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-05 15:40:49 +10:00
Andrey Utkin	b00fc6ec1f	powerpc/mm/numa: Fix break placement Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81631 Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-08-05 15:39:50 +10:00
Benjamin Herrenschmidt	9287b95ec9	Merge remote-tracking branch 'scott/next' into next Scott writes: Highlights include e6500 hardware threading support, an e6500 TLB erratum workaround, corenet error reporting, support for a new board, and some minor fixes.	2014-08-05 14:13:41 +10:00
Scott Wood	48cd9b5d59	powerpc/e6500: Work around erratum A-008139 Erratum A-008139 can cause duplicate TLB entries if an indirect entry is overwritten using tlbwe while the other thread is using it to do a lookup. Work around this by using tlbilx to invalidate prior to overwriting. To avoid the need to save another register to hold MAS1 during the workaround code, TID clearing has been moved from tlb_miss_kernel_e6500 until after the SMT section. Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-07-29 19:26:29 -05:00
Michael Ellerman	0f369103ce	powerpc: Remove power3 from comments There are still a few occurences where it remains, because it helps to explain something that persists. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-07-28 14:10:26 +10:00
Michael Ellerman	c3993f1007	powerpc: Remove CONFIG_POWER3 Now that we have dropped power3 support we can remove CONFIG_POWER3. The usage in pgtable_32.c was already dead code as CONFIG_POWER3 was not selectable on PPC32. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-07-28 14:10:24 +10:00
Michael Ellerman	13b3d13b81	powerpc: Remove MMU_FTR_SLB We now only support cpus that use an SLB, so we don't need an MMU feature to indicate that. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-07-28 14:10:23 +10:00
Michael Ellerman	376af5947c	powerpc: Remove STAB code Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had a Segment Table (STAB). Now that we've dropped support for those cpus, we can remove the STAB support entirely. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-07-28 14:10:22 +10:00
Benjamin Herrenschmidt	5b97259220	Merge branch 'merge' into next	2014-07-11 15:38:12 +10:00
Michael Ellerman	cd68098bce	powerpc: Clean up MMU_FTRS_A2 and MMU_FTR_TYPE_3E In `fb5a515704` "powerpc: Remove platforms/wsp and associated pieces", we removed the last user of MMU_FTRS_A2. So remove it. MMU_FTRS_A2 was the last user of MMU_FTR_TYPE_3E, so remove it also. This leaves some unreachable code in mmu_context_nohash.c, so remove that also. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-07-11 12:55:05 +10:00
Laurentiu Tudor	7c48005036	powerpc/booke64: wrap tlb lock and search in htw miss with FTR_SMT Virtualized environments may expose a e6500 dual-threaded core as two single-threaded e6500 cores. Take advantage of this and get rid of the tlb lock and the trap-causing tlbsx in the htw miss handler by guarding with CPU_FTR_SMT, as it's already being done in the bolted tlb1 miss handler. As seen in the results below, measurements done with lmbench random memory access latency test running under Freescale's Embedded Hypervisor, there is a ~34% improvement. Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) ---------------------------------------------------- Host Mhz L1 $ L2 $ Main mem Rand mem --------- --- ---- ---- -------- -------- smt 1665 1.8020 13.2 83.0 1149.7 nosmt 1665 1.8020 13.2 83.0 758.1 Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com> Cc: Scott Wood <scottwood@freescale.com> [scottwood@freescale.com: commit message tweak] Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-06-20 18:48:34 -05:00
Scott Wood	1cb4ed92f6	powerpc/e6500: hw tablewalk: fix recursive tlb lock on cpu 0 Commit `82d86de25b` "TLB lock recursive" introduced a bug whereby cpu 0 uses the same value for "lock held" as is used to indicate that the lock is free. This means that cpu 1 can acquire the lock whenever it wants, regardless of whether cpu 0 has it locked, which in turn means we can get duplicate TLB entries. Add one to the CPU value to ensure we do not use zero as a "lock held" value. Signed-off-by: Scott Wood <scottwood@freescale.com> Reported-by: Ed Swarthout <ed.swarthout@freescale.com>	2014-06-20 18:48:30 -05:00
Scott Wood	bbd08c72c6	powerpc/e6500: hw tablewalk: clear TID in kernel indirect entries Previously TID was being cleared before the tlbsx, but not after. This can lead to a multiway hit between a TLB entry with TID=0 (previously inserted when PID=0) and a TLB entry with TID!=0 that matches PID. This can theoretically result in undefined behavior, though we probably get lucky due to the details of the overlap. It also results in the inability to use multihit detection to detect other conflicting TLB entries, as well as poorer TLB utilization due to duplicating kernel TLB entries. Rather than try to patch up MAS1 after tlbsx, the entire value is saved/restored as with MAS2. I observed a slight improvement in TLB miss performance with this patch applied. Signed-off-by: Scott Wood <scottwood@freescale.com> Reported-by: Ed Swarthout <ed.swarthout@freescale.com>	2014-06-20 18:48:29 -05:00
Linus Torvalds	c5aec4c76a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "Here is the bulk of the powerpc changes for this merge window. It got a bit delayed in part because I wasn't paying attention, and in part because I discovered I had a core PCI change without a PCI maintainer ack in it. Bjorn eventually agreed it was ok to merge it though we'll probably improve it later and I didn't want to rebase to add his ack. There is going to be a bit more next week, essentially fixes that I still want to sort through and test. The biggest item this time is the support to build the ppc64 LE kernel with our new v2 ABI. We previously supported v2 userspace but the kernel itself was a tougher nut to crack. This is now sorted mostly thanks to Anton and Rusty. We also have a fairly big series from Cedric that add support for 64-bit LE zImage boot wrapper. This was made harder by the fact that traditionally our zImage wrapper was always 32-bit, but our new LE toolchains don't really support 32-bit anymore (it's somewhat there but not really "supported") so we didn't want to rely on it. This meant more churn that just endian fixes. This brings some more LE bits as well, such as the ability to run in LE mode without a hypervisor (ie. under OPAL firmware) by doing the right OPAL call to reinitialize the CPU to take HV interrupts in the right mode and the usual pile of endian fixes. There's another series from Gavin adding EEH improvements (one day we will have a release with less than 20 EEH patches, I promise!). Another highlight is the support for the "Split core" functionality on P8 by Michael. This allows a P8 core to be split into "sub cores" of 4 threads which allows the subcores to run different guests under KVM (the HW still doesn't support a partition per thread). And then the usual misc bits and fixes ..." [ Further delayed by gmail deciding that BenH is a dirty spammer. Google knows. ] * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (155 commits) powerpc/powernv: Add missing include to LPC code selftests/powerpc: Test the THP bug we fixed in the previous commit powerpc/mm: Check paca psize is up to date for huge mappings powerpc/powernv: Pass buffer size to OPAL validate flash call powerpc/pseries: hcall functions are exported to modules, need _GLOBAL_TOC() powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC() powerpc/powernv: Set memory_block_size_bytes to 256MB powerpc: Allow ppc_md platform hook to override memory_block_size_bytes powerpc/powernv: Fix endian issues in memory error handling code powerpc/eeh: Skip eeh sysfs when eeh is disabled powerpc: 64bit sendfile is capped at 2GB powerpc/powernv: Provide debugfs access to the LPC bus via OPAL powerpc/serial: Use saner flags when creating legacy ports powerpc: Add cpu family documentation powerpc/xmon: Fix up xmon format strings powerpc/powernv: Add calls to support little endian host powerpc: Document sysfs DSCR interface powerpc: Fix regression of per-CPU DSCR setting powerpc: Split __SYSFS_SPRSETUP macro arch: powerpc/fadump: Cleaning up inconsistent NULL checks ...	2014-06-10 18:54:22 -07:00
Michael Ellerman	09567e7fd4	powerpc/mm: Check paca psize is up to date for huge mappings We have a bug in our hugepage handling which exhibits as an infinite loop of hash faults. If the fault is being taken in the kernel it will typically trigger the softlockup detector, or the RCU stall detector. The bug is as follows: 1. mmap(0xa0000000, ..., MAP_FIXED \| MAP_HUGE_TLB \| MAP_ANONYMOUS ..) 2. Slice code converts the slice psize to 16M. 3. The code on lines 539-540 of slice.c in slice_get_unmapped_area() synchronises the mm->context with the paca->context. So the paca slice mask is updated to include the 16M slice. 3. Either: * mmap() fails because there are no huge pages available. * mmap() succeeds and the mapping is then munmapped. In both cases the slice psize remains at 16M in both the paca & mm. 4. mmap(0xa0000000, ..., MAP_FIXED \| MAP_ANONYMOUS ..) 5. The slice psize is converted back to 64K. Because of the check on line 539 of slice.c we DO NOT update the paca->context. The paca slice mask is now out of sync with the mm slice mask. 6. User/kernel accesses 0xa0000000. 7. The SLB miss handler slb_allocate_realmode() uses the paca slice mask to create an SLB entry and inserts it in the SLB. 18. With the 16M SLB entry in place the hardware does a hash lookup, no entry is found so a data access exception is generated. 19. The data access handler calls do_page_fault() -> handle_mm_fault(). 10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page(). 11. The hardware retries the access, there is still nothing in the hash table so once again a data access exception is generated. 12. hash_page() calls into __hash_page_thp() and inserts a mapping in the hash. Although the THP mapping maps 16M the hashing is done using 64K as the segment page size. 13. hash_page() returns immediately after calling __hash_page_thp(), skipping over the code at line 1125. Resulting in the mismatch between the paca->context and mm->context not being detected. 14. The hardware retries the access, the hash it generates using the 16M SLB entry does NOT match the hash we inserted. 15. We take another data access and go into __hash_page_thp(). 16. We see a valid entry in the hpte_slot_array and so we call updatepp() which succeeds. 17. Goto 14. We could fix this in two ways. The first would be to remove or modify the check on line 539 of slice.c. The second option is to cause the check of paca psize in hash_page() on line 1125 to also be done for THP pages. We prefer the latter, because the check & update of the paca psize is not done until we know it's necessary. It's also done only on the current cpu, so we don't need to IPI all other cpus. Without further rearranging the code, the simplest fix is to pull out the code that checks paca psize and call it in two places. Firstly for THP/hugetlb, and secondly for other mappings as before. Thanks to Dave Jones for trinity, which originally found this bug. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: stable@vger.kernel.org [v3.11+]	2014-06-06 13:54:26 +10:00
Naoya Horiguchi	c177c81e09	hugetlb: restrict hugepage_migration_support() to x86_64 Currently hugepage migration is available for all archs which support pmd-level hugepage, but testing is done only for x86_64 and there're bugs for other archs. So to avoid breaking such archs, this patch limits the availability strictly to x86_64 until developers of other archs get interested in enabling this feature. Simply disabling hugepage migration on non-x86_64 archs is not enough to fix the reported problem where sys_move_pages() hits the BUG_ON() in follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is supported in vma_migratable(). Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:53:51 -07:00
Linus Torvalds	d27050641e	DeviceTree for 3.16: - Another round of clean-up of FDT related code in architecture code. This removes knowledge of internal FDT details from most architectures except powerpc. - Conversion of kernel's custom FDT parsing code to use libfdt. - DT based initialization for generic serial earlycon. The introduction of generic serial earlycon support went in thru tty tree. - Improve the platform device naming for DT probed devices to ensure unique naming and use parent names instead of a global index. - Fix a race condition in of_update_property. - Unify the various linker section OF match tables and fix several function prototype errors. - Update platform_get_irq_byname to work in deferred probe cases. - 2 binding doc updates -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTjzgyAAoJEMhvYp4jgsXiFsUH/1PMTGo8CyD62VQD5ZKdAoW+ Fq6vCiRQ8assF5i5ZLcW1DqhjtoRaCKYhVbRKa5lj7cZdjlSpacI/qQPrF5Br2Ii bTE3Ff/AQwipQaz/Bj7HqJCgGwfWK8xdfgW0abKsyXMWDN86Bov/zzeu8apmws0x H1XjJRgnc/rzM4m9ny6+lss0iq6YL54SuTYNzHR33+Ywxls69SfHXIhCW0KpZcBl 5U3YUOomt40GfO46sxFA4xApAhypEK4oVq7asyiA2ArTZ/c2Pkc9p5CBqzhDLmlq yioWTwHIISv0q+yMLCuQrVGIsbUDkQyy7RQ15z6U+/e/iGO/M+j3A5yxMc3qOi4= =Onff -----END PGP SIGNATURE----- Merge tag 'devicetree-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux into next Pull DeviceTree updates from Rob Herring: - Another round of clean-up of FDT related code in architecture code. This removes knowledge of internal FDT details from most architectures except powerpc. - Conversion of kernel's custom FDT parsing code to use libfdt. - DT based initialization for generic serial earlycon. The introduction of generic serial earlycon support went in through the tty tree. - Improve the platform device naming for DT probed devices to ensure unique naming and use parent names instead of a global index. - Fix a race condition in of_update_property. - Unify the various linker section OF match tables and fix several function prototype errors. - Update platform_get_irq_byname to work in deferred probe cases. - 2 binding doc updates * tag 'devicetree-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (58 commits) of: handle NULL node in next_child iterators of/irq: provide more wrappers for !CONFIG_OF devicetree: bindings: Document micrel vendor prefix dt: bindings: dwc2: fix required value for the phy-names property of_pci_irq: kill useless variable in of_irq_parse_pci() of/irq: do irq resolution in platform_get_irq_byname() of: Add a testcase for of_find_node_by_path() of: Make of_find_node_by_path() handle /aliases of: Create unlocked version of for_each_child_of_node() lib: add glibc style strchrnul() variant of: Handle memory@0 node on PPC32 only pci/of: Remove dead code of: fix race between search and remove in of_update_property() of: Use NULL for pointers of: Stop naming platform_device using dcr address of: Ensure unique names without sacrificing determinism tty/serial: pl011: add DT based earlycon support of/fdt: add FDT serial scanning for earlycon of/fdt: add FDT address translation support serial: earlycon: add DT support ...	2014-06-04 10:02:38 -07:00
Linus Torvalds	b05d59dfce	At over 200 commits, covering almost all supported architectures, this was a pretty active cycle for KVM. Changes include: - a lot of s390 changes: optimizations, support for migration, GDB support and more - ARM changes are pretty small: support for the PSCI 0.2 hypercall interface on both the guest and the host (the latter acked by Catalin) - initial POWER8 and little-endian host support - support for running u-boot on embedded POWER targets - pretty large changes to MIPS too, completing the userspace interface and improving the handling of virtualized timer hardware - for x86, a larger set of changes is scheduled for 3.17. Still, we have a few emulator bugfixes and support for running nested fully-virtualized Xen guests (para-virtualized Xen guests have always worked). And some optimizations too. The only missing architecture here is ia64. It's not a coincidence that support for KVM on ia64 is scheduled for removal in 3.17. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1 j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8 W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ 0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl 3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz bAknaZpPiOrW11Et1htY =j5Od -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next Pull KVM updates from Paolo Bonzini: "At over 200 commits, covering almost all supported architectures, this was a pretty active cycle for KVM. Changes include: - a lot of s390 changes: optimizations, support for migration, GDB support and more - ARM changes are pretty small: support for the PSCI 0.2 hypercall interface on both the guest and the host (the latter acked by Catalin) - initial POWER8 and little-endian host support - support for running u-boot on embedded POWER targets - pretty large changes to MIPS too, completing the userspace interface and improving the handling of virtualized timer hardware - for x86, a larger set of changes is scheduled for 3.17. Still, we have a few emulator bugfixes and support for running nested fully-virtualized Xen guests (para-virtualized Xen guests have always worked). And some optimizations too. The only missing architecture here is ia64. It's not a coincidence that support for KVM on ia64 is scheduled for removal in 3.17" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits) KVM: add missing cleanup_srcu_struct KVM: PPC: Book3S PR: Rework SLB switching code KVM: PPC: Book3S PR: Use SLB entry 0 KVM: PPC: Book3S HV: Fix machine check delivery to guest KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs KVM: PPC: Book3S HV: Make sure we don't miss dirty pages KVM: PPC: Book3S HV: Fix dirty map for hugepages KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number KVM: PPC: Book3S: Add ONE_REG register names that were missed KVM: PPC: Add CAP to indicate hcall fixes KVM: PPC: MPIC: Reset IRQ source private members KVM: PPC: Graciously fail broken LE hypercalls PPC: ePAPR: Fix hypercall on LE guest KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler KVM: PPC: BOOK3S: Always use the saved DAR value PPC: KVM: Make NX bit available with magic page KVM: PPC: Disable NX for old magic page using guests KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest ...	2014-06-04 08:47:12 -07:00
Alexander Graf	d8d164a985	KVM: PPC: Book3S PR: Rework SLB switching code On LPAR guest systems Linux enables the shadow SLB to indicate to the hypervisor a number of SLB entries that always have to be available. Today we go through this shadow SLB and disable all ESID's valid bits. However, pHyp doesn't like this approach very much and honors us with fancy machine checks. Fortunately the shadow SLB descriptor also has an entry that indicates the number of valid entries following. During the lifetime of a guest we can just swap that value to 0 and don't have to worry about the SLB restoration magic. While we're touching the code, let's also make it more readable (get rid of rldicl), allow it to deal with a dynamic number of bolted SLB entries and only do shadow SLB swizzling on LPAR systems. Signed-off-by: Alexander Graf <agraf@suse.de>	2014-05-30 14:26:30 +02:00
Scott Wood	e57eeae4e6	powerpc/fsl-booke64: Set vmemmap_psize to 4K The only way Freescale booke chips support mappings larger than 4K is via TLB1. The only way we support (direct) TLB1 entries is via hugetlb, which is not what map_kernel_page() does when given a large page size. Without this, a kernel with CONFIG_SPARSEMEM_VMEMMAP enabled crashes on boot with messages such as: PID hash table entries: 4096 (order: 3, 32768 bytes) Sorting __ex_table... BUG: Bad page state in process swapper pfn:00a2f page:8000040000023a48 count:0 mapcount:0 mapping:0000040000ffce48 index:0x40000ffbe50 page flags: 0x40000ffda40(active\|arch_1\|private\|private_2\|head\|tail\|swapcache\|mappedtodisk\|reclaim\|swapbacked\|unevictable\|mlocked) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: page flags: 0x311840(active\|private\|private_2\|swapcache\|unevictable\|mlocked) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.15.0-rc1-00003-g7fa250c #299 Call Trace: [c00000000098ba20] [c000000000008b3c] .show_stack+0x7c/0x1cc (unreliable) [c00000000098baf0] [c00000000060aa50] .dump_stack+0x88/0xb4 [c00000000098bb70] [c0000000000c0468] .bad_page+0x144/0x1a0 [c00000000098bc10] [c0000000000c0628] .free_pages_prepare+0x164/0x17c [c00000000098bcc0] [c0000000000c24cc] .free_hot_cold_page+0x48/0x214 [c00000000098bd60] [c00000000086c318] .free_all_bootmem+0x1fc/0x354 [c00000000098be70] [c00000000085da84] .mem_init+0xac/0xdc [c00000000098bef0] [c0000000008547b0] .start_kernel+0x21c/0x4d4 [c00000000098bf90] [c000000000000448] .start_here_common+0x20/0x58 Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-05-22 18:08:12 -05:00
Rob Herring	eafd370dfe	Merge branch 'dt-bus-name' into for-next	2014-05-13 18:34:35 -05:00
Paolo Bonzini	5367742ad5	Patch queue for 3.15 - 2014-05-12 This request includes a few bug fixes that really shouldn't wait for the next release. It fixes KVM on 32bit PowerPC when built as module. It also fixes the PV KVM acceleration when NX gets honored by the host. Furthermore we fix transactional memory support and numa support on HV KVM. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJTcKFaAAoJECszeR4D/txg7qYP/RX3V32i2zQYH2NpjQrDCwtY Wur+CQrn/VA6xhtTK1rT2zH5rNFLt6ClhtxCMkZFfBdUE4sHi3OTlEdcvXBZjbls JqQ/7lBkUPN8pTpz2NHP9gvH7g6v07EruysRQNa/JZMzlwhpzWk8D7yXakaCPNY/ JZRgVTrfKnhQ8OtXt48Bp4EmEKllbNqi9kNN7dewD2dEb3fAco3Jpk6WoeG+1f0o jv3NmeTsp87KaRpjvDzPb7iCe6PA7GVqvJIQpir3Rpk2Kpx0yj58AfacF+f72GOf CPlJGepiumJCaANhV6dbvtS49vaiiAnSvbqCil2USNl0LIGWQXdSjs5lztEuiMyr tAav0YSVpnIcw0HJxXug/M31VwfRjYCX3hnCCIOd3Xj2jgAqwD+Lo95uUrRGJ9TP 75zKh8E093tOXIC9CyMaiYajpFMUrCSMgnpJ+7fpeHiyigB6yc8juFxahIHsw8q1 NgHggroJm6QNIm8JSY/tG/YET4AT7H4ZetGP8MeeRUg0TpqQXvYpkMGB8YDouaBA XzxjwyTq57BOYgLGExnwW3Jj0kbqVY+ts0aDGQVGrl5YFzooGqrQ61CRmwG5BvI8 sou3l6TJ2ng8qrc7Maw9MHca1QB3mtXD7I26T/QEfQm9NLRTTqJyaxH5J1q9siRI PpHVE5FKnmWPNr8JlxtC =t2S+ -----END PGP SIGNATURE----- Merge tag 'signed-for-3.15' of git://github.com/agraf/linux-2.6 into kvm-master Patch queue for 3.15 - 2014-05-12 This request includes a few bug fixes that really shouldn't wait for the next release. It fixes KVM on 32bit PowerPC when built as module. It also fixes the PV KVM acceleration when NX gets honored by the host. Furthermore we fix transactional memory support and numa support on HV KVM.	2014-05-13 18:15:16 +02:00
Benjamin Herrenschmidt	f6869e7fe6	Merge remote-tracking branch 'anton/abiv2' into next This series adds support for building the powerpc 64-bit LE kernel using the new ABI v2. We already supported running ABI v2 userspace programs but this adds support for building the kernel itself using the new ABI.	2014-05-05 20:57:12 +10:00
Liu Ping Fan	5a4e58bc69	powerpc/mm: use macro PGTABLE_EADDR_SIZE instead of digital In case of extending the eaddr in future, use this macro PGTABLE_EADDR_SIZE to ease the maintenance of the code. Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-05-01 08:26:39 +10:00
Alexander Graf	9048e648bc	powerpc: Use 64k io pages when we never see an HEA When we never get around to seeing an HEA ethernet adapter, there's no point in restricting ourselves to 4k IO page size. This speeds up IO maps when CONFIG_IBMEBUS is disabled. [ Updated the test to also lift the restriction on arch 2.07 (Power 8) which cannot have an HEA -- BenH ] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> foo	2014-05-01 08:25:16 +10:00
Rob Herring	9d0c4dfedd	of/fdt: update of_get_flat_dt_prop in prep for libfdt Make of_get_flat_dt_prop arguments compatible with libfdt fdt_getprop call in preparation to convert FDT code to use libfdt. Make the return value const and the property length ptr type an int. Signed-off-by: Rob Herring <robh@kernel.org> Tested-by: Michal Simek <michal.simek@xilinx.com> Tested-by: Grant Likely <grant.likely@linaro.org> Tested-by: Stephen Chivers <schivers@csc.com>	2014-04-30 00:59:15 -05:00
Alexander Graf	b18db0b808	KVM guest: Make pv trampoline code executable Our PV guest patching code assembles chunks of instructions on the fly when it encounters more complicated instructions to hijack. These instructions need to live in a section that we don't mark as non-executable, as otherwise we fault when jumping there. Right now we put it into the .bss section where it automatically gets marked as non-executable. Add a check to the NX setting function to ensure that we leave these particular pages executable. Signed-off-by: Alexander Graf <agraf@suse.de>	2014-04-29 12:36:09 +02:00
Aneesh Kumar K.V	29ef7a3e26	powerpc/mm: Fix tlbie to add AVAL fields for 64K pages The if condition check was based on a draft ISA doc. Remove the same. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-04-28 13:11:24 +10:00
Anton Blanchard	b86206e4c3	powerpc: Fix branch patching code for ABIv2 The MMU hashtable and SLB branch patching code uses function pointers for the update sites. This creates a difference between ABIv1 and ABIv2 because we don't have function descriptors on ABIv2. Get rid of the function pointer and just point at the update sites directly. This works on both ABIs. Signed-off-by: Anton Blanchard <anton@samba.org>	2014-04-23 10:05:22 +10:00
Anton Blanchard	26f9206056	powerpc: Use ppc_function_entry instead of open coding it Replace FUNCTION_TEXT with ppc_function_entry which can handle both ABIv1 and ABIv2. Signed-off-by: Anton Blanchard <anton@samba.org>	2014-04-23 10:05:22 +10:00
Anton Blanchard	b1576fec7f	powerpc: No need to use dot symbols when branching to a function binutils is smart enough to know that a branch to a function descriptor is actually a branch to the functions text address. Alan tells me that binutils has been doing this for 9 years. Signed-off-by: Anton Blanchard <anton@samba.org>	2014-04-23 10:05:16 +10:00
Mike Qiu	12c743eb22	powerpc/mm: fix ".__node_distance" undefined CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h ... Building modules, stage 2. WARNING: 1 bad relocations c0000000013d6a30 R_PPC64_ADDR64 uprobes_fetch_type_table WRAP arch/powerpc/boot/zImage.pseries WRAP arch/powerpc/boot/zImage.epapr MODPOST 1849 modules ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined! make[1]: * [__modpost] Error 1 make: * [modules] Error 2 make: *** Waiting for unfinished jobs.... The reason is symbol "__node_distance" not been exported in powerpc. Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Alistair Popple <alistair@popple.id.au> Cc: Mike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-18 16:40:08 -07:00
Michael Wang	9a0133613e	power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update Since v1: Edited the comment according to Srivatsa's suggestion. During the testing, we encounter below WARN followed by Oops: WARNING: at kernel/sched/core.c:6218 ... NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200 LR [c000000000101358] .build_sched_domains+0xec8/0x1200 PACATMSCRATCH [800000000000f032] Call Trace: [c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0 LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200 PACATMSCRATCH [8000000000029032] Call Trace: [c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0 [c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... This was caused by that 'sd->groups == NULL' after building groups, which was caused by the empty 'sd->span'. The cpu's domain contained nothing because the cpu was assigned to a wrong node, due to the following unfortunate sequence of events: 1. The hypervisor sent a topology update to the guest OS, to notify changes to the cpu-node mapping. However, the update was actually redundant - i.e., the "new" mapping was exactly the same as the old one. 2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting the 'for-loop' in arch_update_cpu_topology(). 3. So we ended up calling stop-machine() with an empty cpumask list, which made stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as the cpu to run the payload (the update_cpu_topology() function). 4. This causes update_cpu_topology() to be run by CPU0. And since 'updates' is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology() finds update->cpu as well as update->new_nid to be 0. In other words, we end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly. Along with the following wrong updating, it causes the sched-domain rebuild code to break and crash the system. Fix this by skipping the topology update in cases where we find that the topology has not actually changed in reality (ie., spurious updates). CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> CC: Andrew Morton <akpm@linux-foundation.org> CC: Robert Jennings <rcj@linux.vnet.ibm.com> CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com> CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> CC: Alistair Popple <alistair@popple.id.au> Suggested-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-04-09 12:54:17 +10:00
Aneesh Kumar K.V	1dc954bd2f	powerpc/mm: NUMA pte should be handled via slow path in get_user_pages_fast() We need to handle numa pte via the slow path Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-04-09 12:53:03 +10:00
Benjamin Herrenschmidt	cd42748535	Merge remote-tracking branch 'scott/next' into next Freescale updates from Scott. Mostly support for critical and machine check exceptions on 64-bit BookE, some new PCI suspend/resume work and misc bits.	2014-03-24 10:26:10 +11:00
Aneesh Kumar K.V	346519a160	powerpc/mm: Make sure a local_irq_disable prevent a parallel THP split We have generic code like the one in get_futex_key that assume that a local_irq_disable prevents a parallel THP split. Support that by adding a dummy smp call function after setting _PAGE_SPLITTING. Code paths like get_user_pages_fast still need to check for _PAGE_SPLITTING after disabling IRQ which indicate that a parallel THP splitting is ongoing. Now if they don't find _PAGE_SPLITTING set, then we can be sure that parallel split will now block in pmdp_splitting flush until we enables IRQ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-03-24 09:48:34 +11:00
Scott Wood	609af38f8f	powerpc/booke64: Critical and machine check exception support Add special state saving for critical and machine check exceptions. Most of this code could be used to handle debug exceptions taken from kernel space, but actually doing so is outside the scope of this patch. The various critical and machine check exceptions now point to their real handlers, rather than hanging the kernel. Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-03-19 19:57:27 -05:00
Scott Wood	a3dc620743	powerpc/booke64: Use SPRG_TLB_EXFRAME on bolted handlers While bolted handlers (including e6500) do not need to deal with a TLB miss recursively causing another TLB miss, nested TLB misses can still happen with crit/mc/debug exceptions -- so we still need to honor SPRG_TLB_EXFRAME. We don't need to spend time modifying it in the TLB miss fastpath, though -- the special level exception will handle that. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com> Cc: kvm-ppc@vger.kernel.org	2014-03-19 19:57:15 -05:00
Scott Wood	82d86de25b	powerpc/e6500: Make TLB lock recursive Once special level interrupts are supported, we may take nested TLB misses -- so allow the same thread to acquire the lock recursively. The lock will not be effective against the nested TLB miss handler trying to write the same entry as the interrupted TLB miss handler, but that's also a problem on non-threaded CPUs that lack TLB write conditional. This will be addressed in the patch that enables crit/mc support by invalidating the TLB on return from level exceptions. Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-03-19 19:57:13 -05:00
Benjamin Krill	60b962239a	powerpc/book3e: Fix check for linear mapping in TLB miss handler The previous code added wrong TLBs and causes machine check errors if a driver accessed passed the end of the linear mapping instead of a clean page fault. Signed-off-by: Ralph E. Bellofatto <ralphbel@us.ibm.com> Signed-off-by: Benjamin Krill <ben@codiert.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-03-07 15:54:51 +11:00
Nathan Fontenot	9ac8cde938	powerpc/pseries: Use remove_memory() to remove memory The memory remove code for powerpc/pseries should call remove_memory() so that we are holding the hotplug_memory lock during memory remove operations. This patch updates the memory node remove handler to call remove_memory() and adds a ppc_md.remove_memory() entry to handle pseries specific work that is called from arch_remove_memory(). During memory remove in pseries_remove_memblock() we have to stay with removing memory one section at a time. This is needed because of how memory resources are handled. During memory add for pseries (via the probe file in sysfs) we add memory one section at a time which gives us a memory resource for each section. Future patches will aim to address this so will not have to remove memory one section at a time. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-03-07 15:53:13 +11:00
Aneesh Kumar K.V	88247e8d7b	powerpc/mm: Add new "set" flag argument to pte/pmd update function pte_update() is a powerpc-ism used to change the bits of a PTE when the access permission is being restricted (a flush is potentially needed). It uses atomic operations on when needed and handles the hash synchronization on hash based processors. It is currently only used to clear PTE bits and so the current implementation doesn't provide a way to also set PTE bits. The new _PAGE_NUMA bit, when set, is actually restricting access so it must use that function too, so this change adds the ability for pte_update() to also set bits. We will use this later to set the _PAGE_NUMA bit. Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-17 11:19:35 +11:00
Mahesh Salgaonkar	429d2e8342	powerpc: Fix kdump hang issue on p8 with relocation on exception enabled. On p8 systems, with relocation on exception feature enabled we are seeing kdump kernel hang at interrupt vector 0xc4400. The reason is, with this feature enabled, exception are raised with MMU (IR=DR=1) ON with the default offset of 0xc4000. Since exception is raised in virtual mode it requires the vector region to be executable without which it fails to fetch and execute instruction at 0xc*4xxx. For default kernel since kernel is loaded at real 0, the htab mappings sets the entire kernel text region executable. But for relocatable kernel (e.g. kdump case) we only copy interrupt vectors down to real 0 and never marked that region as executable because in p7 and below we always get exception in real mode. This patch fixes this issue by marking htab mapping range as executable that overlaps with the interrupt vector region for relocatable kernel. Thanks to Ben who helped me to debug this issue and find the root cause. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-11 11:24:47 +11:00
Tiejun Chen	94b09d7554	powerpc/hugetlb: Replace __get_cpu_var with get_cpu_var Replace __get_cpu_var safely with get_cpu_var to avoid the following call trace: [ 7253.637591] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: hugemmap01/9048 [ 7253.637601] caller is free_hugepd_range.constprop.25+0x88/0x1a8 [ 7253.637605] CPU: 1 PID: 9048 Comm: hugemmap01 Not tainted 3.10.20-rt14+ #114 [ 7253.637606] Call Trace: [ 7253.637617] [cb049d80] [c0007ea4] show_stack+0x4c/0x168 (unreliable) [ 7253.637624] [cb049dc0] [c031c674] debug_smp_processor_id+0x114/0x134 [ 7253.637628] [cb049de0] [c0016d28] free_hugepd_range.constprop.25+0x88/0x1a8 [ 7253.637632] [cb049e00] [c001711c] hugetlb_free_pgd_range+0x6c/0x168 [ 7253.637639] [cb049e40] [c0117408] free_pgtables+0x12c/0x150 [ 7253.637646] [cb049e70] [c011ce38] unmap_region+0xa0/0x11c [ 7253.637671] [cb049ef0] [c011f03c] do_munmap+0x224/0x3bc [ 7253.637676] [cb049f20] [c011f2e0] vm_munmap+0x38/0x5c [ 7253.637682] [cb049f40] [c000ef88] ret_from_syscall+0x0/0x3c [ 7253.637686] --- Exception: c01 at 0xff16004 Signed-off-by: Tiejun Chen<tiejun.chen@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-01-29 17:02:26 +11:00
jmarchan@redhat.com	19751c07b3	powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the allowed address space According to Posix, if MAP_FIXED is specified mmap shall set ENOMEM if the requested mapping exceeds the allowed range for address space of the process. The generic code set it right, but the specific powerpc slice_get_unmapped_area() function currently returns -EINVAL in that case. This patch corrects it. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-01-29 17:02:25 +11:00
Joe Perches	316d718827	powerpc/numa: Fix decimal permissions This should have been octal. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-01-29 16:58:48 +11:00
Benjamin Herrenschmidt	6e677ef6fb	Merge remote-tracking branch 'scott/next' into next << This contains a fix for a chroma_defconfig build break that was introduced by e6500 tablewalk support, and a device tree binding patch that missed the previous pull request due to some last-minute polishing. >>	2014-01-29 16:55:25 +11:00
Linus Torvalds	1b17366d69	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "So here's my next branch for powerpc. A bit late as I was on vacation last week. It's mostly the same stuff that was in next already, I just added two patches today which are the wiring up of lockref for powerpc, which for some reason fell through the cracks last time and is trivial. The highlights are, in addition to a bunch of bug fixes: - Reworked Machine Check handling on kernels running without a hypervisor (or acting as a hypervisor). Provides hooks to handle some errors in real mode such as TLB errors, handle SLB errors, etc... - Support for retrieving memory error information from the service processor on IBM servers running without a hypervisor and routing them to the memory poison infrastructure. - _PAGE_NUMA support on server processors - 32-bit BookE relocatable kernel support - FSL e6500 hardware tablewalk support - A bunch of new/revived board support - FSL e6500 deeper idle states and altivec powerdown support You'll notice a generic mm change here, it has been acked by the relevant authorities and is a pre-req for our _PAGE_NUMA support" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits) powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked() powerpc: Add support for the optimised lockref implementation powerpc/powernv: Call OPAL sync before kexec'ing powerpc/eeh: Escalate error on non-existing PE powerpc/eeh: Handle multiple EEH errors powerpc: Fix transactional FP/VMX/VSX unavailable handlers powerpc: Don't corrupt transactional state when using FP/VMX in kernel powerpc: Reclaim two unused thread_info flag bits powerpc: Fix races with irq_work Move precessing of MCE queued event out from syscall exit path. pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines powerpc: Make add_system_ram_resources() __init powerpc: add SATA_MV to ppc64_defconfig powerpc/powernv: Increase candidate fw image size powerpc: Add debug checks to catch invalid cpu-to-node mappings powerpc: Fix the setup of CPU-to-Node mappings during CPU online powerpc/iommu: Don't detach device without IOMMU group powerpc/eeh: Hotplug improvement powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space powerpc/eeh: Add restore_config operation ...	2014-01-27 21:11:26 -08:00
Tang Chen	e7e8de5918	memblock: make memblock_set_node() support different memblock_type [sfr@canb.auug.org.au: fix powerpc build] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Liu Jiang <jiang.liu@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:44 -08:00
Scott Wood	9841c79c89	powerpc/booke64: Guard e6500 tlb handler with CONFIG_PPC_FSL_BOOK3E ...and make CONFIG_PPC_FSL_BOOK3E conflict with CONFIG_PPC_64K_PAGES. This fixes a build break with CONFIG_PPC_64K_PAGES on 64-bit book3e, that was introduced by commit `28efc35fe6` ("powerpc/e6500: TLB miss handler with hardware tablewalk support"). Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-01-17 18:43:14 -06:00
Benjamin Herrenschmidt	fac515db45	Merge remote-tracking branch 'scott/next' into next Freescale updates from Scott: << Highlights include 32-bit booke relocatable support, e6500 hardware tablewalk support, various e500 SPE fixes, some new/revived boards, and e6500 deeper idle and altivec powerdown modes. >>	2014-01-15 14:22:35 +11:00
Geert Uytterhoeven	4f7709248d	powerpc: Make add_system_ram_resources() __init add_system_ram_resources() is a subsys_initcall. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-01-15 13:58:52 +11:00
Srivatsa S. Bhat	68fb18aacb	powerpc: Add debug checks to catch invalid cpu-to-node mappings There have been some weird bugs in the past where the kernel tried to associate threads of the same core to different NUMA nodes, and things went haywire after that point (as expected). But unfortunately, root-causing such issues have been quite challenging, due to the lack of appropriate debug checks in the kernel. These bugs usually lead to some odd soft-lockups in the scheduler's build-sched-domain code in the CPU hotplug path, which makes it very hard to trace it back to the incorrect cpu-to-node mappings. So add appropriate debug checks to catch such invalid cpu-to-node mappings as early as possible. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-01-15 13:58:40 +11:00
Srivatsa S. Bhat	d4edc5b6c4	powerpc: Fix the setup of CPU-to-Node mappings during CPU online On POWER platforms, the hypervisor can notify the guest kernel about dynamic changes in the cpu-numa associativity (VPHN topology update). Hence the cpu-to-node mappings that we got from the firmware during boot, may no longer be valid after such updates. This is handled using the arch_update_cpu_topology() hook in the scheduler, and the sched-domains are rebuilt according to the new mappings. But unfortunately, at the moment, CPU hotplug ignores these updated mappings and instead queries the firmware for the cpu-to-numa relationships and uses them during CPU online. So the kernel can end up assigning wrong NUMA nodes to CPUs during subsequent CPU hotplug online operations (after booting). Further, a particularly problematic scenario can result from this bug: On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8) threads per core. The switch to Single-Threaded (ST) mode is performed by offlining all except the first CPU thread in each core. Switching back to SMT mode involves onlining those other threads back, in each core. Now consider this scenario: 1. During boot, the kernel gets the cpu-to-node mappings from the firmware and assigns the CPUs to NUMA nodes appropriately, during CPU online. 2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and communicates this update to the kernel. The kernel in turn updates its cpu-to-node associations and rebuilds its sched domains. Everything is fine so far. 3. Now, the user switches the machine from SMT to ST mode (say, by running ppc64_cpu --smt=1). This involves offlining all except 1 thread in each core. 4. The user then tries to switch back from ST to SMT mode (say, by running ppc64_cpu --smt=4), and this involves onlining those threads back. Since CPU hotplug ignores the new mappings, it queries the firmware and tries to associate the newly onlined sibling threads to the old NUMA nodes. This results in sibling threads within the same core getting associated with different NUMA nodes, which is incorrect. The scheduler's build-sched-domains code gets thoroughly confused with this and enters an infinite loop and causes soft-lockups, as explained in detail in commit `3be7db6ab` (powerpc: VPHN topology change updates all siblings). So to fix this, use the numa_cpu_lookup_table to remember the updated cpu-to-node mappings, and use them during CPU hotplug online operations. Further, we also need to ensure that all threads in a core are assigned to a common NUMA node, irrespective of whether all those threads were online during the topology update. To achieve this, we take care not to use cpu_sibling_mask() since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread() and set up the mappings manually using the 'threads_per_core' value for that particular platform. This helps us ensure that we don't hit this bug with any combination of CPU hotplug and SMT mode switching. Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-01-15 13:58:37 +11:00
Paul Gortmaker	c141611fb1	powerpc: Delete non-required instances of include <linux/init.h> None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. The one instance where we add an include for init.h covers off a case where that file was implicitly getting it from another header which itself didn't need it. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-01-15 13:46:44 +11:00
Scott Wood	1149e8a73f	powerpc/booke-64: fix tlbsrx. path in bolted tlb handler It was branching to the cleanup part of the non-bolted handler, which would have been bad if there were any chips with tlbsrx. that use the bolted handler. Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-01-10 17:32:02 -06:00
Scott Wood	bbead78c06	powerpc/fsl-book3e-64: Use paca for hugetlb TLB1 entry selection This keeps usage coordinated for hugetlb and indirect entries, which should make entry selection more predictable and probably improve overall performance when mixing the two. Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-01-09 17:52:20 -06:00
Scott Wood	28efc35fe6	powerpc/e6500: TLB miss handler with hardware tablewalk support There are a few things that make the existing hw tablewalk handlers unsuitable for e6500: - Indirect entries go in TLB1 (though the resulting direct entries go in TLB0). - It has threads, but no "tlbsrx." -- so we need a spinlock and a normal "tlbsx". Because we need this lock, hardware tablewalk is mandatory on e6500 unless we want to add spinlock+tlbsx to the normal bolted TLB miss handler. - TLB1 has no HES (nor next-victim hint) so we need software round robin (TODO: integrate this round robin data with hugetlb/KVM) - The existing tablewalk handlers map half of a page table at a time, because IBM hardware has a fixed 1MiB indirect page size. e6500 has variable size indirect entries, with a minimum of 2MiB. So we can't do the half-page indirect mapping, and even if we could it would be less efficient than mapping the full page. - Like on e5500, the linear mapping is bolted, so we don't need the overhead of supporting nested tlb misses. Note that hardware tablewalk does not work in rev1 of e6500. We do not expect to support e6500 rev1 in mainline Linux. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com>	2014-01-09 17:52:19 -06:00
Scott Wood	47ce8af420	powerpc: add barrier after writing kernel PTE There is no barrier between something like ioremap() writing to a PTE, and returning the value to a caller that may then store the pointer in a place that is visible to other CPUs. Such callers generally don't perform barriers of their own. Even if callers of ioremap() and similar things did use barriers, the most logical choise would be smp_wmb(), which is not architecturally sufficient when BookE hardware tablewalk is used. A full sync is specified by the architecture. For userspace mappings, OTOH, we generally already have an lwsync due to locking, and if we occasionally take a spurious fault due to not having a full sync with hardware tablewalk, it will not be fatal because we will retry rather than oops. Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-01-09 17:52:19 -06:00
Kevin Hao	0be7d969b0	powerpc/fsl_booke: smp support for booting a relocatable kernel above 64M When booting above the 64M for a secondary cpu, we also face the same issue as the boot cpu that the PAGE_OFFSET map two different physical address for the init tlb and the final map. So we have to use switch_to_as1/restore_to_as0 between the conversion of these two maps. When restoring to as0 for a secondary cpu, we only need to return to the caller. So add a new parameter for function restore_to_as0 for this purpose. Use LOAD_REG_ADDR_PIC to get the address of variables which may be used before we set the final map in cams for the secondary cpu. Move the setting of cams a bit earlier in order to avoid the unnecessary using of LOAD_REG_ADDR_PIC. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-01-09 17:52:18 -06:00
Kevin Hao	7d2471f9fa	powerpc/fsl_booke: make sure PAGE_OFFSET map to memstart_addr for relocatable kernel This is always true for a non-relocatable kernel. Otherwise the kernel would get stuck. But for a relocatable kernel, it seems a little complicated. When booting a relocatable kernel, we just align the kernel start addr to 64M and map the PAGE_OFFSET from there. The relocation will base on this virtual address. But if this address is not the same as the memstart_addr, we will have to change the map of PAGE_OFFSET to the real memstart_addr and do another relocation again. Signed-off-by: Kevin Hao <haokexin@gmail.com> [scottwood@freescale.com: make offset long and non-negative in simple case] Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-01-09 17:52:17 -06:00
Kevin Hao	813125d833	powerpc/fsl_booke: introduce map_mem_in_cams_addr Introduce this function so we can set both the physical and virtual address for the map in cams. This will be used by the relocation code. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-01-09 17:52:17 -06:00
Kevin Hao	78a235efdc	powerpc/fsl_booke: set the tlb entry for the kernel address in AS1 We use the tlb1 entries to map low mem to the kernel space. In the current code, it assumes that the first tlb entry would cover the kernel image. But this is not true for some special cases, such as when we run a relocatable kernel above the 64M or set CONFIG_KERNEL_START above 64M. So we choose to switch to address space 1 before setting these tlb entries. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-01-09 17:52:16 -06:00
Kevin Hao	dd189692d4	powerpc: enable the relocatable support for the fsl booke 32bit kernel This is based on the codes in the head_44x.S. The difference is that the init tlb size we used is 64M. With this patch we can only load the kernel at address between memstart_addr ~ memstart_addr + 64M. We will fix this restriction in the following patches. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-01-09 17:52:16 -06:00
Kevin Hao	7c732cba3d	powerpc/fsl_booke: protect the access to MAS7 The e500v1 doesn't implement the MAS7, so we should avoid to access this register on that implementations. In the current kernel, the access to MAS7 are protected by either CONFIG_PHYS_64BIT or MMU_FTR_BIG_PHYS. Since some code are executed before the code patching, we have to use CONFIG_PHYS_64BIT in these cases. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Scott Wood <scottwood@freescale.com>	2014-01-09 17:52:14 -06:00
Aneesh Kumar K.V	8937ba48dc	powerpc/mm: Only check for _PAGE_PRESENT in set_pte/pmd functions We want to make sure we don't use these function when updating a pte or pmd entry that have a valid hpte entry, because these functions don't invalidate them. So limit the check to _PAGE_PRESENT bit. Numafault core changes use these functions for updating _PAGE_NUMA bits. That should be ok because when _PAGE_NUMA is set we can be sure that hpte entries are not present. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-12-09 11:40:29 +11:00
Aneesh Kumar K.V	c8c06f5a0d	powerpc/mm: Free up _PAGE_COHERENCE for numa fault use later Set memory coherence always on hash64 config. If a platform cannot have memory coherence always set they can infer that from _PAGE_NO_CACHE and _PAGE_WRITETHRU like in lpar. So we dont' really need a separate bit for tracking _PAGE_COHERENCE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-12-09 11:40:28 +11:00
Kevin Hao	1e8341ae0c	powerpc: Move the patch_exception to a common place So that it can be used by other codes. No function change. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-12-02 14:06:54 +11:00
Scott Wood	d742aa152f	powerpc/booke: Only check for hugetlb in flush if vma != NULL And in flush_hugetlb_page(), don't check whether vma is NULL after we've already dereferenced it. This was found by Dan using static analysis as described here: https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-November/113161.html We currently get away with this because the callers that currently pass NULL for vma seem to be 32-bit-only (e.g. highmem, and CONFIG_DEBUG_PGALLOC in pgtable_32.c) Hugetlb is currently 64-bit only, so we never saw a NULL vma here. Signed-off-by: Scott Wood <scottwood@freescale.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>	2013-11-22 16:57:29 -06:00
Anton Blanchard	5a049f1490	powerpc: ppc64 address space capped at 32TB, mmap randomisation disabled Commit `fba2369e6c` (mm: use vm_unmapped_area() on powerpc architecture) has a bug in slice_scan_available() where we compare an unsigned long (high_slices) against a shifted int. As a result, comparisons against the top 32 bits of high_slices (representing the top 32TB) always returns 0 and the top of our mmap region is clamped at 32TB This also breaks mmap randomisation since the randomised address is always up near the top of the address space and it gets clamped down to 32TB. Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Michel Lespinasse <walken@google.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-11-21 10:33:41 +11:00
Heiko Carstens	95f715b08f	powerpc: Fix __get_user_pages_fast() irq handling __get_user_pages_fast() may be called with interrupts disabled (see e.g. get_futex_key() in kernel/futex.c) and therefore should use local_irq_save() and local_irq_restore() instead of local_irq_disable()/enable(). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> CC: <stable@vger.kernel.org> [v3.12] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-11-21 10:33:37 +11:00
Kirill A. Shutemov	4f804943f9	powerpc: handle pgtable_page_ctor() fail Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-15 09:32:18 +09:00
Xishi Qiu	6408068ee6	mm: use pgdat_end_pfn() to simplify the code in arch Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn + pgdat->node_spanned_pages". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:03 +09:00
Naoya Horiguchi	c69ded84a9	mm: remove obsolete comments about page table lock The callers of free_pgd_range() and hugetlb_free_pgd_range() don't hold page table locks. The comments seems to be obsolete, so let's remove them. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:03 +09:00
Xishi Qiu	c1ce4b375f	mm/arch: use __free_reserved_page() to simplify the code Use __free_reserved_page() to simplify the code in arch. It used split_page() in consistent_alloc()/__dma_alloc_coherent()/dma_alloc_coherent(), so page->_count == 1, and we can free it safely. __free_reserved_page() ClearPageReserved() init_page_count() // it won't change the value __free_page() Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:03 +09:00
Robert Jennings	ec32dd663d	powerpc: Fix warnings for arch/powerpc/mm/numa.c Simple fixes for sparse warnings in this file. Resolves: arch/powerpc/mm/numa.c:198:24: warning: Using plain integer as NULL pointer arch/powerpc/mm/numa.c:1157:5: warning: symbol 'hot_add_node_scn_to_nid' was not declared. Should it be static? arch/powerpc/mm/numa.c:1238:28: warning: Using plain integer as NULL pointer arch/powerpc/mm/numa.c:1538:6: warning: symbol 'topology_schedule_update' was not declared. Should it be static? Signed-off-by: Robert C Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-10-30 16:00:24 +11:00
LEROY Christophe	5f2da0f1e8	powerpc/8xx: Fixing memory init issue with CONFIG_PIN_TLB Activating CONFIG_PIN_TLB allows access to the 24 first Mbytes of memory at bootup instead of 8. It is needed for "big" kernels for instance when activating CONFIG_LOCKDEP_SUPPORT. This needs to be taken into account in init_32 too, otherwise memory allocation soon fails after startup. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: Scott Wood <scottwood@freescale.com>	2013-10-28 21:11:22 -05:00
LEROY Christophe	79df1b374b	powerpc/8xx: Revert commit `e0908085fc` The commit `e0908085fc` ("powerpc/8xx: Fix regression introduced by cache coherency rewrite") is not needed anymore. The issue was because dcbst wrongly sets the store bit when causing a DTLB error, but this is now fixed by commit `0a2ab51ffb` ("powerpc/8xx: Fixup DAR from buggy dcbX instructions.") which handles the buggy dcbx instructions on data page faults on the 8xx. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood@freescale.com: fix commit message] Signed-off-by: Scott Wood <scottwood@freescale.com>	2013-10-28 21:11:18 -05:00
Benjamin Herrenschmidt	3ad26e5c44	Merge branch 'for-kvm' into next Topic branch for commits that the KVM tree might want to pull in separately. Hand merged a few files due to conflicts with the LE stuff Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-10-11 18:23:53 +11:00
Alexey Kardashevskiy	8e0861fa3c	powerpc: Prepare to support kernel handling of IOMMU map/unmap The current VFIO-on-POWER implementation supports only user mode driven mapping, i.e. QEMU is sending requests to map/unmap pages. However this approach is really slow, so we want to move that to KVM. Since H_PUT_TCE can be extremely performance sensitive (especially with network adapters where each packet needs to be mapped/unmapped) we chose to implement that as a "fast" hypercall directly in "real mode" (processor still in the guest context but MMU off). To be able to do that, we need to provide some facilities to access the struct page count within that real mode environment as things like the sparsemem vmemmap mappings aren't accessible. This adds an API function realmode_pfn_to_page() to get page struct when MMU is off. This adds to MM a new function put_page_unless_one() which drops a page if counter is bigger than 1. It is going to be used when MMU is off (for example, real mode on PPC64) and we want to make sure that page release will not happen in real mode as it may crash the kernel in a horrible way. CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported. Cc: linux-mm@kvack.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-10-11 17:24:39 +11:00
Anton Blanchard	12f04f2be8	powerpc: Book 3S MMU little endian support Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-10-11 16:48:26 +11:00
Nathan Fontenot	f7e3334a6b	powerpc: Fix memory hotplug with sparse vmemmap Previous commit 46723bfa540... introduced a new config option HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for ppc when sparse vmemmap is not defined. This patch defines HAVE_BOOTMEM_INFO_NODE for ppc and adds the call to register_page_bootmem_info_node. Without this we get a BUG_ON for memory hot remove in put_page_bootmem(). This also adds a stub for register_page_bootmem_memmap to allow ppc to build with sparse vmemmap defined. Leaving this as a stub is fine since the same vmemmap addresses are also handled in vmemmap_populate and as such are properly mapped. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.9+]	2013-10-03 17:21:38 +10:00
Johannes Weiner	759496ba64	arch: mm: pass userspace fault flag to generic fault handler Unlike global OOM handling, memory cgroup code will invoke the OOM killer in any OOM situation because it has no way of telling faults occuring in kernel context - which could be handled more gracefully - from user-triggered faults. Pass a flag that identifies faults originating in user space from the architecture-specific fault handlers to generic code so that memcg OOM handling can be improved. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-12 15:38:01 -07:00
Naoya Horiguchi	83467efbdb	mm: migrate: check movability of hugepage in unmap_and_move_huge_page() Currently hugepage migration works well only for pmd-based hugepages (mainly due to lack of testing,) so we had better not enable migration of other levels of hugepages until we are ready for it. Some users of hugepage migration (mbind, move_pages, and migrate_pages) do page table walk and check pud/pmd_huge() there, so they are safe. But the other users (softoffline and memory hotremove) don't do this, so without this patch they can try to migrate unexpected types of hugepages. To prevent this, we introduce hugepage_migration_support() as an architecture dependent check of whether hugepage are implemented on a pmd basis or not. And on some architecture multiple sizes of hugepages are available, so hugepage_migration_support() also checks hugepage size. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-11 15:57:49 -07:00
Aneesh Kumar K.V	69e044dd75	powerpc: Fix possible deadlock on page fault stack_grow_into/14082 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<c000000000206d28>] .might_fault+0x78/0xe0 but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<c0000000007ffd8c>] .do_page_fault+0x24c/0x910 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&mm->mmap_sem); lock(&mm->mmap_sem); * DEADLOCK * May be due to missing lock nesting notation 1 lock held by stack_grow_into/14082: #0: (&mm->mmap_sem){++++++}, at: [<c0000000007ffd8c>] .do_page_fault+0x24c/0x910 stack backtrace: CPU: 21 PID: 14082 Comm: stack_grow_into Not tainted 3.10.0-10.el7.ppc64.debug #1 Call Trace: [c0000003d396b850] [c000000000016e7c] .show_stack+0x7c/0x1f0 (unreliable) [c0000003d396b920] [c000000000813fc8] .dump_stack+0x28/0x3c [c0000003d396b990] [c000000000124b90] .__lock_acquire+0x1640/0x1800 [c0000003d396bab0] [c00000000012570c] .lock_acquire+0xac/0x250 [c0000003d396bb80] [c000000000206d54] .might_fault+0xa4/0xe0 [c0000003d396bbf0] [c0000000007ffe2c] .do_page_fault+0x2ec/0x910 [c0000003d396be30] [c0000000000092e8] handle_page_fault+0x10/0x30 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-09-11 11:39:37 +10:00
Linus Torvalds	39eda2aba6	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "Here's the powerpc batch for this merge window. Some of the highlights are: - A bunch of endian fixes ! We don't have full LE support yet in that release but this contains a lot of fixes all over arch/powerpc to use the proper accessors, call the firmware with the right endian mode, etc... - A few updates to our "powernv" platform (non-virtualized, the one to run KVM on), among other, support for bridging the P8 LPC bus for UARTs, support and some EEH fixes. - Some mpc51xx clock API cleanups in preparation for a clock API overhaul - A pile of cleanups of our old math emulation code, including better support for using it to emulate optional FP instructions on embedded chips that otherwise have a HW FPU. - Some infrastructure in selftest, for powerpc now, but could be generalized, initially used by some tests for our perf instruction counting code. - A pile of fixes for hotplug on pseries (that was seriously bitrotting) - The usual slew of freescale embedded updates, new boards, 64-bit hiberation support, e6500 core PMU support, etc..." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits) powerpc: Correct FSCR bit definitions powerpc/xmon: Fix printing of set of CPUs in xmon powerpc/pseries: Move lparcfg.c to platforms/pseries powerpc/powernv: Return secondary CPUs to firmware on kexec powerpc/btext: Fix CONFIG_PPC_EARLY_DEBUG_BOOTX on ppc32 powerpc: Cleanup handling of the DSCR bit in the FSCR register powerpc/pseries: Child nodes are not detached by dlpar_detach_node powerpc/pseries: Add mising of_node_put in delete_dt_node powerpc/pseries: Make dlpar_configure_connector parent node aware powerpc/pseries: Do all node initialization in dlpar_parse_cc_node powerpc/pseries: Fix parsing of initial node path in update_dt_node powerpc/pseries: Pack update_props_workarea to map correctly to rtas buffer header powerpc/pseries: Fix over writing of rtas return code in update_dt_node powerpc/pseries: Fix creation of loop in device node property list powerpc: Skip emulating & leave interrupts off for kernel program checks powerpc: Add more exception trampolines for hypervisor exceptions powerpc: Fix location and rename exception trampolines powerpc: Add more trap names to xmon powerpc/pseries: Add a warning in the case of cross-cpu VPA registration powerpc: Update the 00-Index in Documentation/powerpc ...	2013-09-06 10:49:42 -07:00
Linus Torvalds	2e515bf096	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "The usual trivial updates all over the tree -- mostly typo fixes and documentation updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits) doc: Documentation/cputopology.txt fix typo treewide: Convert retrun typos to return Fix comment typo for init_cma_reserved_pageblock Documentation/trace: Correcting and extending tracepoint documentation mm/hotplug: fix a typo in Documentation/memory-hotplug.txt power: Documentation: Update s2ram link doc: fix a typo in Documentation/00-INDEX Documentation/printk-formats.txt: No casts needed for u64/s64 doc: Fix typo "is is" in Documentations treewide: Fix printks with 0x%# zram: doc fixes Documentation/kmemcheck: update kmemcheck documentation doc: documentation/hwspinlock.txt fix typo PM / Hibernate: add section for resume options doc: filesystems : Fix typo in Documentations/filesystems scsi/megaraid fixed several typos in comments ppc: init_32: Fix error typo "CONFIG_START_KERNEL" treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks page_isolation: Fix a comment typo in test_pages_isolated() doc: fix a typo about irq affinity ...	2013-09-06 09:36:28 -07:00
Nathan Fontenot	f748edafac	powerpc/mm: Mark Memory Resources as busy Memory I/O resources need to be marked as busy or else we cannot remove them when doing memory hot remove. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-08-27 14:35:11 +10:00
Paul Bolle	b615f4b3f9	ppc: init_32: Fix error typo "CONFIG_START_KERNEL" Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-08-20 13:07:31 +02:00
Anton Blanchard	7ffcf8ec26	powerpc: Fix little endian lppaca, slb_shadow and dtl_entry The lppaca, slb_shadow and dtl_entry hypervisor structures are big endian, so we have to byte swap them in little endian builds. LE KVM hosts will also need to be fixed but for now add an #error to remind us. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-08-14 15:33:35 +10:00
Alistair Popple	b08a2a12e4	powerpc: Make NUMA device node code endian safe The device tree is big endian so make sure we byteswap on little endian. We assume any pHyp calls also return big endian results in memory. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-08-14 15:33:33 +10:00
Paul Mackerras	1f7bf02876	powerpc: Implement __get_user_pages_fast() Other architectures have a __get_user_pages_fast(), in addition to the regular get_user_pages_fast(), which doesn't call get_user_pages() on failure, and thus doesn't attempt to fault pages in or COW them. The generic KVM code uses __get_user_pages_fast() to detect whether a page for which we have only requested read access is actually writable. This provides an implementation of __get_user_pages_fast() by splitting the existing get_user_pages_fast() in two. With this, the generic KVM code will get the right answer instead of always considering such pages non-writable. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-08-14 14:57:14 +10:00
Anton Blanchard	f13c13a005	powerpc: Stop using non-architected shared_proc field in lppaca Although the shared_proc field in the lppaca works today, it is not architected. A shared processor partition will always have a non zero yield_count so use that instead. Create a wrapper so users don't have to know about the details. In order for older kernels to continue to work on KVM we need to set the shared_proc bit. While here, remove the ugly bitfield. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-08-14 11:50:26 +10:00
Anton Blanchard	b0d436c739	powerpc: Fix a number of sparse warnings Address some of the trivial sparse warnings in arch/powerpc. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-08-14 11:50:24 +10:00
Robert Jennings	3be7db6ab4	powerpc: VPHN topology change updates all siblings When an associativity level change is found for one thread, the siblings threads need to be updated as well. This is done today for PRRN in stage_topology_update() but is missing for VPHN in update_cpu_associativity_changes_mask(). This patch will correctly update all thread siblings during a topology change. Without this patch a topology update can result in a CPU in init_sched_groups_power() getting stuck indefinitely in a loop. This loop is built in build_sched_groups(). As a result of the thread moving to a node separate from its siblings the struct sched_group will have its next pointer set to point to itself rather than the sched_group struct of the next thread. This happens because we have a domain without the SD_OVERLAP flag, which is correct, and a topology that doesn't conform with reality (threads on the same core assigned to different numa nodes). When this list is traversed by init_sched_groups_power() it will reach the thread's sched_group structure and loop indefinitely; the cpu will be stuck at this point. The bug was exposed when VPHN was enabled in commit `b7abef0` (v3.9). Cc: <stable@vger.kernel.org> [v3.9+] Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-08-01 13:11:47 +10:00
Aneesh Kumar K.V	de640959b6	powerpc/mm: Use the correct SLB(LLP) encoding in tlbie instruction The sllp value is stored in mmu_psize_defs in such a way that we can easily OR the value to get the operand for slbmte instruction. ie, the L and LP bits are not contiguous. Decode the bits and use them correctly in tlbie. regression is introduced by `1f6aaaccb1` "powerpc: Update tlbie/tlbiel as per ISA doc" Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-07-24 14:42:24 +10:00
Aneesh Kumar K.V	83383b73ad	powerpc/mm: Fix fallthrough bug in hpte_decode We should not fallthrough different case statements in hpte_decode. Add break statement to break out of the switch. The regression is introduced by `dcda287a9b` "powerpc/mm: Simplify hpte_decode" Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-07-24 14:42:21 +10:00
Michel Lespinasse	98d1e64f95	mm: remove free_area_cache Since all architectures have been converted to use vm_unmapped_area(), there is no remaining use for the free_area_cache. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-10 18:11:34 -07:00
Linus Torvalds	65b97fb730	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc changes for the 3.11 merge window. In addition to the usual bug fixes and small updates, the main highlights are: - Support for transparent huge pages by Aneesh Kumar for 64-bit server processors. This allows the use of 16M pages as transparent huge pages on kernels compiled with a 64K base page size. - Base VFIO support for KVM on power by Alexey Kardashevskiy - Wiring up of our nvram to the pstore infrastructure, including putting compressed oopses in there by Aruna Balakrishnaiah - Move, rework and improve our "EEH" (basically PCI error handling and recovery) infrastructure. It is no longer specific to pseries but is now usable by the new "powernv" platform as well (no hypervisor) by Gavin Shan. - I fixed some bugs in our math-emu instruction decoding and made it usable to emulate some optional FP instructions on processors with hard FP that lack them (such as fsqrt on Freescale embedded processors). - Support for Power8 "Event Based Branch" facility by Michael Ellerman. This facility allows what is basically "userspace interrupts" for performance monitor events. - A bunch of Transactional Memory vs. Signals bug fixes and HW breakpoint/watchpoint fixes by Michael Neuling. And more ... I appologize in advance if I've failed to highlight something that somebody deemed worth it." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) pstore: Add hsize argument in write_buf call of pstore_ftrace_call powerpc/fsl: add MPIC timer wakeup support powerpc/mpic: create mpic subsystem object powerpc/mpic: add global timer support powerpc/mpic: add irq_set_wake support powerpc/85xx: enable coreint for all the 64bit boards powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig powerpc/mpic: Add get_version API both for internal and external use powerpc: Handle both new style and old style reserve maps powerpc/hw_brk: Fix off by one error when validating DAWR region end powerpc/pseries: Support compression of oops text via pstore powerpc/pseries: Re-organise the oops compression code pstore: Pass header size in the pstore write callback powerpc/powernv: Fix iommu initialization again powerpc/pseries: Inform the hypervisor we are using EBB regs powerpc/perf: Add power8 EBB support powerpc/perf: Core EBB support for 64-bit book3s powerpc/perf: Drop MMCRA from thread_struct powerpc/perf: Don't enable if we have zero events ...	2013-07-04 10:29:23 -07:00
Jiang Liu	602ddc70ec	mm/PPC: prepare for killing free_all_bootmem_node() Prepare for killing free_all_bootmem_node() by using free_all_bootmem(). Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Alexander Graf <agraf@suse.de> Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:07:39 -07:00
Jiang Liu	369a9d8523	mm/ppc: prepare for removing num_physpages and simplify mem_init() Prepare for removing num_physpages and simplify mem_init(). Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:07:37 -07:00
Jiang Liu	0c98853473	mm: concentrate modification of totalram_pages into the mm core Concentrate code to modify totalram_pages into the mm core, so the arch memory initialized code doesn't need to take care of it. With these changes applied, only following functions from mm core modify global variable totalram_pages: free_bootmem_late(), free_all_bootmem(), free_all_bootmem_node(), adjust_managed_page_count(). With this patch applied, it will be much more easier for us to keep totalram_pages and zone->managed_pages in consistence. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: David Howells <dhowells@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:07:33 -07:00
Jiang Liu	dbe67df4ba	mm: enhance free_reserved_area() to support poisoning memory with zero Address more review comments from last round of code review. 1) Enhance free_reserved_area() to support poisoning freed memory with pattern '0'. This could be used to get rid of poison_init_mem() on ARM64. 2) A previous patch has disabled memory poison for initmem on s390 by mistake, so restore to the original behavior. 3) Remove redundant PAGE_ALIGN() when calling free_reserved_area(). Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:07:32 -07:00
Jiang Liu	11199692d8	mm: change signature of free_reserved_area() to fix building warnings Change signature of free_reserved_area() according to Russell King's suggestion to fix following build warnings: arch/arm/mm/init.c: In function 'mem_init': arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default] free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL); ^ In file included from include/linux/mman.h:4:0, from arch/arm/mm/init.c:15: include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void ' extern unsigned long free_reserved_area(unsigned long start, unsigned long end, mm/page_alloc.c: In function 'free_reserved_area': >> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default] In file included from arch/mips/include/asm/page.h:49:0, from include/linux/mmzone.h:20, from include/linux/gfp.h:4, from include/linux/mm.h:8, from mm/page_alloc.c:18: arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void ' but argument is of type 'long unsigned int' mm/page_alloc.c: In function 'free_area_init_nodes': mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds] Also address some minor code review comments. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:07:32 -07:00
Wanpeng Li	2415cf12e0	mm/hugetlb: use already existing interface huge_page_shift Use the already existing interface huge_page_shift instead of h->order + PAGE_SHIFT. Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:07:32 -07:00
Benjamin Herrenschmidt	24a72acac1	Linux 3.10 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJR0K2gAAoJEHm+PkMAQRiGWsEH+gMZSN1qRm34hZ82q1Tx7HvL Eb/Gsl3Qw/7G2TlTqgjBUs36IdqV9O2cui/aa3/TfXvdvrx+0GlhRkEwQPc+ygcO Mvoyoke4tT4+4jVFdCg1J8avREsa28/6oaHs0ZZxuVmJBBLTJH7aXaNsGn6eU1q9 9+p798MQis6naIiPC63somlZcCIiBhsuWCPWpEfLMn8G1HWAFTM3xXIbNBqe/brS bmIOfhomlIZ5dcdaXGvjtP3+KJhkNDwhkPC4tVYu8JqqgSlrE+a+EGyEuuGqKk10 U+swiqyuD31uBI9ga54u/2FzSqDiAu6YOcMXevjo/m3g9XLdYbYLvN+nvN8alCQ= =Ob6Z -----END PGP SIGNATURE----- Merge tag 'v3.10' into next Merge 3.10 in order to get some of the last minute powerpc changes, resolve conflicts and add additional fixes on top of them.	2013-07-01 17:57:25 +10:00
Nathan Fontenot	dd023217e1	powerpc/numa: Do not update sysfs cpu registration from invalid context The topology update code that updates the cpu node registration in sysfs should not be called while in stop_machine(). The register/unregister calls take a lock and may sleep. This patch moves these calls outside of the call to stop_machine(). Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-07-01 11:49:34 +10:00
Paul Gortmaker	061d19f279	powerpc: Delete __cpuinit usage from all users The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit `5e427ec2d0` ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the powerpc uses of the __cpuinit macros. There are no __CPUINIT users in assembly files in powerpc. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-07-01 11:10:36 +10:00
Greg Kroah-Hartman	b5aef682e0	Merge 3.10-rc7 into driver-core-next We want the firmware merge fixes, and other bits, in here now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-06-24 15:14:43 -07:00
Aneesh Kumar K.V	1a5272866f	powerpc: Optimize hugepage invalidate Hugepage invalidate involves invalidating multiple hpte entries. Optimize the operation using H_BULK_REMOVE on lpar platforms. On native, reduce the number of tlb flush. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:58 +10:00
Aneesh Kumar K.V	437d496457	powerpc/THP: Enable THP on PPC64 We enable only if the we support 16MB page size. Reviewed-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:58 +10:00
Aneesh Kumar K.V	d8e355a20f	powerpc: split hugepage when using subpage protection We find all the overlapping vma and mark them such that we don't allocate hugepage in that range. Also we split existing huge page so that the normal page hash can be invalidated and new page faulted in with new protection bits. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:57 +10:00
Aneesh Kumar K.V	a00e7bea0d	powerpc: disable assert_pte_locked for collapse_huge_page With THP we set pmd to none, before we do pte_clear. Hence we can't walk page table to get the pte lock ptr and verify whether it is locked. THP do take pte lock before calling pte_clear. So we don't change the locking rules here. It is that we can't use page table walking to check whether pte locks are held with THP. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:57 +10:00
Aneesh Kumar K.V	7888b4ddb4	powerpc: Prevent gcc to re-read the pagetables GCC is very likely to read the pagetables just once and cache them in the local stack or in a register, but it is can also decide to re-read the pagetables. The problem is that the pagetable in those places can change from under gcc. With THP/hugetlbfs the pmd (and pgd for hugetlbfs giga pages) can change under gup_fast. The pages won't be freed untill we finish gup fast because we have irq disabled and we free these pages via rcu callback. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:56 +10:00
Aneesh Kumar K.V	0ac52dd766	powerpc: Make linux pagetable walk safe with THP enabled We need to have irqs disabled to handle all the possible parallel update for linux page table without holding locks. Events that we are intersted in while walking page tables are 1) Page fault 2) umap 3) THP split 4) THP collapse A) local_irq_disabled: ------------------------ 1) page fault: A none to valid transition via page fault is not an issue because we would either see a none or valid. If it is none, we would error out the page table walk. We may need to use on stack values when checking for type of page table elements, because if we do if (!is_hugepd()) { if (!pmd_none() { if (pmd_bad() { We could take that bad condition because the pmd got converted to a hugepd after the !is_hugepd check via a hugetlb fault. The right way would be to check for pmd_none higher up or use on stack value. 2) A valid to none conversion via unmap: We can safely walk the upper level table, because we don't remove the the page table entries until rcu grace period. So even if we followed a wrong pointer we still have the pointer valid till the grace period. A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent pte_clear we take _PAGE_BUSY. 3) THP split: A valid transparent hugepage is converted to nomal page. Before we split we do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING So when walking page table we need to check for pmd_trans_splitting and handle that. The pte returned should also need to be checked for _PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save the value of PTE on stack and check for the flag in the local pte value. If we don't have the value set we can safely operate on the local pte value and we atomicaly set _PAGE_BUSY. 4) THP collapse: A normal page gets converted to hugepage. In the collapse path, we mark the pmd none early (pmdp_clear_flush). With irq disabled, if we are aleady walking page table we would see the pmd_none and won't continue. If we see a valid PMD, we should still check for _PAGE_PRESENT before setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:56 +10:00
Aneesh Kumar K.V	6d492ecc64	powerpc/THP: Add code to handle HPTE faults for hugepages The deposted PTE page in the second half of the PMD table is used to track the state on hash PTEs. After updating the HPTE, we mark the coresponding slot in the deposted PTE page valid. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:56 +10:00
Aneesh Kumar K.V	c367714ce8	powerpc: Update gup_pmd_range to handle transparent hugepages Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:55 +10:00
Aneesh Kumar K.V	12bc9f6fc1	powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly document why we don't need to handle transparent hugepages at callsites. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:54 +10:00
Aneesh Kumar K.V	ac52ae4721	powerpc: Update find_linux_pte_or_hugepte to handle transparent hugepages Reviewed-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:54 +10:00
Aneesh Kumar K.V	29409997f8	powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common code We will use this in the later patch for handling THP pages Reviewed-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:54 +10:00
Aneesh Kumar K.V	074c2eae3e	powerpc/THP: Implement transparent hugepages for ppc64 We now have pmd entries covering 16MB range and the PMD table double its original size. We use the second half of the PMD table to deposit the pgtable (PTE page). The depoisted PTE page is further used to track the HPTE information. The information include [ secondary group \| 3 bit hidx \| valid ]. We use one byte per each HPTE entry. With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need 4096 entries. Both will fit in a 4K PTE page. On hugepage invalidate we need to walk the PTE page and invalidate all valid HPTEs. This patch implements necessary arch specific functions for THP support and also hugepage invalidate logic. These PMD related functions are intentionally kept similar to their PTE counter-part. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:53 +10:00
Aneesh Kumar K.V	f940f52898	powerpc/THP: Double the PMD table size for THP THP code does PTE page allocation along with large page request and deposit them for later use. This is to ensure that we won't have any failures when we split hugepages to regular pages. On powerpc we want to use the deposited PTE page for storing hash pte slot and secondary bit information for the HPTEs. We use the second half of the pmd table to save the deposted PTE page. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:53 +10:00
Aneesh Kumar K.V	db3d853490	powerpc/mm: handle hugepage size correctly when invalidating hpte entries If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. With hugepages, we need to pass the correct actual page size value for tlb invalidation. This change update the patch `0608d69246` "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle transparent hugepages correctly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-21 16:01:52 +10:00
Daniel Walker	d5d8ec895c	powerpc/mm: Make mmap_64.c compile on 32bit powerpc There appears to be no good reason to keep this as 64bit only. It works on 32bit also, and has checks so that it can work correctly with 32bit binaries on 64bit hardware which is why I think this works. I tested this on qemu using the virtex-ml507 machine type. Before, /bin2 # ./test & cat /proc/${!}/maps 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test 48000000-48020000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so 48021000-48023000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bfd03000-bfd24000 rw-p 00000000 00:00 0 [stack] /bin2 # ./test & cat /proc/${!}/maps 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 0fe6e000-0ffd8000 r-xp 00000000 00:01 214 /lib/libc-2.11.3.so 0ffd8000-0ffe8000 ---p 0016a000 00:01 214 /lib/libc-2.11.3.so 0ffe8000-0ffed000 rw-p 0016a000 00:01 214 /lib/libc-2.11.3.so 0ffed000-0fff0000 rw-p 00000000 00:00 0 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test 48000000-48020000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so 48020000-48021000 rw-p 00000000 00:00 0 48021000-48023000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bf98a000-bf9ab000 rw-p 00000000 00:00 0 [stack] /bin2 # ./test & cat /proc/${!}/maps 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 0fe6e000-0ffd8000 r-xp 00000000 00:01 214 /lib/libc-2.11.3.so 0ffd8000-0ffe8000 ---p 0016a000 00:01 214 /lib/libc-2.11.3.so 0ffe8000-0ffed000 rw-p 0016a000 00:01 214 /lib/libc-2.11.3.so 0ffed000-0fff0000 rw-p 00000000 00:00 0 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test 48000000-48020000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so 48020000-48021000 rw-p 00000000 00:00 0 48021000-48023000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bfa54000-bfa75000 rw-p 00000000 00:00 0 [stack] After, bash-4.1# ./test & cat /proc/${!}/maps [7] 803 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test b7eb0000-b7ed0000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so b7ed1000-b7ed3000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bfbc0000-bfbe1000 rw-p 00000000 00:00 0 [stack] bash-4.1# ./test & cat /proc/${!}/maps [8] 805 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test b7b03000-b7b23000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so b7b24000-b7b26000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bfc27000-bfc48000 rw-p 00000000 00:00 0 [stack] bash-4.1# ./test & cat /proc/${!}/maps [9] 807 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test b7f37000-b7f57000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so b7f58000-b7f5a000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bff96000-bffb7000 rw-p 00000000 00:00 0 [stack] Signed-off-by: Daniel Walker <dwalker@fifo90.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-20 16:55:11 +10:00
Scott Wood	39a421ff0b	powerpc/mm/nohash: Ignore NULL stale_map entries This happens with threads that are offline due to CPU hotplug (including threads that were never "plugged in" to begin with because SMT is disabled). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-20 16:55:10 +10:00
Aneesh Kumar K.V	8bbd9f04b7	powerpc: Fix bad pmd error with book3E config Book3E uses the hugepd at PMD level and don't encode pte directly at the pmd level. So it will find the lower bits of pmd set and the pmd_bad check throws error. Infact the current code will never take the free_hugepd_range call at all because it will clear the pmd if it find a hugepd pointer. Fix this by clearing bad pmd only if it is not a hugepd pointer. This is regression introduced by `e2b3d202d1` "powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format" Reported-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-20 15:25:21 +10:00
Greg Kroah-Hartman	bb07b00be7	Merge 3.10-rc6 into driver-core-next We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-06-17 16:57:20 -07:00
Stephen Rothwell	40b313608a	Finally eradicate CONFIG_HOTPLUG Ever since commit `45f035ab9b` ("CONFIG_HOTPLUG should be always on"), it has been basically impossible to build a kernel with CONFIG_HOTPLUG turned off. Remove all the remaining references to it. Cc: Russell King <linux@arm.linux.org.uk> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-06-03 14:20:18 -07:00
Aneesh Kumar K.V	0608d69246	powerpc/mm: Always invalidate tlb on hpte invalidate and update If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. This was a regression introduced by `b1022fbd29` Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-01 08:29:26 +10:00
Li Zhong	ba12eedee3	powerpc: Exception hooks for context tracking subsystem This is the exception hooks for context tracking subsystem, including data access, program check, single step, instruction breakpoint, machine check, alignment, fp unavailable, altivec assist, unknown exception, whose handlers might use RCU. This patch corresponds to [PATCH] x86: Exception hooks for userspace RCU extended QS commit `6ba3c97a38` But after the exception handling moved to generic code, and some changes in following two commits: `56dd9470d7` context_tracking: Move exception handling to generic code `6c1e0256fa` context_tracking: Restore correct previous context state on exception exit it is able for exception hooks to use the generic code above instead of a redundant arch implementation. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-05-14 16:00:19 +10:00
Aneesh Kumar K.V	83d5e64b7e	powerpc: Fix build errors STRICT_MM_TYPECHECKS Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-05-14 14:36:20 +10:00
Michael Neuling	c2fd22df89	powerpc/tm: Fix null pointer deference in flush_hash_page Make sure that current->thread.reg exists before we deference it in flush_hash_page. Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: John J Miller <millerjo@us.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-05-06 09:25:36 +10:00
Linus Torvalds	5a148af669	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc update from Benjamin Herrenschmidt: "The main highlights this time around are: - A pile of addition POWER8 bits and nits, such as updated performance counter support (Michael Ellerman), new branch history buffer support (Anshuman Khandual), base support for the new PCI host bridge when not using the hypervisor (Gavin Shan) and other random related bits and fixes from various contributors. - Some rework of our page table format by Aneesh Kumar which fixes a thing or two and paves the way for THP support. THP itself will not make it this time around however. - More Freescale updates, including Altivec support on the new e6500 cores, new PCI controller support, and a pile of new boards support and updates. - The usual batch of trivial cleanups & fixes" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) powerpc: Fix build error for book3e powerpc: Context switch the new EBB SPRs powerpc: Turn on the EBB H/FSCR bits powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S powerpc: Setup BHRB instructions facility in HFSCR for POWER8 powerpc: Fix interrupt range check on debug exception powerpc: Update tlbie/tlbiel as per ISA doc powerpc: Print page size info during boot powerpc: print both base and actual page size on hash failure powerpc: Fix hpte_decode to use the correct decoding for page sizes powerpc: Decode the pte-lp-encoding bits correctly. powerpc: Use encode avpn where we need only avpn values powerpc: Reduce PTE table memory wastage powerpc: Move the pte free routines from common header powerpc: Reduce the PTE_INDEX_SIZE powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format powerpc: New hugepage directory format powerpc: Don't truncate pgd_index wrongly powerpc: Don't hard code the size of pte page powerpc: Save DAR and DSISR in pt_regs on MCE ...	2013-05-02 10:16:16 -07:00
Aneesh Kumar K.V	1f6aaaccb1	powerpc: Update tlbie/tlbiel as per ISA doc Encode the actual page correctly in tlbie/tlbiel. This make sure we handle multiple page size segment correctly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 16:00:29 +10:00
Aneesh Kumar K.V	3dc4feca4b	powerpc: Print page size info during boot This gives hint about different base and actual page size combination supported by the platform. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 16:00:25 +10:00
Aneesh Kumar K.V	d8139ebf85	powerpc: print both base and actual page size on hash failure Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 16:00:22 +10:00
Aneesh Kumar K.V	7e74c3921a	powerpc: Fix hpte_decode to use the correct decoding for page sizes As per ISA doc, we encode base and actual page size in the LP bits of PTE. The number of bit used to encode the page sizes depend on actual page size. ISA doc lists this as PTE LP actual page size rrrr rrrz >=8KB rrrr rrzz >=16KB rrrr rzzz >=32KB rrrr zzzz >=64KB rrrz zzzz >=128KB rrzz zzzz >=256KB rzzz zzzz >=512KB zzzz zzzz >=1MB ISA doc also says "The values of the “z” bits used to specify each size, along with all possible values of “r” bits in the LP field, must result in LP values distinct from other LP values for other sizes." based on the above update hpte_decode to use the correct decoding for LP bits. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 16:00:18 +10:00
Aneesh Kumar K.V	b1022fbd29	powerpc: Decode the pte-lp-encoding bits correctly. We look at both the segment base page size and actual page size and store the pte-lp-encodings in an array per base page size. We also update all relevant functions to take actual page size argument so that we can use the correct PTE LP encoding in HPTE. This should also get the basic Multiple Page Size per Segment (MPSS) support. This is needed to enable THP on ppc64. [Fixed PR KVM build --BenH] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 16:00:14 +10:00
Aneesh Kumar K.V	74f227b228	powerpc: Use encode avpn where we need only avpn values In all these cases we are doing something similar to HPTE_V_COMPARE(hpte_v, want_v) which ignores the HPTE_V_LARGE bit With MPSS support we would need actual page size to set HPTE_V_LARGE bit and that won't be available in most of these cases. Since we are ignoring HPTE_V_LARGE bit, use the avpn value instead. There should not be any change in behaviour after this patch. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 16:00:11 +10:00
Aneesh Kumar K.V	5c1f6ee9a3	powerpc: Reduce PTE table memory wastage We allocate one page for the last level of linux page table. With THP and large page size of 16MB, that would mean we are wasting large part of that page. To map 16MB area, we only need a PTE space of 2K with 64K page size. This patch reduce the space wastage by sharing the page allocated for the last level of linux page table with multiple pmd entries. We call these smaller chunks PTE page fragments and allocated page, PTE page. In order to support systems which doesn't have 64K HPTE support, we also add another 2K to PTE page fragment. The second half of the PTE fragments is used for storing slot and secondary bit information of an HPTE. With this we now have a 4K PTE fragment. We use a simple approach to share the PTE page. On allocation, we bump the PTE page refcount to 16 and share the PTE page with the next 16 pte alloc request. This should help in the node locality of the PTE page fragment, assuming that the immediate pte alloc request will mostly come from the same NUMA node. We don't try to reuse the freed PTE page fragment. Hence we could be waisting some space. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 16:00:07 +10:00
Aneesh Kumar K.V	e2b3d202d1	powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format We will be switching PMD_SHIFT to 24 bits to facilitate THP impmenetation. With PMD_SHIFT set to 24, we now have 16MB huge pages allocated at PGD level. That means with 32 bit process we cannot allocate normal pages at all, because we cover the entire address space with one pgd entry. Fix this by switching to a new page table format for hugepages. With the new page table format for 16GB and 16MB hugepages we won't allocate hugepage directory. Instead we encode the PTE information directly at the directory level. This forces 16MB hugepage at PMD level. This will also make the page take walk much simpler later when we add the THP support. With the new table format we have 4 cases for pgds and pmds: (1) invalid (all zeroes) (2) pointer to next table, as normal; bottom 6 bits == 0 (3) leaf pte for huge page, bottom two bits != 00 (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 15:59:56 +10:00
Aneesh Kumar K.V	cf9427b85e	powerpc: New hugepage directory format Change the hugepage directory format so that we can have leaf ptes directly at page directory avoiding the allocation of hugepage directory. With the new table format we have 3 cases for pgds and pmds: (1) invalid (all zeroes) (2) pointer to next table, as normal; bottom 6 bits == 0 (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table Instead of storing shift value in hugepd pointer we use mmu_psize_def index so that we can fit all the supported hugepage size in 4 bits Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 15:59:53 +10:00
Aneesh Kumar K.V	cc3665a60a	powerpc: Don't hard code the size of pte page USE PTRS_PER_PTE to indicate the size of pte page. To support THP, later patches will be changing PTRS_PER_PTE value. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 15:59:46 +10:00
Nathan Fontenot	601abdc3b4	powerpc/pseries: Correct builds break when CONFIG_SMP not defined Correct build failure for powerpc/pseries builds with CONFIG_SMP not defined. The function cpu_sibling_mask has no meaning (or definition) when CONFIG_SMP is not defined. Additionally, the updating of NUMA affinity for a CPU in a UP system doesn't really make sense. This patch ifdef's out the code making the affinity updates for PRRN events to fix the following build break. arch/powerpc/mm/numa.c: In function ‘stage_topology_update’: arch/powerpc/mm/numa.c:1535: error: implicit declaration of function ‘cpu_sibling_mask’ arch/powerpc/mm/numa.c:1535: warning: passing argument 3 of ‘cpumask_or’ makes pointer from integer without a cast make[1]: *** [arch/powerpc/mm/numa.o] Error 1 Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 15:59:35 +10:00
Stephen Rothwell	9f3a90e89d	powerpc: Fix build failure after merge of the cgroup tree After merging the cgroup tree, today's linux-next build (powerpc ppc64_defconfig) failed like this: arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology': arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration] arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror] arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration] Caused by commit `30c05350c3` ("powerpc/pseries: Use stop machine to update cpu maps") from the powerpc tree interacting with (probably) commit `ff794dea52` ("cpuset: remove include of cgroup.h from cpuset.h") from the cgroup tree. Removing includes from header files is fraught with danger ... The former should have added an include of linux/slab.h to arch/powerpc/mm/numa.c. I have added the following merge fix patch for today (but it should be applied to the powerpc tree ASAP). From: Stephen Rothwell <sfr@canb.auug.org.au> Date: Mon, 29 Apr 2013 14:01:44 +1000 Subject: [PATCH] powerpc: numa.c: using kzalloc/kfree requires including slab.h fixes these build errors: arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology': arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration] arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror] arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration] Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 15:59:24 +10:00
Linus Torvalds	191a712090	Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Fixes and a lot of cleanups. Locking cleanup is finally complete. cgroup_mutex is no longer exposed to individual controlelrs which used to cause nasty deadlock issues. Li fixed and cleaned up quite a bit including long standing ones like racy cgroup_path(). - device cgroup now supports proper hierarchy thanks to Aristeu. - perf_event cgroup now supports proper hierarchy. - A new mount option "__DEVEL__sane_behavior" is added. As indicated by the name, this option is to be used for development only at this point and generates a warning message when used. Unfortunately, cgroup interface currently has too many brekages and inconsistencies to implement a consistent and unified hierarchy on top. The new flag is used to collect the behavior changes which are necessary to implement consistent unified hierarchy. It's likely that this flag won't be used verbatim when it becomes ready but will be enabled implicitly along with unified hierarchy. The option currently disables some of broken behaviors in cgroup core and also .use_hierarchy switch in memcg (will be routed through -mm), which can be used to make very unusual hierarchy where nesting is partially honored. It will also be used to implement hierarchy support for blk-throttle which would be impossible otherwise without introducing a full separate set of control knobs. This is essentially versioning of interface which isn't very nice but at this point I can't see any other options which would allow keeping the interface the same while moving towards hierarchy behavior which is at least somewhat sane. The planned unified hierarchy is likely to require some level of adaptation from userland anyway, so I think it'd be best to take the chance and update the interface such that it's supportable in the long term. Maintaining the existing interface does complicate cgroup core but shouldn't put too much strain on individual controllers and I think it'd be manageable for the foreseeable future. Maybe we'll be able to drop it in a decade. Fix up conflicts (including a semantic one adding a new #include to ppc that was uncovered by header the file changes) as per Tejun. * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (45 commits) cpuset: fix compile warning when CONFIG_SMP=n cpuset: fix cpu hotplug vs rebuild_sched_domains() race cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn() cgroup: restore the call to eventfd->poll() cgroup: fix use-after-free when umounting cgroupfs cgroup: fix broken file xattrs devcg: remove parent_cgroup. memcg: force use_hierarchy if sane_behavior cgroup: remove cgrp->top_cgroup cgroup: introduce sane_behavior mount option move cgroupfs_root to include/linux/cgroup.h cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix cgroup: make cgroup_path() not print double slashes Revert "cgroup: remove bind() method from cgroup_subsys." perf: make perf_event cgroup hierarchical cgroup: implement cgroup_is_descendant() cgroup: make sure parent won't be destroyed before its children cgroup: remove bind() method from cgroup_subsys. devcg: remove broken_hierarchy tag cgroup: remove cgroup_lock_is_held() ...	2013-04-29 19:14:20 -07:00
Benjamin Herrenschmidt	bc23100a0d	Merge remote-tracking branch 'kumar/next' into next From Kumar Gala: << Add support for T4 and B4 SoC families from Freescale, e6500 altivec support, some various board fixes and other minor cleanups. >>	2013-04-30 11:10:09 +10:00
Michel Lespinasse	fba2369e6c	mm: use vm_unmapped_area() on powerpc architecture Update the powerpc slice_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Tested-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 11:05:17 +10:00
Michel Lespinasse	34d07177b8	mm: remove free_area_cache use in powerpc architecture As all other architectures have been converted to use vm_unmapped_area(), we are about to retire the free_area_cache. This change simply removes the use of that cache in slice_get_unmapped_area(), which will most certainly have a performance cost. Next one will convert that function to use the vm_unmapped_area() infrastructure and regain the performance. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 11:05:10 +10:00
Cody P Schafer	f9d531b8dc	powerpc/mm/numa: use setup_nr_node_ids() instead of opencoding. [sfr@canb.auug.org.au: add missing semicolon] Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:36 -07:00
Johannes Weiner	0aad818b2d	sparse-vmemmap: specify vmemmap population range in bytes The sparse code, when asking the architecture to populate the vmemmap, specifies the section range as a starting page and a number of pages. This is an awkward interface, because none of the arch-specific code actually thinks of the range in terms of 'struct page' units and always translates it to bytes first. In addition, later patches mix huge page and regular page backing for the vmemmap. For this, they need to call vmemmap_populate_basepages() on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these are not necessarily multiples of the 'struct page' size and so this unit is too coarse. Just translate the section range into bytes once in the generic sparse code, then pass byte ranges down the stack. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: David S. Miller <davem@davemloft.net> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:35 -07:00
Jiang Liu	7db6f78cfe	mm/PPC: use free_highmem_page() to free highmem pages into buddy system Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Alexander Graf <agraf@suse.de> Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:32 -07:00
Jiang Liu	5d585e5c48	mm/ppc: use common help functions to free reserved pages Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Anatolij Gustschin <agust@denx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:30 -07:00
Nathan Fontenot	e04fa61214	powerpc/pseries: Add /proc interface to control topology updates There are instances in which we do not want topology updates to occur. In order to allow this a /proc interface (/proc/powerpc/topology_updates) is introduced so that topology updates can be enabled and disabled. This patch also adds a prrn_is_enabled() call so that PRRN events are handled in the kernel only if topology updating is enabled. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-26 16:08:26 +10:00
Jesse Larrew	b7abef045f	powerpc/pseries: RE-enable Virtual Processor Home Node updating The new PRRN firmware feature provides a more convenient and event-driven interface than VPHN for notifying Linux of changes to the NUMA affinity of platform resources. However, for practical reasons, it may not be feasible for some customers to update to the latest firmware. For these customers, the VPHN feature supported on previous firmware versions may still be the best option. The VPHN feature was previously disabled due to races with the load balancing code when accessing the NUMA cpu maps, but the new stop_machine() approach protects the NUMA cpu maps from these concurrent accesses. It should be safe to re-enable this feature now. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-26 16:08:25 +10:00
Jesse Larrew	176bbf1424	powerpc/pseries: Update NUMA VDSO information when updating CPU maps The following patch adds vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3: Commit `18ad51dd34` ("powerpc: Add VDSO version of getcpu") adds vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3. This patch ensures that this information is also updated when the NUMA affinity of a cpu changes. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-26 16:08:24 +10:00
Nathan Fontenot	30c05350c3	powerpc/pseries: Use stop machine to update cpu maps The new PRRN firmware feature allows CPU and memory resources to be transparently reassigned across NUMA boundaries. When this happens, the kernel must update the node maps to reflect the new affinity information. Although the NUMA maps can be protected by locking primitives during the update itself, this is insufficient to prevent concurrent accesses to these structures. Since cpumask_of_node() hands out a pointer to these structures, they can still be modified outside of the lock. Furthermore, tracking down each usage of these pointers and adding locks would be quite invasive and difficult to maintain. The approach used is to make a list of affected cpus and call stop_machine to have the update routine run on each of the affected cpus allowing them to update themselves. Each cpu finds itself in the list of cpus and makes the appropriate updates. We need to have each cpu do this for themselves to handle calls to vdso_getcpu_init() added in a subsequent patch. Situations like these are best handled using stop_machine(). Since the NUMA affinity updates are exceptionally rare events, this approach has the benefit of not adding any overhead while accessing the NUMA maps during normal operation. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-26 16:08:24 +10:00
Jesse Larrew	5d88aa85c0	powerpc/pseries: Update CPU maps when device tree is updated Platform events such as partition migration or the new PRRN firmware feature can cause the NUMA characteristics of a CPU to change, and these changes will be reflected in the device tree nodes for the affected CPUs. This patch registers a handler for Open Firmware device tree updates and reconfigures the CPU and node maps whenever the associativity changes. Currently, this is accomplished by marking the affected CPUs in the cpu_associativity_changes_mask and allowing arch_update_cpu_topology() to retrieve the new associativity information using hcall_vphn(). Protecting the NUMA cpu maps from concurrent access during an update operation will be addressed in a subsequent patch in this series. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-26 16:08:23 +10:00
Nathan Fontenot	8002b0c54b	powerpc/pseries: Update numa.c to use updated firmware_has_feature() Update the numa code to use the updated firmware_has_feature() when checking for type 1 affinity. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-26 16:08:22 +10:00
Li Zhong	016af59f0f	powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page() This patch fixes the following oops, which could be trigged by build the kernel with many concurrent threads, under CONFIG_DEBUG_PAGEALLOC. hpte_insert() might return -1, indicating that the bucket (primary here) is full. We are not necessarily reporting a BUG in this case. Instead, we could try repeatedly (try secondary, remove and try again) until we find a slot. [ 543.075675] ------------[ cut here ]------------ [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] [ 543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 543.075741] Modules linked in: binfmt_misc ehea [ 543.075759] NIP: c000000000036eb0 LR: c000000000036ea4 CTR: c00000000005a594 [ 543.075771] REGS: c0000000a90832c0 TRAP: 0700 Not tainted (3.8.0-next-20130222) [ 543.075781] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 22224482 XER: 00000000 [ 543.075816] SOFTE: 0 [ 543.075823] CFAR: c00000000004c200 [ 543.075830] TASK = c0000000e506b750[23934] 'cc1' THREAD: c0000000a9080000 CPU: 1 GPR00: 0000000000000001 c0000000a9083540 c000000000c600a8 ffffffffffffffff GPR04: 0000000000000050 fffffffffffffffa c0000000a90834e0 00000000004ff594 GPR08: 0000000000000001 0000000000000000 000000009592d4d8 c000000000c86854 GPR12: 0000000000000002 c000000006ead300 0000000000a51000 0000000000000001 GPR16: f000000003354380 ffffffffffffffff ffffffffffffff80 0000000000000000 GPR20: 0000000000000001 c000000000c600a8 0000000000000001 0000000000000001 GPR24: 0000000003354380 c000000000000000 0000000000000000 c000000000b65950 GPR28: 0000002000000000 00000000000cd50e 0000000000bf50d9 c000000000c7c230 [ 543.076005] NIP [c000000000036eb0] .kernel_map_pages+0x1e0/0x3f8 [ 543.076016] LR [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8 [ 543.076025] Call Trace: [ 543.076033] [c0000000a9083540] [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable) [ 543.076053] [c0000000a9083640] [c000000000167638] .get_page_from_freelist+0x6cc/0x8dc [ 543.076067] [c0000000a9083800] [c000000000167a48] .__alloc_pages_nodemask+0x200/0x96c [ 543.076082] [c0000000a90839c0] [c0000000001ade44] .alloc_pages_vma+0x160/0x1e4 [ 543.076098] [c0000000a9083a80] [c00000000018ce04] .handle_pte_fault+0x1b0/0x7e8 [ 543.076113] [c0000000a9083b50] [c00000000018d5a8] .handle_mm_fault+0x16c/0x1a0 [ 543.076129] [c0000000a9083c00] [c0000000007bf1dc] .do_page_fault+0x4d0/0x7a4 [ 543.076144] [c0000000a9083e30] [c0000000000090e8] handle_page_fault+0x10/0x30 [ 543.076155] Instruction dump: [ 543.076163] 7c630038 78631d88 e80a0000 f8410028 7c0903a6 e91f01de e96a0010 e84a0008 [ 543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b000000> 7f63db78 48785781 60000000 [ 543.076224] ---[ end trace bd5807e8d6ae186b ]--- Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>	2013-04-18 16:00:00 +10:00
Li Zhong	b170bd3de6	powerpc: Split the code trying to insert hpte repeatedly as an helper function Move the logic trying to insert hpte in __hash_page_huge() to an helper function, so it could also be used by others. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>	2013-04-18 15:59:59 +10:00
Li Zhong	2c3c0693d9	powerpc: Move the setting of rflags out of loop in __hash_page_huge It seems that new_pte and rflags don't get changed in the repeating loop, so move their assignment out of the loop. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>	2013-04-18 15:59:59 +10:00
Stephen Rothwell	55671f3cc2	powerpc: fix annotation of fake_numa_create_new_node() This function has always been marked as __cpuinit, but is only called from functions marked as __init and references an __initdata variable. So change its annotation to __init. Fixes this build warning: WARNING: arch/powerpc/mm/built-in.o(.cpuinit.text+0x86): Section mismatch in reference from the function .fake_numa_create_new_node() to the variable .init.data:cmdline The function __cpuinit .fake_numa_create_new_node() references a variable __initdata cmdline. If cmdline is only used by .fake_numa_create_new_node then annotate cmdline with a matching annotation. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>	2013-04-18 15:59:56 +10:00
Vaidyanathan Srinivasan	7122beeee7	powerpc: fix numa distance for form0 device tree The following commit breaks numa distance setup for old powerpc systems that use form0 encoding in device tree. commit `41eab6f88f` powerpc/numa: Use form 1 affinity to setup node distance Device tree node /rtas/ibm,associativity-reference-points would index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or form1 encoding detected by ibm,architecture-vec-5 property. All modern systems use form1 and current kernel code is correct. However, on older systems with form0 encoding, the numa distance will get hard coded as LOCAL_DISTANCE for all nodes. This causes task scheduling anomaly since scheduler will skip building numa level domain (topmost domain with all cpus) if all numa distances are same. (value of 'level' in sched_init_numa() will remain 0) Prior to the above commit: ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE) Restoring compatible behavior with this patch for old powerpc systems with device tree where numa distance are encoded as form0. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>	2013-04-18 15:59:56 +10:00
Paul Bolle	aac3d0c875	powerpc: Fix typo "CONFIG_ICSWX_PID" Untested. As this typo was introduced in v3.3, with commit `9d67028090` ("powerpc: Split ICSWX ACOP and PID processing"), which actually added PPC_ICSWX_PID, this surely needs testing. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>	2013-04-18 13:03:54 +10:00
Valentina Manea	8040bda31a	powerpc: place EXPORT_SYMBOL macro right after declaration This fixes the following checkpatch.pl warnings: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable +EXPORT_SYMBOL(kmap_prot); WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable +EXPORT_SYMBOL(kmap_pte); Signed-off-by: Valentina Manea <valentina.manea.m@gmail.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>	2013-04-18 13:03:49 +10:00
Aneesh Kumar K.V	af81d7878c	powerpc: Rename USER_ESID_BITS* to ESID_BITS* Now we use ESID_BITS of kernel address to build proto vsid. So rename USER_ESIT_BITS to ESID_BITS Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.8]	2013-03-17 12:45:44 +11:00
Aneesh Kumar K.V	c60ac5693c	powerpc: Update kernel VSID range This patch change the kernel VSID range so that we limit VSID_BITS to 37. This enables us to support 64TB with 65 bit VA (37+28). Without this patch we have boot hangs on platforms that only support 65 bit VA. With this patch we now have proto vsid generated as below: We first generate a 37-bit "proto-VSID". Proto-VSIDs are generated from mmu context id and effective segment id of the address. For user processes max context id is limited to ((1ul << 19) - 5) for kernel space, we use the top 4 context ids to map address as below 0x7fffc - [ 0xc000000000000000 - 0xc0003fffffffffff ] 0x7fffd - [ 0xd000000000000000 - 0xd0003fffffffffff ] 0x7fffe - [ 0xe000000000000000 - 0xe0003fffffffffff ] 0x7ffff - [ 0xf000000000000000 - 0xf0003fffffffffff ] Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.8]	2013-03-17 12:39:06 +11:00
Benjamin Herrenschmidt	13938117a5	powerpc: Fix STAB initialization Commit `f5339277eb` accidentally removed more than just iSeries bits and took out the call to stab_initialize() thus breaking support for POWER3 processors. Put it back. (Yes, nobody noticed until now ...) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.4+]	2013-03-13 10:06:22 +11:00
Kumar Gala	1b29187315	powerpc/fsl-booke: Support detection of page sizes on e6500 The e6500 core used on T4240 and B4860 SoCs from FSL implements MMUv2 of the Power Book-E Architecture. However there are some minor differences between it and other Book-E implementations. Add support to parse SPRN_TLB1PS for the variable page sizes supported. In the future this should be expanded for more page sizes supported on e6500 as well as other MMU features. This patch is based on code from Scott Wood. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2013-03-05 17:10:27 -06:00
Linus Torvalds	5ce1a70e2f	Merge branch 'akpm' (more incoming from Andrew) Merge second patch-bomb from Andrew Morton: - A little DM fix - the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits) ksm: allocate roots when needed mm: cleanup "swapcache" in do_swap_page mm,ksm: swapoff might need to copy mm,ksm: FOLL_MIGRATION do migration_entry_wait ksm: shrink 32-bit rmap_item back to 32 bytes ksm: treat unstable nid like in stable tree ksm: add some comments tmpfs: fix mempolicy object leaks tmpfs: fix use-after-free of mempolicy object mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages mm: export mmu notifier invalidates mm: accelerate mm_populate() treatment of THP pages mm: use long type for page counts in mm_populate() and get_user_pages() mm: accurately document nr_free_*_pages functions with code comments HWPOISON: change order of error_states[]'s elements HWPOISON: fix misjudgement of page_action() for errors on mlocked pages memcg: stop warning on memcg_propagate_kmem net: change type of virtio_chan->p9_max_pages vmscan: change type of vm_total_pages to unsigned long fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used ...	2013-02-23 17:50:35 -08:00
Tang Chen	0197518cd3	memory-hotplug: remove memmap of sparse-vmemmap Introduce a new API vmemmap_free() to free and remove vmemmap pagetables. Since pagetable implements are different, each architecture has to provide its own version of vmemmap_free(), just like vmemmap_populate(). Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc. [mhocko@suse.cz: fix implicit declaration of remove_pagetable] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Yasuaki Ishimatsu	46723bfa54	memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). NOTE: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node() when platform doesn't support it. It's implemented by adding a new Kconfig option named CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected by memory-hotplug feature fully supported archs(currently only on x86_64). Since we have 2 config options called MEMORY_HOTPLUG and MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately, and codes in function register_page_bootmem_info_node() are only used for collecting infomation for hot-remove, so reside it under MEMORY_HOTREMOVE. Besides page_isolation.c selected by MEMORY_ISOLATION under MEMORY_HOTPLUG is also such case, move it too. [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE] [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()] [mhocko@suse.cz: remove the arch specific functions without any implementation] [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed] [rientjes@google.com: fix defined but not used warning] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wu Jianguo <wujianguo@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Wen Congyang	24d335ca36	memory-hotplug: introduce new arch_remove_memory() for removing page table For removing memory, we need to remove page tables. But it depends on architecture. So the patch introduce arch_remove_memory() for removing page table. Now it only calls __remove_pages(). Note: __remove_pages() for some archtecuture is not implemented (I don't know how to implement it for s390). Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Linus Torvalds	9d3cae26ac	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "So from the depth of frozen Minnesota, here's the powerpc pull request for 3.9. It has a few interesting highlights, in addition to the usual bunch of bug fixes, minor updates, embedded device tree updates and new boards: - Hand tuned asm implementation of SHA1 (by Paulus & Michael Ellerman) - Support for Doorbell interrupts on Power8 (kind of fast thread-thread IPIs) by Ian Munsie - Long overdue cleanup of the way we handle relocation of our open firmware trampoline (prom_init.c) on 64-bit by Anton Blanchard - Support for saving/restoring & context switching the PPR (Processor Priority Register) on server processors that support it. This allows the kernel to preserve thread priorities established by userspace. By Haren Myneni. - DAWR (new watchpoint facility) support on Power8 by Michael Neuling - Ability to change the DSCR (Data Stream Control Register) which controls cache prefetching on a running process via ptrace by Alexey Kardashevskiy - Support for context switching the TAR register on Power8 (new branch target register meant to be used by some new specific userspace perf event interrupt facility which is yet to be enabled) by Ian Munsie. - Improve preservation of the CFAR register (which captures the origin of a branch) on various exception conditions by Paulus. - Move the Bestcomm DMA driver from arch powerpc to drivers/dma where it belongs by Philippe De Muyter - Support for Transactional Memory on Power8 by Michael Neuling (based on original work by Matt Evans). For those curious about the feature, the patch contains a pretty good description." (See commit `db8ff90702`: "powerpc: Documentation for transactional memory on powerpc" for the mentioned description added to the file Documentation/powerpc/transactional_memory.txt) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (140 commits) powerpc/kexec: Disable hard IRQ before kexec powerpc/85xx: l2sram - Add compatible string for BSC9131 platform powerpc/85xx: bsc9131 - Correct typo in SDHC device node powerpc/e500/qemu-e500: enable coreint powerpc/mpic: allow coreint to be determined by MPIC version powerpc/fsl_pci: Store the pci ctlr device ptr in the pci ctlr struct powerpc/85xx: Board support for ppa8548 powerpc/fsl: remove extraneous DIU platform functions arch/powerpc/platforms/85xx/p1022_ds.c: adjust duplicate test powerpc: Documentation for transactional memory on powerpc powerpc: Add transactional memory to pseries and ppc64 defconfigs powerpc: Add config option for transactional memory powerpc: Add transactional memory to POWER8 cpu features powerpc: Add new transactional memory state to the signal context powerpc: Hook in new transactional memory code powerpc: Routines for FP/VSX/VMX unavailable during a transaction powerpc: Add transactional memory unavaliable execption handler powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes powerpc: Add FP/VSX and VMX register load functions for transactional memory powerpc: Add helper functions for transactional memory context switching ...	2013-02-23 17:09:55 -08:00
Michael Neuling	bc2a9408fa	powerpc: Hook in new transactional memory code This hooks the new transactional memory code into context switching, FP/VMX/VMX unavailable and exception return. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-02-15 17:02:23 +11:00
Aneesh Kumar K.V	eda8eebdd1	powerpc/mm: Fix hash computation function The ASM version of hash computation function was truncating the upper bit. Make the ASM version similar to hpt_hash function. Remove masking vsid bits. Without this patch, we observed hang during bootup due to not satisfying page fault request correctly. The fault handler used wrong hash values to update the HPTE. Hence we kept looping with page fault. hash_page(ea=000001003e260008, access=203, trap=300 ip=3fff91787134 dsisr 42000000 The computed value of hash 000000000f22f390 update: avpnv=4003e46054003e00, hash=000000000722f390, f=80000006, psize: 2 ... BenH: The over-masking has been there for ever but only hurts with the new 64T support introduced in 3.7 Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.7]	2013-02-04 15:15:08 +11:00
Cody P Schafer	4e8309baed	powerpc/mm: Eliminate unneeded for_each_memblock The only persistent change made by this loop is calling memblock_set_node() once for each memblock, which is not useful (and has no effect) as memblock_set_node() is not called with any memblock-specific parameters. Subsistute a single memblock_set_node(). Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-01-29 11:34:25 +11:00
Michael Neuling	9422de3e95	powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers This is a rewrite so that we don't assume we are using the DABR throughout the code. We now use the arch_hw_breakpoint to store the breakpoint in a generic manner in the thread_struct, rather than storing the raw DABR value. The ptrace GET/SET_DEBUGREG interface currently passes the raw DABR in from userspace. We keep this functionality, so that future changes (like the POWER8 DAWR), will still fake the DABR to userspace. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-01-10 17:01:44 +11:00
Anton Blanchard	1fbe9cf259	powerpc: Build kernel with -mcmodel=medium Finally remove the two level TOC and build with -mcmodel=medium. Unfortunately we can't build modules with -mcmodel=medium due to the tricks the kernel module loader plays with percpu data: # -mcmodel=medium breaks modules because it uses 32bit offsets from # the TOC pointer to create pointers where possible. Pointers into the # percpu data area are created by this method. # # The kernel module loader relocates the percpu data section from the # original location (starting with 0xd...) to somewhere in the base # kernel percpu data space (starting with 0xc...). We need a full # 64bit relocation for this to work, hence -mcmodel=large. On older kernels we fall back to the two level TOC (-mminimal-toc) Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-01-10 17:00:31 +11:00
Greg Kroah-Hartman	cad5cef62a	POWERPC: drivers: remove __dev* attributes. CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitdata, __devinitconst, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-03 15:57:04 -08:00
Linus Torvalds	16e024f30c	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc update from Benjamin Herrenschmidt: "The main highlight is probably some base POWER8 support. There's more to come such as transactional memory support but that will wait for the next one. Overall it's pretty quiet, or rather I've been pretty poor at picking things up from patchwork and reviewing them this time around and Kumar no better on the FSL side it seems..." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (73 commits) powerpc+of: Rename and fix OF reconfig notifier error inject module powerpc: mpc5200: Add a3m071 board support powerpc/512x: don't compile any platform DIU code if the DIU is not enabled powerpc/mpc52xx: use module_platform_driver macro powerpc+of: Export of_reconfig_notifier_[register,unregister] powerpc/dma/raidengine: add raidengine device powerpc/iommu/fsl: Add PAMU bypass enable register to ccsr_guts struct powerpc/mpc85xx: Change spin table to cached memory powerpc/fsl-pci: Add PCI controller ATMU PM support powerpc/86xx: fsl_pcibios_fixup_bus requires CONFIG_PCI drivers/virt: the Freescale hypervisor driver doesn't need to check MSR[GS] powerpc/85xx: p1022ds: Use NULL instead of 0 for pointers powerpc: Disable relocation on exceptions when kexecing powerpc: Enable relocation on during exceptions at boot powerpc: Move get_longbusy_msecs into hvcall.h and remove duplicate function powerpc: Add wrappers to enable/disable relocation on exceptions powerpc: Add set_mode hcall powerpc: Setup relocation on exceptions for bare metal systems powerpc: Move initial mfspr LPCR out of __init_LPCR powerpc: Add relocation on exception vector handlers ...	2012-12-18 09:58:09 -08:00
Linus Torvalds	f6e858a00a	Merge branch 'akpm' (Andrew's patch-bomb) Merge misc VM changes from Andrew Morton: "The rest of most-of-MM. The other MM bits await a slab merge. This patch includes the addition of a huge zero_page. Not a performance boost but it an save large amounts of physical memory in some situations. Also a bunch of Fujitsu engineers are working on memory hotplug. Which, as it turns out, was badly broken. About half of their patches are included here; the remainder are 3.8 material." However, this merge disables CONFIG_MOVABLE_NODE, which was totally broken. We don't add new features with "default y", nor do we add Kconfig questions that are incomprehensible to most people without any help text. Does the feature even make sense without compaction or memory hotplug? * akpm: (54 commits) mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic() mm/memory.c: remove unused code from do_wp_page() asm-generic, mm: pgtable: consolidate zero page helpers mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage hwpoison, hugetlbfs: fix RSS-counter warning hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage mm: protect against concurrent vma expansion memcg: do not check for mm in __mem_cgroup_count_vm_event tmpfs: support SEEK_DATA and SEEK_HOLE (reprise) mm: provide more accurate estimation of pages occupied by memmap fs/buffer.c: remove redundant initialization in alloc_page_buffers() fs/buffer.c: do not inline exported function writeback: fix a typo in comment mm: introduce new field "managed_pages" to struct zone mm, oom: remove statically defined arch functions of same name mm, oom: remove redundant sleep in pagefault oom handler mm, oom: cleanup pagefault oom handler memory_hotplug: allow online/offline memory to result movable node numa: add CONFIG_MOVABLE_NODE for movable-dedicated node mm, memcg: avoid unnecessary function call when memcg is disabled ...	2012-12-13 13:11:15 -08:00
David Rientjes	c2d23f919b	mm, oom: remove statically defined arch functions of same name out_of_memory() is a globally defined function to call the oom killer. x86, sh, and powerpc all use a function of the same name within file scope in their respective fault.c unnecessarily. Inline the functions into the pagefault handlers to clean the code up. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-12 17:38:34 -08:00
Adam Buchbinder	48fc7f7e78	Fix misspellings of "whether" in comments. "Whether" is misspelled in various comments across the tree; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-11-19 14:31:35 +01:00
Benjamin Herrenschmidt	de1bb03af7	Merge branch 'dt' into next	2012-11-15 15:02:44 +11:00
Tony Breeds	1afc149def	powerpc/47x: Use the new ppc-opcode infrastructure Don't use 47x only #defines for TLBIVAX or ICBT, supply and use helpers in ppc-opcode.h This fixes a compile breakage. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-11-15 12:59:24 +11:00
Nathan Fontenot	f594972083	powerpc+of: Move of_drconf_cell struct definition to asm/prom.h This patch moves the definition of the of_drconf_cell struct to asm/prom.h to make it available for all powerpc/pseries code. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-11-15 09:43:55 +11:00
Shaohua Li	45cac65b0f	readahead: fault retry breaks mmap file read random detection .fault now can retry. The retry can break state machine of .fault. In filemap_fault, if page is miss, ra->mmap_miss is increased. In the second try, since the page is in page cache now, ra->mmap_miss is decreased. And these are done in one fault, so we can't detect random mmap file access. Add a new flag to indicate .fault is tried once. In the second try, skip ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. I only tested x86, didn't test other archs, but looks the change for other archs is obvious, but who knows :) Signed-off-by: Shaohua Li <shaohua.li@fusionio.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:47 +09:00
Linus Torvalds	5f3d2f2e1a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "Some highlights in addition to the usual batch of fixes: - 64TB address space support for 64-bit processes by Aneesh Kumar - Gavin Shan did a major cleanup & re-organization of our EEH support code (IBM fancy PCI error handling & recovery infrastructure) which paves the way for supporting different platform backends, along with some rework of the PCIe code for the PowerNV platform in order to remove home made resource allocations and instead use the generic code (which is possible after some small improvements to it done by Gavin). - Uprobes support by Ananth N Mavinakayanahalli - A pile of embedded updates from Freescale folks, including new SoC and board supports, more KVM stuff including preparing for 64-bit BookE KVM support, ePAPR 1.1 updates, etc..." Fixup trivial conflicts in drivers/scsi/ipr.c * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits) powerpc/iommu: Fix multiple issues with IOMMU pools code powerpc: Fix VMX fix for memcpy case driver/mtd:IFC NAND:Initialise internal SRAM before any write powerpc/fsl-pci: use 'Header Type' to identify PCIE mode powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get powerpc: Remove tlb batching hack for nighthawk powerpc: Set paca->data_offset = 0 for boot cpu powerpc/perf: Sample only if SIAR-Valid bit is set in P7+ powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled powerpc/mpc85xx: Update interrupt handling for IFC controller powerpc/85xx: Enable USB support in p1023rds_defconfig powerpc/smp: Do not disable IPI interrupts during suspend powerpc/eeh: Fix crash on converting OF node to edev powerpc/eeh: Lock module while handling EEH event powerpc/kprobe: Don't emulate store when kprobe stwu r1 powerpc/kprobe: Complete kprobe and migrate exception frame powerpc/kprobe: Introduce a new thread flag powerpc: Remove unused __get_user64() and __put_user64() powerpc/eeh: Global mutex to protect PE tree powerpc/eeh: Remove EEH PE for normal PCI hotplug ...	2012-10-06 03:16:12 +09:00
Linus Torvalds	437589a74b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...	2012-10-02 11:11:09 -07:00
Michael Ellerman	8e166991c0	powerpc: Remove tlb batching hack for nighthawk In hpte_init_native() we call tlb_batching_enabled() to decide if we should setup ppc_md.flush_hash_range. tlb_batching_enabled() checks the _unflattened_ device tree, to see if we are running on a nighthawk. Since commit `a223535` ("dont allow pSeries_probe to succeed without initialising MMU", Dec 2006), hpte_init_native() has been called from pSeries_probe() - at which point we have not yet unflattened the device tree. This means tlb_batching_enabled() will always return true, so the hack has effectively been disabled since Dec 2006. Ergo, I think we can drop it. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-27 12:51:06 +10:00
Eric W. Biederman	9e184e0aa3	userns: On ppc convert current_uid from a kuid before printing. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-09-21 03:13:28 -07:00
Benjamin Herrenschmidt	caa1d631fc	Merge remote-tracking branch 'kumar/next' into next	2012-09-18 16:04:33 +10:00
Aneesh Kumar K.V	78f1dbde9f	powerpc/mm: Make some of the PGTABLE_RANGE dependency explicit slice array size and slice mask size depend on PGTABLE_RANGE. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:31:53 +10:00
Aneesh Kumar K.V	f033d659c3	powerpc/mm: Update VSID allocation documentation This update the proto-VSID and VSID scramble related information to be more generic by using names instead of current values. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:31:53 +10:00
Aneesh Kumar K.V	048ee0993e	powerpc/mm: Add 64TB support Increase max addressable range to 64TB. This is not tested on real hardware yet. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:31:51 +10:00
Aneesh Kumar K.V	735cafc32b	powerpc/mm: Use 32bit array for slb cache With larger vsid we need to track more bits of ESID in slb cache for slb invalidate. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:31:51 +10:00
Aneesh Kumar K.V	ac8dc2823a	powerpc/mm: Use the required number of VSID bits in slbmte ASM_VSID_SCRAMBLE can leave non-zero bits in the high 28 bits of the result for 256MB segment (40 bits for 1T segment). Properly mask them before using the values in slbmte Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:31:50 +10:00
Aneesh Kumar K.V	7aa0727f33	powerpc/mm: Increase the slice range to 64TB This patch makes the high psizes mask as an unsigned char array so that we can have more than 16TB. Currently we support upto 64TB Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:31:50 +10:00
Aneesh Kumar K.V	5524a27d39	powerpc/mm: Convert virtual address to vpn This patch convert different functions to take virtual page number instead of virtual address. Virtual page number is virtual address shifted right by VPN_SHIFT (12) bits. This enable us to have an address range of upto 76 bits. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:31:49 +10:00
Aneesh Kumar K.V	dcda287a9b	powerpc/mm: Simplify hpte_decode This patch simplify hpte_decode for easy switching of virtual address to virtual page number in the later patch Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:31:49 +10:00
Aneesh Kumar K.V	f038392363	powerpc/mm: Replace open coded CONTEXT_BITS value To clarify the meaning for future readers, replace the open coded 19 with CONTEXT_BITS Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:31:48 +10:00
Jia Hongtao	688ba1dbee	powerpc/swiotlb: Enable at early stage and disable if not necessary Remove the dependency on PCI initialization for SWIOTLB initialization. So that PCI can be initialized at proper time. SWIOTLB is partly determined by PCI inbound/outbound map which is assigned in PCI initialization. But swiotlb_init() should be done at the stage of mem_init() which is much earlier than PCI initialization. So we reserve the memory for SWIOTLB first and free it if not necessary. All boards are converted to fit this change. Signed-off-by: Jia Hongtao <B38951@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Acked-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2012-09-12 14:57:09 -05:00
Joe MacDonald	6b5e7229bb	powerpc/mm: Match variable types to API sys_subpage_prot() takes an unsigned long for 'addr' then does some stuff with it and the result is stored in a signed int, i, which is eventually used as the size parameter in a copy_from_user call. Update 'i' to be an unsigned long as well and since 'nw' is used in a size_t context which, depending on whether this is 32- or 64-bit may be unsigned int or unsigned long, switch that to a size_t and always be right. Finally, since we're in the neighbourhood, make the same changes to subpage_prot_clear(). Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Joe MacDonald <joe.macdonald@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-10 14:37:31 +10:00
Suzuki Poulose	a84fcd4687	powerpc: Change memory_limit from phys_addr_t to unsigned long long There are some device-tree nodes, whose values are of type phys_addr_t. The phys_addr_t is variable sized based on the CONFIG_PHSY_T_64BIT. Change these to a fixed unsigned long long for consistency. This patch does the change only for memory_limit. The following is a list of such variables which need the change: 1) kernel_end, crashk_size - in arch/powerpc/kernel/machine_kexec.c 2) (struct resource *)crashk_res.start - We could export a local static variable from machine_kexec.c. Changing the above values might break the kexec-tools. So, I will fix kexec-tools first to handle the different sized values and then change the above. Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-07 11:44:30 +10:00
Benjamin Herrenschmidt	fff34b3412	Merge branch 'merge' into next Brings in various bug fixes from 3.6-rcX	2012-09-07 09:48:59 +10:00
Jesse Larrew	79c5fcebfe	powerpc/vphn: Fix arch_update_cpu_topology() return value arch_update_cpu_topology() should only return 1 when the topology has actually changed, and should return 0 otherwise. This patch fixes a potential bug where rebuild_sched_domains() would reinitialize the sched domains even when the topology hasn't changed. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 16:05:19 +10:00
Mihai Caraman	8b64a9dfb0	powerpc/booke64: Use SPRG0/3 scratch for bolted TLB miss & crit int Embedded.Hypervisor category defines GSPRG0..3 physical registers for guests. Avoid SPRG4-7 usage as scratch in host exception handlers, otherwise guest SPRG4-7 registers will be clobbered. For bolted TLB miss exception handlers, which is the version currently supported by KVM, use SPRN_SPRG_GEN_SCRATCH aka SPRG0 instead of SPRN_SPRG_TLB_SCRATCH aka SPRG6. Keep using TLB PACA slots to fit in one 64-byte cache line. For critical exception handlers use SPRG3 instead of SPRG7. Provide a routine to store and restore user-visible SPRGs. This will be subsequently used to restore VDSO information in SPRG3. Add EX_R13 to paca slots to free up SPRG3 and change the critical exception epilog to use it. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 15:35:52 +10:00
Mihai Caraman	fecff0f724	powerpc/booke64: Add DO_KVM kernel hooks Hook DO_KVM macro into 64-bit booke for KVM integration. Extend interrupt handlers' parameter list with interrupt vector numbers to accomodate the macro. Only the bolted version of tlb miss handers is addressed now. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 15:35:46 +10:00
Ananth N Mavinakayanahalli	41ab5266c3	powerpc: Add trap_nr to thread_struct Add thread_struct.trap_nr and use it to store the last exception the thread experienced. In this patch, we populate the field at various places where we force_sig_info() to the process. This is also used in uprobes to determine if the probed instruction caused an exception. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 15:19:36 +10:00
Michael Ellerman	beacc6da86	powerpc: Remove all includes of <asm/abs_addr.h> It's empty now, apart from other includes. Fixup a few files that were getting things via this header. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 15:19:33 +10:00
Michael Ellerman	d88dc13a24	powerpc/mm: Remove uses of abs_to_virt() and virt_to_abs() These days they are just __va() and __pa() respectively. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 15:19:31 +10:00
Michael Ellerman	70267a7f47	powerpc/mm: Replace abs_to_virt() with __va() abs_to_virt() is just a wrapper around __va(), call __va() directly. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 15:18:43 +10:00
Alexander Graf	249ba1ee0f	KVM: PPC: Add cache flush on page map When we map a page that wasn't icache cleared before, do so when first mapping it in KVM using the same information bits as the Linux mapping logic. That way we are 100% sure that any page we map does not have stale entries in the icache. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-08-16 14:14:53 +02:00
Stuart Yoder	9778b696a0	powerpc: Use CURRENT_THREAD_INFO instead of open coded assembly Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-11 14:18:22 +10:00
Michael Neuling	962cffbd8a	powerpc: Enforce usage of RA 0-R31 where possible Some macros use RA where when RA=R0 the values is 0, so make this the enforced mnemonic in the macro. Idea suggested by Andreas Schwab. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-10 19:18:35 +10:00
Michael Neuling	e55174e911	powerpc: Fixes for instructions not using correct register naming These macros are using integers where they could be using logical names since they take registers. We are going to enforce this soon, so fix these up now. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-10 19:18:16 +10:00
Michael Neuling	44ce6a5ee7	powerpc: Merge STK_REG/PARAM/FRAMESIZE Merge the defines of STACKFRAMESIZE, STK_REG, STK_PARAM from different places. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-10 19:18:03 +10:00
Michael Neuling	c75df6f96c	powerpc: Fix usage of register macros getting ready for %r0 change Anything that uses a constructed instruction (ie. from ppc-opcode.h), need to use the new R0 macro, as %r0 is not going to work. Also convert usages of macros where we are just determining an offset (usually for a load/store), like: std r14,STK_REG(r14)(r1) Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since it's just calculating an offset. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-10 19:17:55 +10:00
Benjamin Herrenschmidt	50bba07d6a	Merge branch 'merge' into next We want to bring in the latest IRQ fixes	2012-07-10 19:16:43 +10:00
Benjamin Herrenschmidt	aa709f3bc9	powerpc/numa: Avoid stupid uninitialized warning from gcc Newer gcc are being a bit blind here (it's pretty obvious we don't reach the code path using the array if we haven't initialized the pointer) but none of that is performance critical so let's just silence it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-10 19:16:23 +10:00
Gavin Shan	5b958a7eef	powerpc/numa: Fix OF node refcounting bug The form affinity for NUMA is set to 1 if the firmware supports OPAL. Otherwise, we have to retrieve that from OF node "/chosen". For the latter case, OF node "/chosen" reference count was never decreased. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-03 14:14:44 +10:00
Michael Neuling	82b2521d25	powerpc: Fix uninitialised error in numa.c chroma_defconfig currently gives me this with gcc 4.6: arch/powerpc/mm/numa.c:638:13: error: 'dm' may be used uninitialized in this function [-Werror=uninitialized] It's a bogus warning/error since of_get_drconf_memory() only writes it anyway. Signed-off-by: Michael Neuling <mikey@neuling.org> cc: <stable@kernel.org> [v3.3+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-06-29 14:35:35 +10:00
Anton Vorontsov	73863ab028	powerpc: use clear_tasks_mm_cpumask() Current CPU hotplug code has some task->mm handling issues: 1. Working with task->mm w/o getting mm or grabing the task lock is dangerous as ->mm might disappear (exit_mm() assigns NULL under task_lock(), so tasklist lock is not enough). We can't use get_task_mm()/mmput() pair as mmput() might sleep, so we must take the task lock while handle its mm. 2. Checking for process->mm is not enough because process' main thread may exit or detach its mm via use_mm(), but other threads may still have a valid mm. To fix this we would need to use find_lock_task_mm(), which would walk up all threads and returns an appropriate task (with task lock held). clear_tasks_mm_cpumask() has all the issues fixed, so let's use it. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-31 17:49:29 -07:00
Paul Gortmaker	89528127fa	powerpc: fix compile fail in hugetlb cmdline parsing Commit `9fb48c744b` "params: add 3rd arg to option handler callback signature" added an extra arg to the function, but didn't catch all the use cases needing it, causing this compile fail in mpc85xx_defconfig: arch/powerpc/mm/hugetlbpage.c:316:4: error: passing argument 7 of 'parse_args' from incompatible pointer type [-Werror] include/linux/moduleparam.h:317:12: note: expected 'int ()(char , char , const char )' but argument is of type 'int ()(char , char *)' This function has no need to printk out the "doing" value, so just add the arg as an "unused". Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jim Cromie <jim.cromie@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Becky Bruce <beckyb@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-05-07 16:51:19 -07:00
Linus Torvalds	0195c00244	Disintegrate and delete asm/system.h -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+ /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD NypQthI85pc= =G9mT -----END PGP SIGNATURE----- Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...	2012-03-28 15:58:21 -07:00
Linus Torvalds	2e7580b0e7	Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Avi Kivity: "Changes include timekeeping improvements, support for assigning host PCI devices that share interrupt lines, s390 user-controlled guests, a large ppc update, and random fixes." This is with the sign-off's fixed, hopefully next merge window we won't have rebased commits. * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: Convert intx_mask_lock to spin lock KVM: x86: fix kvm_write_tsc() TSC matching thinko x86: kvmclock: abstract save/restore sched_clock_state KVM: nVMX: Fix erroneous exception bitmap check KVM: Ignore the writes to MSR_K7_HWCR(3) KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask KVM: PMU: add proper support for fixed counter 2 KVM: PMU: Fix raw event check KVM: PMU: warn when pin control is set in eventsel msr KVM: VMX: Fix delayed load of shared MSRs KVM: use correct tlbs dirty type in cmpxchg KVM: Allow host IRQ sharing for assigned PCI 2.3 devices KVM: Ensure all vcpus are consistent with in-kernel irqchip settings KVM: x86 emulator: Allow PM/VM86 switch during task switch KVM: SVM: Fix CPL updates KVM: x86 emulator: VM86 segments must have DPL 3 KVM: x86 emulator: Fix task switch privilege checks arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation KVM: mmu_notifier: Flush TLBs before releasing mmu_lock ...	2012-03-28 14:35:31 -07:00
David Howells	ae3a197e3d	Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PowerPC. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> cc: linuxppc-dev@lists.ozlabs.org	2012-03-28 18:30:02 +01:00
Pawel Moll	026cee0086	params: <level>_initcall-like kernel parameters This patch adds a set of macros that can be used to declare kernel parameters to be parsed _before_ initcalls at a chosen level are executed. We rename the now-unused "flags" field of struct kernel_param as the level. It's signed, for when we use this for early params as well, in future. Linker macro collating init calls had to be modified in order to add additional symbols between levels that are later used by the init code to split the calls into blocks. Signed-off-by: Pawel Moll <pawel.moll@arm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-03-26 12:50:51 +10:30
Linus Torvalds	5375871d43	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc merge from Benjamin Herrenschmidt: "Here's the powerpc batch for this merge window. It is going to be a bit more nasty than usual as in touching things outside of arch/powerpc mostly due to the big iSeriesectomy :-) We finally got rid of the bugger (legacy iSeries support) which was a PITA to maintain and that nobody really used anymore. Here are some of the highlights: - Legacy iSeries is gone. Thanks Stephen ! There's still some bits and pieces remaining if you do a grep -ir series arch/powerpc but they are harmless and will be removed in the next few weeks hopefully. - The 'fadump' functionality (Firmware Assisted Dump) replaces the previous (equivalent) "pHyp assisted dump"... it's a rewrite of a mechanism to get the hypervisor to do crash dumps on pSeries, the new implementation hopefully being much more reliable. Thanks Mahesh Salgaonkar. - The "EEH" code (pSeries PCI error handling & recovery) got a big spring cleaning, motivated by the need to be able to implement a new backend for it on top of some new different type of firwmare. The work isn't complete yet, but a good chunk of the cleanups is there. Note that this adds a field to struct device_node which is not very nice and which Grant objects to. I will have a patch soon that moves that to a powerpc private data structure (hopefully before rc1) and we'll improve things further later on (hopefully getting rid of the need for that pointer completely). Thanks Gavin Shan. - I dug into our exception & interrupt handling code to improve the way we do lazy interrupt handling (and make it work properly with "edge" triggered interrupt sources), and while at it found & fixed a wagon of issues in those areas, including adding support for page fault retry & fatal signals on page faults. - Your usual random batch of small fixes & updates, including a bunch of new embedded boards, both Freescale and APM based ones, etc..." I fixed up some conflicts with the generalized irq-domain changes from Grant Likely, hopefully correctly. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (141 commits) powerpc/ps3: Do not adjust the wrapper load address powerpc: Remove the rest of the legacy iSeries include files powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces init: Remove CONFIG_PPC_ISERIES powerpc: Remove FW_FEATURE ISERIES from arch code tty/hvc_vio: FW_FEATURE_ISERIES is no longer selectable powerpc/spufs: Fix double unlocks powerpc/5200: convert mpc5200 to use of_platform_populate() powerpc/mpc5200: add options to mpc5200_defconfig powerpc/mpc52xx: add a4m072 board support powerpc/mpc5200: update mpc5200_defconfig to fit for charon board Documentation/powerpc/mpc52xx.txt: Checkpatch cleanup powerpc/44x: Add additional device support for APM821xx SoC and Bluestone board powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board MAINTAINERS: Update PowerPC 4xx tree powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board powerpc: document the FSL MPIC message register binding powerpc: add support for MPIC message register API powerpc/fsl: Added aliased MSIIR register address to MSI node in dts powerpc/85xx: mpc8548cds - add 36-bit dts ...	2012-03-21 18:55:10 -07:00
Stephen Rothwell	f5339277eb	powerpc: Remove FW_FEATURE ISERIES from arch code This is no longer selectable, so just remove all the dependent code. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-21 11:16:11 +11:00
Cong Wang	2480b20892	powerpc: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:14 +08:00
Kumar Gala	f0b8b3417d	powerpc/fsl-booke: Fixup calc_cam_sz to support MMU v2 The registers that describe size supported by TLB are different on MMU v2 as well as we support power of two page sizes. For now we continue to assume that FSL variable size array supports all page sizes up to the maximum one reported in TLB1PS. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2012-03-15 12:12:19 -05:00
Benjamin Herrenschmidt	9be72573a8	powerpc: Add support for page fault retry and fatal signals Other architectures such as x86 and ARM have been growing new support for features like retrying page faults after dropping the mm semaphore to break contention, or being able to return from a stuck page fault when a SIGKILL is pending. This refactors our implementation of do_page_fault() to move the error handling out of line in a way similar to x86 and adds support for those two features. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:12 +11:00
Benjamin Herrenschmidt	a546498f3b	powerpc: Call do_page_fault() with interrupts off We currently turn interrupts back to their previous state before calling do_page_fault(). This can be annoying when debugging as a bad fault will potentially have lost some processor state before getting into the debugger. We also end up calling some generic code with interrupts enabled such as notify_page_fault() with interrupts enabled, which could be unexpected. This changes our code to behave more like other architectures, and make the assembly entry code call into do_page_faults() with interrupts disabled. They are conditionally re-enabled from within do_page_fault() in the same spot x86 does it. While there, add the might_sleep() test in the case of a successful trylock of the mmap semaphore, again like x86. Also fix a bug in the existing assembly where r12 (_MSR) could get clobbered by C calls (the DTL accounting in the exception common macro and DISABLE_INTS) in some cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2. Add the r12 clobber fix	2012-03-09 10:55:08 +11:00
Benjamin Herrenschmidt	4f8cf36f48	powerpc: Remove legacy iSeries bits from assembly files This removes the various bits of assembly in the kernel entry, exception handling and SLB management code that were specific to running under the legacy iSeries hypervisor which is no longer supported. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:54:59 +11:00
Joe Perches	a2234b4bae	powerpc: Use vsprintf extention %pf with builtin_return_address Emit the function name not the address when possible. builtin_return_address() gives an address. When building a kernel with CONFIG_KALLSYMS, emit the actual function name not the address. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-07 17:06:09 +11:00
Jimi Xenidis	de801de139	powerpc/icswx: Fix race condition with IPI setting ACOP There is a race where a thread causes a coprocessor type to be valid in its own ACOP _and_ in the current context, but it does not propagate to the ACOP register of other threads in time for them to use it. The original code tries to solve this by sending an IPI to all threads on the system, which is heavy handed, but unfortunately still provides a window where the icswx is issued by other threads and the ACOP is not up to date. This patch detects that the ACOP DSI fault was a "false positive" and syncs the ACOP and causes the icswx to be replayed. Signed-off-by: Jimi Xenidis <jimix@pobox.com> Cc: Anton Blanchard <anton@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-07 17:06:09 +11:00
Paul Mackerras	342d3db763	KVM: PPC: Implement MMU notifiers for Book3S HV guests This adds the infrastructure to enable us to page out pages underneath a Book3S HV guest, on processors that support virtualized partition memory, that is, POWER7. Instead of pinning all the guest's pages, we now look in the host userspace Linux page tables to find the mapping for a given guest page. Then, if the userspace Linux PTE gets invalidated, kvm_unmap_hva() gets called for that address, and we replace all the guest HPTEs that refer to that page with absent HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit set, which will cause an HDSI when the guest tries to access them. Finally, the page fault handler is extended to reinstantiate the guest HPTE when the guest tries to access a page which has been paged out. Since we can't intercept the guest DSI and ISI interrupts on PPC970, we still have to pin all the guest pages on PPC970. We have a new flag, kvm->arch.using_mmu_notifiers, that indicates whether we can page guest pages out. If it is not set, the MMU notifier callbacks do nothing and everything operates as before. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:38 +02:00
Mahesh Salgaonkar	3ccc00a7e0	fadump: Register for firmware assisted dump. On 2012-02-20 11:02:51 Mon, Paul Mackerras wrote: > On Thu, Feb 16, 2012 at 04:44:30PM +0530, Mahesh J Salgaonkar wrote: > > If I have read the code correctly, we are going to get this printk on > non-pSeries machines or on older pSeries machines, even if the user > has not put the fadump=on option on the kernel command line. The > printk will be annoying since there is no actual error condition. It > seems to me that the condition for the printk should include > fw_dump.fadump_enabled. In other words you should probably add > > if (!fw_dump.fadump_enabled) > return 0; > > at the beginning of the function. Hi Paul, Thanks for pointing it out. Please find the updated patch below. The existing patches above this (4/10 through 10/10) cleanly applies on this update. Thanks, -Mahesh. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-02-23 10:50:01 +11:00
Wanlong Gao	9512938b88	cpumask: update setup_node_to_cpumask_map() comments node_to_cpumask() has been replaced by cpumask_of_node(), and wholly removed since commit `29c337a0` ("cpumask: remove obsolete node_to_cpumask now everyone uses cpumask_of_node"). So update the comments for setup_node_to_cpumask_map(). Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:11 -08:00
Linus Torvalds	98793265b4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)	2012-01-08 13:21:22 -08:00
Linus Torvalds	7affca3537	Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits) arm: fix up some samsung merge sysdev conversion problems firmware: Fix an oops on reading fw_priv->fw in sysfs loading file Drivers:hv: Fix a bug in vmbus_driver_unregister() driver core: remove __must_check from device_create_file debugfs: add missing #ifdef HAS_IOMEM arm: time.h: remove device.h #include driver-core: remove sysdev.h usage. clockevents: remove sysdev.h arm: convert sysdev_class to a regular subsystem arm: leds: convert sysdev_class to a regular subsystem kobject: remove kset_find_obj_hinted() m86k: gpio - convert sysdev_class to a regular subsystem mips: txx9_sram - convert sysdev_class to a regular subsystem mips: 7segled - convert sysdev_class to a regular subsystem sh: dma - convert sysdev_class to a regular subsystem sh: intc - convert sysdev_class to a regular subsystem power: suspend - convert sysdev_class to a regular subsystem power: qe_ic - convert sysdev_class to a regular subsystem power: cmm - convert sysdev_class to a regular subsystem s390: time - convert sysdev_class to a regular subsystem ... Fix up conflicts with 'struct sysdev' removal from various platform drivers that got changed: - arch/arm/mach-exynos/cpu.c - arch/arm/mach-exynos/irq-eint.c - arch/arm/mach-s3c64xx/common.c - arch/arm/mach-s3c64xx/cpu.c - arch/arm/mach-s5p64x0/cpu.c - arch/arm/mach-s5pv210/common.c - arch/arm/plat-samsung/include/plat/cpu.h - arch/powerpc/kernel/sysfs.c and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h	2012-01-07 12:03:30 -08:00
Linus Torvalds	e4e88f31bc	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (185 commits) powerpc: fix compile error with 85xx/p1010rdb.c powerpc: fix compile error with 85xx/p1023_rds.c powerpc/fsl: add MSI support for the Freescale hypervisor arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree powerpc/fsl: Add support for Integrated Flash Controller powerpc/fsl: update compatiable on fsl 16550 uart nodes powerpc/85xx: fix PCI and localbus properties in p1022ds.dts powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig powerpc/fsl: Update defconfigs to enable some standard FSL HW features powerpc: Add TBI PHY node to first MDIO bus sbc834x: put full compat string in board match check powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit offb: Fix setting of the pseudo-palette for >8bpp offb: Add palette hack for qemu "standard vga" framebuffer offb: Fix bug in calculating requested vram size powerpc/boot: Change the WARN to INFO for boot wrapper overlap message powerpc/44x: Fix build error on currituck platform powerpc/boot: Change the load address for the wrapper to fit the kernel powerpc/44x: Enable CRASH_DUMP for 440x ... Fix up a trivial conflict in arch/powerpc/include/asm/cputime.h due to the additional sparse-checking code for cputime_t.	2012-01-06 17:58:22 -08:00
Greg Kroah-Hartman	ff4b8a57f0	Merge branch 'driver-core-next' into Linux 3.2 This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file, and it fixes the build error in the arch/x86/kernel/microcode_core.c file, that the merge did not catch. The microcode_core.c patch was provided by Stephen Rothwell <sfr@canb.auug.org.au> who was invaluable in the merge issues involved with the large sysdev removal process in the driver-core tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-06 11:42:52 -08:00
Kay Sievers	8a25a2fd12	cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@amd64.org> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Len Brown <lenb@kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-12-21 14:29:42 -08:00
Suzuki Poulose	368ff8f14d	powerpc: Define virtual-physical translations for RELOCATABLE We find the runtime address of _stext and relocate ourselves based on the following calculation. virtual_base = ALIGN(KERNELBASE,KERNEL_TLB_PIN_SIZE) + MODULO(_stext.run,KERNEL_TLB_PIN_SIZE) relocate() is called with the Effective Virtual Base Address (as shown below) \| Phys. Addr\| Virt. Addr \| Page \|------------------------\| Boundary \| \| \| \| \| \| \| \| \| Kernel Load \|___________\|_ __ _ _ _ _\|<- Effective Addr(_stext)\| \| ^ \|Virt. Base Addr \| \| \| \| \| \| \| \| \| \|reloc_offset\| \| \| \| \| \| \| \| \| \| \|______v_____\|<-(KERNELBASE)%TLB_SIZE \| \| \| \| \| \| \| \| \| Page \|-----------\|------------\| Boundary \| \| \| On BookE, we need __va() & __pa() early in the boot process to access the device tree. Currently this has been defined as : #define __va(x) ((void )(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + KERNELBASE) where: PHYSICAL_START is kernstart_addr - a variable updated at runtime. KERNELBASE is the compile time Virtual base address of kernel. This won't work for us, as kernstart_addr is dynamic and will yield different results for __va()/__pa() for same mapping. e.g., Let the kernel be loaded at 64MB and KERNELBASE be 0xc0000000 (same as PAGE_OFFSET). In this case, we would be mapping 0 to 0xc0000000, and kernstart_addr = 64M Now __va(1MB) = (0x100000) - (0x4000000) + 0xc0000000 = 0xbc100000 , which is wrong. it should be : 0xc0000000 + 0x100000 = 0xc0100000 On platforms which support AMP, like PPC_47x (based on 44x), the kernel could be loaded at highmem. Hence we cannot always depend on the compile time constants for mapping. Here are the possible solutions: 1) Update kernstart_addr(PHSYICAL_START) to match the Physical address of compile time KERNELBASE value, instead of the actual Physical_Address(_stext). The disadvantage is that we may break other users of PHYSICAL_START. They could be replaced with __pa(_stext). 2) Redefine __va() & __pa() with relocation offset #ifdef CONFIG_RELOCATABLE_PPC32 #define __va(x) ((void )(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + (KERNELBASE + RELOC_OFFSET))) #define __pa(x) ((unsigned long)(x) + PHYSICAL_START - (KERNELBASE + RELOC_OFFSET)) #endif where, RELOC_OFFSET could be a) A variable, say relocation_offset (like kernstart_addr), updated at boot time. This impacts performance, as we have to load an additional variable from memory. OR b) #define RELOC_OFFSET ((PHYSICAL_START & PPC_PIN_SIZE_OFFSET_MASK) - \ (KERNELBASE & PPC_PIN_SIZE_OFFSET_MASK)) This introduces more calculations for doing the translation. 3) Redefine __va() & __pa() with a new variable i.e, #define __va(x) ((void )(unsigned long)((phys_addr_t)(x) + VIRT_PHYS_OFFSET)) where VIRT_PHYS_OFFSET : #ifdef CONFIG_RELOCATABLE_PPC32 #define VIRT_PHYS_OFFSET virt_phys_offset #else #define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START) #endif / CONFIG_RELOCATABLE_PPC32 */ where virt_phy_offset is updated at runtime to : Effective KERNELBASE - kernstart_addr. Taking our example, above: virt_phys_offset = effective_kernelstart_vaddr - kernstart_addr = 0xc0400000 - 0x400000 = 0xc0000000 and __va(0x100000) = 0xc0000000 + 0x100000 = 0xc0100000 which is what we want. I have implemented (3) in the following patch which has same cost of operation as the existing one. I have tested the patches on 440x platforms only. However this should work fine for PPC_47x also, as we only depend on the runtime address and the current TLB XLAT entry for the startup code, which is available in r25. I don't have access to a 47x board yet. So, it would be great if somebody could test this on 47x. Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org> Signed-off-by: Josh Boyer <jwboyer@gmail.com>	2011-12-20 10:21:34 -05:00
Suzuki Poulose	0f890c8d20	powerpc: Rename mapping based RELOCATABLE to DYNAMIC_MEMSTART for BookE The current implementation of CONFIG_RELOCATABLE in BookE is based on mapping the page aligned kernel load address to KERNELBASE. This approach however is not enough for platforms, where the TLB page size is large (e.g, 256M on 44x). So we are renaming the RELOCATABLE used currently in BookE to DYNAMIC_MEMSTART to reflect the actual method. The CONFIG_RELOCATABLE for PPC32(BookE) based on processing of the dynamic relocations will be introduced in the later in the patch series. This change would allow the use of the old method of RELOCATABLE for platforms which can afford to enforce the page alignment (platforms with smaller TLB size). Changes since v3: * Introduced a new config, NONSTATIC_KERNEL, to denote a kernel which is either a RELOCATABLE or DYNAMIC_MEMSTART(Suggested by: Josh Boyer) Suggested-by: Scott Wood <scottwood@freescale.com> Tested-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com> Cc: Scott Wood <scottwood@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linux ppc dev <linuxppc-dev@lists.ozlabs.org> Signed-off-by: Josh Boyer <jwboyer@gmail.com>	2011-12-20 10:20:19 -05:00
David Rientjes	2011b1d0d3	powerpc/mm: Fix section mismatch for read_n_cells read_n_cells() cannot be marked as .devinit.text since it is referenced from two functions that are not in that section: of_get_lmb_size() and hot_add_drconf_scn_to_nid(). Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-19 14:41:18 +11:00
David Rientjes	28e86bdbc9	powerpc/mm: Fix section mismatch for mark_reserved_regions_for_nid mark_reserved_regions_for_nid() is only called from do_init_bootmem(), which is in .init.text, so it must be in the same section to avoid a section mismatch warning. Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-19 14:41:16 +11:00
Benjamin Herrenschmidt	1e7342e778	Merge remote-tracking branch 'jwb/next' into next Conflicts: arch/powerpc/platforms/40x/ppc40x_simple.c	2011-12-16 11:24:25 +11:00
Christoph Egger	ca899859f1	powerpc/44x: Removing dead CONFIG_PPC47x CONFIG_PPC47x doesn't exist in Kconfig and no 476 processor calls this function ppc44x_pin_tlb() as it has it's own ppc47x_pin_tlb(). This code is probably an artifact of the original 476 code that shouldn't have made it upstream. Signed-off-by: Christoph Egger <siccegge@cs.fau.de> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Josh Boyer <jwboyer@gmail.com>	2011-12-09 07:48:56 -05:00
Tejun Heo	1d7cfe18ec	powerpc: Use HAVE_MEMBLOCK_NODE_MAP powerpc doesn't access early_node_map[] directly and enabling HAVE_MEMBLOCK_NODE_MAP is trivial - replacing add_active_range() calls with memblock_set_node() and selecting HAVE_MEMBLOCK_NODE_MAP is enough. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org>	2011-12-08 10:22:08 -08:00
Tejun Heo	1aadc0560f	memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users The only function of memblock_analyze() is now allowing resize of memblock region arrays. Rename it to memblock_allow_resize() and update its users. * The following users remain the same other than renaming. arm/mm/init.c::arm_memblock_init() microblaze/kernel/prom.c::early_init_devtree() powerpc/kernel/prom.c::early_init_devtree() openrisc/kernel/prom.c::early_init_devtree() sh/mm/init.c::paging_init() sparc/mm/init_64.c::paging_init() unicore32/mm/init.c::uc32_memblock_init() * In the following users, analyze was used to update total size which is no longer necessary. powerpc/kernel/machine_kexec.c::reserve_crashkernel() powerpc/kernel/prom.c::early_init_devtree() powerpc/mm/init_32.c::MMU_init() powerpc/mm/tlb_nohash.c::__early_init_mmu() powerpc/platforms/ps3/mm.c::ps3_mm_add_memory() powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups() sh/kernel/machine_kexec.c::reserve_crashkernel() * x86/kernel/e820.c::memblock_x86_fill() was directly setting memblock_can_resize before populating memblock and calling analyze afterwards. Call memblock_allow_resize() before start populating. memblock_can_resize is now static inside memblock.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-08 10:22:08 -08:00
Tejun Heo	6fbef13c4f	powerpc: Cleanup memblock usage * early_init_devtree(): Total memory size is aligned to PAGE_SIZE; however, alignment isn't enforced if memory_limit is explicitly specified. Simplify the logic and always apply PAGE_SIZE alignment. * MMU_init(): memblock regions is truncated by directly modifying memblock.memory.cnt. This is incomplete (reserved array is not truncated) and unnecessarily low level hindering further memblock improvments. Use memblock_enforce_memory_limit() instead. * wii_memory_fixups(): Unnecessarily low level direct manipulation of memblock regions. The same result can be achieved using properly abstracted operations. Reimplement using memblock API. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org>	2011-12-08 10:22:07 -08:00
sukadev@linux.vnet.ibm.com	8a3e3d31d1	powerpc: Punch a hole in /dev/mem for librtas With CONFIG_STRICT_DEVMEM=y, user space cannot read any part of /dev/mem. Since this breaks librtas, punch a hole in /dev/mem to allow access to the rmo_buffer that librtas needs. Anton Blanchard reported the problem and helped with the fix. A quick test for this patch: # cat /proc/rtas/rmo_buffer 000000000f190000 10000 # python -c "print 0x000000000f190000 / 0x10000" 3865 # dd if=/dev/mem of=/tmp/foo count=1 bs=64k skip=3865 1+0 records in 1+0 records out 65536 bytes (66 kB) copied, 0.000205235 s, 319 MB/s # dd if=/dev/mem of=/tmp/foo dd: reading `/dev/mem': Operation not permitted 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.00022519 s, 0.0 kB/s Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-08 14:22:52 +11:00
Becky Bruce	d93e4d7d72	powerpc/book3e: Change hugetlb preload to take vma argument This avoids an extra find_vma() and is less error-prone. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-07 16:26:24 +11:00
Becky Bruce	a6146888be	powerpc: Add gpages reservation code for 64-bit FSL BOOKE For 64-bit FSL_BOOKE implementations, gigantic pages need to be reserved at boot time by the memblock code based on the command line. This adds the call that handles the reservation, and fixes some code comments. It also removes the previous pr_err when reserve_hugetlb_gpages is called on a system without hugetlb enabled - the way the code is structured, the call is unconditional and the resulting error message spurious and confusing. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-07 16:26:23 +11:00
Becky Bruce	d1b9b12811	powerpc: Add hugepage support to 64-bit tablewalk code for FSL_BOOK3E Before hugetlb, at each level of the table, we test for !0 to determine if we have a valid table entry. With hugetlb, this compare becomes: < 0 is a normal entry 0 is an invalid entry > 0 is huge This works because the hugepage code pulls the top bit off the entry (which for non-huge entries always has the top bit set) as an indicator that we have a hugepage. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-07 16:26:22 +11:00
Becky Bruce	27609a42ee	powerpc: Whitespace/comment changes to tlb_low_64e.S I happened to comment this code while I was digging through it; we might as well commit that. I also made some whitespace changes - the existing code had a lot of unnecessary newlines that I found annoying when I was working on my tiny laptop. No functional changes. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-07 16:26:22 +11:00
Becky Bruce	881fde1db5	powerpc: hugetlb: modify include usage for FSL BookE code The original 32-bit hugetlb implementation used PPC64 vs PPC32 to determine which code path to take. However, the final hugetlb implementation for 64-bit FSL ended up shared with the FSL 32-bit code so the actual check needs to be FSL_BOOK3E vs everything else. This patch changes the include protections to reflect this. There are also a couple of related comment fixes. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-07 16:26:22 +11:00
Becky Bruce	a1cd541988	powerpc: Update hugetlb huge_pte_alloc and tablewalk code for FSL BOOKE This updates the hugetlb page table code to handle 64-bit FSL_BOOKE. The previous 32-bit work counted on the inner levels of the page table collapsing. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-07 16:26:22 +11:00
Becky Bruce	8c1674de2b	powerpc: Fix booke hugetlb preload code for PPC_MM_SLICES and 64-bit This patch does 2 things: It corrects the code that determines the size to write into MAS1 for the PPC_MM_SLICES case (this originally came from David Gibson and I had incorrectly altered it), and it changes the methodolody used to calculate the size for !PPC_MM_SLICES to work for 64-bit as well as 32-bit. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-07 16:26:21 +11:00
Becky Bruce	7651295944	powerpc: Only define HAVE_ARCH_HUGETLB_UNMAPPED_AREA if PPC_MM_SLICES If we don't have slices, we should be able to use the generic hugetlb_get_unmapped_area() code Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-07 16:26:21 +11:00
Justin P. Mattock	42b2aa86c6	treewide: Fix typos in various parts of the kernel, and fix some comments. The below patch fixes some typos in various parts of the kernel, as well as fixes some comments. Please let me know if I missed anything, and I will try to get it changed and resent. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-12-02 14:57:31 +01:00
Tejun Heo	d4bbf7e775	Merge branch 'master' into x86/memblock Conflicts & resolutions: * arch/x86/xen/setup.c `dc91c728fd` "xen: allow extra memory to be in multiple regions" `24aa07882b` "memblock, x86: Replace memblock_x86_reserve/free..." conflicted on xen_add_extra_mem() updates. The resolution is trivial as the latter just want to replace memblock_x86_reserve_range() with memblock_reserve(). * drivers/pci/intel-iommu.c `166e9278a3` "x86/ia64: intel-iommu: move to drivers/iommu/" `5dfe8660a3` "bootmem: Replace work_with_active_regions() with..." conflicted as the former moved the file under drivers/iommu/. Resolved by applying the chnages from the latter on the moved file. * mm/Kconfig `6661672053` "memblock: add NO_BOOTMEM config symbol" `c378ddd53f` "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option" conflicted trivially. Both added config options. Just letting both add their own options resolves the conflict. * mm/memblock.c `d1f0ece6cd` "mm/memblock.c: small function definition fixes" `ed7b56a799` "memblock: Remove memblock_memory_can_coalesce()" confliected. The former updates function removed by the latter. Resolution is trivial. Signed-off-by: Tejun Heo <tj@kernel.org>	2011-11-28 09:46:22 -08:00
Dan McGee	fa8cbaaf5a	powerpc+sparc64/mm: Remove hack in mmap randomize layout Since commit `8a0a9bd4db`, this comment in mmap_rnd() does not hold true as the value returned by get_random_int() will in fact be different every single call. Remove the comment and simplify the code back to its original desired form. This reverts commit `a5adc91a4b` which is no longer necessary and also fixes the sparc code that copied this same adjustment. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-28 11:42:09 +11:00
sukadev@linux.vnet.ibm.com	1d54cf2b97	powerpc: Implement CONFIG_STRICT_DEVMEM As described in the help text in the patch, this token restricts general access to /dev/mem as a way of increasing the security. Specifically, access to exclusive IOMEM and kernel RAM is denied unless CONFIG_STRICT_DEVMEM is set to 'n'. Implement the 'devmem_is_allowed()' interface for Powerpc. It will be called from range_is_allowed() when userpsace attempts to access /dev/mem. This patch is based on an earlier patch from Steve Best and with input from Paul Mackerras and Scott Wood. [BenH] Fixed a typo or two and removed the generic change which should be submitted as a separate patch Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-28 11:42:08 +11:00
Jimi Xenidis	c3dcf53a3f	powerpc/icswx: Simple ACOP fault handler This patch adds a fault handler that responds to illegal Coprocessor types. Currently all CTs are treated and illegal. There are two ways to report the fault back to the application. If the application used the record form ("icswx.") then the architected "reject" is emulated. If the application did not used the record form ("icswx") then it is selectable by config whether the failure is silent (as architected) or a SIGILL is generated. In all cases pr_warn() is used to log the bad CT. Signed-off-by: Jimi Xenidis <jimix@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-25 14:11:28 +11:00
Jimi Xenidis	9d67028090	powerpc: Split ICSWX ACOP and PID processing Some processors, like embedded, that already have a PID register that is managed by the system. This patch separates the ACOP and PID processing into separate files so that the ACOP code can be shared. Signed-off-by: Jimi Xenidis <jimix@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-25 14:11:27 +11:00
Kumar Gala	13020be8be	powerpc: Fix compiliation with hugetlbfs enabled arch/powerpc/mm/hugetlbpage.c: In function 'reserve_hugetlb_gpages': arch/powerpc/mm/hugetlbpage.c:312:2: error: implicit declaration of function 'parse_args' Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-25 10:05:59 +11:00
Dipankar Sarma	1c8ee73395	powerpc/numa: NUMA topology support for PowerNV This patch adds support for numa topology on powernv platforms running OPAL formware. It checks for the type of platform at run time and sets the affinity form correctly so that NUMA topology can be discovered correctly. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-08 14:51:46 +11:00
Anton Blanchard	c40dd2f766	powerpc: Add System RAM to /proc/iomem We've resisted adding System RAM to /proc/iomem because it is the wrong place for it. Unfortunately we continue to find tools that rely on this behaviour so give up and add it in. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-08 14:51:46 +11:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Linus Torvalds	1197ab2942	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits) powerpc/p3060qds: Add support for P3060QDS board powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX powerpc/85xx: Make kexec to interate over online cpus powerpc/fsl_booke: Fix comment in head_fsl_booke.S powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver powerpc/86xx: Correct Gianfar support for GE boards powerpc/cpm: Clear muram before it is in use. drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager powerpc/fsl_msi: add support for "msi-address-64" property powerpc/85xx: Setup secondary cores PIR with hard SMP id powerpc/fsl-booke: Fix settlbcam for 64-bit powerpc/85xx: Adding DCSR node to dtsi device trees powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards powerpc/85xx: fix PHYS_64BIT selection for P1022DS powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map powerpc: respect mem= setting for early memory limit setup powerpc: Update corenet64_smp_defconfig powerpc: Update mpc85xx/corenet 32-bit defconfigs ... Fix up trivial conflicts in: - arch/powerpc/configs/40x/hcu4_defconfig removed stale file, edited elsewhere - arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c: added opal and gelic drivers vs added ePAPR driver - drivers/tty/serial/8250.c moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support	2011-11-06 17:12:03 -08:00
Andrea Arcangeli	b35a35b556	thp: share get_huge_page_tail() This avoids duplicating the function in every arch gup_fast. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:58 -07:00
Andrea Arcangeli	cf592bf768	powerpc: gup_huge_pmd() return 0 if pte changes powerpc didn't return 0 in that case, if it's rolling back the *nr pointer it should also return zero to avoid adding pages to the array at the wrong offset. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Andrea Arcangeli	3526741f09	powerpc: gup_hugepte() support THP based tail recounting Up to this point the code assumed old refcounting for hugepages (pre-thp). This updates the code directly to the thp mapcount tail page refcounting. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Andrea Arcangeli	8596468487	powerpc: gup_hugepte() avoid freeing the head page too many times We only taken "refs" pins on the head page not "*nr" pins. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Andrea Arcangeli	405e44f2e3	powerpc: get_hugepte() don't put_page() the wrong page "page" may have changed to point to the next hugepage after the loop completed, The references have been taken on the head page, so the put_page must happen there too. This is a longstanding issue pre-thp inclusion. It's totally unclear how these page_cache_add_speculative and pte_val(pte) != pte_val(ptep) checks are necessary across all the powerpc gup_fast code, when x86 doesn't need any of that: there's no way the page can be freed with irq disabled so we're guaranteed the atomic_inc will happen on a page with page_count > 0 (so not needing the speculative check). The pte check is also meaningless on x86: no need to rollback on x86 if the pte changed, because the pte can still change a CPU tick after the check succeeded and it won't be rolled back in that case. The important thing is we got a reference on a valid page that was mapped there a CPU tick ago. So not knowing the soft tlb refill code of ppc64 in great detail I'm not removing the "speculative" page_count increase and the pte checks across all the code, but unless there's a strong reason for it they should be later cleaned up too. If a pte can change from huge to non-huge (like it could happen with THP) passing a pte_t ptep to gup_hugepte() would also require to repeat the is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs only so I'm not altering that. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Andrea Arcangeli	2839bdc1bf	powerpc: remove superfluous PageTail checks on the pte gup_fast This part of gup_fast doesn't seem capable of handling hugetlbfs ptes, those should be handled by gup_hugepd only, so these checks are superfluous. Plus if this wasn't a noop, it would have oopsed because, the insistence of using the speculative refcounting would trigger a VM_BUG_ON if a tail page was encountered in the page_cache_get_speculative(). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Andrea Arcangeli	70b50f94f1	mm: thp: tail page refcounting fix Michel while working on the working set estimation code, noticed that calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't safe, if the pfn ended up being a tail page of a transparent hugepage under splitting by __split_huge_page_refcount(). He then found the problem could also theoretically materialize with page_cache_get_speculative() during the speculative radix tree lookups that uses get_page_unless_zero() in SMP if the radix tree page is freed and reallocated and get_user_pages is called on it before page_cache_get_speculative has a chance to call get_page_unless_zero(). So the best way to fix the problem is to keep page_tail->_count zero at all times. This will guarantee that get_page_unless_zero() can never succeed on any tail page. page_tail->_mapcount is guaranteed zero and is unused for all tail pages of a compound page, so we can simply account the tail page references there and transfer them to tail_page->_count in __split_huge_page_refcount() (in addition to the head_page->_mapcount). While debugging this s/_count/_mapcount/ change I also noticed get_page is called by direct-io.c on pages returned by get_user_pages. That wasn't entirely safe because the two atomic_inc in get_page weren't atomic. As opposed to other get_user_page users like secondary-MMU page fault to establish the shadow pagetables would never call any superflous get_page after get_user_page returns. It's safer to make get_page universally safe for tail pages and to use get_page_foll() within follow_page (inside get_user_pages()). get_page_foll() is safe to do the refcounting for tail pages without taking any locks because it is run within PT lock protected critical sections (PT lock for pte and page_table_lock for pmd_trans_huge). The standard get_page() as invoked by direct-io instead will now take the compound_lock but still only for tail pages. The direct-io paths are usually I/O bound and the compound_lock is per THP so very finegrined, so there's no risk of scalability issues with it. A simple direct-io benchmarks with all lockdep prove locking and spinlock debugging infrastructure enabled shows identical performance and no overhead. So it's worth it. Ideally direct-io should stop calling get_page() on pages returned by get_user_pages(). The spinlock in get_page() is already optimized away for no-THP builds but doing get_page() on tail pages returned by GUP is generally a rare operation and usually only run in I/O paths. This new refcounting on page_tail->_mapcount in addition to avoiding new RCU critical sections will also allow the working set estimation code to work without any further complexity associated to the tail page refcounting with THP. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michel Lespinasse <walken@google.com> Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Paul Gortmaker	4b16f8e2d6	powerpc: various straight conversions from module.h --> export.h All these files were including module.h just for the basic EXPORT_SYMBOL infrastructure. We can shift them off to the export.h header which is a way smaller footprint and thus realize some compile time gains. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:44 -04:00
Paul Gortmaker	9308794884	powerpc: include export.h for files using EXPORT_SYMBOL/THIS_MODULE Fix failures in powerpc associated with the previously allowed implicit module.h presence that now lead to things like this: arch/powerpc/mm/mmu_context_hash32.c:76:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' arch/powerpc/mm/tlb_hash32.c:48:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' arch/powerpc/kernel/pci_32.c:51:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' arch/powerpc/kernel/iomap.c:36:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' arch/powerpc/platforms/44x/canyonlands.c:126:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' arch/powerpc/kvm/44x.c:168:59: error: 'THIS_MODULE' undeclared (first use in this function) [with several contibutions from Stephen Rothwell <sfr@canb.auug.org.au>] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:38 -04:00
Paul Gortmaker	66b15db69c	powerpc: add export.h to files making use of EXPORT_SYMBOL With module.h being implicitly everywhere via device.h, the absence of explicitly including something for EXPORT_SYMBOL went unnoticed. Since we are heading to fix things up and clean module.h from the device.h file, we need to explicitly include these files now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:37 -04:00
Becky Bruce	4559424a0c	powerpc/fsl-booke: Fix settlbcam for 64-bit Currently, it does a cntlzd on the size and then subtracts it from 21.... this doesn't take into account the varying size of a "long". Just use __ilog instead (and subtract the 10 we have to subtract to get to the tsize encoding). Also correct the comment about page sizes supported. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2011-10-12 23:39:10 -05:00
Kumar Gala	1dc91c3eb3	powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map On FSL Book-E devices we support multiple large TLB sizes and so we can get into situations in which the initial 1G TLB size is too big and we're asked for a size that is not mappable by a single entry (like 512M). The single entry is important because when we bring up secondary cores they need to ensure any data structure they need to access (eg PACA or stack) is always mapped. So we really need to determine what size will actually be mapped by the first TLB entry to ensure we limit early memory references to that region. We refactor the map_mem_in_cams() code to provider a helper function that we can utilize to determine the size of the first TLB entry while taking into account size and alignment constraints. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2011-10-11 23:30:41 -05:00
Paul Mackerras	25c29f9e32	powerpc: Fix hugetlb with CONFIG_PPC_MM_SLICES=y Commit `41151e77a4` ("powerpc: Hugetlb for BookE") added some #ifdef CONFIG_MM_SLICES conditionals to hugetlb_get_unmapped_area() and vma_mmu_pagesize(). Unfortunately this is not the correct config symbol; it should be CONFIG_PPC_MM_SLICES. The result is that attempting to use hugetlbfs on 64-bit Power server processors results in an infinite stack recursion between get_unmapped_area() and hugetlb_get_unmapped_area(). This fixes it by changing the #ifdef to use CONFIG_PPC_MM_SLICES in those functions and also in book3e_hugetlb_preload(). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-23 10:21:33 +10:00
Anton Blanchard	8bdafa39a4	powerpc: Fix deadlock in icswx code The icswx code introduced an A-B B-A deadlock: CPU0 CPU1 ---- ---- lock(&anon_vma->mutex); lock(&mm->mmap_sem); lock(&anon_vma->mutex); lock(&mm->mmap_sem); Instead of using the mmap_sem to keep mm_users constant, take the page table spinlock. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 15:53:23 +10:00
Anton Blanchard	a11940978b	powerpc: Fix oops when echoing bad values to /sys/devices/system/memory/probe If we echo an address the hypervisor doesn't like to /sys/devices/system/memory/probe we oops the box: # echo 0x10000000000 > /sys/devices/system/memory/probe kernel BUG at arch/powerpc/mm/hash_utils_64.c:541! The backtrace is: create_section_mapping arch_add_memory add_memory memory_probe_store sysdev_class_store sysfs_write_file vfs_write SyS_write In create_section_mapping we BUG if htab_bolt_mapping returned an error. A better approach is to return an error which will propagate back to userspace. Rerunning the test with this patch applied: # echo 0x10000000000 > /sys/devices/system/memory/probe -bash: echo: write error: Invalid argument Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 15:53:23 +10:00
Anton Blanchard	dfbe93a222	powerpc: Coding style cleanups While converting code to use for_each_node_by_type I noticed a number of coding style issues. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 15:53:23 +10:00
Anton Blanchard	94db7c5e14	powerpc: Use for_each_node_by_type instead of open coding it Use for_each_node_by_type instead of open coding it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 15:53:23 +10:00
Anton Blanchard	6083184269	powerpc/numa: Remove double of_node_put in hot_add_node_scn_to_nid During memory hotplug testing, I got the following warning: ERROR: Bad of_node_put() on /memory@0 of_node_release kref_put of_node_put of_find_node_by_type hot_add_node_scn_to_nid hot_add_scn_to_nid memory_add_physaddr_to_nid ... of_find_node_by_type() loop does the of_node_put for us so we only need the handle the case where we terminate the loop early. As suggested by Stephen Rothwell we can do the of_node_put unconditionally outside of the loop since of_node_put handles a NULL argument fine. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 15:53:22 +10:00
Becky Bruce	41151e77a4	powerpc: Hugetlb for BookE Enable hugepages on Freescale BookE processors. This allows the kernel to use huge TLB entries to map pages, which can greatly reduce the number of TLB misses and the amount of TLB thrashing experienced by applications with large memory footprints. Care should be taken when using this on FSL processors, as the number of large TLB entries supported by the core is low (16-64) on current processors. The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g. Page sizes larger than the max zone size are called "gigantic" pages and must be allocated on the command line (and cannot be deallocated). This is currently only fully implemented for Freescale 32-bit BookE processors, but there is some infrastructure in the code for 64-bit BooKE. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 09:19:40 +10:00
Linus Torvalds	184475029a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (99 commits) drivers/virt: add missing linux/interrupt.h to fsl_hypervisor.c powerpc/85xx: fix mpic configuration in CAMP mode powerpc: Copy back TIF flags on return from softirq stack powerpc/64: Make server perfmon only built on ppc64 server devices powerpc/pseries: Fix hvc_vio.c build due to recent changes powerpc: Exporting boot_cpuid_phys powerpc: Add CFAR to oops output hvc_console: Add kdb support powerpc/pseries: Fix hvterm_raw_get_chars to accept < 16 chars, fixing xmon powerpc/irq: Quieten irq mapping printks powerpc: Enable lockup and hung task detectors in pseries and ppc64 defeconfigs powerpc: Add mpt2sas driver to pseries and ppc64 defconfig powerpc: Disable IRQs off tracer in ppc64 defconfig powerpc: Sync pseries and ppc64 defconfigs powerpc/pseries/hvconsole: Fix dropped console output hvc_console: Improve tty/console put_chars handling powerpc/kdump: Fix timeout in crash_kexec_wait_realmode powerpc/mm: Fix output of total_ram. powerpc/cpufreq: Add cpufreq driver for Momentum Maple boards powerpc: Correct annotations of pmu registration functions ... Fix up trivial Kconfig/Makefile conflicts in arch/powerpc, drivers, and drivers/cpufreq	2011-07-25 22:59:39 -07:00
Linus Torvalds	5fabc487c9	Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) KVM: IOMMU: Disable device assignment without interrupt remapping KVM: MMU: trace mmio page fault KVM: MMU: mmio page fault support KVM: MMU: reorganize struct kvm_shadow_walk_iterator KVM: MMU: lockless walking shadow page table KVM: MMU: do not need atomicly to set/clear spte KVM: MMU: introduce the rules to modify shadow page table KVM: MMU: abstract some functions to handle fault pfn KVM: MMU: filter out the mmio pfn from the fault pfn KVM: MMU: remove bypass_guest_pf KVM: MMU: split kvm_mmu_free_page KVM: MMU: count used shadow pages on prepareing path KVM: MMU: rename 'pt_write' to 'emulate' KVM: MMU: cleanup for FNAME(fetch) KVM: MMU: optimize to handle dirty bit KVM: MMU: cache mmio info on page fault path KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code KVM: MMU: do not update slot bitmap if spte is nonpresent KVM: MMU: fix walking shadow page table KVM guest: KVM Steal time registration ...	2011-07-24 09:07:03 -07:00
Linus Torvalds	4d4abdcb1d	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits) perf: Remove the nmi parameter from the oprofile_perf backend x86, perf: Make copy_from_user_nmi() a library function perf: Remove perf_event_attr::type check x86, perf: P4 PMU - Fix typos in comments and style cleanup perf tools: Make test use the preset debugfs path perf tools: Add automated tests for events parsing perf tools: De-opt the parse_events function perf script: Fix display of IP address for non-callchain path perf tools: Fix endian conversion reading event attr from file header perf tools: Add missing 'node' alias to the hw_cache[] array perf probe: Support adding probes on offline kernel modules perf probe: Add probed module in front of function perf probe: Introduce debuginfo to encapsulate dwarf information perf-probe: Move dwarf library routines to dwarf-aux.{c, h} perf probe: Remove redundant dwarf functions perf probe: Move strtailcmp to string.c perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END tracing/kprobe: Update symbol reference when loading module tracing/kprobes: Support module init function probing kprobes: Return -ENOENT if probe point doesn't exist ...	2011-07-22 16:44:39 -07:00
Benjamin Herrenschmidt	4b575f3e8a	Merge remote-tracking branch 'jwb/next' into next	2011-07-22 13:16:41 +10:00
Tony Breeds	f7ba2991e9	powerpc/mm: Fix output of total_ram. On 32bit platforms that support >= 4GB memory total_ram was truncated. This creates a confusing printk: Top of RAM: 0x100000000, Total RAM: 0x0 Fix that: Top of RAM: 0x100000000, Total RAM: 0x100000000 Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-19 15:13:04 +10:00
Tejun Heo	5dfe8660a3	bootmem: Replace work_with_active_regions() with for_each_mem_pfn_range() Callback based iteration is cumbersome and much less useful than for_each_*() iterator. This patch implements for_each_mem_pfn_range() which replaces work_with_active_regions(). All the current users of work_with_active_regions() are converted. This simplifies walking over early_node_map and will allow converting internal logics in page_alloc to use iterator instead of walking early_node_map directly, which in turn will enable moving node information to memblock. powerpc change is only compile tested. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110714074610.GD3455@htj.dyndns.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:45:29 -07:00
Dave Kleikamp	9661534d6a	powerpc/47x: allow kernel to be loaded in higher physical memory The 44x code (which is shared by 47x) assumes the available physical memory begins at 0x00000000. This is not necessarily the case in an AMP environment. Support CONFIG_RELOCATABLE for 476 in order to allow the kernel to be loaded into a higher memory range. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2011-07-12 10:34:24 -04:00
Dave Kleikamp	91b191c71e	powerpc/44x: don't use tlbivax on AMP systems Since other OS's may be running on the other cores don't use tlbivax Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2011-07-12 09:21:55 -04:00
Paul Mackerras	9e368f2915	KVM: PPC: book3s_hv: Add support for PPC970-family processors This adds support for running KVM guests in supervisor mode on those PPC970 processors that have a usable hypervisor mode. Unfortunately, Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to 1), but the YDL PowerStation does have a usable hypervisor mode. There are several differences between the PPC970 and POWER7 in how guests are managed. These differences are accommodated using the CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature bits. Notably, on PPC970: * The LPCR, LPID or RMOR registers don't exist, and the functions of those registers are provided by bits in HID4 and one bit in HID0. * External interrupts can be directed to the hypervisor, but unlike POWER7 they are masked by MSR[EE] in non-hypervisor modes and use SRR0/1 not HSRR0/1. * There is no virtual RMA (VRMA) mode; the guest must use an RMO (real mode offset) area. * The TLB entries are not tagged with the LPID, so it is necessary to flush the whole TLB on partition switch. Furthermore, when switching partitions we have to ensure that no other CPU is executing the tlbie or tlbsync instructions in either the old or the new partition, otherwise undefined behaviour can occur. * The PMU has 8 counters (PMC registers) rather than 6. * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist. * The SLB has 64 entries rather than 32. * There is no mediated external interrupt facility, so if we switch to a guest that has a virtual external interrupt pending but the guest has MSR[EE] = 0, we have to arrange to have an interrupt pending for it so that we can get control back once it re-enables interrupts. We do that by sending ourselves an IPI with smp_send_reschedule after hard-disabling interrupts. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:59 +03:00
Paul Mackerras	969391c58a	powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to indicate that we have a usable hypervisor mode, and another to indicate that the processor conforms to PowerISA version 2.06. We also add another bit to indicate that the processor conforms to ISA version 2.01 and set that for PPC970 and derivatives. Some PPC970 chips (specifically those in Apple machines) have a hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode is not useful in the sense that there is no way to run any code in supervisor mode (HV=0 PR=0). On these processors, the LPES0 and LPES1 bits in HID4 are always 0, and we use that as a way of detecting that hypervisor mode is not useful. Where we have a feature section in assembly code around code that only applies on POWER7 in hypervisor mode, we use a construct like END_FTR_SECTION_IFSET(CPU_FTR_HVMODE \| CPU_FTR_ARCH_206) The definition of END_FTR_SECTION_IFSET is such that the code will be enabled (not overwritten with nops) only if all bits in the provided mask are set. Note that the CPU feature check in __tlbie() only needs to check the ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called if we are running bare-metal, i.e. in hypervisor mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:58 +03:00
Becky Bruce	3160b09796	powerpc: Create next_tlbcam_idx percpu variable for FSL_BOOKE This is used to round-robin TLBCAM entries. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2011-07-08 00:21:34 -05:00
Peter Zijlstra	a8b0ca17b8	perf: Remove the nmi parameter from the swevent and overflow interface The nmi parameter indicated if we could do wakeups from the current context, if not, we would set some state and self-IPI and let the resulting interrupt do the wakeup. For the various event classes: - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from the PMI-tail (ARM etc.) - tracepoint: nmi=0; since tracepoint could be from NMI context. - software: nmi=[0,1]; some, like the schedule thing cannot perform wakeups, and hence need 0. As one can see, there is very little nmi=1 usage, and the down-side of not using it is that on some platforms some software events can have a jiffy delay in wakeup (when arch_irq_work_raise isn't implemented). The up-side however is that we can remove the nmi parameter and save a bunch of conditionals in fast paths. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-01 11:06:35 +02:00
Dave Carroll	a9c0f41b3a	powerpc: Add printk companion for ppc_md.progress This patch adds a printk companion to replace the udbg progress function when initmem is freed. Suggested-by: Milton Miller <miltonm@bga.com> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Dave Carroll <dcarroll@astekcorp.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-30 15:28:05 +10:00
Dave Carroll	2773fcc8c4	powerpc: Move free_initmem to common code The free_initmem function is basically duplicated in mm/init_32, and init_64, and is moved to the common 32/64-bit mm/mem.c. All other sections except init were removed in v2.6.15 by `6c45ab992e` (powerpc: Remove section free() and linker script bits), and therefore the bulk of the executed code is identical. This patch also removes updating ppc_md.progress to NULL in the powermac late_initcall. Suggested-by: Milton Miller <miltonm@bga.com> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Dave Carroll <dcarroll@astekcorp.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-30 15:28:05 +10:00
Benjamin Herrenschmidt	6da49a2925	Merge remote branch 'origin/master' into next	2011-06-30 15:23:59 +10:00
Becky Bruce	3d41e0f6d9	powerpc: mem_init should call memblock_is_reserved with phys_addr_t This has been broken for a while but hasn't been an issue until now because nobody was reserving regions at high addresses. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-29 17:48:18 +10:00
Scott Wood	f67f4ef5fc	powerpc/book3e-64: use a separate TLB handler when linear map is bolted On MMUs such as FSL where we can guarantee the entire linear mapping is bolted, we don't need to worry about linear TLB misses. If on top of that we do a full table walk, we get rid of all recursive TLB faults, and can dispense with some state saving. This gains a few percent on TLB-miss-heavy workloads, and around 50% on a benchmark that had a high rate of virtual page table faults under the normal handler. While touching the EX_TLB layout, remove EX_TLB_MMUCR0, EX_TLB_SRR0, and EX_TLB_SRR1 as they're not used. [BenH: Fixed build with 64K pages (wsp config)] Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-29 17:47:48 +10:00
Christian Dietrich	76462232c2	arch/powerpc: use printk_ratelimited instead of printk_ratelimit Since printk_ratelimit() shouldn't be used anymore (see comment in include/linux/printk.h), replace it with printk_ratelimited. Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-29 15:31:01 +10:00
Kumar Gala	32d206eb56	powerpc/book3e: Clarify HW table walk enable/disable message Before if we didn't support or enable HW table walk we'd get a messaage like: MMU: Book3E Page Tables Disabled Which is a bit misleading. Now it will say: MMU: Book3E HW tablewalk not supported Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-17 16:19:51 +10:00
Benjamin Herrenschmidt	307cfe7153	powerpc: Force page alignment for initrd reserved memory When using 64K pages with a separate cpio rootfs, U-Boot will align the rootfs on a 4K page boundary. When the memory is reserved, and subsequent early memblock_alloc is called, it will allocate memory between the 64K page alignment and reserved memory. When the reserved memory is subsequently freed, it is done so by pages, causing the early memblock_alloc requests to be re-used, which in my case, caused the device-tree to be clobbered. This patch forces the reserved memory for initrd to be kernel page aligned, and will move the device tree if it overlaps with the range extension of initrd. This patch will also consolidate the identical function free_initrd_mem() from mm/init_32.c, init_64.c to mm/mem.c, and adds the same range extension when freeing initrd. free_initrd_mem() is also moved to the __init section. Many thanks to Milton Miller for his input on this patch. [BenH: Fixed build without CONFIG_BLK_DEV_INITRD] Signed-off-by: Dave Carroll <dcarroll@astekcorp.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-09 16:52:38 +10:00
Peter Zijlstra	2672391169	mm, powerpc: move the RCU page-table freeing into generic code In case other architectures require RCU freed page-tables to implement gup_fast() and software filled hashes and similar things, provide the means to do so by moving the logic into generic code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Requested-by: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:16 -07:00
Peter Zijlstra	d6bf29b44d	powerpc: mmu_gather rework Fix up powerpc to the new mmu_gather stuff. PPC has an extra batching queue to RCU free the actual pagetable allocations, use the ARCH extentions for that for now. For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the hardware hash-table, keep using per-cpu arrays but flush on context switch and use a TLF bit to track the lazy_mmu state. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:13 -07:00
Anton Blanchard	40f1ce7fb7	powerpc: Remove ioremap_flags We have a confusing number of ioremap functions. Make things just a bit simpler by merging ioremap_flags and ioremap_prot. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-19 14:30:43 +10:00
Anton Blanchard	be135f4089	powerpc: Add ioremap_wc Add ioremap_wc so drivers can request write combining on kernel mappings. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-19 14:30:42 +10:00
Stephen Rothwell	79af2187fa	powerpc: Fix compile with icwsx support Due to a collision between NO_CONTEXT->MMU_NO_CONTEXT change and Anton's patch. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-06 13:18:34 +10:00
KOSAKI Motohiro	104699c0ab	powerpc: Convert old cpumask API into new one Adapt new API. Almost change is trivial. Most important change is the below line because we plan to change task->cpus_allowed implementation. - ctx->cpus_allowed = current->cpus_allowed; Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-04 15:22:59 +10:00
Tseng-Hui (Frank) Lin	851d2e2fe8	powerpc: Add Initiate Coprocessor Store Word (icswx) support Icswx is a PowerPC instruction to send data to a co-processor. On Book-S processors the LPAR_ID and process ID (PID) of the owning process are registered in the window context of the co-processor at initialization time. When the icswx instruction is executed the L2 generates a cop-reg transaction on PowerBus. The transaction has no address and the processor does not perform an MMU access to authenticate the transaction. The co-processor compares the LPAR_ID and the PID included in the transaction and the LPAR_ID and PID held in the window context to determine if the process is authorized to generate the transaction. The OS needs to assign a 16-bit PID for the process. This cop-PID needs to be updated during context switch. The cop-PID needs to be destroyed when the context is destroyed. Signed-off-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com> Signed-off-by: Tseng-Hui (Frank) Lin <thlin@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-04 15:19:26 +10:00
Michael Neuling	a32e252f7c	powerpc: Use new CPU feature bit to select 2.06 tlbie This removes MMU_FTR_TLBIE_206 as we can now use CPU_FTR_HVMODE_206. It also changes the logic to select which tlbie to use to be based on this new CPU feature bit. This also duplicates the ASM_FTR_IF/SET/CLR defines for CPU features (copied from MMU features). Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-04 15:19:26 +10:00
Matt Evans	44ae3ab335	powerpc: Free up some CPU feature bits by moving out MMU-related features Some of the 64bit PPC CPU features are MMU-related, so this patch moves them to MMU_FTR_ bits. All cpu_has_feature()-style tests are moved to mmu_has_feature(), and seven feature bits are freed as a result. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 14:18:52 +10:00
Michael Ellerman	e70606eb9b	powerpc/numa: Look for ibm, associativity-reference-points at the root If we don't find ibm,associativity-reference-points as a child of /rtas, look for it at the root of the tree instead. We use this on Book3E where we have no RTAS but still use the sPAPR conventions for NUMA. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 14:18:35 +10:00
Benjamin Herrenschmidt	bd49178109	powerpc: Add TLB size detection for TYPE_3E MMUs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 13:02:10 +10:00
Anton Blanchard	b68a70c496	powerpc: Replace open coded instruction patching with patch_instruction/patch_branch There are a few places we patch instructions without using patch_instruction and patch_branch, probably because they predated it. Fix it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-20 17:01:18 +10:00
Michael Ellerman	f5be2dc0bd	powerpc/nohash: Allocate stale_map[cpu] on CPU_UP_PREPARE not CPU_ONLINE Currently we allocate the stale_map for a cpu when it comes online, this leaves open a small window where a process can be scheduled on the cpu before the stale_map is allocated. Instead allocate the stale_map at CPU_UP_PREPARE time, that way it will be always available before tasks start running. It is possible the cpu fails to come up, in which case we should free the stale_map, so add a CPU_UP_CANCELED case to do that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-20 17:01:18 +10:00
Michael Ellerman	5e8e7b404a	powerpc/mm: Standardise on MMU_NO_CONTEXT Use MMU_NO_CONTEXT as the initialiser for mm_context.id on nohash and hash64. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-20 16:59:20 +10:00
Linus Torvalds	42933bac11	Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings	2011-04-07 11:14:49 -07:00
Sylvestre Ledru	f65e51d740	Documentation: fix minor typos/spelling Fix some minor typos: * informations => information * there own => their own * these => this Signed-off-by: Sylvestre Ledru <sylvestre.ledru@scilab.org> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-04-04 17:51:47 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Benjamin Herrenschmidt	6090912c4a	powerpc: Implement dma_mmap_coherent() This is used by Alsa to mmap buffers allocated with dma_alloc_coherent() into userspace. We need a special variant to handle machines with non-coherent DMAs as those buffers have "special" virt addresses and require non-cachable mappings Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-03-30 10:44:00 +11:00
Linus Torvalds	0a95d92c00	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (62 commits) powerpc/85xx: Fix signedness bug in cache-sram powerpc/fsl: 85xx: document cache sram bindings powerpc/fsl: define binding for fsl mpic interrupt controllers powerpc/fsl_msi: Handle msi-available-ranges better drivers/serial/ucc_uart.c: Add of_node_put to avoid memory leak powerpc/85xx: Fix SPE float to integer conversion failure powerpc/85xx: Update sata controller compatible for p1022ds board ATA: Add FSL sata v2 controller support powerpc/mpc8xxx_gpio: simplify searching for 'fsl, qoriq-gpio' compatiable powerpc/8xx: remove obsolete mgsuvd board powerpc/82xx: rename and update mgcoge board support powerpc/83xx: rename and update kmeter1 powerpc/85xx: Workaroudn e500 CPU erratum A005 powerpc/fsl_pci: Add support for FSL PCIe controllers v2.x powerpc/85xx: Fix writing to spin table 'cpu-release-addr' on ppc64e powerpc/pseries: Disable MSI using new interface if possible powerpc: Enable GENERIC_HARDIRQS_NO_DEPRECATED. powerpc: core irq_data conversion. powerpc: sysdev/xilinx_intc irq_data conversion. powerpc: sysdev/uic irq_data conversion. ... Fix up conflicts in arch/powerpc/sysdev/fsl_msi.c (due to getting rid of of_platform_driver in arch/powerpc)	2011-03-18 06:31:43 -07:00
Benjamin Herrenschmidt	831532035b	Merge remote branch 'jwb/next' into next	2011-03-17 17:59:01 +11:00
Benjamin Herrenschmidt	36e8695ca5	powerpc/pseries: Disable VPNH feature This feature triggers nasty races in the scheduler between the rebuilding of the topology and the load balancing code, causing the machine to hang. Disable it for now until the races are fixed. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-03-10 10:06:41 +11:00
Scott Wood	6dd2270029	powerpc: Fix memory limits when starting at a non-zero address memblock_enforce_memory_limit() takes the desired maximum quantity of memory to end up with, not an address above which memory will not be used. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-03-02 16:56:15 +11:00
Peter Zijlstra	f342552b91	powerpc/mm: Make hpte_need_flush() safe for preemption hpte_need_flush() might be called outside of a preempt section when manipulating the kernel page tables, so we need to use the appopriate variants of per-cpu variable accesses. There should be no risk of being in the middle of a batch and a context switch will flush any pending batch. [Patch extracted from a larger patch in Peter's preemptible mmu_gather series] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-03-02 14:56:48 +11:00
Anton Blanchard	429f4d8d20	powerpc/numa: Fix bug in unmap_cpu_from_node When converting to the new cpumask code I screwed up: - if (cpu_isset(cpu, numa_cpumask_lookup_table[node])) { - cpu_clear(cpu, numa_cpumask_lookup_table[node]); + if (cpumask_test_cpu(cpu, node_to_cpumask_map[node])) { + cpumask_set_cpu(cpu, node_to_cpumask_map[node]); This was introduced in commit `25863de07a` (powerpc/cpumask: Convert NUMA code to new cpumask API) Fix it. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:06:06 +11:00
Anton Blanchard	fe5cfd6355	powerpc/numa: Disable VPHN on dedicated processor partitions There is no need to start up the timer and monitor topology changes on a dedicated processor partition, so disable it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:06:04 +11:00
Anton Blanchard	c0e5e46f39	powerpc/numa: Add length when creating OF properties via VPHN The rest of the NUMA code expects an OF associativity property with the first cell containing the length. Without this fix all topology changes cause us to misparse the property and put the cpu into node 0. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:06:03 +11:00
Anton Blanchard	d69043e806	powerpc/numa: Check for all VPHN changes The hypervisor uses unsigned 1 byte counters to signal topology changes to the OS. Since they can wrap we need to check for any difference, not just if the hypervisor count is greater than the previous count. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:06:01 +11:00
Anton Blanchard	5de1669910	powerpc/numa: Only use active VPHN count fields VPHN supports up to 8 distance fields but the number of entries in ibm,associativity-reference-points signifies how many are in use. Don't look at all the VPHN counts, only distance_ref_points_depth worth. Since we already cap our distance metrics at MAX_DISTANCE_REF_POINTS, use that to size the VPHN arrays and add a BUILD_BUG_ON to avoid it growing larger than the VPHN maximum of 8. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:05:59 +11:00
Jesse Larrew	cd9d6cc726	powerpc/pseries: Remove unnecessary variable initializations in numa.c Remove unnecessary variable initializations in VPHN functions. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:05:36 +11:00
Jesse Larrew	7639adaafb	powerpc/pseries: Fix brace placement in numa.c Fix brace placement in VPHN code. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 12:58:23 +11:00
Jesse Larrew	bd03403ad5	powerpc/pseries: Fix typo in VPHN comments Correct a spelling error in VPHN comments in numa.c. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 12:58:21 +11:00
Dave Kleikamp	21a06b0459	powerpc/476: Workaround for PLB6 hang The 476FP core may hang if an instruction fetch happens during an msync following a tlbsync. This workaround makes sure that enough instruction cache lines are pre-fetched before executing the msync. (sync and msync are the same to the compiler.) Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2011-02-02 06:59:02 -05:00
Andrea Arcangeli	9180706344	thp: alter compound get_page/put_page Alter compound get_page/put_page to keep references on subpages too, in order to allow __split_huge_page_refcount to split an hugepage even while subpages have been pinned by one of the get_user_pages() variants. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:39 -08:00
Benjamin Herrenschmidt	5d7d8072ed	powerpc/pseries: Fix build of topology stuff without CONFIG_NUMA Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-01-12 10:56:29 +11:00
Jesse Larrew	39bf990ead	powerpc/pseries: Fix VPHN build errors on non-SMP systems The header asm/hvcall.h was previously included indirectly via smp.h. On non-SMP systems, however, these declarations are excluded and the build breaks. This is easily fixed by including asm/hvcall.h directly. The VPHN feature is only meaningful on NUMA systems that implement the SPLPAR option, so exclude the VPHN code on systems without SPLPAR enabled. Also, expose unmap_cpu_from_node() on systems with SPLPAR enabled, even if CONFIG_HOTPLUG_CPU is disabled. Lastly, map_cpu_to_node() is now needed by VPHN to manipulate the node masks after boot time, so remove the __cpuinit annotation to fix a section mismatch. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>	2011-01-11 16:06:16 +11:00
Jesper Juhl	ae9fd31a36	powerpc: Remove unnecessary casts of void ptr Hi, The [vk][cmz]alloc(_node) family of functions return void pointers which it's completely unnecessary/pointless to cast to other pointer types since that happens implicitly. This patch removes such casts from arch/powerpc/ Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-12-09 15:36:30 +11:00
Jesse Larrew	9eff1a3840	powerpc/pseries: Poll VPA for topology changes and update NUMA maps This patch sets a timer during boot that will periodically poll the associativity change counters in the VPA. When a change in associativity is detected, it retrieves the new associativity domain information via the H_HOME_NODE_ASSOCIATIVITY hcall and updates the NUMA node maps and sysfs entries accordingly. Note that since the ibm,associativity device tree property does not exist on configurations with both NUMA and SPLPAR enabled, no device tree updates are necessary. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-12-09 15:36:29 +11:00
Michael Ellerman	7a9d12568e	powerpc: Record vma->phys_addr in ioremap() The vmalloc code can track the physical address of a vma, when the vma is used for ioremap, if set it is displayed in /proc/vmallocinfo. Because get_vm_area_caller() doesn't know it's being called for ioremap() it's up to the arch code to set the phys_addr. A bunch of other arch's do this, I'm not sure why powerpc doesn't? Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-12-09 15:35:32 +11:00
Benjamin Herrenschmidt	f4b9841595	Merge branch 'nvram' into next	2010-12-09 14:36:38 +11:00
Peter Zijlstra	f2e785ed5f	powerpc: Use call_rcu_sched() for pagetables PowerPC relies on IRQ-disable to guard against RCU quiecent states, use the appropriate RCU call version. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-30 10:42:20 +11:00
Michael Neuling	0b97fee0ef	powerpc/mm: Avoid avoidable void* pointer Change pgdir from a void to real type. Having this as a void is stupid and has already caused 1 bug. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-29 15:48:23 +11:00
Nishanth Aravamudan	cd34206e94	powerpc: Add memory_hotplug_max() Add a function to get the maximum address that can be hotplug added. This is needed to calculate the size of the tce table needed to cover all memory in 1:1 mode. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-29 15:48:21 +11:00
Vaidyanathan Srinivasan	99d8670525	powerpc: Cleanup APIs for cpu/thread/core mappings These APIs take logical cpu number as input Change cpu_first_thread_in_core() to cpu_first_thread_sibling() Change cpu_last_thread_in_core() to cpu_last_thread_sibling() These APIs convert core number (index) to logical cpu/thread numbers Add cpu_first_thread_of_core(int core) Changed cpu_thread_to_core() to cpu_core_index_of_thread(int cpu) The goal is to make 'threads_per_core' accessible to the pseries_energy module. Instead of making an API to read threads_per_core, this is a higher level wrapper function to convert from logical cpu number to core number. The current APIs cpu_first_thread_in_core() and cpu_last_thread_in_core() returns logical CPU number while cpu_thread_to_core() returns core number or index which is not a logical CPU number. The new APIs are now clearly named to distinguish 'core number' versus first and last 'logical cpu number' in that core. The new APIs cpu_{first,last}_thread_sibling() work on logical cpu numbers. While cpu_first_thread_of_core() and cpu_core_index_of_thread() work on core index. Example usage: (4 threads per core system) cpu_first_thread_sibling(5) = 4 cpu_last_thread_sibling(5) = 7 cpu_core_index_of_thread(5) = 1 cpu_first_thread_of_core(1) = 4 cpu_core_index_of_thread() is used in cpu_to_drc_index() in the module and cpu_first_thread_of_core() is used in drc_index_to_cpu() in the module. Make API changes to few callers. Export symbols for use in modules. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-29 15:48:19 +11:00
Kumar Gala	82ae5eaffa	powerpc/mm: Fix module instruction tlb fault handling on Book-E 64 We were seeing oops like the following when we did an rmmod on a module: Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0x8000000000008010 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2 P5020 DS last sysfs file: /sys/devices/qman-portals.2/qman-pool.9/uevent Modules linked in: qman_tester(-) NIP: 8000000000008010 LR: c000000000074858 CTR: 8000000000008010 REGS: c00000002e29bab0 TRAP: 0400 Not tainted (2.6.34.6-00744-g2d21f14) MSR: 0000000080029000 <EE,ME,CE> CR: 24000448 XER: 00000000 TASK = c00000007a8be600[4987] 'rmmod' THREAD: c00000002e298000 CPU: 1 GPR00: 8000000000008010 c00000002e29bd30 8000000000012798 c00000000035fb28 GPR04: 0000000000000002 0000000000000002 0000000024022428 c000000000009108 GPR08: fffffffffffffffe 800000000000a618 c0000000003c13c8 0000000000000000 GPR12: 0000000022000444 c00000000fffed00 0000000000000000 0000000000000000 GPR16: 00000000100c0000 0000000000000000 00000000100dabc8 0000000010099688 GPR20: 0000000000000000 00000000100cfc28 0000000000000000 0000000010011a44 GPR24: 00000000100017b2 0000000000000000 0000000000000000 0000000000000880 GPR28: c00000000035fb28 800000000000a7b8 c000000000376d80 c0000000003cce50 NIP [8000000000008010] .test_exit+0x0/0x10 [qman_tester] LR [c000000000074858] .SyS_delete_module+0x1f8/0x2f0 Call Trace: [c00000002e29bd30] [c0000000000748b4] .SyS_delete_module+0x254/0x2f0 (unreliable) [c00000002e29be30] [c000000000000580] syscall_exit+0x0/0x2c Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 38600000 4e800020 60000000 60000000 <4e800020> 60000000 60000000 60000000 ---[ end trace 4f57124939a84dc8 ]--- This appears to be due to checking the wrong permission bits in the instruction_tlb_miss handling if the address that faulted was in vmalloc space. We need to look at the supervisor execute (_PAGE_BAP_SX) bit and not the user bit (_PAGE_BAP_UX/_PAGE_EXEC). Also removed a branch level since it did not appear to be used. Reported-by: Jeffrey Ladouceur <Jeffrey.Ladouceur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-18 14:54:23 +11:00
Michael Neuling	1c2c25c787	powerpc: Fix call to subpage_protection() In: powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT commit `d28513bc7f` Author: David Gibson <david@gibson.dropbear.id.au> subpage_protection() was changed to to take an mm rather a pgdir but it didn't change calling site in hashpage_preload(). The change wasn't noticed at compile time since hashpage_preload() used a void* as the parameter to subpage_protection(). This is obviously wrong and can trigger the following crash when CONFIG_SLAB, CONFIG_DEBUG_SLAB, CONFIG_PPC_64K_PAGES CONFIG_PPC_SUBPAGE_PROT are enabled. Freeing unused kernel memory: 704k freed Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6c49b7 Faulting instruction address: 0xc0000000000410f4 cpu 0x2: Vector: 300 (Data Access) at [c00000004233f590] pc: c0000000000410f4: .hash_preload+0x258/0x338 lr: c000000000041054: .hash_preload+0x1b8/0x338 sp: c00000004233f810 msr: 8000000000009032 dar: 6b6b6b6b6b6c49b7 dsisr: 40000000 current = 0xc00000007e2c0070 paca = 0xc000000007fe0500 pid = 1, comm = init enter ? for help [c00000004233f810] c000000000041020 .hash_preload+0x184/0x338 (unreliable) [c00000004233f8f0] c00000000003ed98 .update_mmu_cache+0xb0/0xd0 [c00000004233f990] c000000000157754 .__do_fault+0x48c/0x5dc [c00000004233faa0] c000000000158fd0 .handle_mm_fault+0x508/0xa8c [c00000004233fb90] c0000000006acdd4 .do_page_fault+0x428/0x6ac [c00000004233fe30] c000000000005260 handle_page_fault+0x20/0x74 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-18 14:54:23 +11:00
Kumar Gala	4a89261b02	powerpc/mm: Fix build error in setup_initial_memory_limit arch/powerpc/mm/tlb_nohash.c: In function 'setup_initial_memory_limit': arch/powerpc/mm/tlb_nohash.c:588:29: error: 'ppc64_memblock_base' undeclared (first use in this function) arch/powerpc/mm/tlb_nohash.c:588:29: note: each undeclared identifier is reported only once for each function it appears in Due to a copy/paste typo with the following commit: commit `cd3db0c4ca` Author: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Tue Jul 6 15:39:02 2010 -0700 memblock: Remove rmo_size, burry it in arch/powerpc where it belongs Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-18 14:54:22 +11:00
Peter Zijlstra	20273941f2	mm: fix race in kunmap_atomic() Christoph reported a nice splat which illustrated a race in the new stack based kmap_atomic implementation. The problem is that we pop our stack slot before we're completely done resetting its state -- in particular clearing the PTE (sometimes that's CONFIG_DEBUG_HIGHMEM). If an interrupt happens before we actually clear the PTE used for the last slot, that interrupt can reuse the slot in a dirty state, which triggers a BUG in kmap_atomic(). Fix this by introducing kmap_atomic_idx() which reports the current slot index without actually releasing it and use that to find the PTE and delay the _pop() until after we're completely done. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Christoph Hellwig <hch@infradead.org> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:05 -07:00
Peter Zijlstra	3e4d3af501	mm: stack based kmap_atomic() Keep the current interface but ignore the KM_type and use a stack based approach. The advantage is that we get rid of crappy code like: #define __KM_PTE \ (in_nmi() ? KM_NMI_PTE : \ in_irq() ? KM_IRQ_PTE : \ KM_PTE0) and in general can stop worrying about what context we're in and what kmap slots might be appropriate for that. The downside is that FRV kmap_atomic() gets more expensive. For now we use a CPP trick suggested by Andrew: #define kmap_atomic(page, args...) __kmap_atomic(page) to avoid having to touch all kmap_atomic() users in a single patch. [ not compiled on: - mn10300: the arch doesn't actually build with highmem to begin with ] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:08 -07:00
Linus Torvalds	d4429f608a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (71 commits) powerpc/44x: Update ppc44x_defconfig powerpc/watchdog: Make default timeout for Book-E watchdog a Kconfig option fsl_rio: Add comments for sRIO registers. powerpc/fsl-booke: Add e55xx (64-bit) smp defconfig powerpc/fsl-booke: Add p5020 DS board support powerpc/fsl-booke64: Use TLB CAMs to cover linear mapping on FSL 64-bit chips powerpc/fsl-booke: Add support for FSL Arch v1.0 MMU in setup_page_sizes powerpc/fsl-booke: Add support for FSL 64-bit e5500 core powerpc/85xx: add cache-sram support powerpc/85xx: add ngPIXIS FPGA device tree node to the P1022DS board powerpc: Fix compile error with paca code on ppc64e powerpc/fsl-booke: Add p3041 DS board support oprofile/fsl emb: Don't set MSR[PMM] until after clearing the interrupt. powerpc/fsl-booke: Add PCI device ids for P2040/P3041/P5010/P5020 QoirQ chips powerpc/mpc8xxx_gpio: Add support for 'qoriq-gpio' controllers powerpc/fsl_booke: Add support to boot from core other than 0 powerpc/p1022: Add probing for individual DMA channels powerpc/fsl_soc: Search all global-utilities nodes for rstccr powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT powerpc/mpc83xx: Support for MPC8308 P1M board ... Fix up conflict with the generic irq_work changes in arch/powerpc/kernel/time.c	2010-10-21 21:19:54 -07:00
Kumar Gala	55fd766b5f	powerpc/fsl-booke64: Use TLB CAMs to cover linear mapping on FSL 64-bit chips On Freescale parts typically have TLB array for large mappings that we can bolt the linear mapping into. We utilize the code that already exists on PPC32 on the 64-bit side to setup the linear mapping to be cover by bolted TLB entries. We utilize a quarter of the variable size TLB array for this purpose. Additionally, we limit the amount of memory to what we can cover via bolted entries so we don't get secondary faults in the TLB miss handlers. We should fix this limitation in the future. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-10-14 00:55:14 -05:00
Kumar Gala	988cf86d4f	powerpc/fsl-booke: Add support for FSL Arch v1.0 MMU in setup_page_sizes Update setup_page_sizes() to support for a MMU v1.0 FSL style MMU implementation. In such a processor, we don't have TLB0PS or EPTCFG registers (and access to these registers may cause exceptions). We need to parse the older format of TLBnCFG for page size support. Additionaly, assume since we are an FSL implementation that we have 2 TLB arrays and the second array contains the variable size pages. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-10-14 00:55:09 -05:00
Paul Gortmaker	92437d4137	powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT There exists a four line chunk of code, which when configured for 64 bit address space, can incorrectly set certain page flags during the TLB creation. It turns out that this is code which isn't used, but might still serve a purpose. Since it isn't obvious why it exists or why it causes problems, the below description covers both in detail. For powerpc bootstrap, the physical memory (at most 768M), is mapped into the kernel space via the following path: MMU_init() \| + adjust_total_lowmem() \| + map_mem_in_cams() \| + settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0); On settlbcam(), the kernel will create TLB entries according to the flag, PAGE_KERNEL_X. settlbcam() { ... TLBCAM[index].MAS1 = MAS1_VALID \| MAS1_IPROT \| MAS1_TSIZE(tsize) \| MAS1_TID(pid); ^ These entries cannot be invalidated by the kernel since MAS1_IPROT is set on TLB property. ... if (flags & _PAGE_USER) { TLBCAM[index].MAS3 \|= MAS3_UX \| MAS3_UR; TLBCAM[index].MAS3 \|= ((flags & _PAGE_RW) ? MAS3_UW : 0); } For classic BookE (flags & _PAGE_USER) is 'zero' so it's fine. But on boards like the the Freescale P4080, we want to support 36-bit physical address on it. So the following options may be set: CONFIG_FSL_BOOKE=y CONFIG_PTE_64BIT=y CONFIG_PHYS_64BIT=y As a result, boards like the P4080 will introduce PTE format as Book3E. As per the file: arch/powerpc/include/asm/pgtable-ppc32.h * #elif defined(CONFIG_FSL_BOOKE) && defined(CONFIG_PTE_64BIT) * #include <asm/pte-book3e.h> So PAGE_KERNEL_X is __pgprot(_PAGE_BASE \| _PAGE_KERNEL_RWX) and the book3E version of _PAGE_KERNEL_RWX is defined with: (_PAGE_BAP_SW \| _PAGE_BAP_SR \| _PAGE_DIRTY \| _PAGE_BAP_SX) Note the _PAGE_BAP_SR, which is also defined in the book3E _PAGE_USER: #define _PAGE_USER (_PAGE_BAP_UR \| _PAGE_BAP_SR) /* Can be read */ So the possibility exists to wrongly assign the user MAS3_U<RWX> bits to kernel (PAGE_KERNEL_X) address space via the following code fragment: if (flags & _PAGE_USER) { TLBCAM[index].MAS3 \|= MAS3_UX \| MAS3_UR; TLBCAM[index].MAS3 \|= ((flags & _PAGE_RW) ? MAS3_UW : 0); } Here is a dump of the TLB info from Simics with the above code present: ------ L2 TLB1 GT SSS UUU V I Row Logical Physical SS TLPID TID WIMGE XWR XWR F P V ----- ----------------- ------------------- -- ----- ----- ----- --- --- - - - 0 c0000000-cfffffff 000000000-00fffffff 00 0 0 M XWR XWR 0 1 1 1 d0000000-dfffffff 010000000-01fffffff 00 0 0 M XWR XWR 0 1 1 2 e0000000-efffffff 020000000-02fffffff 00 0 0 M XWR XWR 0 1 1 Actually this conditional code was used for two legacy functions: 1: support KGDB to set break point. KGDB already dropped this; now uses its core write to set break point. 2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE) for device IO space. This use case is also removed from the latest PowerPC kernel. However, there may still be a use case for it in the future, like large user pages, so we can't remove it entirely. As an alternative, we match on all bits of _PAGE_USER instead of just any bits, so the case where just _PAGE_BAP_SR is set can't sneak through. With this done, the TLB appears without U having XWR as below: ------- L2 TLB1 GT SSS UUU V I Row Logical Physical SS TLPID TID WIMGE XWR XWR F P V ----- ----------------- ------------------- -- ----- ----- ----- --- --- - - - 0 c0000000-cfffffff 000000000-00fffffff 00 0 0 M XWR 0 1 1 1 d0000000-dfffffff 010000000-01fffffff 00 0 0 M XWR 0 1 1 2 e0000000-efffffff 020000000-02fffffff 00 0 0 M XWR 0 1 1 Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-10-14 00:52:55 -05:00
matt mooney	4108d9ba90	powerpc/Makefiles: Change to new flag variables Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y. Signed-off-by: matt mooney <mfm@muteddisk.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-10-13 16:19:22 +11:00
Yinghai Lu	c7fc2de0c8	memblock, bootmem: Round pfn properly for memory and reserved regions We need to round memory regions correctly -- specifically, we need to round reserved region in the more expansive direction (lower limit down, upper limit up) whereas usable memory regions need to be rounded in the more restrictive direction (lower limit up, upper limit down). This introduces two set of inlines: memblock_region_memory_base_pfn() memblock_region_memory_end_pfn() memblock_region_reserved_base_pfn() memblock_region_reserved_end_pfn() Although they are antisymmetric (and therefore are technically duplicates) the use of the different inlines explicitly documents the programmer's intention. The lack of proper rounding caused a bug on ARM, which was then found to also affect other architectures. Reported-by: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB4CDFD.4020105@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-12 15:37:51 -07:00
Matthew McClintock	0d35e1620d	powerpc/mm: Assume first cpu is boot_cpuid not 0 arch/powerpc/mm/mmu_context_nohash.c assumes the boot cpu will always have smp_processor_id() == 0. This patch fixes that assumption Signed-off-by: Matthew McClintock <msm@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:34 +10:00
Anton Blanchard	28b549905b	powerpc: Check end of stack canary at oops time Add a check for the stack canary when we oops, similar to x86. This should make it clear that we overran our stack: Unable to handle kernel paging request for data at address 0x24652f63700ac689 Faulting instruction address: 0xc000000000063d24 Thread overran stack, or stack corrupted Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:30 +10:00
Ingo Molnar	daab7fc734	Merge commit 'v2.6.36-rc3' into x86/memblock Conflicts: arch/x86/kernel/trampoline.c mm/memblock.c Merge reason: Resolve the conflicts, update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-31 09:45:46 +02:00
Sonny Rao	79c3095fb3	powerpc: Export memstart_addr and kernstart_addr on ppc64 Some modules (like eHCA) want to map all of kernel memory, for this to work with a relocated kernel, we need to export kernstart_addr so modules can use PHYSICAL_START and memstart_addr so they could use MEMORY_START. Note that the 32bit code already exports these symbols. Signed-off-By: Sonny Rao <sonnyrao@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-24 15:26:26 +10:00
Benjamin Herrenschmidt	b1515af291	Merge remote branch 'jwb/merge' into merge	2010-08-24 14:36:45 +10:00
Dave Kleikamp	32412aa214	powerpc/47x: Add an isync before the tlbivax instruction Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2010-08-23 07:38:31 -04:00
Cesar Eduardo Barros	597781f3e5	kmap_atomic: make kunmap_atomic() harder to misuse kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse" list[1] ("Follow common convention and you'll get it wrong"), except in some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3]. kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes takes a pointer to within the page itself. This seems to once in a while trip people up (the convention they are following is the one from kunmap()). Make it much harder to misuse, by moving it to level 9 on Rusty's list[4] ("The compiler/linker won't let you get it wrong"). This is done by refusing to build if the type of its first argument is a pointer to a struct page. The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck() (which is what you would call in case for some strange reason calling it with a pointer to a struct page is not incorrect in your code). The previous version of this patch was compile tested on x86-64. [1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html [2] In these cases, it is at level 5, "Do it right or it will always break at runtime." [3] At least mips and powerpc look very similar, and sparc also seems to share a common ancestor with both; there seems to be quite some degree of copy-and-paste coding here. The include/asm/highmem.h file for these three archs mention x86 CPUs at its top. [4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html [5] As an aside, could someone tell me why mn10300 uses unsigned long as the first parameter of kunmap_atomic() instead of void *? Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net> Cc: Russell King <linux@arm.linux.org.uk> (arch/arm) Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips) Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300) Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300) Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc) Cc: Helge Deller <deller@gmx.de> (arch/parisc) Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc) Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc) Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc) Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86) Cc: Ingo Molnar <mingo@redhat.com> (arch/x86) Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86) Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic) Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list) Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:44:54 -07:00
Linus Torvalds	cdd854bc42	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (79 commits) powerpc/8xx: Add support for the MPC8xx based boards from TQC powerpc/85xx: Introduce support for the Freescale P1022DS reference board powerpc/85xx: Adding DTS for the STx GP3-SSA MPC8555 board powerpc/85xx: Change deprecated binding for 85xx-based boards powerpc/tqm85xx: add a quirk for ti1520 PCMCIA bridge powerpc/tqm85xx: update PCI interrupt-map attribute powerpc/mpc8308rdb: support for MPC8308RDB board from Freescale powerpc/fsl_pci: add quirk for mpc8308 pcie bridge powerpc/85xx: Cleanup QE initialization for MPC85xxMDS boards powerpc/85xx: Fix booting for P1021MDS boards powerpc/85xx: Fix SWIOTLB initalization for MPC85xxMDS boards powerpc/85xx: kexec for SMP 85xx BookE systems powerpc/5200/i2c: improve i2c bus error recovery of/xilinxfb: update tft compatible versions powerpc/fsl-diu-fb: Support setting display mode using EDID powerpc/5121: doc/dts-bindings: update doc of FSL DIU bindings powerpc/5121: shared DIU framebuffer support powerpc/5121: move fsl-diu-fb.h to include/linux powerpc/5121: fsl-diu-fb: fix issue with re-enabling DIU area descriptor powerpc/512x: add clock structure for Video-IN (VIU) unit ...	2010-08-05 09:03:46 -07:00
Benjamin Herrenschmidt	4734b594c6	memblock: Remove memblock_type.size and add memblock.memory_size instead Right now, both the "memory" and "reserved" memblock_type structures have a "size" member. It represents the calculated memory size in the former case and is unused in the latter. This moves it out to the main memblock structure instead Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-05 12:56:11 +10:00
Benjamin Herrenschmidt	cd3db0c4ca	memblock: Remove rmo_size, burry it in arch/powerpc where it belongs The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact server ppc64 though I hijack it on embedded ppc64 for similar purposes) and represents the area of memory that can be accessed in real mode (aka with MMU off), or on embedded, from the exception vectors (which is bolted in the TLB) which pretty much boils down to the same thing. We take that out of the generic MEMBLOCK data structure and move it into arch/powerpc where it belongs, renaming it to "RMA" while at it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-05 12:56:08 +10:00
Benjamin Herrenschmidt	e63075a3c9	memblock: Introduce default allocation limit and use it to replace explicit ones This introduce memblock.current_limit which is used to limit allocations from memblock_alloc() or memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE). The old MEMBLOCK_ALLOC_ANYWHERE changes value from 0 to ~(u64)0 and can still be used with memblock_alloc_base() to allocate really anywhere. It is -no-longer- cropped to MEMBLOCK_REAL_LIMIT which disappears. Note to archs: I'm leaving the default limit to MEMBLOCK_ALLOC_ANYWHERE. I strongly recommend that you ensure that you set an appropriate limit during boot in order to guarantee that an memblock_alloc() at any time results in something that is accessible with a simple __va(). The reason is that a subsequent patch will introduce the ability for the array to resize itself by reallocating itself. The MEMBLOCK core will honor the current limit when performing those allocations. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-05 12:56:07 +10:00
Benjamin Herrenschmidt	27f574c223	memblock: Expose MEMBLOCK_ALLOC_ANYWHERE Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-05 12:56:06 +10:00
Benjamin Herrenschmidt	28be7072ce	memblock/powerpc: Use new accessors Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-04 14:39:01 +10:00
Benjamin Herrenschmidt	e3239ff92a	memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-04 14:21:49 +10:00
Benjamin Herrenschmidt	412a4ac5e9	Merge commit 'gcl/next' into next	2010-08-04 10:26:03 +10:00
Benjamin Herrenschmidt	3fdfd99051	powerpc: Fix erroneous lmb->memblock conversions Oooops... we missed these. We incorrectly converted strings used when parsing the device-tree on pseries, thus breaking access to drconf memory and hotplug memory. While at it, also revert some variable names that represent something the FW calls "lmb" and thus don't need to be converted to "memblock". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> ---	2010-07-23 12:56:57 +10:00
Benjamin Herrenschmidt	4b8692c022	powerpc/mm: Add some debug output when hash insertion fails This adds some debug output to our MMU hash code to print out some useful debug data if the hypervisor refuses the insertion (which should normally never happen). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> ---	2010-07-23 12:56:56 +10:00
Benjamin Herrenschmidt	171aa2caaa	powerpc/mm: Fix bugs in huge page hashing There's a couple of nasty bugs lurking in our huge page hashing code. First, we don't check the access permission atomically with setting the _PAGE_BUSY bit, which means that the PTE value we end up using for the hashing might be different than the one we have checked the access permissions for. We've seen cases where that leads us to try to use an invalidated PTE for hashing, causing all sort of "interesting" issues. Then, we also failed to set _PAGE_DIRTY on a write access. Finally, a minor tweak but we should return 0 when we find the PTE busy, in order to just re-execute the access, rather than 1 which means going to do_page_fault(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> ---	2010-07-23 12:55:21 +10:00
Benjamin Herrenschmidt	ca91e6c09d	powerpc/mm: Move around testing of _PAGE_PRESENT in hash code Instead of adding _PAGE_PRESENT to the access permission mask in each low level routine independently, we add it once from hash_page(). We also move the preliminary access check (the racy one before the PTE is locked) up so it applies to the huge page case. This duplicates code in __hash_page_huge() which we'll remove in a subsequent patch to fix a race in there. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-23 08:53:23 +10:00
Anton Blanchard	b1623e7eb2	powerpc/mm: Handle hypervisor pte insert failure in __hash_page_huge If the hypervisor gives us an error on a hugepage insert we panic. The normal page code already handles this by returning an error instead and we end calling low_hash_fault which will just kill the task if possible. The patch below does a similar thing for the hugepage case. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-23 08:44:51 +10:00
Yinghai Lu	95f72d1ed4	lmb: rename to memblock via following scripts FILES=$(find * -type f \| grep -vE 'oprofile\|[^K]config') sed -i \ -e 's/lmb/memblock/g' \ -e 's/LMB/MEMBLOCK/g' \ $FILES for N in $(find . -name lmb.[ch]); do M=$(echo $N \| sed 's/lmb/memblock/g') mv $N $M done and remove some wrong change like lmbench and dlmb etc. also move memblock.c from lib/ to mm/ Suggested-by: Ingo Molnar <mingo@elte.hu> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-14 17:14:00 +10:00
Benjamin Herrenschmidt	f2b26c9235	powerpc/book3e: Adjust the page sizes list based on MMU config Use the MMU config registers to scan for available direct and indirect page sizes and print out the result. Will be needed for future hugetlbfs implementation. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-14 14:13:53 +10:00
Benjamin Herrenschmidt	ff82c319e6	powerpc/book3e: Fix single step when using HW page tables We patch the TLB miss exception vectors to point to alternate functions when using HW page table on BookE. However, we were patching in a new branch in the first instruction of the exception handler instead of the second one, thus overriding the nop that is in the first instruction. This cause problems when single stepping as we rely on that nop for the single step to stop properly within the exception vector range rather than on the target of the branch. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-14 14:13:51 +10:00
Christoph Egger	cccd234283	powerpc: Removing dead CONFIG_SMP_750 CONFIG_SMP_750 doesn't exist in Kconfig, therefore removing all references for it from the source code. Signed-off-by: Christoph Egger <siccegge@cs.fau.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-09 11:28:38 +10:00
Anton Blanchard	41eab6f88f	powerpc/numa: Use form 1 affinity to setup node distance Form 1 affinity allows multiple entries in ibm,associativity-reference-points which represent affinity domains in decreasing order of importance. The Linux concept of a node is always the first entry, but using the other values as an input to node_distance() allows the memory allocator to make better decisions on which node to go first when local memory has been exhausted. We keep things simple and create an array indexed by NUMA node, capped at 4 entries. Each time we lookup an associativity property we initialise the array which is overkill, but since we should only hit this path during boot it didn't seem worth adding a per node valid bit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-09 11:28:35 +10:00
Paul E. McKenney	a591f6b56d	powerpc: Remove all rcu head initializations Remove all rcu head inits. We don't care about the RCU head state before passing it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can keep track of objects on stack. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-09 11:28:34 +10:00
Becky Bruce	d10ac3734d	powerpc/fsl-booke: Fix comments in mmu code that mention BATS There are no BATS on BookE - we have the TLBCAM instead. Also correct the page size information to included extended sizes. We don't actually allow a 4G page size to be used, so comment on that as well. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-09 11:28:26 +10:00
Christoph Egger	8054a3428f	powerpc: Remove dead CONFIG_HIGHPTE CONFIG_HIGHPTE doesn't exist in Kconfig, therefore removing all references for it from the source code. Signed-off-by: Christoph Egger <siccegge@cs.fau.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-06-15 15:02:32 +10:00
Linus Torvalds	98edb6ca41	Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits) KVM: x86: Add missing locking to arch specific vcpu ioctls KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls KVM: MMU: Segregate shadow pages with different cr0.wp KVM: x86: Check LMA bit before set_efer KVM: Don't allow lmsw to clear cr0.pe KVM: Add cpuid.txt file KVM: x86: Tell the guest we'll warn it about tsc stability x86, paravirt: don't compute pvclock adjustments if we trust the tsc x86: KVM guest: Try using new kvm clock msrs KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID KVM: x86: add new KVMCLOCK cpuid feature KVM: x86: change msr numbers for kvmclock x86, paravirt: Add a global synchronization point for pvclock x86, paravirt: Enable pvclock flags in vcpu_time_info structure KVM: x86: Inject #GP with the right rip on efer writes KVM: SVM: Don't allow nested guest to VMMCALL into host KVM: x86: Fix exception reinjection forced to true KVM: Fix wallclock version writing race KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) ...	2010-05-21 17:16:21 -07:00
Anton Blanchard	bc8449cc57	powerpc/numa: Use ibm,architecture-vec-5 to detect form 1 affinity I've been told that the architected way to determine we are in form 1 affinity mode is by reading the ibm,architecture-vec-5 property which mirrors the layout of the fifth vector of the ibm,client-architecture structure. Eventually we may want to parse the ibm,architecture-vec-5 and create FW_FEATURE_* bits. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-21 17:31:12 +10:00
Kumar Gala	78f622377f	powerpc/fsl-booke: Move loadcam_entry back to asm code to fix SMP ftrace When we build with ftrace enabled its possible that loadcam_entry would have used the stack pointer (even though the code doesn't need it). We call loadcam_entry in __secondary_start before the stack is setup. To ensure that loadcam_entry doesn't use the stack pointer the easiest solution is to just have it in asm code. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-05-17 10:56:20 -05:00
Alexander Graf	c83ec269e6	PPC: Split context init/destroy functions We need to reserve a context from KVM to make sure we have our own segment space. While we did that split for Book3S_64 already, 32 bit is still outstanding. So let's split it now. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:20 +03:00
Benjamin Herrenschmidt	1ed31d6db9	Merge commit 'origin/master' into next	2010-05-07 11:29:25 +10:00
Anton Blanchard	25863de07a	powerpc/cpumask: Convert NUMA code to new cpumask API Convert NUMA code to new cpumask API. We shift the node to cpumask setup code until after we complete bootmem allocation so we can dynamically allocate the cpumasks. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-06 17:41:58 +10:00
Benjamin Herrenschmidt	e460c2c91a	powerpc: Invoke oom-killer from page fault As explained in commit `1c0fe6e3bd`, we want to call the architecture independent oom killer when getting an unexplained OOM from handle_mm_fault, rather than simply killing current. Cc: linuxppc-dev@ozlabs.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-06 17:15:58 +10:00
Mark Nelson	91eea67c6d	powerpc/mm: Track backing pages allocated by vmemmap_populate() We need to keep track of the backing pages that get allocated by vmemmap_populate() so that when we use kdump, the dump-capture kernel knows where these pages are. We use a simple linked list of structures that contain the physical address of the backing page and corresponding virtual address to track the backing pages. To save space, we just use a pointer to the next struct vmemmap_backing. We can also do this because we never remove nodes. We call the pointer "list" to be compatible with changes made to the crash utility. vmemmap_populate() is called either at boot-time or on a memory hotplug operation. We don't have to worry about the boot-time calls because they will be inherently single-threaded, and for a memory hotplug operation vmemmap_populate() is called through: sparse_add_one_section() \| V kmalloc_section_memmap() \| V sparse_mem_map_populate() \| V vmemmap_populate() and in sparse_add_one_section() we're protected by pgdat_resize_lock(). So, we don't need a spinlock to protect the vmemmap_list. We allocate space for the vmemmap_backing structs by allocating whole pages in vmemmap_list_alloc() and then handing out chunks of this to vmemmap_list_populate(). This means that we waste at most just under one page, but this keeps the code is simple. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-06 16:49:27 +10:00
Benjamin Herrenschmidt	75c1d539ea	powerpc: Fix CONFIG_DEBUG_PAGEALLOC on 603/e300 So we tried to speed things up a bit using flush_hash_pages() directly but that falls over on 603 of course meaning we fail to flush the TLB properly and we may even end up having it corrupt memory randomly by accessing a hash table that doesn't exist. This removes the "optimization" by always going through flush_tlb_page() for now at least. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-06 16:49:26 +10:00
Dave Kleikamp	e7f75ad01d	powerpc/47x: Base ppc476 support This patch adds the base support for the 476 processor. The code was primarily written by Ben Herrenschmidt and Torez Smith, but I've been maintaining it for a while. The goal is to have a single binary that will run on 44x and 47x, but we still have some details to work out. The biggest is that the L1 cache line size differs on the two platforms, but it's currently a compile-time option. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2010-05-05 09:11:10 -04:00
Linus Torvalds	b18262eda3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: don't needlessly skip PAGE_USER test for Fsl booke	2010-04-29 20:01:42 -07:00
Wufei	56151e7534	kgdb: don't needlessly skip PAGE_USER test for Fsl booke The bypassing of this test is a leftover from 2.4 vintage kernels, and is no longer appropriate, or even used by KGDB. Currently KGDB uses probe_kernel_write() for all access to memory via the KGDB core, so it can simply be deleted. This fixes CVE-2010-1446. CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Wufei <fei.wu@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-04-29 21:41:44 -05:00
Anton Blanchard	4b83c330b4	powerpc/numa: Add form 1 NUMA affinity Firmware changed the way it represents memory and cpu affinity on POWER7. Unfortunately the old method now caps the topology to work around issues with legacy operating systems. For Linux to get the correct topology we need to use the new form 1 affinity information. We set the form 1 field in the client architecture, and if we see "1" in the ibm,associativity-form property firmware supports form 1 affinity and we should look at the first field in the ibm,associativity-reference-points array. If not we use the second field as we always have. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-04-28 16:22:33 +10:00
Becky Bruce	e8137341b1	powerpc/fsl_booke: Correct test for MMU_FTR_BIG_PHYS The code was looking for this in cpu_features, not mmu_features. Fix this. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-04-19 23:12:44 -05:00
K.Prasad	9c7cc234dc	powerpc: Disable interrupts for data breakpoint exceptions Data address breakpoint exceptions are currently handled along with page-faults which require interrupts to remain in enabled state. Since exception handling for data breakpoints aren't pre-empt safe, we handle them separately. Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-04-07 14:44:38 +10:00
Benjamin Herrenschmidt	55052eeca6	powerpc: Fix ioremap_flags() with book3e pte definition We can't just clear the user read permission in book3e pte, because that will also clear supervisor read permission. This surely isn't desired. Fix the problem by adding the supervisor read back. BenH: Slightly simplified the ifdef and applied to ppc64 too Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-04-07 14:39:47 +10:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
FUJITA Tomonori	a93272969c	powerpc: Fix swiotlb to respect the boot option powerpc initializes swiotlb before parsing the kernel boot options so swiotlb options (e.g. specifying the swiotlb buffer size) are ignored. Any time before freeing bootmem works for swiotlb so this patch moves powerpc's swiotlb initialization after parsing the kernel boot options, mem_init (as x86 does). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Tested-by: Becky Bruce <beckyb@kernel.crashing.org> Tested-by: Albert Herranz <albert_herranz@yahoo.es> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-03-19 16:38:16 +11:00
Jiri Kosina	318ae2edc3	Merge branch 'for-next' into for-linus Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c	2010-03-08 16:55:37 +01:00
H Hartley Sweeten	72c3368856	nodemask.h: remove macro any_online_node The macro any_online_node() is prone to producing sparse warnings due to the local symbol 'node'. Since all the in-tree users are really requesting the first online node (the mask argument is either NODE_MASK_ALL or node_online_map) just use the first_online_node macro and remove the any_online_node macro since there are no users. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Milton Miller <miltonm@bga.com> Cc: Nathan Fontenot <nfont@austin.ibm.com> Cc: Geoff Levand <geoffrey.levand@am.sony.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: David S. Miller <davem@davemloft.net> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-06 11:26:31 -08:00
Linus Torvalds	ac0f6f927d	Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm * 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (100 commits) ARM: Eliminate decompressor -Dstatic= PIC hack ARM: 5958/1: ARM: U300: fix inverted clk round rate ARM: 5956/1: misplaced parentheses ARM: 5955/1: ep93xx: move timer defines into core.c and document ARM: 5954/1: ep93xx: move gpio interrupt support to gpio.c ARM: 5953/1: ep93xx: fix broken build of clock.c ARM: 5952/1: ARM: MM: Add ARM_L1_CACHE_SHIFT_6 for handle inside each ARCH Kconfig ARM: 5949/1: NUC900 add gpio virtual memory map ARM: 5948/1: Enable timer0 to time4 clock support for nuc910 ARM: 5940/2: ARM: MMCI: remove custom DBG macro and printk ARM: make_coherent(): fix problems with highpte, part 2 MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself ARM: 5945/1: ep93xx: include correct irq.h in core.c ARM: 5933/1: amba-pl011: support hardware flow control ARM: 5930/1: Add PKMAP area description to memory.txt. ARM: 5929/1: Add checks to detect overlap of memory regions. ARM: 5928/1: Change type of VMALLOC_END to unsigned long. ARM: 5927/1: Make delimiters of DMA area globally visibly. ARM: 5926/1: Add "Virtual kernel memory..." printout. ARM: 5920/1: OMAP4: Enable L2 Cache ... Fix up trivial conflict in arch/arm/mach-mx25/clock.c	2010-03-01 09:15:15 -08:00
Russell King	4b3073e1c5	MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself On VIVT ARM, when we have multiple shared mappings of the same file in the same MM, we need to ensure that we have coherency across all copies. We do this via make_coherent() by making the pages uncacheable. This used to work fine, until we allowed highmem with highpte - we now have a page table which is mapped as required, and is not available for modification via update_mmu_cache(). Ralf Beache suggested getting rid of the PTE value passed to update_mmu_cache(): On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables to construct a pointer to the pte again. Passing a pte_t * is much more elegant. Maybe we might even replace the pte argument with the pte_t? Ben Herrenschmidt would also like the pte pointer for PowerPC: Passing the ptep in there is exactly what I want. I want that -instead- of the PTE value, because I have issue on some ppc cases, for I$/D$ coherency, where set_pte_at() may decide to mask out the _PAGE_EXEC. So, pass in the mapped page table pointer into update_mmu_cache(), and remove the PTE value, updating all implementations and call sites to suit. Includes a fix from Stephen Rothwell: sparc: fix fallout from update_mmu_cache API change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2010-02-20 16:41:46 +00:00
Thomas Gleixner	3eb93c558a	powerpc: Convert tlbivax_lock to raw_spinlock tlbivax_lock needs to be a real spinlock in RT. Convert it to raw_spinlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-19 14:52:33 +11:00
Thomas Gleixner	6b9c9b8a66	powerpc: Convert native_tlbie_lock to raw_spinlock native_tlbie_lock needs to be a real spinlock in RT. Convert it to raw_spinlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-19 14:52:30 +11:00
Thomas Gleixner	be833f3371	powerpc: Convert context_lock to raw_spinlock context_lock needs to be a real spinlock in RT. Convert it to raw_spinlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-19 14:52:30 +11:00
Benjamin Herrenschmidt	efd0f0f385	Merge commit 'jwb/next' into next	2010-02-18 09:34:38 +11:00
Anton Blanchard	66d99b8834	powerpc: Convert open coded native hashtable bit lock Now we have real bit locks use them instead of open coding it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-17 14:03:15 +11:00
Benjamin Herrenschmidt	ec144a81ad	Merge commit 'origin/master' into next	2010-02-17 10:00:42 +11:00
Stefan Roese	c7b6669812	powerpc/40x: Add support for PPC40x boards with > 512MB SDRAM This patch adds support for boards with more that 512MByte RAM. Currently only 512MB of memory are enabled in the DCCR/ICCR real-mode cache control registers. This patch now enables caching in real-mode for 2GByte. Signed-off-by: Stefan Roese <sr@denx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2010-02-12 07:54:45 -05:00
David Gibson	77058e1adc	powerpc: Fix address masking bug in hpte_need_flush() Commit `f71dc176aa` 'Make hpte_need_flush() correctly mask for multiple page sizes' introduced bug, which is triggered when a kernel with a 64k base page size is run on a system whose hardware does not 64k hash PTEs. In this case, we emulate 64k pages with multiple 4k hash PTEs, however in hpte_need_flush() we incorrectly only mask the hardware page size from the address, instead of the logical page size. This causes things to go wrong when we later attempt to iterate through the hardware subpages of the logical page. This patch corrects the error. It has been tested on pSeries bare metal by Michael Neuling. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-10 13:58:06 +11:00
Anton Blanchard	7317ac8711	powerpc: Convert mmu context allocator from idr to ida We can use the much more lightweight ida allocator since we don't need the pointer storage idr provides. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-09 13:56:07 +11:00
Uwe Kleine-König	9ddc5b6f18	tree-wide: fix typos "ammount" -> "amount" Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-05 12:22:40 +01:00
Thadeu Lima de Souza Cascardo	2273130de8	fix comment typo leve -> level in powerpc Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-05 12:22:38 +01:00
Thadeu Lima de Souza Cascardo	6c504d4231	powerpc: Fix typo s/leve/level/ in TLB code Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-03 17:39:50 +11:00
Jiri Slaby	4bf936b9e4	powerpc: Use helpers for rlimits Make sure compiler won't do weird things with limits. E.g. fetching them twice may return 2 different values after writable limits are implemented. I.e. either use rlimit helpers added in `3e10e716ab` or ACCESS_ONCE if not applicable. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-01-15 13:20:08 +11:00
David Gibson	a1128f8f0f	powerpc/mm: Fix stupid bug in subpge protection handling Commit `d28513bc7f` ("Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT"), itself a fix for breakage caused by an earlier clean up patch of mine, contains a stupid bug. I changed the parameters of the subpage_protection() function, but failed to update one of the callers. This patch fixes it, and replaces a void * with a typed pointer so that the compiler will warn on such an error in future. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-18 14:55:44 +11:00
Yang Li	f04b10cddb	powerpc/mm: Fix typo of cpumask_clear_cpu() The function name of cpumask_clear_cpu was not correct. Fortunately nobody uses that code with hotplug yet :-) Reported-by: Jin Qing <b24347@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-18 14:54:27 +11:00
Sachin P. Sant	5c33991987	powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled. This time without the funny characters. Fix following build errors generated with DEBUG=1 cc1: warnings being treated as errors arch/powerpc/mm/hash_utils_64.c: In function 'htab_dt_scan_page_sizes': arch/powerpc/mm/hash_utils_64.c:343: error: format '%04x' expects type 'unsigned int', but argument 4 has type 'long unsigned int' arch/powerpc/mm/hash_utils_64.c:343: error: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int' arch/powerpc/mm/hash_utils_64.c: In function 'htab_initialize': arch/powerpc/mm/hash_utils_64.c:666: error: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int' ... SNIP ... Signed-off-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-18 14:54:27 +11:00
Benjamin Herrenschmidt	50891457f1	powerpc/mm: Fix a WARN_ON() with CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_VM Set need to call __set_pte_at() and not set_pte_at() from __change_page_attr() since the later will perform checks with CONFIG_DEBUG_VM that aren't suitable to the way we override an existing PTE. (More specifically, it doesn't let you write over a present PTE). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-18 14:54:26 +11:00
Linus Torvalds	a73611b6aa	Merge branch 'next' of git://git.secretlab.ca/git/linux-2.6 * 'next' of git://git.secretlab.ca/git/linux-2.6: (23 commits) powerpc: fix up for mmu_mapin_ram api change powerpc: wii: allow ioremap within the memory hole powerpc: allow ioremap within reserved memory regions wii: use both mem1 and mem2 as ram wii: bootwrapper: add fixup to calc useable mem2 powerpc: gamecube/wii: early debugging using usbgecko powerpc: reserve fixmap entries for early debug powerpc: wii: default config powerpc: wii: platform support powerpc: wii: hollywood interrupt controller support powerpc: broadway processor support powerpc: wii: bootwrapper bits powerpc: wii: device tree powerpc: gamecube: default config powerpc: gamecube: platform support powerpc: gamecube/wii: flipper interrupt controller support powerpc: gamecube/wii: udbg support for usbgecko powerpc: gamecube/wii: do not include PCI support powerpc: gamecube/wii: declare as non-coherent platforms powerpc: gamecube/wii: introduce GAMECUBE_COMMON ... Fix up conflicts in arch/powerpc/mm/fsl_booke_mmu.c. Hopefully even close to correctly.	2009-12-16 13:26:53 -08:00
Stephen Rothwell	ae4cec4736	powerpc: fix up for mmu_mapin_ram api change Today's linux-next build (powerpc ppc44x_defconfig) failed like this: arch/powerpc/mm/pgtable_32.c: In function 'mapin_ram': arch/powerpc/mm/pgtable_32.c:318: error: too many arguments to function 'mmu_mapin_ram' Casued by commit `de32400dd2` ("wii: use both mem1 and mem2 as ram"). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2009-12-14 09:04:24 -07:00
Albert Herranz	c5df7f7751	powerpc: allow ioremap within reserved memory regions Add a flag to let a platform ioremap memory regions marked as reserved. This flag will be used later by the Nintendo Wii support code to allow ioremapping the I/O region sitting between MEM1 and MEM2 and marked as reserved RAM in the patch "wii: use both mem1 and mem2 as ram". This will no longer be needed when proper discontig memory support for 32-bit PowerPC is added to the kernel. Signed-off-by: Albert Herranz <albert_herranz@yahoo.es> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2009-12-12 22:24:32 -07:00
Albert Herranz	de32400dd2	wii: use both mem1 and mem2 as ram The Nintendo Wii video game console has two discontiguous RAM regions: - MEM1: 24MB @ 0x00000000 - MEM2: 64MB @ 0x10000000 Unfortunately, the kernel currently does not support discontiguous RAM memory regions on 32-bit PowerPC platforms. This patch adds a series of workarounds to allow the use of the second memory region (MEM2) as RAM by the kernel. Basically, a single range of memory from the beginning of MEM1 to the end of MEM2 is reported to the kernel, and a memory reservation is created for the hole between MEM1 and MEM2. With this patch the system is able to use all the available RAM and not just ~27% of it. This will no longer be needed when proper discontig memory support for 32-bit PowerPC is added to the kernel. Signed-off-by: Albert Herranz <albert_herranz@yahoo.es> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2009-12-12 22:24:31 -07:00
Joakim Tjernlund	5efab4a02c	powerpc/8xx: Invalidate non present TLBs 8xx sometimes need to load a invalid/non-present TLBs in it DTLB asm handler. These must be invalidated separaly as linux mm don't. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-09 17:10:35 +11:00
David Gibson	d28513bc7f	powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT Commit `a0668cdc15` cleans up the handling of kmem_caches for allocating various levels of pagetables. Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to the latter's cleverly hidden technique of adding some extra allocation space to the top level page directory to store the extra information it needs. Since that extra allocation really doesn't fit into the cleaned up page directory allocating scheme, this patch alters CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct subpage_prot_table as part of the mm_context_t. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-08 15:59:33 +11:00
Benjamin Herrenschmidt	5a7b4193e5	Revert "powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT" This reverts commit `c045256d14`. It breaks build when CONFIG_PPC_SUBPAGE_PROT is not set. I will commit a fixed version separately Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-02 09:28:35 +11:00
David Gibson	39adfa540f	powerpc/mm: Fix bug in gup_hugepd() Commit `a4fe3ce769` introduced a new get_user_pages() path for hugepages on powerpc. Unfortunately, there is a bug in it's loop logic, which can cause it to overrun the end of the intended region. This came about by copying the logic from the normal page path, which assumes the address and end parameters have been pagesize aligned at the top-level. Since they're not hugepage size aligned, the simplistic logic could step over the end of the gup region without triggering the loop end condition. This patch fixes the bug by using the technique that the normal page path uses in levels above the lowest to truncate the ending address to something we know we'll match with. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-27 14:24:30 +11:00
David Gibson	c045256d14	powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT Commit `a0668cdc15` cleans up the handling of kmem_caches for allocating various levels of pagetables. Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to the latter's cleverly hidden technique of adding some extra allocation space to the top level page directory to store the extra information it needs. Since that extra allocation really doesn't fit into the cleaned up page directory allocating scheme, this patch alters CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct subpage_prot_table as part of the mm_context_t. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-27 14:24:29 +11:00
Kumar Gala	8b27f0b61d	powerpc/fsl-booke: Rework TLB CAM code Re-write the code so its more standalone and fixed some issues: * Bump'd # of CAM entries to 64 to support e500mc * Make the code handle MAS7 properly * Use pr_cont instead of creating a string as we go Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-11-20 16:45:33 -06:00
Benjamin Herrenschmidt	0526484aa3	Merge commit 'origin/master' into next	2009-11-12 10:59:04 +11:00
Alexander Graf	e85a47106a	Split init_new_context and destroy_context For KVM we need to allocate a new context id, but don't really care about all the mm context around it. So let's split the alloc and destroy functions for the context id, so we can grab one without allocating an mm context. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-05 16:50:25 +11:00
Alexander Graf	4ab79aa801	Export symbols for KVM module We want to be able to build KVM as a module. To enable us doing so, we need some more exports from core Linux parts. This patch exports all functions and variables that are required for KVM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-05 16:50:24 +11:00
Benjamin Herrenschmidt	f1167fb318	powerpc/mm: Remove debug context clamping from nohash code I inadvertently left that debug code enabled, causing the number of contexts to be clamped to 31 which is going to slow things down on 4xx and just plain breaks 8xx Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-05 16:41:59 +11:00
David Gibson	0895ecda79	powerpc/mm: Bring hugepage PTE accessor functions back into sync with normal accessors The hugepage arch code provides a number of hook functions/macros which mirror the functionality of various normal page pte access functions. Various changes in the normal page accessors (in particular BenH's recent changes to the handling of lazy icache flushing and PAGE_EXEC) have caused the hugepage versions to get out of sync with the originals. In some cases, this is a bug, at least on some MMU types. One of the reasons that some hooks were not identical to the normal page versions, is that the fact we're dealing with a hugepage needed to be passed down do use the correct dcache-icache flush function. This patch makes the main flush_dcache_icache_page() function hugepage aware (by checking for the PageCompound flag). That in turn means we can make set_huge_pte_at() just a call to set_pte_at() bringing it back into sync. As a bonus, this lets us remove the hash_huge_page_do_lazy_icache() function, replacing it with a call to the hash_page_do_lazy_icache() function it was based on. Some other hugepage pte access hooks - huge_ptep_get_and_clear() and huge_ptep_clear_flush() - are not so easily unified, but this patch at least brings them back into sync with the current versions of the corresponding normal page functions. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:21:23 +11:00
David Gibson	883a3e5236	powerpc/mm: Split hash MMU specific hugepage code into a new file This patch separates the parts of hugetlbpage.c which are inherently specific to the hash MMU into a new hugelbpage-hash64.c file. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:59 +11:00
David Gibson	d1837cba5d	powerpc/mm: Cleanup initialization of hugepages on powerpc This patch simplifies the logic used to initialize hugepages on powerpc. The somewhat oddly named set_huge_psize() is renamed to add_huge_page_size() and now does all necessary verification of whether it's given a valid hugepage sizes (instead of just some) and instantiates the generic hstate structure (but no more). hugetlbpage_init() now steps through the available pagesizes, checks if they're valid for hugepages by calling add_huge_page_size() and initializes the kmem_caches for the hugepage pagetables. This means we can now eliminate the mmu_huge_psizes array, since we no longer need to pass the sizing information for the pagetable caches from set_huge_psize() into hugetlbpage_init() Determination of the default huge page size is also moved from the hash code into the general hugepage code. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:58 +11:00
David Gibson	a4fe3ce769	powerpc/mm: Allow more flexible layouts for hugepage pagetables Currently each available hugepage size uses a slightly different pagetable layout: that is, the bottem level table of pointers to hugepages is a different size, and may branch off from the normal page tables at a different level. Every hugepage aware path that needs to walk the pagetables must therefore look up the hugepage size from the slice info first, and work out the correct way to walk the pagetables accordingly. Future hardware is likely to add more possible hugepage sizes, more layout options and more mess. This patch, therefore reworks the handling of hugepage pagetables to reduce this complexity. In the new scheme, instead of having to consult the slice mask, pagetable walking code can check a flag in the PGD/PUD/PMD entries to see where to branch off to hugepage pagetables, and the entry also contains the information (eseentially hugepage shift) necessary to then interpret that table without recourse to the slice mask. This scheme can be extended neatly to handle multiple levels of self-describing "special" hugepage pagetables, although for now we assume only one level exists. This approach means that only the pagetable allocation path needs to know how the pagetables should be set out. All other (hugepage) pagetable walking paths can just interpret the structure as they go. There already was a flag bit in PGD/PUD/PMD entries for hugepage directory pointers, but it was only used for debug. We alter that flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable pointer (normally it would be 1 since the pointer lies in the linear mapping). This means that asm pagetable walking can test for (and punt on) hugepage pointers with the same test that checks for unpopulated page directory entries (beq becomes bge), since hugepage pointers will always be positive, and normal pointers always negative. While we're at it, we get rid of the confusing (and grep defeating) #defining of hugepte_shift to be the same thing as mmu_huge_psizes. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:58 +11:00
David Gibson	a0668cdc15	powerpc/mm: Cleanup management of kmem_caches for pagetables Currently we have a fair bit of rather fiddly code to manage the various kmem_caches used to store page tables of various levels. We generally have two caches holding some combination of PGD, PUD and PMD tables, plus several more for the special hugepage pagetables. This patch cleans this all up by taking a different approach. Rather than the caches being designated as for PUDs or for hugeptes for 16M pages, the caches are simply allocated to be a specific size. Thus sharing of caches between different types/levels of pagetables happens naturally. The pagetable size, where needed, is passed around encoded in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the pagetable contains 2^n pointers. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:57 +11:00
David Gibson	f71dc176aa	powerpc/mm: Make hpte_need_flush() correctly mask for multiple page sizes Currently, hpte_need_flush() only correctly flushes the given address for normal pages. Callers for hugepages are required to mask the address themselves. But hpte_need_flush() already looks up the page sizes for its own reasons, so this is a rather silly imposition on the callers. This patch alters it to mask based on the pagesize it has looked up itself, and removes the awkward masking code in the hugepage caller. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:57 +11:00
Benjamin Herrenschmidt	8d8997f34e	powerpc/mm: Fix hang accessing top of vmalloc space On pSeries, we always force the IO space to be mapped using 4K pages even with a 64K base page size to cope with some limitations in the HV interface to some devices. However, the SLB miss handler code to discriminate between vmalloc and ioremap space uses a CPU feature section such that the code is nop'ed out when the processor support large pages non-cachable mappings. Thus, we end up always using the ioremap page size for vmalloc segments on such processors, causing a discrepency between the segment and the hash table, and thus a hang continously hashing the page. It works for the first segment of the vmalloc space since that segment is "bolted" in by C code correctly, and thankfully we almost never use the vmalloc space beyond the first segment, but the new percpu code made the bug happen. This fixes it by removing the feature section from the assembly, we now always do the comparison between vmalloc and ioremap. Signed-off-by; Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-14 16:58:36 +11:00
Rex Feany	e0908085fc	powerpc/8xx: Fix regression introduced by cache coherency rewrite After upgrading to the latest kernel on my mpc875 userspace started running incredibly slow (hours to get to a shell, even!). I tracked it down to commit `8d30c14cab`, that patch removed a work-around for the 8xx. Adding it back makes my problem go away. Signed-off-by: Rex Feany <rfeany@mrv.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-09-24 15:56:30 +10:00
Huang Weiyi	b9eceb2307	powerpc/mm: Remove duplicated #include Remove duplicated #include('s) in arch/powerpc/mm/tlb_low_64e.S Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-09-24 15:31:42 +10:00
KAMEZAWA Hiroyuki	3089aa1b0c	kcore: use registerd physmem information For /proc/kcore, each arch registers its memory range by kclist_add(). In usual, - range of physical memory - range of vmalloc area - text, etc... are registered but "range of physical memory" has some troubles. It doesn't updated at memory hotplug and it tend to include unnecessary memory holes. Now, /proc/iomem (kernel/resource.c) includes required physical memory range information and it's properly updated at memory hotplug. Then, it's good to avoid using its own code(duplicating information) and to rebuild kclist for physical memory based on /proc/iomem. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	908eedc616	walk system ram range Originally, walk_memory_resource() was introduced to traverse all memory of "System RAM" for detecting memory hotplug/unplug range. For doing so, flags of IORESOUCE_MEM\|IORESOURCE_BUSY was used and this was enough for memory hotplug. But for using other purpose, /proc/kcore, this may includes some firmware area marked as IORESOURCE_BUSY \| IORESOUCE_MEM. This patch makes the check strict to find out busy "System RAM". Note: PPC64 keeps their own walk_memory_resouce(), which walk through ppc64's lmb informaton. Because old kclist_add() is called per lmb, this patch makes no difference in behavior, finally. And this patch removes CONFIG_MEMORY_HOTPLUG check from this function. Because pfn_valid() just show "there is memmap or not* and cannot be used for "there is physical memory or not", this function is useful in generic to scan physical memory range. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	a0614da88b	kcore: register vmalloc area in generic way For /proc/kcore, vmalloc areas are registered per arch. But, all of them registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies them. By this. archs which have no kclist_add() hooks can see vmalloc area correctly. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	c30bb2a25f	kcore: add kclist types Presently, kclist_add() only eats start address and size as its arguments. Considering to make kclist dynamically reconfigulable, it's necessary to know which kclists are for System RAM and which are not. This patch add kclist types as KCORE_RAM KCORE_VMALLOC KCORE_TEXT KCORE_OTHER This "type" is used in a patch following this for detecting KCORE_RAM. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
Geert Uytterhoeven	cc013a8890	arches: drop superfluous casts in nr_free_pages() callers Commit `9617729941` ("Drop free_pages()") modified nr_free_pages() to return 'unsigned long' instead of 'unsigned int'. This made the casts to 'unsigned long' in most callers superfluous, so remove them. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <zankel@tensilica.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:34 -07:00
Ingo Molnar	cdd6c482c9	perf: Do the big rename: Performance Counters -> Performance Events Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f \| grep -vE 'oprofile\|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N \| sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-21 14:28:04 +02:00
Linus Torvalds	723e9db7a4	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits) powerpc/nvram: Enable use Generic NVRAM driver for different size chips powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline powerpc/ps3: Workaround for flash memory I/O error powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead powerpc/perf_counters: Reduce stack usage of power_check_constraints powerpc: Fix bug where perf_counters breaks oprofile powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops powerpc/irq: Improve nanodoc powerpc: Fix some late PowerMac G5 with PCIe ATI graphics powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT powerpc/book3e: Add missing page sizes powerpc/pseries: Fix to handle slb resize across migration powerpc/powermac: Thermal control turns system off too eagerly powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan() powerpc/405ex: support cuImage via included dtb powerpc/405ex: provide necessary fixup function to support cuImage powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support. powerpc/44x: Update Arches defconfig powerpc/44x: Update Arches dts ... Fix up conflicts in drivers/char/agp/uninorth-agp.c	2009-09-15 09:51:09 -07:00
Linus Torvalds	ada3fa1505	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c	2009-09-15 09:39:44 -07:00
Brian King	46db2f86a3	powerpc/pseries: Fix to handle slb resize across migration The SLB can change sizes across a live migration, which was not being handled, resulting in possible machine crashes during migration if migrating to a machine which has a smaller max SLB size than the source machine. Fix this by first reducing the SLB size to the minimum possible value, which is 32, prior to migration. Then during the device tree update which occurs after migration, we make the call to ensure the SLB gets updated. Also add the slb_size to the lparcfg output so that the migration tools can check to make sure the kernel has this capability before allowing migration in scenarios where the SLB size will change. BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid breaking ppc32 build Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-09-02 16:19:01 +10:00
Kumar Gala	df5d6ecf81	powerpc/mm: Add MMU features for TLB reservation & Paired MAS registers Support for TLB reservation (or TLB Write Conditional) and Paired MAS registers are optional for a processor implementation so we handle them via MMU feature sections. We currently only used paired MAS registers to access the full RPN + perm bits that are kept in MAS7\|\|MAS3. We assume that if an implementation has hardware page table at this time it also implements in TLB reservations. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-28 14:24:12 +10:00
Benjamin Herrenschmidt	3c2ee2d9f4	Merge commit 'kumar/next' into next	2009-08-27 13:13:41 +10:00
Benjamin Herrenschmidt	ea3cc330ac	powerpc/mm: Cleanup handling of execute permission This is an attempt at cleaning up a bit the way we handle execute permission on powerpc. _PAGE_HWEXEC is gone, _PAGE_EXEC is now only defined by CPUs that can do something with it, and the myriad of #ifdef's in the I$/D$ coherency code is reduced to 2 cases that hopefully should cover everything. The logic on BookE is a little bit different than what it was though not by much. Since now, _PAGE_EXEC will be set by the generic code for executable pages, we need to filter out if they are unclean and recover it. However, I don't expect the code to be more bloated than it already was in that area due to that change. I could boast that this brings proper enforcing of per-page execute permissions to all BookE and 40x but in fact, we've had that now for some time as a side effect of my previous rework in that area (and I didn't even know it :-) We would only enable execute permission if the page was cache clean and we would only cache clean it if we took and exec fault. Since we now enforce that the later only work if VM_EXEC is part of the VMA flags, we de-fact already enforce per-page execute permissions... Unless I missed something Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-27 13:12:51 +10:00
Kumar Gala	fc4bdb35fb	powerpc/booke: Move MMUCSR definition into mmu-book3e.h The MMUCSR is now defined as part of the Book-3E architecture so we can move it into mmu-book3e.h and add some of the additional bits defined by the architecture specs. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-08-24 20:48:05 -05:00
Benjamin Herrenschmidt	4f0dbc2781	Merge commit 'paulus-perf/master' into next	2009-08-20 11:07:56 +10:00
Kumar Gala	797a747a82	powerpc/mm: Fix assert_pte_locked to work properly on uniprocessor Since the pte_lockptr is a spinlock it gets optimized away on uniprocessor builds so using spin_is_locked is not correct. We can use assert_spin_locked instead and get the proper behavior between UP and SMP builds. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:28:32 +10:00
Roel Kluin	8dcd038a13	powerpc/fsl-booke: read buffer overflow cam[tlbcam_index] is checked before tlbcam_index < ARRAY_SIZE(cam) Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:27:12 +10:00
Kumar Gala	67050b5c3e	powerpc/mm: Fix switch_mmu_context to iterate of the proper list of cpus Introduced a temporary variable into our iterating over the list cpus that are threads on the same core. For some reason Ben forgot how for loops work. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:12 +10:00
Benjamin Herrenschmidt	2d27cfd328	powerpc: Remaining 64-bit Book3E support This contains all the bits that didn't fit in previous patches :-) This includes the actual exception handlers assembly, the changes to the kernel entry, other misc bits and wiring it all up in Kconfig. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:11 +10:00
Benjamin Herrenschmidt	32a74949b7	powerpc/mm: Add support for SPARSEMEM_VMEMMAP on 64-bit Book3E The base TLB support didn't include support for SPARSEMEM_VMEMMAP, though we did carve out some virtual space for it, the necessary support code wasn't there. This implements it by using 16M pages for now, though the page size could easily be changed at runtime if necessary. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:10 +10:00
Benjamin Herrenschmidt	25d21ad6e7	powerpc: Add TLB management code for 64-bit Book3E This adds the TLB miss handler assembly, the low level TLB flush routines along with the necessary hook for dealing with our virtual page tables or indirect TLB entries that need to be flushes when PTE pages are freed. There is currently no support for hugetlbfs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:09 +10:00
Benjamin Herrenschmidt	a8f7758c1c	powerpc/mm: Move around mmu_gathers definition on 64-bit The definition for the global structure mmu_gathers, used by generic code, is currently defined in multiple places not including anything used by 64-bit Book3E. This changes it by moving to one place common to all processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:09 +10:00
Benjamin Herrenschmidt	57e2a99f74	powerpc: Add memory management headers for new 64-bit BookE This adds the PTE and pgtable format definitions, along with changes to the kernel memory map and other definitions related to implementing support for 64-bit Book3E. This also shields some asm-offset bits that are currently only relevant on 32-bit We also move the definition of the "linux" page size constants to the common mmu.h file and add a few sizes that are relevant to embedded processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:06 +10:00
Benjamin Herrenschmidt	c7cc58a1ad	powerpc/mm: Rework & cleanup page table freeing code path That patch used to just add a hook to page table flushing but pulling that string brought out a whole bunch of issues, so it now does that and more: - We now make the RCU batching of page freeing SMP only, as I believe it was intended initially. We make a few more things compile to nothing on !CONFIG_SMP - Some macros are turned into functions, though that forced me to out of line a few stuffs due to unsolvable include depenencies, however it's probably better that way anyway, it's not -that- critical code path. - 32-bit didn't call pte_free_finish() on tlb_flush() which means that it wouldn't push out the batch to RCU for delayed freeing when a bunch of page tables have been freed, they would just stay in there until the batch gets full. 64-bit BookE will use that hook to maintain the virtually linear page tables or the indirect entries in the TLB when using the HW loader. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:24:56 +10:00
Benjamin Herrenschmidt	d4e167da4c	powerpc/mm: Make low level TLB flush ops on BookE take additional args We need to pass down whether the page is direct or indirect and we'll need to pass the page size to _tlbil_va and _tlbivax_bcast We also add a new low level _tlbil_pid_noind() which does a TLB flush by PID but avoids flushing indirect entries if possible This implements those new prototypes but defines them with inlines or macros so that no additional arguments are actually passed on current processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:41 +10:00
Benjamin Herrenschmidt	a245067e20	powerpc/mm: Add support for early ioremap on non-hash 64-bit processors This adds some code to do early ioremap's using page tables instead of bolting entries in the hash table. This will be used by the upcoming 64-bits BookE port. The patch also changes the test for early vs. late ioremap to use slab_is_available() instead of our old hackish mem_init_done. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:40 +10:00
Benjamin Herrenschmidt	fcce810986	powerpc/mm: Add HW threads support to no_hash TLB management The current "no hash" MMU context management code is written with the assumption that one CPU == one TLB. This is not the case on implementations that support HW multithreading, where several linux CPUs can share the same TLB. This adds some basic support for this to our context management and our TLB flushing code. It also cleans up the optional debugging output a bit Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:37 +10:00
Benjamin Herrenschmidt	ee43eb788b	powerpc: Use names rather than numbers for SPRGs (v2) The kernel uses SPRG registers for various purposes, typically in low level assembly code as scratch registers or to hold per-cpu global infos such as the PACA or the current thread_info pointer. We want to be able to easily shuffle the usage of those registers as some implementations have specific constraints realted to some of them, for example, some have userspace readable aliases, etc.. and the current choice isn't always the best. This patch should not change any code generation, and replaces the usage of SPRN_SPRGn everywhere in the kernel with a named replacement and adds documentation next to the definition of the names as to what those are used for on each processor family. The only parts that still use the original numbers are bits of KVM or suspend/resume code that just blindly needs to save/restore all the SPRGs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:27 +10:00
Anton Blanchard	de4376c284	powerpc: Preload application text segment instead of TASK_UNMAPPED_BASE TASK_UNMAPPED_BASE is not used with the new top down mmap layout. We can reuse this preload slot by loading in the segment at 0x10000000, where almost all PowerPC binaries are linked at. On a microbenchmark that bounces a token between two 64bit processes over pipes and calls gettimeofday each iteration (to access the VDSO), both the 32bit and 64bit context switch rate improves (tested on a 4GHz POWER6): 32bit: 273k/sec -> 283k/sec 64bit: 277k/sec -> 284k/sec Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:26 +10:00
Anton Blanchard	5eb9bac040	powerpc: Rearrange SLB preload code With the new top down layout it is likely that the pc and stack will be in the same segment, because the pc is most likely in a library allocated via a top down mmap. Right now we bail out early if these segments match. Rearrange the SLB preload code to sanity check all SLB preload addresses are not in the kernel, then check all addresses for conflicts. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:25 +10:00
Paul Mackerras	9c1e105238	powerpc: Allow perf_counters to access user memory at interrupt time This provides a mechanism to allow the perf_counters code to access user memory in a PMU interrupt routine. Such an access can cause various kinds of interrupt: SLB miss, MMU hash table miss, segment table miss, or TLB miss, depending on the processor. This commit only deals with 64-bit classic/server processors, which use an MMU hash table. 32-bit processors are already able to access user memory at interrupt time. Since we don't soft-disable on 32-bit, we avoid the possibility of reentering hash_page or the TLB miss handlers, since they run with interrupts disabled. On 64-bit processors, an SLB miss interrupt on a user address will update the slb_cache and slb_cache_ptr fields in the paca. This is OK except in the case where a PMU interrupt occurs in switch_slb, which also accesses those fields. To prevent this, we hard-disable interrupts in switch_slb. Interrupts are already soft-disabled at this point, and will get hard-enabled when they get soft-enabled later. This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice, and to make sure that it clears the slb_cache_ptr when called from other callers than switch_slb, the existing routine is renamed to __slb_flush_and_rebolt, which is called by switch_slb and the new version of slb_flush_and_rebolt. Similarly, switch_stab (used on POWER3 and RS64 processors) gets a hard_irq_disable() to protect the per-cpu variables used there and in ste_allocate. If a MMU hashtable miss interrupt occurs, normally we would call hash_page to look up the Linux PTE for the address and create a HPTE. However, hash_page is fairly complex and takes some locks, so to avoid the possibility of deadlock, we check the preemption count to see if we are in a (pseudo-)NMI handler, and if so, we don't call hash_page but instead treat it like a bad access that will get reported up through the exception table mechanism. An interrupt whose handler runs even though the interrupt occurred when soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI handler, which should use nmi_enter()/nmi_exit() rather than irq_enter()/irq_exit(). Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-08-18 14:48:43 +10:00
Tejun Heo	384be2b18a	Merge branch 'percpu-for-linus' into percpu-for-next Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 14:45:31 +09:00
Kumar Gala	5156ddce6c	powerpc/mm: Fix SMP issue with MMU context handling code In switch_mmu_context() if we call steal_context_smp() to get a context to use we shouldn't fall through and than call steal_context_up(). Doing so can be problematic in that the 'mm' that steal_context_up() ends up using will not get marked dirty in the stale_map[] for other CPUs that might have used that mm. Thus we could end up with stale TLB entries in the other CPUs that can cause all kinda of havoc. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-07-29 23:05:43 -05:00
Benjamin Herrenschmidt	9e1b32caa5	mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() Upcoming paches to support the new 64-bit "BookE" powerpc architecture will need to have the virtual address corresponding to PTE page when freeing it, due to the way the HW table walker works. Basically, the TLB can be loaded with "large" pages that cover the whole virtual space (well, sort-of, half of it actually) represented by a PTE page, and which contain an "indirect" bit indicating that this TLB entry RPN points to an array of PTEs from which the TLB can then create direct entries. Thus, in order to invalidate those when PTE pages are deleted, we need the virtual address to pass to tlbilx or tlbivax instructions. The old trick of sticking it somewhere in the PTE page struct page sucks too much, the address is almost readily available in all call sites and almost everybody implemets these as macros, so we may as well add the argument everywhere. I added it to the pmd and pud variants for consistency. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV] Acked-by: Nick Piggin <npiggin@suse.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-27 12:10:38 -07:00
Michael Ellerman	30c5af435b	powerpc: Use pr_devel() in do_dcache_icache_coherency() pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 2036 368 8 2412 96c arch/powerpc/mm/pgtable.o size after: text data bss dec hex filename 1677 248 8 1933 78d arch/powerpc/mm/pgtable.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:24 +10:00
Michael Ellerman	29e5fa59e5	powerpc: Use pr_devel() in arch/powerpc/mm/gup.c pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3252 384 0 3636 e34 arch/powerpc/mm/gup.o size after: text data bss dec hex filename 2576 96 0 2672 a70 arch/powerpc/mm/gup.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:23 +10:00
Michael Ellerman	651e2dd2a1	powerpc: Cleanup & use pr_devel() in arch/powerpc/mm/slb.c pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3261 416 4 3681 e61 arch/powerpc/mm/slb.o size after: text data bss dec hex filename 2861 248 4 3113 c29 arch/powerpc/mm/slb.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:22 +10:00
Michael Ellerman	a1ac38ab98	powerpc: Use pr_devel() in arch/powerpc/mm/mmu_context_nohash.c pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 1508 48 28 1584 630 powerpc/mm/mmu_context_nohash.o size after: text data bss dec hex filename 1088 0 28 1116 45c powerpc/mm/mmu_context_nohash.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:22 +10:00
Joe Perches	d258e64ef5	powerpc: Remove unnecessary semicolons Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:21 +10:00
Tejun Heo	c43768cbb7	Merge branch 'master' into for-next Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix changes. As alpha in percpu tree uses 'weak' attribute instead of inline assembly, there's no need for __used attribute. Conflicts: arch/alpha/include/asm/percpu.h arch/mn10300/kernel/vmlinux.lds.S include/linux/percpu-defs.h	2009-07-04 07:13:18 +09:00
Benjamin Herrenschmidt	850f6ac316	powerpc/mm: Make k(un)map_atomic out of line Those functions are way too big to be inline, besides, kmap_atomic() wants to call debug_kmap_atomic() which isn't exported for modules and causes module link failures. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-26 14:37:25 +10:00
Tejun Heo	204fba4aa3	percpu: cleanup percpu array definitions Currently, the following three different ways to define percpu arrays are in use. 1. DEFINE_PER_CPU(elem_type[array_len], array_name); 2. DEFINE_PER_CPU(elem_type, array_name[array_len]); 3. DEFINE_PER_CPU(elem_type, array_name)[array_len]; Unify to #1 which correctly separates the roles of the two parameters and thus allows more flexibility in the way percpu variables are defined. [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm@kvack.org Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David S. Miller <davem@davemloft.net>	2009-06-24 15:13:45 +09:00
Linus Torvalds	d06063cc22	Move FAULT_FLAG_xyz into handle_mm_fault() callers This allows the callers to now pass down the full set of FAULT_FLAG_xyz flags to handle_mm_fault(). All callers have been (mechanically) converted to the new calling convention, there's almost certainly room for architectures to clean up their code and then add FAULT_FLAG_RETRY when that support is added. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-21 13:08:22 -07:00
Michael Ellerman	ba55bd7436	powerpc: Add configurable -Werror for arch/powerpc Add the option to build the code under arch/powerpc with -Werror. The intention is to make it harder for people to inadvertantly introduce warnings in the arch/powerpc code. It needs to be configurable so that if a warning is introduced, people can easily work around it while it's being fixed. The option is a negative, ie. don't enable -Werror, so that it will be turned on for allyes and allmodconfig builds. The default is n, in the hope that developers will build with -Werror, that will probably lead to some build breaks, I am prepared to be flamed. It's not enabled for math-emu, which is a steaming pile of warnings. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-16 14:15:45 +10:00
Benjamin Herrenschmidt	7dafd239ab	Merge commit 'origin/master' into next	2009-06-15 10:36:54 +10:00
Sankar P	5cdcd9d691	trivial: spelling fix in ppc code comments Fixes a trivial spelling error in powerpc code comments. Signed-off-by: Sankar P <sankar.curiosity@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:47 +02:00
Benjamin Herrenschmidt	bc47ab0241	Merge commit 'origin/master' into next Manual merge of: arch/powerpc/kernel/asm-offsets.c	2009-06-12 16:53:38 +10:00
Peter Zijlstra	f4dbfa8f31	perf_counter: Standardize event names Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 17:54:15 +02:00
Benjamin Herrenschmidt	944916858a	powerpc: Shield code specific to 64-bit server processors This is a random collection of added ifdef's around portions of code that only mak sense on server processors. Using either CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate. This is meant to make the future merging of Book3E 64-bit support easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:47:38 +10:00
Benjamin Herrenschmidt	d3f6204a7d	powerpc: Set init_bootmem_done on NUMA platforms as well For some obscure reason, we only set init_bootmem_done after initializing bootmem when NUMA isn't enabled. We even document this next to the declaration of that global in system.h which of course I didn't read before I had to debug why some WIP code wasn't working properly... This patch changes it so that we always set it after bootmem is initialized which should have always been the case... go figure ! Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:43:04 +10:00
Benjamin Herrenschmidt	b46b6942b3	powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock The MMU context_lock can be taken from switch_mm() while the rq->lock is held. The rq->lock can also be taken from interrupts, thus if we get interrupted in destroy_context() with the context lock held and that interrupt tries to take the rq->lock, there's a possible deadlock scenario with another CPU having the rq->lock and calling switch_mm() which takes our context lock. The fix is to always ensure interrupts are off when taking our context lock. The switch_mm() path is already good so this fixes the destroy_context() path. While at it, turn the context lock into a new style spinlock. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:43:04 +10:00
Benjamin Herrenschmidt	3035c8634f	powerpc/mm: Fix some SMP issues with MMU context handling This patch fixes a couple of issues that can happen as a result of steal_context() dropping the context_lock when all possible PIDs are ineligible for stealing (hopefully an extremely hard to hit occurence). This case exposes the possibility of a stale context_mm[] entry to be seen since destroy_context() doesn't clear it and the free map isn't re-tested. It also means steal_context() will not notice a context freed while the lock was help, thus possibly trying to steal a context when a free one was available. This fixes it by always returning to the caller from steal_context when it dropped the lock with a return value that causes the caller to re-samble the number of free contexts, along with properly clearing the context_mm[] array for destroyed contexts. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:42:21 +10:00
Ingo Molnar	23db9f430b	Merge branch 'linus' into perfcounters/core Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6 based - to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 10:01:39 +02:00
Benjamin Herrenschmidt	435462c6e6	Merge branch 'merge' into next	2009-05-29 13:54:52 +10:00
Benjamin Herrenschmidt	8b31e49d1d	powerpc: Fix up dma_alloc_coherent() on platforms without cache coherency. The implementation we just revived has issues, such as using a Kconfig-defined virtual address area in kernel space that nothing actually carves out (and thus will overlap whatever is there), or having some dependencies on being self contained in a single PTE page which adds unnecessary constraints on the kernel virtual address space. This fixes it by using more classic PTE accessors and automatically locating the area for consistent memory, carving an appropriate hole in the kernel virtual address space, leaving only the size of that area as a Kconfig option. It also brings some dma-mask related fixes from the ARM implementation which was almost identical initially but grew its own fixes. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 16:33:59 +10:00
Benjamin Herrenschmidt	f637a49e50	powerpc: Minor cleanups of kernel virt address space definitions Make FIXADDR_TOP a compile time constant and cleanup a couple of definitions relative to the layout of the kernel address space on ppc32. We also print out that layout at boot time for debugging purposes. This is a pre-requisite for properly fixing non-coherent DMA allocactions. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 16:32:50 +10:00
Benjamin Herrenschmidt	b16e7766d6	powerpc: Move dma-noncoherent.c from arch/powerpc/lib to arch/powerpc/mm (pre-requisite to make the next patches more palatable) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 16:32:05 +10:00
Hideo Saito	8e35961b57	powerpc/mm: Fix broken MMU PID stealing on !SMP The recent rework of the MMU PID handling for non-hash CPUs has a subtle bug in the !SMP "optimized" variant of the PID stealing function. It clears the PID in the mm context before it calls local_flush_tlb_mm(). However, the later will not flush anything if the PID in the context is clear... Signed-off-by: Hideo Saito <hsaito.ppc@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-26 13:46:49 +10:00
Milton Miller	60dbf43851	powerpc: Add 2.06 tlbie mnemonics This adds the PowerPC 2.06 tlbie mnemonics and keeps backwards compatibilty for CPUs before 2.06. Only useful for bare metal systems. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-21 15:44:21 +10:00
Ingo Molnar	dc3f81b129	Merge commit 'v2.6.30-rc6' into perfcounters/core Merge reason: this branch was on an -rc4 base, merge it up to -rc6 to get the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 07:37:49 +02:00
Mel Gorman	af3e4aca47	powerpc: Do not assert pte_locked for hugepage PTE entries With CONFIG_DEBUG_VM, an assertion is made when changing the protection flags of a PTE that the PTE is locked. Huge pages use a different pagetable format and the assertion is bogus and will always trigger with a bug looking something like Unable to handle kernel paging request for data at address 0xf1a00235800006f8 Faulting instruction address: 0xc000000000034a80 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=32 NUMA Maple Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor windfarm_cpufreq_clamp windfarm_core i2c_powermac NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003 REGS: c000000003037600 TRAP: 0300 Not tainted (2.6.30-rc3-autokern1) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28002484 XER: 200fffff DAR: f1a00235800006f8, DSISR: 0000000040010000 TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2 GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500 GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001 GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8 GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000 GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20 GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000 GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8 GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880 NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0 LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 Call Trace: [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable) [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674 [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8 [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828 [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584 [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c Instruction dump: 7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4 7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000 This patch fixes the problem by not asseting the PTE is locked for VMAs backed by huge pages. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-18 15:19:04 +10:00
Becky Bruce	49a8496525	powerpc: Allow mem=x cmdline to work with 4G+ We're currently choking on mem=4g (and above) due to memory_limit being specified as an unsigned long. Make memory_limit phys_addr_t to fix this. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-15 16:43:41 +10:00
Ingo Molnar	e7fd5d4b3d	Merge branch 'linus' into perfcounters/core Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-29 14:47:05 +02:00
Stephen Rothwell	b62c31ae40	powerpc: fix for long standing bug noticed by gcc 4.4.0 Previous gcc versions didn't notice this because one of the preceding #ifs always evaluated to true. gcc 4.4.0 produced this error: arch/powerpc/mm/tlb_nohash_low.S:206:6: error: #elif with no expression Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-23 08:52:16 -05:00
Kumar Gala	323d23aeac	Revert "powerpc: Add support for early tlbilx opcode" This reverts commit `e996557740`. Our HW guys were able to fix this so it never sees the light of day. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-23 08:51:22 -05:00
Michael Ellerman	24f1ce803c	powerpc: Fix crash on CPU hotplug early_init_mmu_secondary() is called at CPU hotplug time, so it must be marked as __cpuinit, not __init. Caused by `757c74d2` ("powerpc/mm: Introduce early_init_mmu() on 64-bit"). Tested-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-04-22 14:56:34 +10:00
Peter Zijlstra	78f13e9525	perf_counter: allow for data addresses to be recorded Paul suggested we allow for data addresses to be recorded along with the traditional IPs as power can provide these. For now, only the software pagefault events provide data addresses, but in the future power might as well for some events. x86 doesn't seem capable of providing this atm. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130409.394816925@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-08 19:05:56 +02:00
Kumar Gala	52ce67f157	powerpc/mm: Fix compile warning arch/powerpc/mm/tlb_nohash.c: In function 'flush_tlb_mm': arch/powerpc/mm/tlb_nohash.c:128: warning: unused variable 'cpu_mask' Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-07 22:11:10 -05:00
Kumar Gala	e996557740	powerpc: Add support for early tlbilx opcode During the ISA 2.06 development the opcode for tlbilx changed and some early implementations used to old opcode. Add support for a MMU_FTR fixup to deal with this. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-07 01:36:30 -05:00
Peter Zijlstra	ac17dc8e58	perf_counter: provide major/minor page fault software events Provide separate sw counters for major and minor page faults. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:29:40 +02:00
Peter Zijlstra	7dd1fcc258	perf_counter: provide pagefault software events We use the generic software counter infrastructure to provide page fault events. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:29:37 +02:00
Benjamin Herrenschmidt	757c74d298	powerpc/mm: Introduce early_init_mmu() on 64-bit This moves some MMU related init code out of setup_64.c into hash_utils_64.c and calls it early_init_mmu() and early_init_mmu_secondary(). This will make it easier to plug in a new MMU type. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt	ff7c660092	powerpc/mm: Fix printk type warning in mmu_context_nohash We need to use %zu instead of %d when printing a sizeof() Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt	d62cbf45a8	powerpc/mm: Rename arch/powerpc/kernel/mmap.c to mmap_64.c This file is only useful on 64-bit, so we name it accordingly. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:33 +11:00
Benjamin Herrenschmidt	8d1cf34e7a	powerpc/mm: Tweak PTE bit combination definitions This patch tweaks the way some PTE bit combinations are defined, in such a way that the 32 and 64-bit variant become almost identical and that will make it easier to bring in a new common pte-* file for the new variant of the Book3-E support. The combination of bits defining access to kernel pages are now clearly separated from the combination used by userspace and the core VM. The resulting generated code should remain identical unless I made a mistake. Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB in ppc_mmu_32.c which could cause kernel mappings to be user accessible when that option is enabled. Probably something that bitrot. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:33 +11:00
Rusty Russell	56aa4129e8	cpumask: Use mm_cpumask() wrapper instead of cpu_vm_mask Makes code futureproof against the impending change to mm->cpu_vm_mask. It's also a chance to use the new cpumask_ ops which take a pointer (the older ones are deprecated, but there's no hurry for arch code). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:29 +11:00
Benjamin Herrenschmidt	9e5efaa936	powerpc/mm: Properly wire up get_user_pages_fast() on 32-bit While we did add support for _PAGE_SPECIAL on some 32-bit platforms, we never actually built get_user_pages_fast() on them. This fixes it which requires a little bit of ifdef'ing around. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:11:34 +11:00
Benjamin Herrenschmidt	1cdab55d8a	powerpc: Wire up /proc/vmallocinfo to our ioremap() This adds the necessary bits and pieces to powerpc implementation of ioremap to benefit from caller tracking in /proc/vmallocinfo, at least for ioremap's done after mem init as the older ones aren't tracked. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:10:14 +11:00
Kumar Gala	c3071951d0	powerpc/fsl-booke: Add support for tlbilx instructions The e500mc core supports the new tlbilx instructions that do core local invalidates and also provide us the ability to take down all TLB entries matching a given PID. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-03-09 09:25:38 -05:00
Anton Blanchard	002b0ec73d	powerpc: Increase stack gap on 64bit binaries On 64bit there is a possibility our stack and mmap randomisation will put the two close enough such that we can't expand our stack to match the ulimit specified. To avoid this, start the upper mmap address at 1GB + 128MB below the top of our address space, so in the worst case we end up with the same ~128MB hole as in 32bit. This works because we randomise the stack over a 1GB range. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:21 +11:00
Anton Blanchard	a5adc91a4b	powerpc: Ensure random space between stack and mmaps get_random_int() returns the same value within a 1 jiffy interval. This means that the mmap and stack regions will almost always end up the same distance apart, making a relative offset based attack possible. To fix this, shift the randomness we use for the mmap region by 1 bit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:21 +11:00
Anton Blanchard	9f14c42d75	powerpc: Randomise mmap start address Randomise mmap start address - 8MB on 32bit and 1GB on 64bit tasks. Until ppc32 uses the mmap.c functionality, this is ppc64 specific. Before: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 After: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 f718b000-f7b8c000 rw-p f718b000 00:00 0 f7551000-f7f52000 rw-p f7551000 00:00 0 f6ee7000-f78e8000 rw-p f6ee7000 00:00 0 f74d4000-f7ed5000 rw-p f74d4000 00:00 0 f6e9d000-f789e000 rw-p f6e9d000 00:00 0 Similar for 64bit, but with 1GB of scatter: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 fffb97b5000-fffb97b6000 rw-p fffb97b5000 00:00 0 fffce9a3000-fffce9a4000 rw-p fffce9a3000 00:00 0 fffeaaf2000-fffeaaf3000 rw-p fffeaaf2000 00:00 0 fffd88ac000-fffd88ad000 rw-p fffd88ac000 00:00 0 fffbc62e000-fffbc62f000 rw-p fffbc62e000 00:00 0 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:07 +11:00
Anton Blanchard	13a2cb3694	powerpc: Rearrange mmap.c Rearrange mmap.c to better match the x86 version. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:06 +11:00
Nathan Fontenot	0f16ef7fd3	powerpc/numa: Cleanup hot_add_scn_to_nid This patch reworks the hot_add_scn_to_nid and its supporting functions to make them easier to understand. There are no functional changes in this patch and has been tested on machine with memory represented in the device tree as memory nodes and in the ibm,dynamic-memory property. My previous patch that introduced support for hotplug memory add on systems whose memory was represented by the ibm,dynamic-memory property of the device tree only left the code more unintelligible. This will hopefully makes things easier to understand. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:04 +11:00
Anton Blanchard	13870b6575	powerpc/mm: Reduce hashtable size when using 64kB pages At the moment we size the hashtable based on 4kB pages / 2, even on a 64kB kernel. This results in a hashtable that is much larger than it needs to be. Grab the real page size and size the hashtable based on that Note: This only has effect on non hypervisor machines. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:58 +11:00
Benjamin Herrenschmidt	3b7faeb49e	Merge commit 'kumar/next' into next	2009-02-18 13:23:30 +11:00
Benjamin Herrenschmidt	82a0a1cc8f	Merge commit 'origin/master' into next Manual merge of: arch/powerpc/include/asm/pgtable-ppc32.h	2009-02-18 13:19:25 +11:00
Dave Hansen	06eccea6c3	powerpc/mm: Fix numa reserve bootmem page selection Fix the powerpc NUMA reserve bootmem page selection logic. commit `8f64e1f2d1` (powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes) changed the logic for how the powerpc LMB reserved regions were converted to bootmen reserved regions. As the folowing discussion reports, the new logic was not correct. mark_reserved_regions_for_nid() goes through each LMB on the system that specifies a reserved area. It searches for active regions that intersect with that LMB and are on the specified node. It attempts to bootmem-reserve only the area where the active region and the reserved LMB intersect. We can not reserve things on other nodes as they may not have bootmem structures allocated, yet. We base the size of the bootmem reservation on two possible things. Normally, we just make the reservation start and stop exactly at the start and end of the LMB. However, the LMB reservations are not aware of NUMA nodes and on occasion a single LMB may cross into several adjacent active regions. Those may even be on different NUMA nodes and will require separate calls to the bootmem reserve functions. So, the bootmem reservation must be trimmed to fit inside the current active region. That's all fine and dandy, but we trim the reservation in a page-aligned fashion. That's bad because we start the reservation at a non-page-aligned address: physbase. The reservation may only span 2 bytes, but that those bytes may span two pfns and cause a reserve_size of 2PAGE_SIZE. Take the case where you reserve 0x2 bytes at 0x0fff and where the active region ends at 0x1000. You'll jump into that if() statment, but node_ar.end_pfn=0x1 and start_pfn=0x0. You'll end up with a reserve_size=0x1000, and then call reserve_bootmem_node(node, physbase=0xfff, size=0x1000); 0x1000 may not be on the same node as 0xfff. Oops. In almost all the vm code, end_<anything> is not inclusive. If you have an end_pfn of 0x1234, page 0x1234 is not included in the range. Using PFN_UP instead of the (>> >> PAGE_SHIFT) will make this consistent with the other VM code. We also need to do math for the reserved size with physbase instead of start_pfn. node_ar.end_pfn << PAGE_SHIFT is precisely* the end of the node. However, (start_pfn << PAGE_SHIFT) is NOT precisely the beginning of the reserved area. That is, of course, physbase. If we don't use physbase here, the reserve_size can be made too large. From: Dave Hansen <dave@linux.vnet.ibm.com> Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Tested on PS3. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-13 16:37:45 +11:00
Kumar Gala	96a8bac589	powerpc/fsl-booke: Fix compile warning arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem': arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t' Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-12 16:54:53 -06:00
Kumar Gala	d66c82ea45	powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines The Power ISA 2.06 added power of two page sizes to the embedded MMU architecture. Its done it such a way to be code compatiable with the existing HW. Made the minor code changes to support both power of two and power of four page sizes. Also added some new MAS bits and macros that are defined as part of the 2.06 ISA. Renamed some things to use the 'Book-3e' concept to convey the new MMU that is based on the Freescale Book-E MMU programming model. Note, its still invalid to try and use a page size that isn't supported by cpu. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-12 16:37:11 -06:00
Kumar Gala	f99fb8a2cb	powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW The following commit: commit `64b3d0e812` Author: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Thu Dec 18 19:13:51 2008 +0000 powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED broke setting of the _PAGE_COHERENT bit in the PPC HW PTE. Since we now actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it out before we propogate it to the PPC HW PTE. Reported-by: Martyn Welch <martyn.welch@gefanuc.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:07:02 +11:00
Benjamin Herrenschmidt	8d30c14cab	powerpc/mm: Rework I$/D$ coherency (v3) This patch reworks the way we do I and D cache coherency on PowerPC. The "old" way was split in 3 different parts depending on the processor type: - Hash with per-page exec support (64-bit and >= POWER4 only) does it at hashing time, by preventing exec on unclean pages and cleaning pages on exec faults. - Everything without per-page exec support (32-bit hash, 8xx, and 64-bit < POWER4) does it for all page going to user space in update_mmu_cache(). - Embedded with per-page exec support does it from do_page_fault() on exec faults, in a way similar to what the hash code does. That leads to confusion, and bugs. For example, the method using update_mmu_cache() is racy on SMP where another processor can see the new PTE and hash it in before we have cleaned the cache, and then blow trying to execute. This is hard to hit but I think it has bitten us in the past. Also, it's inefficient for embedded where we always end up having to do at least one more page fault. This reworks the whole thing by moving the cache sync into two main call sites, though we keep different behaviours depending on the HW capability. The call sites are set_pte_at() which is now made out of line, and ptep_set_access_flags() which joins the former in pgtable.c The base idea for Embedded with per-page exec support, is that we now do the flush at set_pte_at() time when coming from an exec fault, which allows us to avoid the double fault problem completely (we can even improve the situation more by implementing TLB preload in update_mmu_cache() but that's for later). If for some reason we didn't do it there and we try to execute, we'll hit the page fault, which will do a minor fault, which will hit ptep_set_access_flags() to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make this guys also perform the I/D cache sync for exec faults now. This second path is the catch all for things that weren't cleaned at set_pte_at() time. For cpus without per-pag exec support, we always do the sync at set_pte_at(), thus guaranteeing that when the PTE is visible to other processors, the cache is clean. For the 64-bit hash with per-page exec support case, we keep the old mechanism for now. I'll look into changing it later, once I've reworked a bit how we use _PAGE_EXEC. This is also a first step for adding _PAGE_EXEC support for embedded platforms Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:00:10 +11:00
Anton Blanchard	91b0f5ec53	powerpc/mm: Move 64-bit unmapped_area to top of address space We currently place mmaps just below the stack on 32bit, but leave them in the middle of the address space on 64bit: 00100000-00120000 r-xp 00100000 00:00 0 [vdso] 10000000-10010000 r-xp 00000000 08:06 179534 /tmp/sleep 10010000-10020000 rw-p 00000000 08:06 179534 /tmp/sleep 10020000-10130000 rw-p 10020000 00:00 0 [heap] 40000000000-40000030000 r-xp 00000000 08:06 440743 /lib64/ld-2.9.so 40000030000-40000040000 rw-p 00020000 08:06 440743 /lib64/ld-2.9.so 40000050000-400001f0000 r-xp 00000000 08:06 440671 /lib64/libc-2.9.so 400001f0000-40000200000 r--p 00190000 08:06 440671 /lib64/libc-2.9.so 40000200000-40000220000 rw-p 001a0000 08:06 440671 /lib64/libc-2.9.so 40000220000-40008230000 rw-p 40000220000 00:00 0 fffffbc0000-fffffd10000 rw-p fffffeb0000 00:00 0 [stack] Right now it isn't an issue, but at some stage we will run into mmap or hugetlb allocation issues. Using the same layout as 32bit gives us a some breathing room. This matches what x86-64 is doing too. 00100000-00103000 r-xp 00100000 00:00 0 [vdso] 10000000-10001000 r-xp 00000000 08:06 554894 /tmp/test 10010000-10011000 r--p 00000000 08:06 554894 /tmp/test 10011000-10012000 rw-p 00001000 08:06 554894 /tmp/test 10012000-10113000 rw-p 10012000 00:00 0 [heap] fffefdf7000-ffff7df8000 rw-p fffefdf7000 00:00 0 ffff7df8000-ffff7f97000 r-xp 00000000 08:06 130591 /lib64/libc-2.9.so ffff7f97000-ffff7fa6000 ---p 0019f000 08:06 130591 /lib64/libc-2.9.so ffff7fa6000-ffff7faa000 r--p 0019e000 08:06 130591 /lib64/libc-2.9.so ffff7faa000-ffff7fc0000 rw-p 001a2000 08:06 130591 /lib64/libc-2.9.so ffff7fc0000-ffff7fc4000 rw-p ffff7fc0000 00:00 0 ffff7fc4000-ffff7fec000 r-xp 00000000 08:06 130663 /lib64/ld-2.9.so ffff7fee000-ffff7ff0000 rw-p ffff7fee000 00:00 0 ffff7ffa000-ffff7ffb000 rw-p ffff7ffa000 00:00 0 ffff7ffb000-ffff7ffc000 r--p 00027000 08:06 130663 /lib64/ld-2.9.so ffff7ffc000-ffff7fff000 rw-p 00028000 08:06 130663 /lib64/ld-2.9.so ffff7fff000-ffff8000000 rw-p ffff7fff000 00:00 0 fffffc59000-fffffc6e000 rw-p ffffffeb000 00:00 0 [stack] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:00:07 +11:00
Milton Miller	8b16cd238d	powerpc/numa: Remove redundant find_cpu_node() Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in a less restrictive section (text vs cpuinit). Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 13:37:59 +11:00
Milton Miller	20fcefe5a0	powerpc/numa: Avoid possible reference beyond prop. length in find_min_common_depth() find_min_common_depth() was checking the property length incorrectly. The value is in bytes not cells, and it is using the second entry. Signed-off-By: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 13:37:58 +11:00
Benjamin Herrenschmidt	edbc29d76d	Merge commit 'kumar/next' into next	2009-02-11 13:37:44 +11:00
Kumar Gala	6c24b17453	powerpc/fsl-booke: Fix mapping functions to use phys_addr_t Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t instead of unsigned long. In 36-bit physical mode we really need these functions to deal with phys_addr_t when trying to match a physical address or when returning one. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-09 21:11:55 -06:00
Trent Piepho	96051465fd	powerpc/fsl-booke: Make CAM entries used for lowmem configurable On booke processors, the code that maps low memory only uses up to three CAM entries, even though there are sixteen and nothing else uses them. Make this number configurable in the advanced options menu along with max low memory size. If one wants 1 GB of lowmem, then it's typically necessary to have four CAM entries. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:54 -06:00
Trent Piepho	c8f3570b7e	powerpc/fsl-booke: Allow larger CAM sizes than 256 MB The code that maps kernel low memory would only use page sizes up to 256 MB. On E500v2 pages up to 4 GB are supported. However, a page must be aligned to a multiple of the page's size. I.e. 256 MB pages must aligned to a 256 MB boundary. This was enforced by a requirement that the physical and virtual addresses of the start of lowmem be aligned to 256 MB. Clearly requiring 1GB or 4GB alignment to allow pages of that size isn't acceptable. To solve this, I simply have adjust_total_lowmem() take alignment into account when it decides what size pages to use. Give it PAGE_OFFSET = 0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map pages like this: PA 0x3000_0000 VA 0x7000_0000 Size 256 MB PA 0x4000_0000 VA 0x8000_0000 Size 1 GB PA 0x8000_0000 VA 0xC000_0000 Size 256 MB PA 0x9000_0000 VA 0xD000_0000 Size 256 MB PA 0xA000_0000 VA 0xE000_0000 Size 256 MB Because the lowmem mapping code now takes alignment into account, PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB. Even lower might be possible. The lowmem code will work down to 4 kB but it's possible some of the boot code will fail before then. Poor alignment will force small pages to be used, which combined with the limited number of TLB1 pages available, will result in very little memory getting mapped. So alignments less than 64 MB probably aren't very useful anyway. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:53 -06:00
Trent Piepho	f88747e7f6	powerpc/fsl-booke: Remove code duplication in lowmem mapping The code to map lowmem uses three CAM aka TLB[1] entries to cover it. The size of each is stored in three globals named __cam0, __cam1, and __cam2. All the code that uses them is duplicated three times for each of the three variables. We have these things called arrays and loops.... Once converted to use an array, it will be easier to make the number of CAMs configurable. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:51 -06:00
Gerhard Pircher	4c456a67f5	powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup code _PAGE_COHERENT is now always set in _PAGE_RAM resp. PAGE_KERNEL. Thus it has to be masked out, if the BAT mapping should be non cacheable or CPU_FTR_NEED_COHERENT is not set. This will work on normal SMP setups because we force-set CPU_FTR_NEED_COHERENT as part of CPU_FTR_COMMON on SMP. Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-28 17:15:52 +11:00
Dave Kleikamp	9ba0fdbfae	powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices The subpage_prot syscall fails on second and subsequent calls for a given region, because is_hugepage_only_range() is mis-identifying the 4 kB slices when the process has a 64 kB page size. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-16 16:15:16 +11:00
Ingo Molnar	fe333321e2	powerpc: Change u64/s64 to a long long integer type Convert arch/powerpc/ over to long long based u64: -#ifdef __powerpc64__ -# include <asm-generic/int-l64.h> -#else -# include <asm-generic/int-ll64.h> -#endif +#include <asm-generic/int-ll64.h> This will avoid reoccuring spurious warnings in core kernel code that comes when people test on their own hardware. (i.e. x86 in ~98% of the cases) This is what x86 uses and it generally helps keep 64-bit code 32-bit clean too. [Adjusted to not impact user mode (from paulus) - sfr] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-13 14:47:59 +11:00
Benjamin Herrenschmidt	30aae739a9	Merge commit 'kumar/kumar-next' into next	2009-01-13 13:59:03 +11:00
Anton Vorontsov	7021d86afa	powerpc/mm: Make clear_fixmap() actually work The clear_fixmap() routine issues map_page() with flags set to 0. Currently this causes a BUG_ON() inside the map_page(), as it assumes that a PTE should be clear before mapping. This patch makes the map_page() to trigger the BUG_ON() only if the flags were set. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:17 +11:00
Benjamin Herrenschmidt	4a0826824b	powerpc: Fix missing semicolons in mmu_decl.h This is a brown paper bag from one of my earlier patches that breaks build on 40x and 8xx. And yes, I've now added 40x and 8xx to my list of test configs :-) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:17 +11:00
Dave Liu	d6a09e0cd6	powerpc: Remove the redundant _tlbil_pid at SMP case Signed-off-by: Dave Liu <daveliu@freescale.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:13 +11:00
Dave Hansen	893473df78	powerpc/mm: Cleanup careful_allocation(): consolidate memset() Both users of careful_allocation() immediately memset() the result. So, just do it in one place. Also give careful_allocation() a 'z' prefix to bring it in line with kzmalloc() and friends. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:09 +11:00
Dave Hansen	0be210fd66	powerpc/mm: Make careful_allocation() return virtual addrs Since we memset() the result in both of the uses here, just make careful_alloc() return a virtual address. Also, add a separate variable to store the physial address that comes back from the lmb_alloc() functions. This makes it less likely that someone will screw it up forgetting to convert before returning since the vaddr is always in a void* and the paddr is always in an unsigned long. I admit this is arbitrary since one of its users needs a paddr and one a vaddr, but it does remove a good number of casts. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:08 +11:00
Dave Hansen	5d21ea2b0e	powerpc/mm:: Cleanup careful_allocation(): bootmem already panics If we fail a bootmem allocation, the bootmem code itself panics. No need to redo it here. Also change the wording of the other panic. We don't strictly have to allocate memory on the specified node. It is just a hint and that node may not even have any memory on it. In that case we can and do fall back to other nodes. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:08 +11:00
Dave Hansen	c555e520ef	powerpc/mm: Add better comment on careful_allocation() The behavior in careful_allocation() really confused me at first. Add a comment to hopefully make it easier on the next doofus that looks at it. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:08 +11:00
Trent Piepho	6fd8be4bf7	powerpc/fsl-booke: Remove num_tlbcam_entries This is a global variable defined in fsl_booke_mmu.c with a value that gets initialized in assembly code in head_fsl_booke.S. It's never used. If some code ever does want to know the number of entries in TLB1, then "numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a global initialized during kernel boot from assembly. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-07 15:33:07 -06:00
Trent Piepho	19f5465e82	powerpc/fsl-booke: Don't hard-code size of struct tlbcam Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam to 20 when it indexed the TLBCAM table. Anyone changing the size of struct tlbcam would not know to expect that. The kernel already has a system to get the size of C structures into assembly language files, asm-offsets, so let's use it. The definition of the struct gets moved to a header, so that asm-offsets.c can include it. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-07 15:33:06 -06:00
Gary Hade	c04fc586c1	mm: show node to memory section relationship with symlinks in sysfs Show node to memory section relationship with symlinks in sysfs Add /sys/devices/system/node/nodeX/memoryY symlinks for all the memory sections located on nodeX. For example: /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 indicates that memory section 135 resides on node1. Also revises documentation to cover this change as well as updating Documentation/ABI/testing/sysfs-devices-memory to include descriptions of memory hotremove files 'phys_device', 'phys_index', and 'state' that were previously not described there. In addition to it always being a good policy to provide users with the maximum possible amount of physical location information for resources that can be hot-added and/or hot-removed, the following are some (but likely not all) of the user benefits provided by this change. Immediate: - Provides information needed to determine the specific node on which a defective DIMM is located. This will reduce system downtime when the node or defective DIMM is swapped out. - Prevents unintended onlining of a memory section that was previously offlined due to a defective DIMM. This could happen during node hot-add when the user or node hot-add assist script onlines _all_ offlined sections due to user or script inability to identify the specific memory sections located on the hot-added node. The consequences of reintroducing the defective memory could be ugly. - Provides information needed to vary the amount and distribution of memory on specific nodes for testing or debugging purposes. Future: - Will provide information needed to identify the memory sections that need to be offlined prior to physical removal of a specific node. Symlink creation during boot was tested on 2-node x86_64, 2-node ppc64, and 2-node ia64 systems. Symlink creation during physical memory hot-add tested on a 2-node x86_64 system. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:00 -08:00
Mel Gorman	3340289ddf	mm: report the MMU pagesize in /proc/pid/smaps The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the kernel to back a VMA. This matches the size used by the MMU in the majority of cases. However, one counter-example occurs on PPC64 kernels whereby a kernel using 64K as a base pagesize may still use 4K pages for the MMU on older processor. To distinguish, this patch reports MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:58:58 -08:00
Linus Torvalds	3c92ec8ae9	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits) powerpc/44x: Support 16K/64K base page sizes on 44x powerpc: Force memory size to be a multiple of PAGE_SIZE powerpc/32: Wire up the trampoline code for kdump powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M powerpc/32: Allow __ioremap on RAM addresses for kdump kernel powerpc/32: Setup OF properties for kdump powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs() powerpc: Prepare xmon_save_regs for use with kdump powerpc: Remove default kexec/crash_kernel ops assignments powerpc: Make default kexec/crash_kernel ops implicit powerpc: Setup OF properties for ppc32 kexec powerpc/pseries: Fix cpu hotplug powerpc: Fix KVM build on ppc440 powerpc/cell: add QPACE as a separate Cell platform powerpc/cell: fix build breakage with CONFIG_SPUFS disabled powerpc/mpc5200: fix error paths in PSC UART probe function powerpc/mpc5200: add rts/cts handling in PSC UART driver powerpc/mpc5200: Make PSC UART driver update serial errors counters powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver ... Fix trivial conflict in drivers/char/Makefile as per Paul's directions	2008-12-28 16:54:33 -08:00
Ilya Yanok	ca9153a3a2	powerpc/44x: Support 16K/64K base page sizes on 44x This adds support for 16k and 64k page sizes on PowerPC 44x processors. The PGDIR table is much smaller than a page when using 16k or 64k pages (512 and 32 bytes respectively) so we allocate the PGDIR with kzalloc() instead of __get_free_pages(). One PTE table covers rather a large memory area when using 16k or 64k pages (32MB or 512MB respectively), so we can easily put FIXMAP and PKMAP in the area covered by one PTE table. Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Vladimir Panfilov <pvr@emcraft.com> Signed-off-by: Ilya Yanok <yanok@emcraft.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-29 09:53:25 +11:00
James Morris	cbacc2c7f0	Merge branch 'next' into for-linus	2008-12-25 11:40:09 +11:00
Dale Farnsworth	ccdcef72c2	powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M Add the ability for a classic ppc kernel to be loaded at an address of 32MB. This done by fixing a few places that assume we are loaded at address 0, and by changing several uses of KERNELBASE to use PAGE_OFFSET, instead. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:29 +11:00
Anton Vorontsov	01695a9687	powerpc/32: Allow __ioremap on RAM addresses for kdump kernel While for debugging it is good to catch bogus users of ioremap, though for kdump support it is more convenient to use __ioremap for copy_oldmem_page() (exactly as we do for PPC64 currently). Note that copy_oldmem_page() calls __ioremap with flags set to '0', so it should be safe with the regard to the caches. The other option is to use kmap_atomic_pfn()[1], but it will not work for kernels compiled without HIGHMEM. That is, on a board with 256MB RAM and crashkernel=64M@32M case, the !HIGHMEM capturing kernel maps 0-96M range, which does not include all the memory needed to capture the dump. And, obviously, accessing anything upper than 96M will cause faults. [1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046747.html Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:29 +11:00
Benjamin Herrenschmidt	a14953597b	powerpc: Fix missing 'blr' in _tlbia() Rework to MMU code dropped a much missed 'blr' instruction. Brown-Paper-Bag-Worn-By: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2008-12-21 02:54:25 -07:00
Benjamin Herrenschmidt	64b3d0e812	powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in in the hash code based on some CPU feature bit. We also manipulate _PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places. This changes the logic so that instead, the PTE now contains _PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms that need it. The hash code clears it if the feature bit is not set. It also adds some clean accessors to setup various valid combinations of access flags and change various bits of code to use them instead. This should help having the PTE actually containing the bit combinations that we really want. I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead set it explicitely from the TLB miss. I will ultimately remove it completely as it appears that it might not be needed after all but in the meantime, having it in the TLB miss makes things a lot easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	7752035180	powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs This makes the MMU context code used for CPUs with no hash table (except 603) dynamically allocate the various maps used to track the state of contexts. Only the main free map and CPU 0 stale map are allocated at boot time. Other CPU maps are allocated when those CPUs are brought up and freed if they are unplugged. This also moves the initialization of the MMU context management slightly later during the boot process, which should be fine as it's really only needed when userland if first started anyways. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	760ec0e02d	powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440 The handlers for Critical, Machine Check or Debug interrupts will save and restore MMUCR nowadays, thus we only need to disable normal interrupts when invalidating TLB entries. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	2a4aca1144	powerpc/mm: Split low level tlb invalidate for nohash processors Currently, the various forms of low level TLB invalidations are all implemented in misc_32.S for 32-bit processors, in a fairly scary mess of #ifdef's and with interesting duplication such as a whole bunch of code for FSL _tlbie and _tlbia which are no longer used. This moves things around such that _tlbie is now defined in hash_low_32.S and is only used by the 32-bit hash code, and all nohash CPUs use the various _tlbil_* forms that are now moved to a new file, tlb_nohash_low.S. I moved all the definitions for that stuff out of include/asm/tlbflush.h as they are really internal mm stuff, into mm/mmu_decl.h The code should have no functional changes. I kept some variants inline for trivial forms on things like 40x and 8xx. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	f048aace29	powerpc/mm: Add SMP support to no-hash TLB handling This commit moves the whole no-hash TLB handling out of line into a new tlb_nohash.c file, and implements some basic SMP support using IPIs and/or broadcast tlbivax instructions. Note that I'm using local invalidations for D->I cache coherency. At worst, if another processor is trying to execute the same and has the old entry in its TLB, it will just take a fault and re-do the TLB flush locally (it won't re-do the cache flush in any case). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	7c03d653cd	powerpc/mm: Introduce MMU features We're soon running out of CPU features and I need to add some new ones for various MMU related bits, so this patch separates the MMU features from the CPU features. I moved over the 32-bit MMU related ones, added base features for MMU type families, but didn't move over any 64-bit only feature yet. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	2ca8cf7389	powerpc/mm: Rework context management for CPUs with no hash table This reworks the context management code used by 4xx,8xx and freescale BookE. It adds support for SMP by implementing a concept of stale context map to lazily flush the TLB on processors where a context may have been invalidated. This also contains the ground work for generalizing such lazy TLB flushing by just picking up a new PID and marking the old one stale. This will be implemented later. This is a first implementation that uses a global spinlock. Ideally, we should try to get at least the fast path (context ID already assigned) lockless or limited to a per context lock, but for now this will do. I tried to keep the UP case reasonably simple to avoid adding too much overhead to 8xx which does a lot of context stealing since it effectively has only 16 PIDs available. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt	5e696617c4	powerpc/mm: Split mmu_context handling This splits the mmu_context handling between 32-bit hash based processors, 64-bit hash based processors and everybody else. This is preliminary work for adding SMP support for BookE processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt	f63837f058	powerpc/mm: Remove flush_HPTE() The function flush_HPTE() is used in only one place, the implementation of DEBUG_PAGEALLOC on ppc32. It's actually a dup of flush_tlb_page() though it's -slightly- more efficient on hash based processors. We remove it and replace it by a direct call to the hash flush code on those processors and to flush_tlb_page() for everybody else. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:53:34 +11:00
Benjamin Herrenschmidt	e41e811a79	powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c This renames the files to clarify the fact that they are used by the hash based family of CPUs (the 603 being an exception in that family but is still handled by that code). This paves the way for the new tlb_nohash.c coming via a subsequent commit. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:53:30 +11:00
Paul Mackerras	1e1c568d6c	Merge branch 'merge' into next	2008-12-16 14:38:58 +11:00
Dave Hansen	a4c74ddd5e	powerpc: Fix bootmem reservation on uninitialized node careful_allocation() was calling into the bootmem allocator for nodes which had not been fully initialized and caused a previous bug: http://patchwork.ozlabs.org/patch/10528/ So, I merged a few broken out loops in do_init_bootmem() to fix it. That changed the code ordering. I think this bug is triggered by having reserved areas for a node which are spanned by another node's contents. In the mark_reserved_regions_for_nid() code, we attempt to reserve the area for a node before we have allocated the NODE_DATA() for that nid. We do this since I reordered that loop. I suck. This is causing crashes at bootup on some systems, as reported by Jon Tollefson. This may only present on some systems that have 16GB pages reserved. But, it can probably happen on any system that is trying to reserve large swaths of memory that happen to span other nodes' contents. This commit ensures that we do not touch bootmem for any node which has not been initialized, and also removes a compile warning about an unused variable. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 13:48:18 +11:00
Brian King	48f797de55	powerpc: Check for valid hugepage size in hugetlb_get_unmapped_area It looks like most of the hugetlb code is doing the correct thing if hugepages are not supported, but the mmap code is not. If we get into the mmap code when hugepages are not supported, such as in an LPAR which is running Active Memory Sharing, we can oops the kernel. This fixes the oops being seen in this path. oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: nfs(N) lockd(N) nfs_acl(N) sunrpc(N) ipv6(N) fuse(N) loop(N) dm_mod(N) sg(N) ibmveth(N) sd_mod(N) crc_t10dif(N) ibmvscsic(N) scsi_transport_srp(N) scsi_tgt(N) scsi_mod(N) Supported: No NIP: c000000000038d60 LR: c00000000003945c CTR: c0000000000393f0 REGS: c000000077e7b830 TRAP: 0300 Tainted: G (2.6.27.5-bz50170-2-ppc64) MSR: 8000000000009032 <EE,ME,IR,DR> CR: 44000448 XER: 20000001 DAR: c000002000af90a8, DSISR: 0000000040000000 TASK = c00000007c1b8600[4019] 'hugemmap01' THREAD: c000000077e78000 CPU: 6 GPR00: 0000001fffffffe0 c000000077e7bab0 c0000000009a4e78 0000000000000000 GPR04: 0000000000010000 0000000000000001 00000000ffffffff 0000000000000001 GPR08: 0000000000000000 c000000000af90c8 0000000000000001 0000000000000000 GPR12: 000000000000003f c000000000a73880 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000010000 GPR20: 0000000000000000 0000000000000003 0000000000010000 0000000000000001 GPR24: 0000000000000003 0000000000000000 0000000000000001 ffffffffffffffb5 GPR28: c000000077ca2e80 0000000000000000 c00000000092af78 0000000000010000 NIP [c000000000038d60] .slice_get_unmapped_area+0x6c/0x4e0 LR [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80 Call Trace: [c000000077e7bbc0] [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80 [c000000077e7bc30] [c000000000107e30] .get_unmapped_area+0x64/0xd8 [c000000077e7bcb0] [c00000000010b140] .do_mmap_pgoff+0x140/0x420 [c000000077e7bd80] [c00000000000bf5c] .sys_mmap+0xc4/0x140 [c000000077e7be30] [c0000000000086b4] syscall_exit+0x0/0x40 Instruction dump: fac1ffb0 fae1ffb8 fb01ffc0 fb21ffc8 fb41ffd0 fb61ffd8 fb81ffe0 fbc1fff0 fbe1fff8 f821fef1 f8c10158 f8e10160 <7d49002e> f9010168 e92d01b0 eb4902b0 Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 13:48:18 +11:00
James Morris	ec98ce480a	Merge branch 'master' into next Conflicts: fs/nfsd/nfs4recover.c Manually fixed above to use new creds API functions, e.g. nfs4_save_creds(). Signed-off-by: James Morris <jmorris@namei.org>	2008-12-04 17:16:36 +11:00
Kumar Gala	0186f47e70	powerpc: Use RCU based pte freeing mechanism for all powerpc Refactor the RCU based pte free code that was used on ppc64 to be used on all powerpc. Additionally refactor pte_free() & pte_free_kernel() into common code between ppc32 & ppc64. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-03 20:46:35 +11:00
Kumar Gala	f4f3a1261a	powerpc: hash_page_sync should only be used on SMP & STD_MMU_32 Clean up the ifdefs so we only use hash_page_sync if we have CONFIG_SMP && CONFIG_PPC_STD_MMU_32. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-03 20:46:35 +11:00
Paul Mackerras	5274918855	Merge branch 'merge'	2008-12-03 20:11:06 +11:00
Linus Torvalds	03cfdb86ac	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: powerpc: Fix system calls on Cell entered with XER.SO=1 powerpc/cell: Fix GDB watchpoints, again powerpc/mpic: Don't reset affinity for secondary MPIC on boot powerpc/cell/axon-msi: Retry on missing interrupt powerpc: Fix boot freeze on machine with empty memory node powerpc: Fix IRQ assignment for some PCIe devices powerpc/spufs: Fix spinning in spufs_ps_fault on signal powerpc/mpc832x_rdb: fix swapped ethernet ids powerpc: Use generic PHY driver for Marvell 88E1111 PHY on GE Fanuc SBC610 powerpc/85xx: L2 cache size wrong in 8572DS dts powerpc/virtex: Update defconfigs powerpc/52xx: update defconfigs xsysace: Fix driver to use resource_size_t instead of unsigned long powerpc/virtex: fix various format/casting printk mismatches powerpc/mpc5200: fix bestcomm Kconfig dependencies powerpc/44x: Fix 460EX/460GT machine check handling powerpc/40x: Limit allocable DRAM during early mapping	2008-11-30 16:44:18 -08:00
Dave Hansen	4a6186696e	powerpc: Fix boot freeze on machine with empty memory node I got a bug report about a distro kernel not booting on a particular machine. It would freeze during boot: > ... > Could not find start_pfn for node 1 > [boot]0015 Setup Done > Built 2 zonelists in Node order, mobility grouping on. Total pages: 123783 > Policy zone: DMA > Kernel command line: > [boot]0020 XICS Init > [boot]0021 XICS Done > PID hash table entries: 4096 (order: 12, 32768 bytes) > clocksource: timebase mult[7d0000] shift[22] registered > Console: colour dummy device 80x25 > console handover: boot [udbg0] -> real [hvc0] > Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes) > Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes) > freeing bootmem node 0 I've reproduced this on 2.6.27.7. It is caused by commit `8f64e1f2d1` ("powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes"). The problem is that Jon took a loop which was (in pseudocode): for_each_node(nid) NODE_DATA(nid) = careful_alloc(nid); setup_bootmem(nid); reserve_node_bootmem(nid); and broke it up into: for_each_node(nid) NODE_DATA(nid) = careful_alloc(nid); setup_bootmem(nid); for_each_node(nid) reserve_node_bootmem(nid); The issue comes in when the 'careful_alloc()' is called on a node with no memory. It falls back to using bootmem from a previously-initialized node. But, bootmem has not yet been reserved when Jon's patch is applied. It gives back bogus memory (0xc000000000000000) and pukes later in boot. The following patch collapses the loop back together. It also breaks the mark_reserved_regions_for_nid() code out into a function and adds some comments. I think a huge part of introducing this bug is because for loop was too long and hard to read. The actual bug fix here is the: + if (end_pfn <= node->node_start_pfn \|\| + start_pfn >= node_end_pfn) + continue; Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-01 09:40:18 +11:00
Al Viro	4ea8fb9c1c	powerpc set_huge_psize() false positive called only from __init, calls __init. Incidentally, it ought to be static in file. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-30 10:03:35 -08:00
Robert Jennings	a6326e98a2	powerpc: Correct page-in counter for CMM with 64k pages Linux will report the number of page-ins so that the hypervisor can better determine partition memory pressure. The hardware page size and the OS page size can be different. In the case where the hardware page size is 4k and the OS is running with 64k pages the code in commit `409001948d` ("powerpc: Update page-in counter for CMM") would under-report the number of pages. This corrects the reporting to the hypervisor by incrementing the page_in count by 1 << PAGE_FACTOR each time. Reported-by: Andrew Theurer <habanero@linux.vnet.ibm.com> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-19 16:05:05 +11:00
David Howells	1330deb0f6	CRED: Wrap task credential accesses in the PowerPC arch Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:38:39 +11:00
Grant Erickson	5907630ffc	powerpc/40x: Limit allocable DRAM during early mapping If the size of DRAM is not an exact power of two, we may not have covered DRAM in its entirety with large 16 and 4 MiB pages. If that is the case, we can get non-recoverable page faults when doing the final PTE mappings for the non-large page PTEs. Consequently, we restrict the top end of DRAM currently allocable by updating '__initial_memory_limit_addr' so that calls to the LMB to allocate PTEs for "tail" coverage with normal-sized pages (or other reasons) do not attempt to allocate outside the allowed range. Signed-off-by: Grant Erickson <gerickson@nuovations.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2008-11-13 10:10:56 -05:00
Jon Tollefson	7d4320f3d5	powerpc: Hugetlb pgtable cache access cleanup Andrew Morton suggested that using a macro that makes an array reference look like a function call makes it harder to understand the code. This therefore removes the huge_pgtable_cache(psize) macro and replaces its uses with pgtable_cache[HUGE_PGTABLE_INDEX(psize)]. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-06 09:49:39 +11:00
Brian King	409001948d	powerpc: Update page-in counter for CMM A new field has been added to the VPA as a method for the client OS to communicate to firmware the number of page-ins it is performing when running collaborative memory overcommit. The hypervisor will use this information to better determine if a partition is experiencing memory pressure and needs more memory allocated to it. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-05 22:08:28 +11:00
Jon Tollefson	4792adbac9	powerpc: Don't use a 16G page if beyond mem= limits If mem= is used on the boot command line to limit memory then the memory block where a 16G page resides may not be available. Thanks to Michael Ellerman for finding the problem. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-22 15:01:21 +11:00
Benjamin Herrenschmidt	a02efb906d	Merge commit 'origin' into master Manual merge of: arch/powerpc/Kconfig arch/powerpc/include/asm/page.h	2008-10-21 15:52:04 +11:00
Milton Miller	fe55249d17	powerpc: Always trim numa memory to lmb_end_of_DRAM() numa_enforce_memory_limit tried to be smart and only call lmb_end_of_DRAM when a memory limit was set via mem= on the command line. However, the early boot code will also limit memory added to the lmb system when iommu=off is specified. When this happens, the page allocator is given pages not in the linear mapping and this results in a fatal data reference to the unmapped page. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-21 15:19:12 +11:00
Jon Tollefson	e81703724a	powerpc/numa: Make memory reserve code more robust Adjust amount to reserve based on previous nodes for reserves spanning multiple nodes. Check if the node active range is empty before attempting to pass the reserve to bootmem. In practice the range shouldn't be empty, but to be sure we check. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-21 15:17:48 +11:00
Badari Pulavarty	71088785c6	mm: cleanup to make remove_memory() arch-neutral There is nothing architecture specific about remove_memory(). remove_memory() function is common for all architectures which support hotplug memory remove. Instead of duplicating it in every architecture, collapse them into arch neutral function. [akpm@linux-foundation.org: fix the export] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:25 -07:00
David Gibson	f5ea64dcba	powerpc: Get USE_STRICT_MM_TYPECHECKS working again The typesafe version of the powerpc pagetable handling (with USE_STRICT_MM_TYPECHECKS defined) has bitrotted again. This patch makes a bunch of small fixes to get it back to building status. It's still not enabled by default as gcc still generates worse code with it for some reason. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-14 10:35:27 +11:00
Jon Tollefson	8f64e1f2d1	powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes If there are multiple reserved memory blocks via lmb_reserve() that are contiguous addresses and on different NUMA nodes we are losing track of which address ranges to reserve in bootmem on which node. I discovered this when I recently got to try 16GB huge pages on a system with more then 2 nodes. When scanning the device tree in early boot we call lmb_reserve() with the addresses of the 16G pages that we find so that the memory doesn't get used for something else. For example the addresses for the pages could be 4000000000, 4400000000, 4800000000, 4C00000000, etc - 8 pages, one on each of eight nodes. In the lmb after all the pages have been reserved it will look something like the following: lmb_dump_all: memory.cnt = 0x2 memory.size = 0x3e80000000 memory.region[0x0].base = 0x0 .size = 0x1e80000000 memory.region[0x1].base = 0x4000000000 .size = 0x2000000000 reserved.cnt = 0x5 reserved.size = 0x3e80000000 reserved.region[0x0].base = 0x0 .size = 0x7b5000 reserved.region[0x1].base = 0x2a00000 .size = 0x78c000 reserved.region[0x2].base = 0x328c000 .size = 0x43000 reserved.region[0x3].base = 0xf4e8000 .size = 0xb18000 reserved.region[0x4].base = 0x4000000000 .size = 0x2000000000 The reserved.region[0x4] contains the 16G pages. In arch/powerpc/mm/num.c: do_init_bootmem() we loop through each of the node numbers looking for the reserved regions that belong to the particular node. It is not able to identify region 0x4 as being a part of each of the 8 nodes. It is assuming that a reserved region is only on a single node. This patch takes out the reserved region loop from inside the loop that goes over each node. It looks up the active region containing the start of the reserved region. If it extends past that active region then it adjusts the size and gets the next active region containing it. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-10 15:55:19 +11:00
Roland Dreier	a880e76233	powerpc: Avoid integer overflow in page_is_ram() Commit `8b150478` ("ppc: make phys_mem_access_prot() work with pfns instead of addresses") fixed page_is_ram() in arch/ppc to avoid overflow for addresses above 4G on 32-bit kernels. However arch/powerpc's page_is_ram() is missing the same fix -- it computes a physical address by doing pfn << PAGE_SHIFT, which overflows if pfn corresponds to a page above 4G. In particular this causes pages above 4G to be mapped with the wrong caching attribute; for example many ppc440-based SoCs have PCI space above 4G, and mmap()ing MMIO space may end up with a mapping that has caching enabled. Fix this by working with the pfn and avoiding the conversion to physical address that causes the overflow. This patch compares the pfn to max_pfn, which is a semantic change from the old code -- that code compared the physical address to high_memory, which corresponds to max_low_pfn. However, I think that was is another bug, since highmem pages are still RAM. Reported-by: vb <vb@vsbe.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-07 14:26:18 +11:00
Becky Bruce	4ee7084eb1	POWERPC: Allow 32-bit hashed pgtable code to support 36-bit physical This rearranges a bit of code, and adds support for 36-bit physical addressing for configs that use a hashed page table. The 36b physical support is not enabled by default on any config - it must be explicitly enabled via the config system. This patch only expands the page table code to accomodate large physical addresses on 32-bit systems and enables the PHYS_64BIT config option for 86xx. It does not allow you to boot a board with more than about 3.5GB of RAM - for that, SWIOTLB support is also required (and coming soon). Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-09-24 16:29:44 -05:00
Becky Bruce	82331ab15f	powerpc/85xx: fix build warning, remove silly cast This fixes a build warning when PHYS_64BIT is enabled, and removes an unnecessary cast to phys_addr_t (the variable being cast is already a phys_addr_t) Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-09-16 10:01:35 -05:00
David Gibson	0b26425ce1	powerpc: Clean up hugepage pagetable allocation for powerpc with 16G pages There is a small bug in the handling of 16G hugepages recently added to the kernel. This doesn't cause a crash or other user-visible problems, but it does mean that more levels of pagetable are allocated than makes sense for 16G pages. The hugepage pagetables for the 16G pages are allocated much lower in the pagetable tree than they should be, with the intervening levels allocated with full pmd and pud pages which will only ever have one entry filled in. This corrects this problem, at the same time cleaning up the handling of which level 64k versus 16M hugepage pagetables are allocated at. The new way of formatting the tests should be more robust against changes in pagetable structure, or any newly added hugepage sizes. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:08:47 -07:00
Becky Bruce	aaf4a9b0f7	powerpc: Rename PTE_SIZE to HPTE_SIZE It's the size of the hardware PTE; make that clear in the name. Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:08:42 -07:00
Paul Mackerras	549e8152de	powerpc: Make the 64-bit kernel as a position-independent executable This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as a position-independent executable (PIE) when it is set. This involves processing the dynamic relocations in the image in the early stages of booting, even if the kernel is being run at the address it is linked at, since the linker does not necessarily fill in words in the image for which there are dynamic relocations. (In fact the linker does fill in such words for 64-bit executables, though not for 32-bit executables, so in principle we could avoid calling relocate() entirely when we're running a 64-bit kernel at the linked address.) The dynamic relocations are processed by a new function relocate(addr), where the addr parameter is the virtual address where the image will be run. In fact we call it twice; once before calling prom_init, and again when starting the main kernel. This means that reloc_offset() returns 0 in prom_init (since it has been relocated to the address it is running at), which necessitated a few adjustments. This also changes __va and __pa to use an equivalent definition that is simpler. With the relocatable kernel, PAGE_OFFSET and MEMORY_START are constants (for 64-bit) whereas PHYSICAL_START is a variable (and KERNELBASE ideally should be too, but isn't yet). With this, relocatable kernels still copy themselves down to physical address 0 and run there. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:08:38 -07:00
Chandru	cf00085d80	powerpc: Add support for dynamic reconfiguration memory in kexec/kdump kernels Kdump kernel needs to use only those memory regions that it is allowed to use (crashkernel, rtas, tce, etc.). Each of these regions have their own sizes and are currently added under 'linux,usable-memory' property under each memory@xxx node of the device tree. The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory node (on POWER6) now stores in it the representation for most of the logical memory blocks with the size of each memory block being a constant (lmb_size). If one or more or part of the above mentioned regions lie under one of the lmb from ibm,dynamic-memory property, there is a need to identify those regions within the given lmb. This makes the kernel recognize a new 'linux,drconf-usable-memory' property added by kexec-tools. Each entry in this property is of the form of a count followed by that many (base, size) pairs for the above mentioned regions. The number of cells in the count value is given by the #size-cells property of the root node. Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:07:58 -07:00
Paul Mackerras	7e392f8c29	Merge branch 'linux-2.6'	2008-09-10 11:36:13 +10:00
Paul Mackerras	9e88ba4e45	powerpc: Only make kernel text pages of linear mapping executable Commit `bc033b63bb` ("powerpc/mm: Fix attribute confusion with htab_bolt_mapping()") moved the check for whether we should make pages of the linear mapping executable from htab_bolt_mapping into its callers, including htab_initialize. A side-effect of this is that the decision is now made once for each contiguous section in the LMB array rather than for each page individually. This can often mean that the whole of the linear mapping ends up being executable. This reverts to the previous behaviour, where individual pages are checked for being part of the kernel text or not, by moving the check back down into htab_bolt_mapping. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-03 20:53:22 +10:00
Tony Breeds	e16a9c0990	powerpc: Guard htab_dt_scan_hugepage_blocks appropriately htab_dt_scan_hugepage_blocks is only used when CONFIG_HUGETLB_PAGE is defined, so guard the declaration likewise. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-20 16:34:57 +10:00
Benjamin Herrenschmidt	bc033b63bb	powerpc/mm: Fix attribute confusion with htab_bolt_mapping() The function htab_bolt_mapping() is used to create permanent mappings in the MMU hash table, for example, in order to create the linear mapping of vmemmap. It's also used by early boot ioremap (before mem_init_done). However, the way ioremap uses it is incorrect as it passes it the protection flags in the "linux PTE" form while htab_bolt_mapping() expects them in the hash table format. This is made more confusing by the fact that some of those flags are actually in the same position in both cases. This fixes it all by making htab_bolt_mapping() take normal linux protection flags instead, and use a little helper to convert them to htab flags. Callers can now use the usual PAGE_* definitions safely. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> arch/powerpc/include/asm/mmu-hash64.h \| 2 - arch/powerpc/mm/hash_utils_64.c \| 65 ++++++++++++++++++++-------------- arch/powerpc/mm/init_64.c \| 9 +--- 3 files changed, 44 insertions(+), 32 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-11 10:09:56 +10:00
Tony Breeds	c7c8eede27	powerpc: Force printing of 'total_memory' to unsigned long long total_memory is a 'phys_addr_t', Which can be either 64 or 32 bits. Force printing as unsigned long long to silence the warning. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-04 13:18:17 +10:00
Tony Breeds	fb61063587	powerpc: Fix compiler warning in arch/powerpc/mm/mem.c Explicitly cast to unsigned long long, rather than u64. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-04 13:18:17 +10:00
Stephen Rothwell	b8b572e101	powerpc: Move include files to arch/powerpc/include/asm from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-04 12:02:00 +10:00
Nick Piggin	ce0ad7f095	powerpc/mm: Lockless get_user_pages_fast() for 64-bit (v3) Implement lockless get_user_pages_fast for 64-bit powerpc. Page table existence is guaranteed with RCU, and speculative page references are used to take a reference to the pages without having a prior existence guarantee on them. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-30 15:26:54 +10:00
Benjamin Herrenschmidt	00df438e89	powerpc: Disable 64K hugetlb support when doing 64K SPU mappings The 64K SPU local store mapping feature is incompatible with the 64K huge pages support due to the inability of some parts of the memory management to differenciate between them while they use a different page table format. For now, disable 64K huge pages when CONFIG_SPU_FS_64K_LS, in the long run, this can be fixed by making this feature use the hugetlb page table format. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-28 16:30:53 +10:00
Johannes Weiner	bda2fa5355	powerpc: use generic show_mem() Remove arch-specific show_mem() in favor of the generic version. This also removes the following redundant information display: - pages in swapcache, printed by show_swap_cache_info() where show_mem() calls show_free_areas(), which calls show_swap_cache_info(). Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:11 -07:00
Alexey Dobriyan	51cc50685a	SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Luis Machado	d6a61bfc06	powerpc: BookE hardware watchpoint support This patch implements support for HW based watchpoint via the DBSR_DAC (Data Address Compare) facility of the BookE processors. It does so by interfacing with the existing DABR breakpoint code and adding the necessary bits and pieces for the new bits to be properly set or cleared Signed-off-by: Luis Machado <luisgpm@br.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-25 15:44:39 +10:00
Jon Tollefson	0d9ea75443	powerpc: support multiple hugepage sizes Instead of using the variable mmu_huge_psize to keep track of the huge page size we use an array of MMU_PAGE_* values. For each supported huge page size we need to know the hugepte_shift value and have a pgtable_cache. The hstate or an mmu_huge_psizes index is passed to functions so that they know which huge page size they should use. The hugepage sizes 16M and 64K are setup(if available on the hardware) so that they don't have to be set on the boot cmd line in order to use them. The number of 16G pages have to be specified at boot-time though (e.g. hugepagesz=16G hugepages=5). Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Jon Tollefson	91224346aa	powerpc: define support for 16G hugepages The huge page size is defined for 16G pages. If a hugepagesz of 16G is specified at boot-time then it becomes the huge page size instead of the default 16M. The change in pgtable-64K.h is to the macro pte_iterate_hashed_subpages to make the increment to va (the 1 being shifted) be a long so that it is not shifted to 0. Otherwise it would create an infinite loop when the shift value is for a 16G page (when base page size is 64K). Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Jon Tollefson	658013e93e	powerpc: scan device tree for gigantic pages The 16G huge pages have to be reserved in the HMC prior to boot. The location of the pages are placed in the device tree. This patch adds code to scan the device tree during very early boot and save these page locations until hugetlbfs is ready for them. Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Jon Tollefson	ec4b2c0c83	powerpc: function to allocate gigantic hugepages The 16G page locations have been saved during early boot in an array. The alloc_bootmem_huge_page() function adds a page from here to the huge_boot_pages list. Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Andi Kleen	ceb8687961	hugetlb: introduce pud_huge Straight forward extensions for huge pages located in the PUD instead of PMDs. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:18 -07:00
Andi Kleen	a551643895	hugetlb: modular state for hugetlb page size The goal of this patchset is to support multiple hugetlb page sizes. This is achieved by introducing a new struct hstate structure, which encapsulates the important hugetlb state and constants (eg. huge page size, number of huge pages currently allocated, etc). The hstate structure is then passed around the code which requires these fields, they will do the right thing regardless of the exact hstate they are operating on. This patch adds the hstate structure, with a single global instance of it (default_hstate), and does the basic work of converting hugetlb to use the hstate. Future patches will add more hstate structures to allow for different hugetlbfs mounts to have different page sizes. [akpm@linux-foundation.org: coding-style fixes] Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Jan Beulich	42b7772812	mm: remove double indirection on tlb parameter to free_pgd_range() & Co The double indirection here is not needed anywhere and hence (at least) confusing. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Benjamin Herrenschmidt	a1f242ff46	powerpc ioremap_prot This adds ioremap_prot and pte_pgprot() so that one can extract protection bits from a PTE and use them to ioremap_prot() (in order to support ptrace of VM_IO \| VM_PFNMAP as per Rik's patch). This moves a couple of flag checks around in the ioremap implementations of arch/powerpc. There's a side effect of allowing non-cacheable and non-guarded mappings on ppc32 which before would always have _PAGE_GUARDED set whenever _PAGE_NO_CACHE is. (standard ioremap will still set _PAGE_GUARDED, but ioremap_prot will be capable of setting such a non guarded mapping). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Dave Airlie <airlied@linux.ie> Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Johannes Weiner	b61bfa3c46	mm: move bootmem descriptors definition to a single place There are a lot of places that define either a single bootmem descriptor or an array of them. Use only one central array with MAX_NUMNODES items instead. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:14 -07:00
Benjamin Herrenschmidt	84c3d4aaec	Merge commit 'origin/master' Manual merge of: arch/powerpc/Kconfig arch/powerpc/kernel/stacktrace.c arch/powerpc/mm/slice.c arch/ppc/kernel/smp.c	2008-07-16 11:07:59 +10:00
Stefan Roese	2bf3016f89	powerpc: Fix problems with 32bit PPC's running with >= 4GB of RAM This patch enables 32bit PPC's (with 36bit physical address space, e.g. IBM/AMCC PPC44x) to run with >= 4GB of RAM. Mostly its just replacing types (unsigned long -> phys_addr_t). Tested on an AMCC Katmai with 4GB of DDR2. Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2008-07-09 14:13:01 -04:00
Benjamin Herrenschmidt	1bc54c0311	powerpc: rework 4xx PTE access and TLB miss This is some preliminary work to improve TLB management on SW loaded TLB powerpc platforms. This introduce support for non-atomic PTE operations in pgtable-ppc32.h and removes write back to the PTE from the TLB miss handlers. In addition, the DSI interrupt code no longer tries to fixup write permission, this is left to generic code, and _PAGE_HWWRITE is gone. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2008-07-09 13:36:17 -04:00
Stephen Rothwell	392096e98f	generic-ipi: fix linux-next tree build failure Today's linux-next build (powerpc ppc64_defconfig) failed like this: arch/powerpc/mm/tlb_64.c: In function 'pgtable_free_now': arch/powerpc/mm/tlb_64.c:66: error: too many arguments to function 'smp_call_function' arch/powerpc/kernel/machine_kexec_64.c: In function 'kexec_prepare_cpus': arch/powerpc/kernel/machine_kexec_64.c:175: error: too many arguments to function 'smp_call_function' Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: <linuxppc-dev@ozlabs.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-03 09:25:42 +02:00
Nathan Fontenot	0db9360aaa	powerpc/pseries: Update numa association of hotplug memory add for drconf memory Update the association of a memory section with a numa node that occurs during hotplug add of a memory section. This adds a check in the hot_add_scn_to_nid() routine for the ibm,dynamic-reconfiguration-memory node in the device tree. If present the new hot_add_drconf_scn_to_nid() routine is invoked, which can properly parse the ibm,dynamic-reconfiguration-memory node of the device tree and make the proper numa node associations. This also introduces the valid_hot_add_scn() routine as a helper function for code that is common to the hot_add_scn_to_nid() and hot_add_drconf_scn_to_nid() routines. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-03 16:58:18 +10:00
Nathan Fontenot	8342681d3e	powerpc/pseries: Split code into helper routines for drconf memory This splits off several pieces of code that parse the ibm,dynamic-reconfiguration-memory node of the device tree into separate helper routines. This is in preparation for the next commit that will use these helper routines. There are no functional changes in this patch. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-03 16:58:17 +10:00
Tony Breeds	db7f37de2c	powerpc: Fix building of arch/powerpc/mm/mem.o when MEMORY_HOTPLUG=y and SPARSEMEM=n Currently the kernel fails to build with the above config options with: CC arch/powerpc/mm/mem.o arch/powerpc/mm/mem.c: In function 'arch_add_memory': arch/powerpc/mm/mem.c:130: error: implicit declaration of function 'create_section_mapping' This explicitly includes asm/sparsemem.h in arch/powerpc/mm/mem.c and moves the guards in include/asm-powerpc/sparsemem.h to protect the SPARSEMEM specific portions only. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-03 16:58:07 +10:00
Dave Kleikamp	87e9ab13c3	powerpc: hash_huge_page() should get the WIMG bits from the lpte Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:02 +10:00
Paul Mackerras	3a8247cc2c	powerpc: Only demote individual slices rather than whole process At present, if we have a kernel with a 64kB page size, and some process maps something that has to be mapped with 4kB pages (such as a cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair pages), we change the process to use 4kB pages everywhere. This hurts the performance of HPC programs that access eHCA from userspace. With this patch, the kernel will only demote the slice(s) containing the eHCA or cache-inhibited mappings, leaving the remaining slices able to use 64kB hardware pages. This also changes the slice_get_unmapped_area code so that it is willing to place a 64k-page mapping into (or across) a 4k-page slice if there is no better alternative, i.e. if the program specified MAP_FIXED or if there is not sufficient space available in slices that are either empty or already have 64k-page mappings in them. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-01 11:27:57 +10:00
Becky Bruce	316a405841	powerpc: Get rid of bitfields in ppc_bat struct While working on the 36-bit physical support, I noticed that there was exactly one line of code that actually referenced the bitfields. So I got rid of them and redefined ppc_bat as a struct of 2 u32's: batu and batl. I also got rid of the previous union that held the bitfield structs and a word representation of the batu/l values. This seems like a nicer solution than adding in a bunch of new bitfields to support extended bat addressing that would never get used, and just leaving the struct as-is would have been incomplete in the face of large physical addressing. Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-30 22:31:05 +10:00
Becky Bruce	7c5c4325d2	powerpc: Change BAT code to use phys_addr_t Currently, the physical address is an unsigned long, but it should be phys_addr_t in set_bat, [v/p]_mapped_by_bat. Also, create a macro that can convert a large physical address into the correct format for programming the BAT registers. Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-30 22:31:03 +10:00
Benjamin Herrenschmidt	41743a4e34	powerpc: Free a PTE bit on ppc64 with 64K pages This frees a PTE bit when using 64K pages on ppc64. This is done by getting rid of the separate _PAGE_HASHPTE bit. Instead, we just test if any of the 16 sub-page bits is set. For non-combo pages (ie. real 64K pages), we set SUB0 and the location encoding in that field. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-30 22:30:53 +10:00
Paul Mackerras	e9a4b6a3f6	Merge branch 'linux-2.6'	2008-06-30 10:16:50 +10:00
Jens Axboe	15c8b6c1aa	on_each_cpu(): kill unused 'retry' parameter It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-06-26 11:24:38 +02:00
Paul Mackerras	65ba6cdc83	[POWERPC] Clear sub-page HPTE present bits when demoting page size When we demote a slice from 64k to 4k, and we are about to insert an HPTE for a 4k subpage and we notice that there is an existing 64k HPTE, we first invalidate that HPTE before inserting the new 4k subpage HPTE. Since the bits that encode which hash bucket the old HPTE was in overlap with the bits that encode which of the 16 subpages have HPTEs, we need to clear out the subpage HPTE-present bits before starting to insert HPTEs for the 4k subpages. If we don't do that, we can erroneously think that a subpage already has an HPTE when it doesn't. That in itself wouldn't be such a problem except that when we go to update the HPTE that we think is present on machines with a hypervisor, the hypervisor can tell us that the HPTE we think is there is actually there even though it isn't, which can lead to a process getting stuck in a loop, continually faulting. The reason for the confusion is that the AVPN (abbreviated virtual page number) we are looking for in the HPTE for a 4k subpage can actually match the AVPN in a stale HPTE for another 64k page. For example, the HPTE for the 4k subpage at 0x84000f000 will be in the same hash bucket and have the same AVPN as the HPTE for the 64k page at 0x8400f0000. This fixes the code to clear out the subpage HPTE-present bits. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-18 21:40:43 +10:00
Paul Mackerras	8a3e1c670e	Merge branch 'merge' Conflicts: arch/powerpc/sysdev/fsl_soc.c	2008-06-09 12:19:41 +10:00
Nathan Lynch	0d5799449f	[POWERPC] Make walk_memory_resource available with MEMORY_HOTPLUG=n The ehea driver was recently changed[1] to use walk_memory_resource() to detect the system's memory layout. However, walk_memory_resource() is available only when memory hotplug is enabled. So CONFIG_EHEA was made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a network driver to have such a dependency. Make the declaration of walk_memory_resource() and its powerpc implementation (ehea is powerpc-specific) unconditionally available. [1] `48cfb14f8b` "ehea: Add DLPAR memory remove support" [2] `fb7b6ca2b6` "ehea: Add dependency to Kconfig" Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-09 11:32:41 +10:00
Paul Mackerras	acf464817d	Merge branch 'merge' into powerpc-next	2008-05-23 16:53:23 +10:00
David Gibson	46a7417963	[POWERPC] Fix __set_fixmap() for STRICT_MM_TYPECHECKS __set_fixmap() in pgtable_32.c currently fails to compile if STRICT_MM_TYPECHECKS is defined. This fixes it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-23 16:15:32 +10:00
Adrian Bunk	d3d3d3cdb1	[POWERPC] powerpc/mm/hash_low_32.S: Remove CVS keyword This removes a CVS keyword that wasn't updated for a long time from a comment. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-20 09:34:18 +10:00
Paul Mackerras	fcff474ea5	Merge branch 'linux-2.6' into powerpc-next	2008-05-16 23:13:42 +10:00
Benjamin Herrenschmidt	cec08e7a94	[POWERPC] vmemmap fixes to use smaller pages This changes vmemmap to use a different region (region 0xf) of the address space, and to configure the page size of that region dynamically at boot. The problem with the current approach of always using 16M pages is that it's not well suited to machines that have small amounts of memory such as small partitions on pseries, or PS3's. In fact, on the PS3, failure to allocate the 16M page backing vmmemmap tends to prevent hotplugging the HV's "additional" memory, thus limiting the available memory even more, from my experience down to something like 80M total, which makes it really not very useable. The logic used by my match to choose the vmemmap page size is: - If 16M pages are available and there's 1G or more RAM at boot, use that size. - Else if 64K pages are available, use that - Else use 4K pages I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages) and it seems to work fine. Note that I intend to change the way we organize the kernel regions & SLBs so the actual region will change from 0xf back to something else at one point, as I simplify the SLB miss handler, but that will be for a later patch. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-15 20:49:25 +10:00
Michael Ellerman	c884116ac3	[POWERPC] Remove duplicate variable definitions in mm/tlb_64.c Somewhere along the way (`e28f7faf05`, "Four level pagetables for ppc64") we ended up with duplicate definitions for pte_freelist_cur and pte_freelist_force_free. Somehow this compiles, but it would be better to just have one definition for each. The two definitions we end up with can be static too! Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-14 22:31:49 +10:00
Michael Ellerman	572fb578de	[POWERPC] Move declaration of tce variables into mmu-hash64.h ... instead of having extern declarations in a .c file. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-14 22:31:47 +10:00
Michael Ellerman	09de9ff872	[POWERPC] Fix sparse warnings in arch/powerpc/mm Make two vmemmap helpers static in init_64.c Make stab variables static in stab.c Make psize defs static in hash_utils_64.c Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-14 22:31:46 +10:00
Michael Ellerman	5f25f06529	[POWERPC] Move declaration of init_bootmem_done into system.h ... instead of having an extern declaration in a .c file. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-14 22:31:44 +10:00
Paul Mackerras	3b5750644b	[POWERPC] Bolt in SLB entry for kernel stack on secondary cpus This fixes a regression reported by Kamalesh Bulabel where a POWER4 machine would crash because of an SLB miss at a point where the SLB miss exception was unrecoverable. This regression is tracked at: http://bugzilla.kernel.org/show_bug.cgi?id=10082 SLB misses at such points shouldn't happen because the kernel stack is the only memory accessed other than things in the first segment of the linear mapping (which is mapped at all times by entry 0 of the SLB). The context switch code ensures that SLB entry 2 covers the kernel stack, if it is not already covered by entry 0. None of entries 0 to 2 are ever replaced by the SLB miss handler. Where this went wrong is that the context switch code assumes it doesn't have to write to SLB entry 2 if the new kernel stack is in the same segment as the old kernel stack, since entry 2 should already be correct. However, when we start up a secondary cpu, it calls slb_initialize, which doesn't set up entry 2. This is correct for the boot cpu, where we will be using a stack in the kernel BSS at this point (i.e. init_thread_union), but not necessarily for secondary cpus, whose initial stack can be allocated anywhere. This doesn't cause any immediate problem since the SLB miss handler will just create an SLB entry somewhere else to cover the initial stack. In fact it's possible for the cpu to go quite a long time without SLB entry 2 being valid. Eventually, though, the entry created by the SLB miss handler will get overwritten by some other entry, and if the next access to the stack is at an unrecoverable point, we get the crash. This fixes the problem by making slb_initialize create a suitable entry for the kernel stack, if we are on a secondary cpu and the stack isn't covered by SLB entry 0. This requires initializing the get_paca()->kstack field earlier, so I do that in smp_create_idle where the current field is initialized. This also abstracts a bit of the computation that mk_esid_data in slb.c does so that it can be used in slb_initialize. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-02 15:00:45 +10:00
Geoff Levand	bbea346062	[POWERPC] Fix slb.c compile warnings Arrange for a syntax check to always be done on the powerpc/mm/slb.c DBG() macro by defining it to pr_debug() for non-debug builds. Also, fix these related compile warnings: slb.c:273: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int slb.c:274: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int' Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-02 15:00:44 +10:00
Badari Pulavarty	9d88a2eb6e	[POWERPC] Provide walk_memory_resource() for powerpc Provide walk_memory_resource() for 64-bit powerpc. PowerPC maintains logical memory region mapping in the lmb.memory structure. Walk through these structures and do the callbacks for the contiguous chunks. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-29 15:57:53 +10:00
Jeremy Fitzhardinge	180c06efce	hotplug-memory: make online_page() common All architectures use an effectively identical definition of online_page(), so just make it common code. x86-64, ia64, powerpc and sh are actually identical; x86-32 is slightly different. x86-32's differences arise because it puts its hotplug pages in the highmem zone. We can handle this in the generic code by inspecting the page to see if its in highmem, and update the totalhigh_pages count appropriately. This leaves init_32.c:free_new_highpage with a single caller, so I folded it into add_one_highpage_init. I also removed an incorrect comment referring to the NUMA case; any NUMA details have already been dealt with by the time online_page() is called. [akpm@linux-foundation.org: fix indenting] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com> Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:17 -07:00
Kumar Gala	f608600e74	[POWERPC] Clean up access to thread_info in assembly Use (31-THREAD_SHIFT) to get to thread_info from stack pointer. This makes the code a bit easier to read and more robust if we ever change THREAD_SHIFT. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-24 20:58:02 +10:00
Kumar Gala	2c419bdeca	[POWERPC] Port fixmap from x86 and use for kmap_atomic The fixmap code from x86 allows us to have compile time virtual addresses that we change the physical addresses of at run time. This is useful for applications like kmap_atomic, PCI config that is done via direct memory map, kexec/kdump. We got ride of CONFIG_HIGHMEM_START as we can now determine a more optimal location for PKMAP_BASE based on where the fixmap addresses start and working back from there. Additionally, the kmap code in asm-powerpc/highmem.h always had debug enabled. Moved to using CONFIG_DEBUG_HIGHMEM to determine if we should have the extra debug checking. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-24 20:58:02 +10:00
Kumar Gala	37dd2badcf	[POWERPC] 85xx: Add support for relocatable kernel (and booting at non-zero) Added support to allow an 85xx kernel to be run from a non-zero physical address (useful for cooperative asymmetric multiprocessing situations and kdump). The support can be configured at compile time by setting CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as desired. Alternatively, the kernel build can set CONFIG_RELOCATABLE. Setting this config option causes the kernel to determine at runtime the physical addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START. If CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning. However, CONFIG_PHYSICAL_START will always be used to set the LOAD program header physical address field in the resulting ELF image. Currently we are limited to running at a physical address that is a multiple of 256M. This is due to how we map TLBs to cover lowmem. This should be fixed to allow 64M or maybe even 16M alignment in the future. It is considered an error to try and run a kernel at a non-aligned physical address. All the magic for this support is accomplished by proper initialization of the kernel memory subsystem and use of ARCH_PFN_OFFSET. The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings. ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET. /dev/mem continues to allow access to any physical address in the system regardless of how CONFIG_PHYSICAL_START is set. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-24 20:58:01 +10:00
Michael Ellerman	6df1646e31	[POWERPC] Add include of linux/of.h to numa.c numa.c requires routines declared in linux/of.h, so should include it. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-24 20:57:32 +10:00
Olof Johansson	49a9997884	[POWERPC] Remove unused __max_memory variable Remove the __max_memory variable, as it is not referenced anywhere in the tree besides some code in arch/ppc. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-18 15:37:11 +10:00
Kumar Gala	7711684947	[POWERPC] Remove unused machine call outs When we moved to arch/powerpc we actively tried to avoid using the ppc_md.setup_io_mappings(). Currently no board ports use it so let's remove it to avoid any new boards using it. Also, remove early_serial_map() since we don't even have a call out for it in arch/powerpc. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-17 10:01:00 +10:00
Kumar Gala	09b5e63f82	[POWERPC] Rename __initial_memory_limit to __initial_memory_limit_addr We always use __initial_memory_limit as an address so rename it to be clear. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-17 07:46:13 +10:00
Kumar Gala	0aef996b37	[POWERPC] 85xx: Cleanup TLB initialization * Determine the RPN we are running the kernel at runtime rather than using compile time constant for initial TLB * Cleanup adjust_total_lowmem() to respect memstart_addr and be a bit more clear on variables that are sizes vs addresses. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-17 07:46:13 +10:00
Kumar Gala	d7917ba705	[POWERPC] Introduce lowmem_end_addr to distinguish from total_lowmem total_lowmem represents the amount of low memory, not the physical address that low memory ends at. If the start of memory is at 0 it happens that total_lowmem can be used as both the size and the address that lowmem ends at (or more specifically one byte beyond the end). To make the code a bit more clear and deal with the case when the start of memory isn't at physical 0, we introduce lowmem_end_addr that represents one byte beyond the last physical address in the lowmem region. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-17 07:46:13 +10:00
Kumar Gala	99c62dd773	[POWERPC] Remove and replace uses of PPC_MEMSTART with memstart_addr A number of users of PPC_MEMSTART (40x, ppc_mmu_32) can just always use 0 as we don't support booting these kernels at non-zero physical addresses since their exception vectors must be at 0 (or 0xfffx_xxxx). For the sub-arches that support relocatable interrupt vectors (book-e), it's reasonable to have memory start at a non-zero physical address. For those cases use the variable memstart_addr instead of the #define PPC_MEMSTART since the only uses of PPC_MEMSTART are for initialization and in the future we can set memstart_addr at runtime to have a relocatable kernel. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-17 07:46:12 +10:00
Paul Mackerras	ac7c5353b1	Merge branch 'linux-2.6'	2008-04-14 21:11:02 +10:00
Stephen Rothwell	ae86f0088d	[POWERPC] htab_remove_mapping is only used by MEMORY_HOTPLUG This eliminates a warning in builds that don't define CONFIG_MEMORY_HOTPLUG. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-07 13:49:25 +10:00
Benjamin Herrenschmidt	b991f05f13	[POWERPC] Fix deadlock with mmu_hash_lock in hash_page_sync hash_page_sync() takes and releases the low level mmu hash lock in order to sync with other processors disposing of page tables. Because that lock can be needed to service hash misses triggered by interrupt handlers, taking it must be done with interrupts off. However, hash_page_sync() appears to be called with interrupts enabled, thus causing occasional deadlocks. We fix it by making sure hash_page_sync() masks interrupts while holding the lock. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-03 22:11:11 +11:00
Johannes Weiner	745681a524	[POWERPC] Remove redundant display of free swap space in show_mem() show_mem() has no need to print the amount of free swap space manually because show_free_areas() does this already and is called by the former. The two outputs only differ in text formatting: printk("Free swap = %lukB\n", ...); printk("Free swap: %6ldkB\n", ...); Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-01 20:43:10 +11:00
Harvey Harrison	e48b1b452f	[POWERPC] Replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-01 20:43:09 +11:00
Geert Uytterhoeven	fa90f70a8e	[POWERPC] arch_add_memory() cannot be __devinit WARNING: vmlinux.o(.text+0xb41b0): Section mismatch in reference from the function .add_memory() to the function .devinit.text:.arch_add_memory() The function .add_memory() references the function __devinit .arch_add_memory(). This is often because .add_memory lacks a __devinit annotation or the annotation of .arch_add_memory is wrong. arch_add_memory() is also not __devinit on other architectures Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-01 20:43:08 +11:00
Badari Pulavarty	52db9b4426	[POWERPC] Add error return from htab_remove_mapping() If the platform doesn't support hpte_removebolted(), gracefully return failure rather than success. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-01 20:43:08 +11:00
Paul Mackerras	54f53f2b94	Merge branch 'linux-2.6'	2008-03-26 08:44:18 +11:00
Paul Mackerras	cfe666b145	[POWERPC] Don't use 64k pages for ioremap on pSeries On pSeries, the hypervisor doesn't let us map in the eHEA ethernet adapter using 64k pages, and thus the ehea driver will fail if 64k pages are configured. This works around the problem by always using 4k pages for ioremap on pSeries (but not on other platforms). A better fix would be to check whether the partition could ever have an eHEA adapter, and only force 4k pages if it could, but this will do for 2.6.25. This is based on an earlier patch by Tony Breeds. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-03-24 17:41:22 +11:00
Anton Blanchard	44387e9ff2	[POWERPC] Fix PMU + soft interrupt disable bug Since the PMU is an NMI now, it can come at any time we are only soft disabled. We must hard disable around the two places we allow the kernel stack SLB and r1 to go out of sync. Otherwise the PMU exception can force a kernel stack SLB into another slot, which can lead to it getting evicted, which can lead to a nasty unrecoverable SLB miss in the exception entry code. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-03-20 10:14:55 +11:00
Paul Mackerras	bed04a4413	Merge branch 'linux-2.6'	2008-03-13 15:26:33 +11:00
Michael Ellerman	31bf111944	[POWERPC] Fix large hash table allocation on Cell blades My recent hack to allocate the hash table under 1GB on cell was poorly tested, cough. It turns out on blades with large amounts of memory we fail to allocate the hash table at all. This is because RTAS has been instantiated just below 768MB, and 0-x MB are used by the kernel, leaving no areas that are both large enough and also naturally-aligned. For the cell IOMMU hack the page tables must be under 2GB, so use that as the limit instead. This has been tested on real hardware and boots happily. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-03-13 10:10:26 +11:00
Badari Pulavarty	f8c8803bda	[POWERPC] Add code for removing HPTEs for parts of the linear mapping For memory remove, we need to clean up htab mappings for the section of the memory we are removing. This implements support for removing htab bolted mappings for pSeries logical partitions. Other sub-archs may need to implement similar functionality for hotplug memory remove to work on them. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-02-26 22:17:03 +11:00
David S. Miller	d9b2b2a277	[LIB]: Make PowerPC LMB code generic so sparc64 can use it too. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-13 16:56:49 -08:00
Linus Torvalds	dde0013782	Merge branch 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: [POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpc [POWERPC] Enable hotplug memory remove for 64-bit powerpc [POWERPC] Add remove_memory() for 64-bit powerpc [POWERPC] Make cell IOMMU fixed mapping printk more useful [POWERPC] Fix potential cell IOMMU bug when switching back to default DMA ops [POWERPC] Don't enable cell IOMMU fixed mapping if there are no dma-ranges [POWERPC] Fix cell IOMMU null pointer explosion on old firmwares [POWERPC] spufs: Fix timing dependent false return from spufs_run_spu [POWERPC] spufs: No need to have a runnable SPU for libassist update [POWERPC] spufs: Update SPU_Status[CISHP] in backing runcntl write [POWERPC] spufs: Fix state_mutex leaks [POWERPC] Disable G5 NAP mode during SMU commands on U3	2008-02-08 09:31:42 -08:00
Martin Schwidefsky	2f569afd9c	CONFIG_HIGHPTE vs. sub-page page tables. Background: I've implemented 1K/2K page tables for s390. These sub-page page tables are required to properly support the s390 virtualization instruction with KVM. The SIE instruction requires that the page tables have 256 page table entries (pte) followed by 256 page status table entries (pgste). The pgstes are only required if the process is using the SIE instruction. The pgstes are updated by the hardware and by the hypervisor for a number of reasons, one of them is dirty and reference bit tracking. To avoid wasting memory the standard pte table allocation should return 1K/2K (31/64 bit) and 2K/4K if the process is using SIE. Problem: Page size on s390 is 4K, page table size is 1K or 2K. That means the s390 version for pte_alloc_one cannot return a pointer to a struct page. Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one cannot return a pointer to a pte either, since that would require more than 32 bit for the return value of pte_alloc_one (and the pte * would not be accessible since its not kmapped). Solution: The only solution I found to this dilemma is a new typedef: a pgtable_t. For s390 pgtable_t will be a (pte ) - to be introduced with a later patch. For everybody else it will be a (struct page ). The additional problem with the initialization of the ptl lock and the NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and a destructor pgtable_page_dtor. The page table allocation and free functions need to call these two whenever a page table page is allocated or freed. pmd_populate will get a pgtable_t instead of a struct page pointer. To get the pgtable_t back from a pmd entry that has been installed with pmd_populate a new function pmd_pgtable is added. It replaces the pmd_page call in free_pte_range and apply_to_pte_range. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:42 -08:00
Badari Pulavarty	a99824f327	[POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpc walk_memory_resource() verifies if there are holes in a given memory range, by checking against /proc/iomem. On x86/ia64 system memory is represented in /proc/iomem. On powerpc, we don't show system memory as IO resource in /proc/iomem - instead it's maintained in /proc/device-tree. This provides a way for an architecture to provide its own walk_memory_resource() function. On powerpc, the memory region is small (16MB), contiguous and non-overlapping. So extra checking against the device-tree is not needed. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-02-08 19:52:48 +11:00
Badari Pulavarty	aa620abe75	[POWERPC] Add remove_memory() for 64-bit powerpc Supply remove_memory() function for 64-bit powerpc. This is still not quite complete as it needs to do some more arch-specific stuff, which will be added in a later patch. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-02-08 19:52:47 +11:00
Linus Torvalds	3796958130	Merge branch 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (69 commits) [POWERPC] Add SPE registers to core dumps [POWERPC] Use regset code for compat PTRACE_REGS calls [POWERPC] Use generic compat_sys_ptrace [POWERPC] Use generic compat_ptrace_request [POWERPC] Use generic ptrace peekdata/pokedata [POWERPC] Use regset code for PTRACE_REGS requests [POWERPC] Switch to generic compat_binfmt_elf code [POWERPC] Switch to using user_regset-based core dumps [POWERPC] Add user_regset compat support [POWERPC] Add user_regset_view definitions [POWERPC] Use user_regset accessors for GPRs [POWERPC] ptrace accessors for special regs MSR and TRAP [POWERPC] Use user_regset accessors for SPE regs [POWERPC] Use user_regset accessors for altivec regs [POWERPC] Use user_regset accessors for FP regs [POWERPC] mpc52xx: fix compile error introduce when rebasing patch [POWERPC] 4xx: PCIe indirect DCR spinlock fix. [POWERPC] Add missing native dcr dcr_ind_lock spinlock [POWERPC] 4xx: Fix offset value on Warp board [POWERPC] 4xx: Add 440EPx Sequoia ehci dts entry ...	2008-02-07 09:02:26 -08:00
Bernhard Walle	72a7fe3967	Introduce flags for reserve_bootmem() This patchset adds a flags variable to reserve_bootmem() and uses the BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions between crashkernel area and already used memory. This patch: Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE. If that flag is set, the function returns with -EBUSY if the memory already has been reserved in the past. This is to avoid conflicts. Because that code runs before SMP initialisation, there's no race condition inside reserve_bootmem_core(). [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix powerpc build] Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: <linux-arch@vger.kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-07 08:42:25 -08:00
Balbir Singh	1daa6d08d1	[POWERPC] Fake NUMA emulation for PowerPC Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake NUMA nodes can be specified using the following command line option numa=fake=<node range> node range is of the format <range1>,<range2>,...<rangeN> Each of the rangeX parameters is passed using memparse(). I find the patch useful for fake NUMA emulation on my simple PowerPC machine. I've tested it on a numa box with the following arguments numa=fake=512M numa=fake=512M,768M numa=fake=256M,512M mem=512M numa=fake=1G mem=768M numa=fake= without any numa= argument The other side-effect introduced by this patch is that; in the case where we don't have NUMA information, we now set a node online after adding each LMB. This node could very well be node 0, but in the case that we enable fake NUMA nodes, when we cross node boundaries, we need to set the new node online. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-02-07 11:40:19 +11:00
Scott Wood	551ed332da	[POWERPC] update_mmu_cache: Don't cache-flush non-readable pages Currently, update_mmu_cache will crash if given a no-access PTE. There's no need to synchronize dcache/icache unless it's an exec mapping -- however, due to the existence of older glibc versions that execute out of a read-but-no-exec page, readability is tested instead. This assumes no exec-only mappings; if such mappings become supported, they will need to go through the kmap_atomic() version of dcache/icache synchronization. This fixes a bug reported by some users where the kernel would crash while dumping core on a threaded program. Signed-off-by: Scott Wood <scottwood@freescale.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-02-06 16:30:01 +11:00
Benjamin Herrenschmidt	5e5419734c	add mm argument to pte/pmd/pud/pgd_free (with Martin Schwidefsky <schwidefsky@de.ibm.com>) The pgd/pud/pmd/pte page table allocation functions get a mm_struct pointer as first argument. The free functions do not get the mm_struct argument. This is 1) asymmetrical and 2) to do mm related page table allocations the mm argument is needed on the free function as well. [kamalesh@linux.vnet.ibm.com: i386 fix] [akpm@linux-foundation.org: coding-syle fixes] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:18 -08:00
Michael Ellerman	41d824bf61	[POWERPC] Allocate the hash table under 1G on cell In order to support the fixed IOMMU mapping (in a subsequent patch), we need the hash table to be inside the IOMMUs DMA window. This is usually 2G, but let's make sure the hash table is under 1G as that will satisfy the IOMMU requirements and also means the hash table will be on node 0. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-01-31 12:11:09 +11:00
Paul Mackerras	55852bed57	Revert "[POWERPC] Fake NUMA emulation for PowerPC" This reverts commit `5c3f5892a2`, basically because it changes behaviour even when no fake NUMA information is specified on the kernel command line. Firstly, it changes the nid, thus destroying the real NUMA information. Secondly, it also changes behaviour in that if a node ends up with no memory in it because of the memory limit, we used to set it online and now we don't. Also, in the non-NUMA case with no fake NUMA information, we do node_set_online once for each LMB now, whereas previously we only did it once. I don't know if that is actually a problem, but it does seem unnecessary. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-01-26 16:40:33 +11:00
Michael Neuling	c3b75bd7bb	[POWERPC] Make setjmp/longjmp code usable outside of xmon This makes the setjmp/longjmp code used by xmon, generically available to other code. It also removes the requirement for debugger hooks to be only called on 0x300 (data storage) exception. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-01-25 22:52:50 +11:00
Paul Mackerras	dcb571be20	Merge branch 'for-2.6.25' of master.kernel.org:/pub/scm/linux/kernel/git/galak/powerpc into for-2.6.25	2008-01-24 15:29:14 +11:00
Dale Farnsworth	e8b6376155	[POWERPC] 85xx: Respect KERNELBASE, PAGE_OFFSET, and PHYSICAL_START on e500 The e500 MMU init code previously assumed KERNELBASE always equaled PAGE_OFFSET and PHYSICAL_START was 0. This is useful for kdump support as well as asymetric multicore. For the initial kdump support the secondary kernel will run at 32M but need access to all of memory so we bump the initial TLB up to 64M. This also matches with the forth coming ePAPR spec. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-01-23 19:34:36 -06:00
Kumar Gala	f98eeb4eb1	[POWERPC] Fix handling of memreserve if the range lands in highmem There were several issues if a memreserve range existed and happened to be in highmem: * The bootmem allocator is only aware of lowmem so calling reserve_bootmem with a highmem address would cause a BUG_ON * All highmem pages were provided to the buddy allocator Added a lmb_is_reserved() api that we now use to determine if a highem page should continue to be PageReserved or provided to the buddy allocator. Also, we incorrectly reported the amount of pages reserved since all highmem pages are initally marked reserved and we clear the PageReserved flag as we "free" up the highmem pages. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-01-23 19:29:08 -06:00

... 15 16 17 18 19 ...

1941 Commits