Commit Graph

4669 Commits

Author SHA1 Message Date
Michael Ellerman
e8a4fd0afe powerpc: Add macros for the ibm_architecture_vec[] lengths
The encoding of the lengths in the ibm_architecture_vec array is
"interesting" to say the least. It's non-obvious how the number of bytes
we provide relates to the length value.

In fact we already got it wrong once, see 11e9ed43ca "Fix up
ibm_architecture_vec definition".

So add some macros to make it (hopefully) clearer. These at least have
the property that the integer present in the code is equal to the number
of bytes that follows it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2015-07-13 15:46:04 +10:00
Benjamin Herrenschmidt
817820b022 powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask
This patch adds the ability to the DMA direct ops to fallback to the IOMMU
ops for coherent alloc/free if the coherent mask of the device isn't
suitable for accessing the direct DMA space and the device also happens
to have an active IOMMU table.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-13 10:10:55 +10:00
Shreyas B. Prabhu
b32aadc1a8 powerpc/powernv: Fix race in updating core_idle_state
core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
 - The thread is first in subcore to wakeup from sleep/winkle.
 - If its the last thread in the core about to enter sleep/winkle

While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.

But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-

First thread in the subcorea		Another thread in the same
waking up		   		core entering sleep/winkle

lwarx   r15,0,r14
ori     r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx.  r15,0,r14
[Code to restore subcore state]

						lwarx   r15,0,r14
						[clear thread bit]
						stwcx.  r15,0,r14

andi.   r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw     r15,0(r14)

Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.

This patch fixes the above race by looping on the lock bit even while
entering the idle states.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-07 10:16:52 +10:00
Daniel Axtens
27ea2c420c powerpc: Set the correct kernel taint on machine check errors.
This means the 'M' flag will work properly when the kernel prints a backtrace.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-06 20:24:35 +10:00
Linus Torvalds
2d4407079c Replace module_init with equivalent device_initcall in non modules.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkO5XAAoJEOvOhAQsB9HWe4cQAJcsmSXIDN2O6oxvgH8Wilof
 EIEMvT13uwBdsjQdYUY6A6B3iUV9wzEEgoosg/JRgpz5/b1FTDMIO4arUPD3Lcak
 5bmyVO2qAT+yaLAWSgn6I8DMplXrKiEuK+TkH/mW3p9TdvElLdG3Vg6UI407hSWv
 W0QbVwkNtv8XmzshV9F2YdmflT8j1PgYxIu/tEkVOWn37DNW+Fp2OVBrdTIYp3AJ
 X6bYZPEcQDCrWWW/s2GmIDrNgryiebasns+CAgGY21262jAYaRcFOJmR47AsTqW7
 DSZXIlLc/gJca++hfxqV15RZ4NRHxrebCypTsPtZUV7ZiYHI726eeUZzxsp/9itu
 mvhmi+aQUTTUP3dDhiv05f4syAKEb4zslT6SLwcna6oi09M97HfCeQsHqhcFq/MG
 KnS2JJoJQToQtJvMUXMQzp5hyHjNlOclIvCxEiL32EZU54PeJOKasy/mptNGEctk
 TxACWvoXBQglRaVN+1wIjjr0BaHJSuJa9CUnIfM4WZdSHiMQMx00XLTkZcTnSM6R
 12pE54vVolrXswGPJhy4W/Sf1yPSW1tkWSVBbkKLyCIrlAWJtu68rXhvwhG/nz6E
 3g6QrDEQGlk6bzUH4CJCEqXLPRN1bNS0XjdkEFh60Lury3Ns5yHKZXPW5vCQ5csr
 FQNUyBs595CWbJNfbn1n
 =0BDx
 -----END PGP SIGNATURE-----

Merge tag 'module_init-device_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull module_init replacement part one from Paul Gortmaker:
 "Replace module_init with equivalent device_initcall in non modules.

  This series of commits converts non-modular code that is using the
  module_init() call to hook itself into the system to instead use
  device_initcall().

  The conversion is a runtime no-op, since module_init actually becomes
  __initcall in the non-modular case, and that in turn gets mapped onto
  device_initcall.  A couple files show a larger negative diffstat,
  representing ones that had a module_exit function that we remove here
  vs previously relying on the linker to dispose of it.

  We make this conversion now, so that we can relocate module_init from
  init.h into module.h in the future.

  The files changed here are just limited to those that would otherwise
  have to add module.h to obviously non-modular code, in order to avoid
  a compile fail, as testing has shown"

* tag 'module_init-device_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  MIPS: don't use module_init in non-modular cobalt/mtd.c file
  drivers/leds: don't use module_init in non-modular leds-cobalt-raq.c
  cris: don't use module_init for non-modular core eeprom.c code
  tty/metag_da: Avoid module_init/module_exit in non-modular code
  drivers/clk: don't use module_init in clk-nomadik.c which is non-modular
  xtensa: don't use module_init for non-modular core network.c code
  sh: don't use module_init in non-modular psw.c code
  mn10300: don't use module_init in non-modular flash.c code
  parisc64: don't use module_init for non-modular core perf code
  parisc: don't use module_init for non-modular core pdc_cons code
  cris: don't use module_init for non-modular core intmem.c code
  ia64: don't use module_init in non-modular sim/simscsi.c code
  ia64: don't use module_init for non-modular core kernel/mca.c code
  arm: don't use module_init in non-modular mach-vexpress/spc.c code
  powerpc: don't use module_init in non-modular 83xx suspend code
  powerpc: use device_initcall for registering rtc devices
  x86: don't use module_init in non-modular devicetree.c code
  x86: don't use module_init in non-modular intel_mid_vrtc.c
2015-07-02 10:30:48 -07:00
Linus Torvalds
4da3064d17 Devicetree changes for v4.2
A whole lot of bug fixes. Nothing stands out here except the ability to
 enable CONFIG_OF on every architecture, and an import of a newer version
 of dtc.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVlAkwAAoJEMWQL496c2LNNYMP/23EdDPyRneoaIynd0nNk9SO
 UfhOSJdSo7vMmT9Rea2eBHdn3leJrx9m9JXvIrBwGdcDxMNsS4mS1k9Bj63aqEVn
 kK+IrI1Jbx7F6/AlBh3u4nHixIjoTc3IWlFdxUTBKQ2ATYKmCVhVCsf6UyfSxAj+
 xPL6bmALegEZ2kJzK+qhk6K0j7GeQDnk1SAS3xMvTpJH76Ac2F+Gi9u7J68GqXAS
 d7WBCAjijkqskfAdeP13XasvSdU7ZCOnDjClwJd83ZQGmtp77T8PWF0lzLlnC8Ho
 sMwDhoWHnCtFP0U1hnhUF1pXhhn8W9NlxymtYbxR1tJcku0fSiYlibZ6jnzTRc2m
 TsqzaWDR3U/VX4t5wH5FtXM1Cum/eAfV6HX9fGXeYYP7Einl7Kg6yXYjIY+b7HG9
 R3znQ2TKoYPsUr/WWXrZK52ZTesTe+LG98WYH1YhNbZ5riev9fLZxI2zMl/h83/Z
 LrF0g0MLQobHuBCUSIXSUot6RTQgLzFWHtnSrNOUycMwlRNZHYOY3DSvzLYLw+hJ
 XwV9p2k3DV/l/XnQJPy3y/MA+7jEudzlq7HukmtYVhh9rOy3y+Sq3GMGAiUFjAqj
 YDxBrrIpoPWNp/OJJX2yhnTvnNaV/BjhCB1CiJooFCjHz78I5daqBXO155hn9msY
 7To1PHvyEngabBpdN/MZ
 =tm5y
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux

Pull devicetree updates from Grant Likely:
 "A whole lot of bug fixes.

  Nothing stands out here except the ability to enable CONFIG_OF on
  every architecture, and an import of a newer version of dtc"

* tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux: (22 commits)
  of/irq: Rename "intc_desc" to "of_intc_desc" to fix OF on sh
  of/irq: Fix pSeries boot failure
  Documentation: DT: Fix a typo in the filename "lantiq,<chip>-pinumx.txt"
  of: define of_find_node_by_phandle for !CONFIG_OF
  of/address: use atomic allocation in pci_register_io_range()
  of: Add vendor prefix for Zodiac Inflight Innovations
  dt/fdt: add empty versions of early_init_dt_*_memory_arch
  of: clean-up unnecessary libfdt include paths
  of: make unittest select OF_EARLY_FLATTREE instead of depend on it
  of: make CONFIG_OF user selectable
  MIPS: prepare for user enabling of CONFIG_OF
  of/fdt: fix argument name and add comments of unflatten_dt_node()
  of: return NUMA_NO_NODE from fallback of_node_to_nid()
  tps6507x.txt: Remove executable permission
  of/overlay: Grammar s/an negative/a negative/
  of/fdt: Make fdt blob input parameters of unflatten functions const
  of: add helper function to retrive match data
  of: Grammar s/property exist/property exists/
  of: Move OF flags to be visible even when !CONFIG_OF
  scripts/dtc: Update to upstream version 9d3649bd3be245c9
  ...
2015-07-01 19:40:18 -07:00
Grant Likely
becfc3c86d Merge remote-tracking branch 'robh/for-next' into devicetree/next 2015-06-30 14:28:52 +01:00
Akinobu Mita
5935877af4 powerpc: use for_each_sg()
This replaces the plain loop over the sglist array with for_each_sg()
macro which consists of sg_next() function calls.  Since powerpc does
select ARCH_HAS_SG_CHAIN, it is necessary to use for_each_sg() in order
to loop over each sg element.  This also help find problems with drivers
that do not properly initialize their sg tables when CONFIG_DEBUG_SG is
enabled.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:38 -07:00
Linus Torvalds
e3d8238d7f arm64 updates for 4.2, mostly refactoring/clean-up:
- CPU ops and PSCI (Power State Coordination Interface) refactoring
   following the merging of the arm64 ACPI support, together with
   handling of Trusted (secure) OS instances
 
 - Using fixmap for permanent FDT mapping, removing the initial dtb
   placement requirements (within 512MB from the start of the kernel
   image). This required moving the FDT self reservation out of the
   memreserve processing
 
 - Idmap (1:1 mapping used for MMU on/off) handling clean-up
 
 - Removing flush_cache_all() - not safe on ARM unless the MMU is off.
   Last stages of CPU power down/up are handled by firmware already
 
 - "Alternatives" (run-time code patching) refactoring and support for
   immediate branch patching, GICv3 CPU interface access
 
 - User faults handling clean-up
 
 And some fixes:
 
 - Fix for VDSO building with broken ELF toolchains
 
 - Fixing another case of init_mm.pgd usage for user mappings (during
   ASID roll-over broadcasting)
 
 - Fix for FPSIMD reloading after CPU hotplug
 
 - Fix for missing syscall trace exit
 
 - Workaround for .inst asm bug
 
 - Compat fix for switching the user tls tpidr_el0 register
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVitZgAAoJEGvWsS0AyF7x+ToP/0Yci5bNsYVwVay8N1rK6WHh
 SGzDMzyxcSBjQpz2IhhTJ28eTAEH+a+HWQms+0tBehjqxqkvjuzBN0okDkc/z8NB
 7Z0BV2aLkQcMwTbjgIh5jm25ZpGmvmvbWPD5oBwgmgQ4v4i1OLRKgx7+YQ+z9rWZ
 zC70d0UwyWjs2oxmjd2ZrAkps7v/qozEFhcRHxLzCn8Mbw+3FcTQsqMbfnoWGnH0
 YuGfHQQqBY4/HC7uAslMCy7tXeJXqb+NsgrnAovjfEbHGDjsg0KNl06K++LHwE37
 A5Noa3M0AQEPYqx/sg0Ec8RNUUEMB4RA2DCaibp7XlVGncXOwFfiyk/M5uVrYXIO
 ku5QF0ytUfZKzrMq3yQKbEDuCPOFTqjjdVpkeXKFdW66zYTohKVc3vUBV5xHZ5uO
 7Kr8H0ZnhAv3OxPyKdEwAuQ5sJdWwQSvZyGClxMUO4eC/UzD0Mwwf1Y8WYtiAXx+
 NSTeBKw/m33W3/KhNuNH1+qGEOKhuXuKX7AcYA84Rab8ytxYWcurHCG2bmhzpEse
 2DZtNMybrP/HMQPyJlYgGac8B3QbsAIAkkU1f+dJTAv9otuBDhscaDQyZ9Y6WVht
 /k8zJiZeMEuGAmwgTkzLmWs/8pQq42nW4J4eQdXPZAwp4ghCIypPWfaZASAwee6/
 p+es3v8P4k9wkv2TFZMh
 =YeGl
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Mostly refactoring/clean-up:

   - CPU ops and PSCI (Power State Coordination Interface) refactoring
     following the merging of the arm64 ACPI support, together with
     handling of Trusted (secure) OS instances

   - Using fixmap for permanent FDT mapping, removing the initial dtb
     placement requirements (within 512MB from the start of the kernel
     image).  This required moving the FDT self reservation out of the
     memreserve processing

   - Idmap (1:1 mapping used for MMU on/off) handling clean-up

   - Removing flush_cache_all() - not safe on ARM unless the MMU is off.
     Last stages of CPU power down/up are handled by firmware already

   - "Alternatives" (run-time code patching) refactoring and support for
     immediate branch patching, GICv3 CPU interface access

   - User faults handling clean-up

  And some fixes:

   - Fix for VDSO building with broken ELF toolchains

   - Fix another case of init_mm.pgd usage for user mappings (during
     ASID roll-over broadcasting)

   - Fix for FPSIMD reloading after CPU hotplug

   - Fix for missing syscall trace exit

   - Workaround for .inst asm bug

   - Compat fix for switching the user tls tpidr_el0 register"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
  arm64: use private ratelimit state along with show_unhandled_signals
  arm64: show unhandled SP/PC alignment faults
  arm64: vdso: work-around broken ELF toolchains in Makefile
  arm64: kernel: rename __cpu_suspend to keep it aligned with arm
  arm64: compat: print compat_sp instead of sp
  arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP
  arm64: entry: fix context tracking for el0_sp_pc
  arm64: defconfig: enable memtest
  arm64: mm: remove reference to tlb.S from comment block
  arm64: Do not attempt to use init_mm in reset_context()
  arm64: KVM: Switch vgic save/restore to alternative_insn
  arm64: alternative: Introduce feature for GICv3 CPU interface
  arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning
  arm64: fix bug for reloading FPSIMD state after CPU hotplug.
  arm64: kernel thread don't need to save fpsimd context.
  arm64: fix missing syscall trace exit
  arm64: alternative: Work around .inst assembler bugs
  arm64: alternative: Merge alternative-asm.h into alternative.h
  arm64: alternative: Allow immediate branch as alternative instruction
  arm64: Rework alternate sequence for ARM erratum 845719
  ...
2015-06-24 10:02:15 -07:00
Linus Torvalds
08d183e3c1 powerpc updates for 4.2
- Disable the 32-bit vdso when building LE, so we can build with a 64-bit only
    toolchain.
  - EEH fixes from Gavin & Richard.
  - Enable the sys_kcmp syscall from Laurent.
  - Sysfs control for fastsleep workaround from Shreyas.
  - Expose OPAL events as an irq chip by Alistair.
  - MSI ops moved to pci_controller_ops by Daniel.
  - Fix for kernel to userspace backtraces for perf from Anton.
  - Merge pseries and pseries_le defconfigs from Cyril.
  - CXL in-kernel API from Mikey.
  - OPAL prd driver from Jeremy.
  - Fix for DSCR handling & tests from Anshuman.
  - Powernv flash mtd driver from Cyril.
  - Dynamic DMA Window support on powernv from Alexey.
  - LLVM clang fixes & workarounds from Anton.
  - Reworked version of the patch to abort syscalls when transactional.
  - Fix the swap encoding to support 4TB, from Aneesh.
  - Various fixes as usual.
  - Freescale updates from Scott: Highlights include more 8xx optimizations, an
    e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and
    various fixes and cleanup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViSZqAAoJEFHr6jzI4aWAA7kQAKq3+pejfo2rY7alpKJyeVao
 vlaIEaDNOTh+ctcmu3MFF9Jy6fai8gNZziRXU5JRmE5RW4GVBN4KZiqXRbkVjdBK
 uG9sCX7Y58VRsS2vnGBYLsamfTMgjaXeDvgunQHVLiechJnrDr0RHEK90F3LSi73
 Axp6l8XIG63a3zFZmkhzANMCme2lm5+MWmGlSjUUNi5F+viQUgJc5iiO8xrVUgM5
 RpNlV2NJSqFiU+gMQWJ226V85UIniouq4j+qtyUcu8/m9BberyolXVU0GPlPFdsx
 r/Qh9uCJyZaUdSB5hzomQZj50IsSz6J6nEuJTeGRoVZOmeI8Dnc2xU9fxQF5fC8H
 lUJw10WPoNOggQZTeSUKn7wTXw3i4p3KsWNUczaW68VJdhqZUVaSp0+I6mnDSqzs
 9iGC+VffLYNa1OHq7mGRFrgDdLBCHes31aZ3CxlQsmyNpAPCwMzsD4TUfVnvOG6E
 oJOeaQ4mZM9PvqxEYJfoIL+vgRxmQ8sdIBtNY4in+C7J6eFnZNFO9xmPnJZuVU31
 PGtx60kjFCOVMXvqn34WkRNbgqGWI91IK0KcRwFO2LXVio1uY77TWL52kNK2IMsp
 Az+VDDvqnT3+BoV1yz0P6SrXAkwTpvFk2y+IdmEiUUN7zZFL5ZSA2epej9AzHTAK
 WID2bc5yVtIL6p6x5ICH
 =d9Wh
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - disable the 32-bit vdso when building LE, so we can build with a
   64-bit only toolchain.

 - EEH fixes from Gavin & Richard.

 - enable the sys_kcmp syscall from Laurent.

 - sysfs control for fastsleep workaround from Shreyas.

 - expose OPAL events as an irq chip by Alistair.

 - MSI ops moved to pci_controller_ops by Daniel.

 - fix for kernel to userspace backtraces for perf from Anton.

 - merge pseries and pseries_le defconfigs from Cyril.

 - CXL in-kernel API from Mikey.

 - OPAL prd driver from Jeremy.

 - fix for DSCR handling & tests from Anshuman.

 - Powernv flash mtd driver from Cyril.

 - dynamic DMA Window support on powernv from Alexey.

 - LLVM clang fixes & workarounds from Anton.

 - reworked version of the patch to abort syscalls when transactional.

 - fix the swap encoding to support 4TB, from Aneesh.

 - various fixes as usual.

 - Freescale updates from Scott: Highlights include more 8xx
   optimizations, an e6500 hugetlb optimization, QMan device tree nodes,
   t1024/t1023 support, and various fixes and cleanup.

* tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits)
  cxl: Fix typo in debug print
  cxl: Add CXL_KERNEL_API config option
  powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma()
  powerpc/mm: Change the swap encoding in pte.
  powerpc/mm: PTE_RPN_MAX is not used, remove the same
  powerpc/tm: Abort syscalls in active transactions
  powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off
  powerpc/include: Add opal-prd to installed uapi headers
  powerpc/powernv: fix construction of opal PRD messages
  powerpc/powernv: Increase opal-irqchip initcall priority
  powerpc: Make doorbell check preemption safe
  powerpc/powernv: pnv_init_idle_states() should only run on powernv
  macintosh/nvram: Remove as unused
  powerpc: Don't use gcc specific options on clang
  powerpc: Don't use -mno-strict-align on clang
  powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it
  powerpc: Only use -mabi=altivec if toolchain supports it
  powerpc: Fix duplicate const clang warning in user access code
  vfio: powerpc/spapr: Support Dynamic DMA windows
  vfio: powerpc/spapr: Register memory and define IOMMU v2
  ...
2015-06-24 08:46:32 -07:00
Linus Torvalds
d8133356e9 PCI changes for the v4.2 merge window:
Enumeration
     - Move pci_ari_enabled() to global header (Alex Williamson)
     - Account for ARI in _PRT lookups (Alex Williamson)
     - Remove unused pci_scan_bus_parented() (Yijing Wang)
 
   Resource management
     - Use host bridge _CRS info on systems with >32 bit addressing (Bjorn Helgaas)
     - Use host bridge _CRS info on Foxconn K8M890-8237A (Bjorn Helgaas)
     - Fix pci_address_to_pio() conversion of CPU address to I/O port (Zhichang Yuan)
     - Add pci_bus_addr_t (Yinghai Lu)
 
   PCI device hotplug
     - Wait for pciehp command completion where necessary (Alex Williamson)
     - Drop pointless ACPI-based "slot detection" check (Rafael J. Wysocki)
     - Check ignore_hotplug for all downstream devices (Rafael J. Wysocki)
     - Propagate the "ignore hotplug" setting to parent (Rafael J. Wysocki)
     - Inline pciehp "handle event" functions into the ISR (Bjorn Helgaas)
     - Clean up pciehp debug logging (Bjorn Helgaas)
 
   Power management
     - Remove redundant PCIe port type checking (Yijing Wang)
     - Add dev->has_secondary_link to track downstream PCIe links (Yijing Wang)
     - Use dev->has_secondary_link to find downstream links for ASPM (Yijing Wang)
     - Drop __pci_disable_link_state() useless "force" parameter (Bjorn Helgaas)
     - Simplify Clock Power Management setting (Bjorn Helgaas)
 
   Virtualization
     - Add ACS quirks for Intel 9-series PCH root ports (Alex Williamson)
     - Add function 1 DMA alias quirk for Marvell 9120 (Sakari Ailus)
 
   MSI
     - Disable MSI at enumeration even if kernel doesn't support MSI (Michael S. Tsirkin)
     - Remove unused pci_msi_off() (Bjorn Helgaas)
     - Rename msi_set_enable(), msix_clear_and_set_ctrl() (Michael S.  Tsirkin)
     - Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() (Michael S. Tsirkin)
     - Drop pci_msi_off() calls during probe (Michael S. Tsirkin)
 
   APM X-Gene host bridge driver
     - Add APM X-Gene v1 PCIe MSI/MSIX termination driver (Duc Dang)
     - Add APM X-Gene PCIe MSI DTS nodes (Duc Dang)
     - Disable Configuration Request Retry Status for v1 silicon (Duc Dang)
     - Allow config access to Root Port even when link is down (Duc Dang)
 
   Broadcom iProc host bridge driver
     - Allow override of device tree IRQ mapping function (Hauke Mehrtens)
     - Add BCMA PCIe driver (Hauke Mehrtens)
     - Directly add PCI resources (Hauke Mehrtens)
     - Free resource list after registration (Hauke Mehrtens)
 
   Freescale i.MX6 host bridge driver
     - Add speed change timeout message (Troy Kisky)
     - Rename imx6_pcie_start_link() to imx6_pcie_establish_link() (Bjorn Helgaas)
 
   Freescale Layerscape host bridge driver
     - Use dw_pcie_link_up() consistently (Bjorn Helgaas)
     - Factor out ls_pcie_establish_link() (Bjorn Helgaas)
 
   Marvell MVEBU host bridge driver
     - Remove mvebu_pcie_scan_bus() (Yijing Wang)
 
   NVIDIA Tegra host bridge driver
     - Remove tegra_pcie_scan_bus() (Yijing Wang)
 
   Synopsys DesignWare host bridge driver
     - Consolidate outbound iATU programming functions (Jisheng Zhang)
     - Use iATU0 for cfg and IO, iATU1 for MEM (Jisheng Zhang)
     - Add support for x8 links (Zhou Wang)
     - Wait for link to come up with consistent style (Bjorn Helgaas)
     - Use pci_scan_root_bus() for simplicity (Yijing Wang)
 
   TI DRA7xx host bridge driver
     - Use dw_pcie_link_up() consistently (Bjorn Helgaas)
 
   Miscellaneous
     - Include <linux/pci.h>, not <asm/pci.h> (Bjorn Helgaas)
     - Remove unnecessary #includes of <asm/pci.h> (Bjorn Helgaas)
     - Remove unused pcibios_select_root() (again) (Bjorn Helgaas)
     - Remove unused pci_dma_burst_advice() (Bjorn Helgaas)
     - xen/pcifront: Don't use deprecated function pci_scan_bus_parented() (Arnd Bergmann)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViCSWAAoJEFmIoMA60/r8zX8P/1DPNnk+8xSQe3dYjnG8VW3P
 GPxeCqLMkjiF3ffxcLDzsgrHMjZEb8Co67WePs0k5V0lbZevoIwUo48+oO9B5jhc
 H5DuPZHyTHeOvaZv4GUY5vq/1DBh4JXmJc2V/BkaJ6qhXckF+SCam9C+s0p4950o
 QX/ifOjg/VHzmhaiL7wymJOzuniZmIttl+y+nzkl3AUJ+T6ZtQbUhz+8GZ3lj7Ma
 F+7JHhvm9K8Ljajxb6BLWTw4xgHA6ZN5PtYEx+Sl9QBYSsGfL7LnqyYD3KhJ7KV5
 4AHNJGEVhzNwSuyh+VQx1tNm7OHOqkAaTsYdCVUZRow+6CPd8P75QOMtpl+SmPJB
 RV1BAO75OTGqKg0B9IDg855y4Nh+4/dKoZlBPzpp7+cKw3ylaRAsNnaZ9ik5D62v
 RR06CFgWGHwDXSObgbRm4v0HwfAIHWWJzrPqAZmElh2dzb1Lv1I3AbB1SClCN6sl
 fnAu6CAwA47A5GT8xW3L0oQXdcSmdNUdNzZrsfDnOBIQWMsF+zBFKr6sTABVgyxp
 /WEJaNlvx8Zlq0bZlhGDdsGSbFNFzhX4avWZtXhvdcqFzH0KaVghYSayYvJE9Haq
 oakWqS+GZ3x40j+rdrgLg98AWRVraE1MvV1A7N9TIGjuuKqqbZfSP8kvX3QRQQhO
 Z2+X5hMM0s/tdYtADYu/
 =Qw+j
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for the v4.2 merge window:

  Enumeration
    - Move pci_ari_enabled() to global header (Alex Williamson)
    - Account for ARI in _PRT lookups (Alex Williamson)
    - Remove unused pci_scan_bus_parented() (Yijing Wang)

  Resource management
    - Use host bridge _CRS info on systems with >32 bit addressing (Bjorn Helgaas)
    - Use host bridge _CRS info on Foxconn K8M890-8237A (Bjorn Helgaas)
    - Fix pci_address_to_pio() conversion of CPU address to I/O port (Zhichang Yuan)
    - Add pci_bus_addr_t (Yinghai Lu)

  PCI device hotplug
    - Wait for pciehp command completion where necessary (Alex Williamson)
    - Drop pointless ACPI-based "slot detection" check (Rafael J. Wysocki)
    - Check ignore_hotplug for all downstream devices (Rafael J. Wysocki)
    - Propagate the "ignore hotplug" setting to parent (Rafael J. Wysocki)
    - Inline pciehp "handle event" functions into the ISR (Bjorn Helgaas)
    - Clean up pciehp debug logging (Bjorn Helgaas)

  Power management
    - Remove redundant PCIe port type checking (Yijing Wang)
    - Add dev->has_secondary_link to track downstream PCIe links (Yijing Wang)
    - Use dev->has_secondary_link to find downstream links for ASPM (Yijing Wang)
    - Drop __pci_disable_link_state() useless "force" parameter (Bjorn Helgaas)
    - Simplify Clock Power Management setting (Bjorn Helgaas)

  Virtualization
    - Add ACS quirks for Intel 9-series PCH root ports (Alex Williamson)
    - Add function 1 DMA alias quirk for Marvell 9120 (Sakari Ailus)

  MSI
    - Disable MSI at enumeration even if kernel doesn't support MSI (Michael S. Tsirkin)
    - Remove unused pci_msi_off() (Bjorn Helgaas)
    - Rename msi_set_enable(), msix_clear_and_set_ctrl() (Michael S.  Tsirkin)
    - Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() (Michael S. Tsirkin)
    - Drop pci_msi_off() calls during probe (Michael S. Tsirkin)

  APM X-Gene host bridge driver
    - Add APM X-Gene v1 PCIe MSI/MSIX termination driver (Duc Dang)
    - Add APM X-Gene PCIe MSI DTS nodes (Duc Dang)
    - Disable Configuration Request Retry Status for v1 silicon (Duc Dang)
    - Allow config access to Root Port even when link is down (Duc Dang)

  Broadcom iProc host bridge driver
    - Allow override of device tree IRQ mapping function (Hauke Mehrtens)
    - Add BCMA PCIe driver (Hauke Mehrtens)
    - Directly add PCI resources (Hauke Mehrtens)
    - Free resource list after registration (Hauke Mehrtens)

  Freescale i.MX6 host bridge driver
    - Add speed change timeout message (Troy Kisky)
    - Rename imx6_pcie_start_link() to imx6_pcie_establish_link() (Bjorn Helgaas)

  Freescale Layerscape host bridge driver
    - Use dw_pcie_link_up() consistently (Bjorn Helgaas)
    - Factor out ls_pcie_establish_link() (Bjorn Helgaas)

  Marvell MVEBU host bridge driver
    - Remove mvebu_pcie_scan_bus() (Yijing Wang)

  NVIDIA Tegra host bridge driver
    - Remove tegra_pcie_scan_bus() (Yijing Wang)

  Synopsys DesignWare host bridge driver
    - Consolidate outbound iATU programming functions (Jisheng Zhang)
    - Use iATU0 for cfg and IO, iATU1 for MEM (Jisheng Zhang)
    - Add support for x8 links (Zhou Wang)
    - Wait for link to come up with consistent style (Bjorn Helgaas)
    - Use pci_scan_root_bus() for simplicity (Yijing Wang)

  TI DRA7xx host bridge driver
    - Use dw_pcie_link_up() consistently (Bjorn Helgaas)

  Miscellaneous
    - Include <linux/pci.h>, not <asm/pci.h> (Bjorn Helgaas)
    - Remove unnecessary #includes of <asm/pci.h> (Bjorn Helgaas)
    - Remove unused pcibios_select_root() (again) (Bjorn Helgaas)
    - Remove unused pci_dma_burst_advice() (Bjorn Helgaas)
    - xen/pcifront: Don't use deprecated function pci_scan_bus_parented() (Arnd Bergmann)"

* tag 'pci-v4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (58 commits)
  PCI: pciehp: Inline the "handle event" functions into the ISR
  PCI: pciehp: Rename queue_interrupt_event() to pciehp_queue_interrupt_event()
  PCI: pciehp: Make queue_interrupt_event() void
  PCI: xgene: Allow config access to Root Port even when link is down
  PCI: xgene: Disable Configuration Request Retry Status for v1 silicon
  PCI: pciehp: Clean up debug logging
  x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing
  PCI: imx6: Add #define PCIE_RC_LCSR
  PCI: imx6: Use "u32", not "uint32_t"
  PCI: Remove unused pci_scan_bus_parented()
  xen/pcifront: Don't use deprecated function pci_scan_bus_parented()
  PCI: imx6: Add speed change timeout message
  PCI/ASPM: Simplify Clock Power Management setting
  PCI: designware: Wait for link to come up with consistent style
  PCI: layerscape: Factor out ls_pcie_establish_link()
  PCI: layerscape: Use dw_pcie_link_up() consistently
  PCI: dra7xx: Use dw_pcie_link_up() consistently
  x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A
  PCI: pciehp: Wait for hotplug command completion where necessary
  PCI: Remove unused pci_dma_burst_advice()
  ...
2015-06-23 13:41:24 -07:00
Linus Torvalds
44d21c3f3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 4.2:

  API:

   - Convert RNG interface to new style.

   - New AEAD interface with one SG list for AD and plain/cipher text.
     All external AEAD users have been converted.

   - New asymmetric key interface (akcipher).

  Algorithms:

   - Chacha20, Poly1305 and RFC7539 support.

   - New RSA implementation.

   - Jitter RNG.

   - DRBG is now seeded with both /dev/random and Jitter RNG.  If kernel
     pool isn't ready then DRBG will be reseeded when it is.

   - DRBG is now the default crypto API RNG, replacing krng.

   - 842 compression (previously part of powerpc nx driver).

  Drivers:

   - Accelerated SHA-512 for arm64.

   - New Marvell CESA driver that supports DMA and more algorithms.

   - Updated powerpc nx 842 support.

   - Added support for SEC1 hardware to talitos"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (292 commits)
  crypto: marvell/cesa - remove COMPILE_TEST dependency
  crypto: algif_aead - Temporarily disable all AEAD algorithms
  crypto: af_alg - Forbid the use internal algorithms
  crypto: echainiv - Only hold RNG during initialisation
  crypto: seqiv - Add compatibility support without RNG
  crypto: eseqiv - Offer normal cipher functionality without RNG
  crypto: chainiv - Offer normal cipher functionality without RNG
  crypto: user - Add CRYPTO_MSG_DELRNG
  crypto: user - Move cryptouser.h to uapi
  crypto: rng - Do not free default RNG when it becomes unused
  crypto: skcipher - Allow givencrypt to be NULL
  crypto: sahara - propagate the error on clk_disable_unprepare() failure
  crypto: rsa - fix invalid select for AKCIPHER
  crypto: picoxcell - Update to the current clk API
  crypto: nx - Check for bogus firmware properties
  crypto: marvell/cesa - add DT bindings documentation
  crypto: marvell/cesa - add support for Kirkwood and Dove SoCs
  crypto: marvell/cesa - add support for Orion SoCs
  crypto: marvell/cesa - add allhwsupport module parameter
  crypto: marvell/cesa - add support for all armada SoCs
  ...
2015-06-22 21:04:48 -07:00
Herbert Xu
c0b59fafe3 Merge branch 'mvebu/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Merge the mvebu/drivers branch of the arm-soc tree which contains
just a single patch bfa1ce5f38 ("bus:
mvebu-mbus: add mv_mbus_dram_info_nooverlap()") that happens to be
a prerequisite of the new marvell/cesa crypto driver.
2015-06-19 22:07:07 +08:00
Michael Ellerman
6096f88451 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include more 8xx optimizations, an e6500 hugetlb optimization,
QMan device tree nodes, t1024/t1023 support, and various fixes and
cleanup."
2015-06-19 17:23:48 +10:00
Sam bobroff
b4b56f9eca powerpc/tm: Abort syscalls in active transactions
This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return very early without
performing the syscall and keeping side effects to a minimum (no CPU
accounting or system call tracing is performed). Also included is a
new HWCAP2 bit, PPC_FEATURE2_HTM_NOSC, to indicate this
behaviour to userspace.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported.  It doesn't reduce functionality as
syscalls will still be performed after tsuspend; it just requires that
the transaction be explicitly suspended.  It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of
a normal (non-aborted) system call increases by about 0.25%.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-19 17:10:28 +10:00
Paul Gortmaker
8f6b9512ce powerpc: use device_initcall for registering rtc devices
Currently these two RTC devices are in core platform code
where it is not possible for them to be modular.  It will
never be modular, so using module_init as an alias for
__initcall can be somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- they will remain at level 6 in initcall ordering.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Geoff Levand <geoff@infradead.org>
Acked-by: Geoff Levand <geoff@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2015-06-16 14:12:29 -04:00
Alexey Kardashevskiy
15b244a88e powerpc/mmu: Add userspace-to-physical addresses translation cache
We are adding support for DMA memory pre-registration to be used in
conjunction with VFIO. The idea is that the userspace which is going to
run a guest may want to pre-register a user space memory region so
it all gets pinned once and never goes away. Having this done,
a hypervisor will not have to pin/unpin pages on every DMA map/unmap
request. This is going to help with multiple pinning of the same memory.

Another use of it is in-kernel real mode (mmu off) acceleration of
DMA requests where real time translation of guest physical to host
physical addresses is non-trivial and may fail as linux ptes may be
temporarily invalid. Also, having cached host physical addresses
(compared to just pinning at the start and then walking the page table
again on every H_PUT_TCE), we can be sure that the addresses which we put
into TCE table are the ones we already pinned.

This adds a list of memory regions to mm_context_t. Each region consists
of a header and a list of physical addresses. This adds API to:
1. register/unregister memory regions;
2. do final cleanup (which puts all pre-registered pages);
3. do userspace to physical address translation;
4. manage usage counters; multiple registration of the same memory
is allowed (once per container).

This implements 2 counters per registered memory region:
- @mapped: incremented on every DMA mapping; decremented on unmapping;
initialized to 1 when a region is just registered; once it becomes zero,
no more mappings allowe;
- @used: incremented on every "register" ioctl; decremented on
"unregister"; unregistration is allowed for DMA mapped regions unless
it is the very last reference. For the very last reference this checks
that the region is still mapped and returns -EBUSY so the userspace
gets to know that memory is still pinned and unregistration needs to
be retried; @used remains 1.

Host physical addresses are stored in vmalloc'ed array. In order to
access these in the real mode (mmu off), there is a real_vmalloc_addr()
helper. In-kernel acceleration patchset will move it from KVM to MMU code.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:16:54 +10:00
Alexey Kardashevskiy
05c6cfb9dc powerpc/iommu/powernv: Release replaced TCE
At the moment writing new TCE value to the IOMMU table fails with EBUSY
if there is a valid entry already. However PAPR specification allows
the guest to write new TCE value without clearing it first.

Another problem this patch is addressing is the use of pool locks for
external IOMMU users such as VFIO. The pool locks are to protect
DMA page allocator rather than entries and since the host kernel does
not control what pages are in use, there is no point in pool locks and
exchange()+put_page(oldtce) is sufficient to avoid possible races.

This adds an exchange() callback to iommu_table_ops which does the same
thing as set() plus it returns replaced TCE and DMA direction so
the caller can release the pages afterwards. The exchange() receives
a physical address unlike set() which receives linear mapping address;
and returns a physical address as the clear() does.

This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
for a platform to have exchange() implemented in order to support VFIO.

This replaces iommu_tce_build() and iommu_clear_tce() with
a single iommu_tce_xchg().

This makes sure that TCE permission bits are not set in TCE passed to
IOMMU API as those are to be calculated by platform code from
DMA direction.

This moves SetPageDirty() to the IOMMU code to make it work for both
VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
available later).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:16:49 +10:00
Alexey Kardashevskiy
b82c75bfbe powerpc/iommu: Fix IOMMU ownership control functions
This adds missing locks in iommu_take_ownership()/
iommu_release_ownership().

This marks all pages busy in iommu_table::it_map in order to catch
errors if there is an attempt to use this table while ownership over it
is taken.

This only clears TCE content if there is no page marked busy in it_map.
Clearing must be done outside of the table locks as iommu_clear_tce()
called from iommu_clear_tces_and_put_pages() does this.

In order to use bitmap_empty(), the existing code clears bit#0 which
is set even in an empty table if it is bus-mapped at 0 as
iommu_init_table() reserves page#0 to prevent buggy drivers
from crashing when allocated page is bus-mapped at zero
(which is correct). This restores the bit in the case of failure
to bring the it_map to the state it was in when we called
iommu_take_ownership().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:16:47 +10:00
Alexey Kardashevskiy
f87a88642e vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control
This adds tce_iommu_take_ownership() and tce_iommu_release_ownership
which call in a loop iommu_take_ownership()/iommu_release_ownership()
for every table on the group. As there is just one now, no change in
behaviour is expected.

At the moment the iommu_table struct has a set_bypass() which enables/
disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code
which calls this callback when external IOMMU users such as VFIO are
about to get over a PHB.

The set_bypass() callback is not really an iommu_table function but
IOMMU/PE function. This introduces a iommu_table_group_ops struct and
adds take_ownership()/release_ownership() callbacks to it which are
called when an external user takes/releases control over the IOMMU.

This replaces set_bypass() with ownership callbacks as it is not
necessarily just bypass enabling, it can be something else/more
so let's give it more generic name.

The callbacks is implemented for IODA2 only. Other platforms (P5IOC2,
IODA1) will use the old iommu_take_ownership/iommu_release_ownership API.
The following patches will replace iommu_take_ownership/
iommu_release_ownership calls in IODA2 with full IOMMU table release/
create.

As we here and touching bypass control, this removes
pnv_pci_ioda2_setup_bypass_pe() as it does not do much
more compared to pnv_pci_ioda2_set_bypass. This moves tce_bypass_base
initialization to pnv_pci_ioda2_setup_dma_pe.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:16:47 +10:00
Alexey Kardashevskiy
0eaf4defc7 powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group
So far one TCE table could only be used by one IOMMU group. However
IODA2 hardware allows programming the same TCE table address to
multiple PE allowing sharing tables.

This replaces a single pointer to a group in a iommu_table struct
with a linked list of groups which provides the way of invalidating
TCE cache for every PE when an actual TCE table is updated. This adds
pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group()
helpers to manage the list. However without VFIO, it is still going
to be a single IOMMU group per iommu_table.

This changes iommu_add_device() to add a device to a first group
from the group list of a table as it is only called from the platform
init code or PCI bus notifier and at these moments there is only
one group per table.

This does not change TCE invalidation code to loop through all
attached groups in order to simplify this patch and because
it is not really needed in most cases. IODA2 is fixed in a later
patch.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:16:15 +10:00
Alexey Kardashevskiy
b348aa6529 powerpc/spapr: vfio: Replace iommu_table with iommu_table_group
Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
for TCE tables. Right now just one table is supported.

This defines iommu_table_group struct which stores pointers to
iommu_group and iommu_table(s). This replaces iommu_table with
iommu_table_group where iommu_table was used to identify a group:
- iommu_register_group();
- iommudata of generic iommu_group;

This removes @data from iommu_table as it_table_group provides
same access to pnv_ioda_pe.

For IODA, instead of embedding iommu_table, the new iommu_table_group
keeps pointers to those. The iommu_table structs are allocated
dynamically.

For P5IOC2, both iommu_table_group and iommu_table are embedded into
PE struct. As there is no EEH and SRIOV support for P5IOC2,
iommu_free_table() should not be called on iommu_table struct pointers
so we can keep it embedded in pnv_phb::p5ioc2.

For pSeries, this replaces multiple calls of kzalloc_node() with a new
iommu_pseries_alloc_group() helper and stores the table group struct
pointer into the pci_dn struct. For release, a iommu_table_free_group()
helper is added.

This moves iommu_table struct allocation from SR-IOV code to
the generic DMA initialization code in pnv_pci_ioda_setup_dma_pe and
pnv_pci_ioda2_setup_dma_pe as this is where DMA is actually initialized.
This change is here because those lines had to be changed anyway.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:14:57 +10:00
Alexey Kardashevskiy
da004c3600 powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table
This adds a iommu_table_ops struct and puts pointer to it into
the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
callbacks from ppc_md to the new struct where they really belong to.

This adds the requirement for @it_ops to be initialized before calling
iommu_init_table() to make sure that we do not leave any IOMMU table
with iommu_table_ops uninitialized. This is not a parameter of
iommu_init_table() though as there will be cases when iommu_init_table()
will not be called on TCE tables, for example - VFIO.

This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_"
redundant prefixes.

This removes tce_xxx_rm handlers from ppc_md but does not add
them to iommu_table_ops as this will be done later if we decide to
support TCE hypercalls in real mode. This removes _vm callbacks as
only virtual mode is supported by now so this also removes @rm parameter.

For pSeries, this always uses tce_buildmulti_pSeriesLP/
tce_buildmulti_pSeriesLP. This changes multi callback to fall back to
tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not
present. The reason for this is we still have to support "multitce=off"
boot parameter in disable_multitce() and we do not want to walk through
all IOMMU tables in the system and replace "multi" callbacks with single
ones.

For powernv, this defines _ops per PHB type which are P5IOC2/IODA1/IODA2.
This makes the callbacks for them public. Later patches will extend
callbacks for IODA1/2.

No change in behaviour is expected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy
10b35b2b74 powerpc/powernv: Do not set "read" flag if direction==DMA_NONE
Normally a bitmap from the iommu_table is used to track what TCE entry
is in use. Since we are going to use iommu_table without its locks and
do xchg() instead, it becomes essential not to put bits which are not
implied in the direction flag as the old TCE value (more precisely -
the permission bits) will be used to decide whether to put the page or not.

This adds iommu_direction_to_tce_perm() (its counterpart is there already)
and uses it for powernv's pnv_tce_build().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy
9b14a1ff86 vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver
This moves page pinning (get_user_pages_fast()/put_page()) code out of
the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs
to as the platform code does not deal with page pinning.

This makes iommu_take_ownership()/iommu_release_ownership() deal with
the IOMMU table bitmap only.

This removes page unpinning from iommu_take_ownership() as the actual
TCE table might contain garbage and doing put_page() on it is undefined
behaviour.

Besides the last part, the rest of the patch is mechanical.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy
8aca92d82d powerpc/iommu: Always release iommu_table in iommu_free_table()
At the moment iommu_free_table() only releases memory if
the table was initialized for the platform code use, i.e. it had
it_map initialized (which purpose is to track DMA memory space use).

With dynamic DMA windows, we will need to be able to release
iommu_table even if it was used for VFIO in which case it_map is NULL
so does the patch.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy
ac9a58891a powerpc/iommu: Put IOMMU group explicitly
So far an iommu_table lifetime was the same as PE. Dynamic DMA windows
will change this and iommu_free_table() will not always require
the group to be released.

This moves iommu_group_put() out of iommu_free_table().

This adds a iommu_pseries_free_table() helper which does
iommu_group_put() and iommu_free_table(). Later it will be
changed to receive a table_group and we will have to change less
lines then.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy
ea30e99e8e powerpc/eeh/ioda2: Use device::iommu_group to check IOMMU group
This relies on the fact that a PCI device always has an IOMMU table
which may not be the case when we get dynamic DMA windows so
let's use more reliable check for IOMMU group here.

As we do not rely on the table presence here, remove the workaround
from pnv_pci_ioda2_set_bypass(); also remove the @add_to_iommu_group
parameter from pnv_ioda_setup_bus_dma().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:14:54 +10:00
Bjorn Helgaas
633adc711d PCI: Remove unnecessary #includes of <asm/pci.h>
In include/linux/pci.h, we already #include <asm/pci.h>, so we don't need
to include <asm/pci.h> directly.

Remove the unnecessary includes.  All the files here already include
<linux/pci.h>.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>	# sh
Acked-by: Ralf Baechle <ralf@linux-mips.org>
2015-06-08 07:56:09 -05:00
Anshuman Khandual
d3cb06e0cd powerpc/dscr: Add some in-code documentation
This patch adds some in-code documentation to the DSCR related code to
make it more readable without having any functional change to it.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-07 19:29:15 +10:00
Anshuman Khandual
1db365258a powerpc/kernel: Rename PACA_DSCR to PACA_DSCR_DEFAULT
PACA_DSCR offset macro tracks dscr_default element in the paca
structure. Better change the name of this macro to match that of the
data element it tracks. Makes the code more readable.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-07 19:29:00 +10:00
Anshuman Khandual
280e109992 powerpc/kernel: Remove the unused extern dscr_default
The process context switch code no longer uses dscr_default variable
from the sysfs.c file. The variable became unused when we started
storing the CPU specific DSCR value in the PACA structure instead.
This patch just removes this extern declaration. It was originally
added by the following commit.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-07 19:27:26 +10:00
Anshuman Khandual
c952c1c482 powerpc: Fix handling of DSCR related facility unavailable exception
Currently DSCR (Data Stream Control Register) can be accessed with
mfspr or mtspr instructions inside a thread via two different SPR
numbers. One being the user accessible problem state SPR number 0x03
and the other being the privilege state SPR number 0x11. All access
through the privilege state SPR number get emulated through illegal
instruction exception. Any access through the problem state SPR number
raises one facility unavailable exception which sets the thread based
dscr_inherit bit and enables DSCR facility through FSCR register thus
allowing direct access to DSCR without going through this exception in
the future. We set the thread.dscr_inherit bit whether the access was
with mfspr or mtspr instruction which is neither correct nor does it
match the behaviour through the instruction emulation code path driven
from privilege state SPR number. User currently observes two different
kind of behaviour when accessing the DSCR through these two SPR numbers.
This problem can be observed through these two test cases by replacing
the privilege state SPR number with the problem state SPR number.

	(1) http://ozlabs.org/~anton/junkcode/dscr_default_test.c
	(2) http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c

This patch fixes the problem by making sure that the behaviour visible
to the user remains the same irrespective of which SPR number is being
used. Inside facility unavailable exception, we check whether it was
cuased by a mfspr or a mtspr isntrucction. In case of mfspr instruction,
just emulate the instruction. In case of mtspr instruction, set the
thread based dscr_inherit bit and also enable the facility through FSCR.
All user SPR based mfspr instruction will be emulated till one user SPR
based mtspr has been executed.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-07 19:19:57 +10:00
David Gibson
502f159c02 powerpc/eeh: Fix trivial error in eeh_restore_dev_state()
Commit 28158cd "powerpc/eeh: Enhance pcibios_set_pcie_reset_state()"
introduced a fix for a problem where certain configurations could lead to
pci_reset_function() destroying the state of PCI devices other than the one
specified.

Unfortunately, the fix has a trivial bug - it calls pci_save_state() again,
when it should be calling pci_restore_state().  This corrects the problem.

Cc: Gavin Shan <gwshan@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-07 19:11:49 +10:00
Rob Herring
63a4aea556 of: clean-up unnecessary libfdt include paths
With the libfdt include fixups to use "" instead of <> in the
latest dtc import in commit 4760597 (scripts/dtc: Update to upstream
version 9d3649bd3be245c9), it is no longer necessary to add explicit
include paths to use libfdt. Remove these across the kernel.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
2015-06-04 20:16:47 -05:00
Michael Neuling
abeeed6d3d powerpc/pci: Add pcibios_disable_device() hook
This adds a hook into the powerpc pci code for pci_disable_device() calls.  The
generic code already provides a weak pcibios_disable_device() symbol, so we
just need to provide our own in powerpc and it'll get picked up.

This is passed directly to the phb controller ops, provided one exists.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-03 13:27:16 +10:00
Michael Neuling
10e796309a powerpc/pci: Add release_device() hook to phb ops
Add release_device() hook to phb ops so we can clean up for specific phbs.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-03 13:27:15 +10:00
Daniel Axtens
5b64d2cc41 powerpc/pci: Export symbols for CXL
Export pcibios_claim_one_bus, pcibios_scan_phb and pcibios_alloc_controller.

These will be used by the CXL driver.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-03 13:27:15 +10:00
LEROY Christophe
5b2753fc3e powerpc/8xx: Implementation of PAGE_EXEC
This patch implements PAGE_EXEC capability on the 8xx.

All pages PP exec bits are set to 000, which means Execute for
Supervisor and no Execute for User.
Then we use the APG to say whether accesses are according to Page
rules, "all Supervisor" rules (Exec for all) and
"all User" rules (Exec for noone)

Therefore, we define 4 APG groups. msb is _PAGE_EXEC,
lsb is _PAGE_USER. MI_AP is initialised as follows:
GP0 (00) => Not User, no exec => 11 (all accesses performed as user)
GP1 (01) => User but no exec => 11 (all accesses performed as user)
GP2 (10) => Not User, exec => 01 (rights according to page definition)
GP3 (11) => User, exec => 00 (all accesses performed as supervisor)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[scottwood: comments: s/exec/data/ on data side, and s/pages/pages'/]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:28 -05:00
LEROY Christophe
e0a8e0d90a powerpc/8xx: Handle PAGE_USER via APG bits
Use of APG for handling PAGE_USER.

All pages PP exec bits are set to either 000 or 011, which means
respectively RW for Supervisor and no access for User, or RO for
Supervisor and no access for user.

Then we use the APG to say whether accesses are according to
Page rules or "all Supervisor" rules (Access to all)

Therefore, we define 2 APG groups corresponding to _PAGE_USER.
Mx_AP are initialised as follows:
GP0 => No user => 01 (all accesses performed according
				to page definition)
GP1 => User => 00 (all accesses performed as supervisor
                                according to page definition)

This removes the special 8xx handling in pte_update()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:27 -05:00
LEROY Christophe
eeba1f7c38 powerpc/8xx: Add support for TASK_SIZE greater than 0x80000000
By default, TASK_SIZE is set to 0x80000000 for PPC_8xx, which is most
likely sufficient for most cases. However, kernel configuration allows
to set TASK_SIZE to another value, so the 8xx shall handle it.

This patch also takes into account the case of PAGE_OFFSET lower than
0x80000000, allthought most of the time it is equal to 0xC0000000

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:26 -05:00
LEROY Christophe
b821c5fe84 powerpc/8xx: Use SPRG2 instead of DAR for saving r3
We now have SPRG2 available as in it not used anymore for saving CR, so we don't
need to crash DAR anymore for saving r3 for CPU6 ERRATA handling.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:26 -05:00
LEROY Christophe
2eb2fd9500 powerpc/8xx: dont save CR in SCRATCH registers
CR only needs to be preserved when checking if we are handling a kernel address.
So we can preserve CR in a register:
- In ITLBMiss, check is done only when CONFIG_MODULES is defined. Otherwise we
don't need to do anything at all with CR.
- We use r10, then we reload SRR0/MD_EPN into r10 when CR is restored

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:26 -05:00
LEROY Christophe
d5fd9d7d66 powerpc/8xx: Handle CR out of exception PROLOG/EPILOG
In order to be able to reduce scope during which CR is saved, we take
CR saving/restoring out of exception PROLOG and EPILOG

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:25 -05:00
LEROY Christophe
90883a8255 powerpc/8xx: macro for handling CPU15 errata
Having a macro will help keep clear code.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:25 -05:00
Scott Wood
86d63363de powerpc/e500mc: Remove dead L2 flushing code in idle_e500.S
This code can never be executed as it is only built when
CONFIG_PPC_E500MC is unset, but the only CPUs that have CPU_FTR_L2CSR
require CONFIG_PPC_E500MC and do not have the MSR/HID0-based nap
mechanism that this file uses.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:19 -05:00
Ard Biesheuvel
24bbd929e6 of/fdt: split off FDT self reservation from memreserve processing
This splits off the reservation of the memory occupied by the FDT
binary itself from the processing of the memory reservations it
contains. This is necessary because the physical address of the FDT,
which is needed to perform the reservation, may not be known to the
FDT driver core, i.e., it may be mapped outside the linear direct
mapping, in which case __pa() returns a bogus value.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-02 16:31:25 +01:00
Anton Blanchard
d20be433e6 powerpc: Non relocatable system call doesn't need a trampoline
We need to use a trampoline when using LOAD_HANDLER(), because the
destination needs to be in the first 64kB. An absolute branch has
no such limitations, so just jump there.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-02 13:26:47 +10:00
Anton Blanchard
05b05f28fb powerpc: Relocatable system call no longer uses the LR
We had some code to restore the LR in the relocatable system call path
back when we used the LR to do an indirect branch.

Commit 6a404806df ("powerpc: Avoid link stack corruption in MMU
on syscall entry path") changed this to use the CTR which is volatile
across system calls so does not need restoring.

Remove the stale comment and the restore of the LR.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-02 13:26:47 +10:00
Daniel Axtens
3405c2570f powerpc/pci: add dma_set_mask to pci_controller_ops
Some systems only need to deal with DMA masks for PCI devices.
For these systems, we can avoid the need for a platform hook and
instead use a pci controller based hook.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-02 13:18:49 +10:00
Daniel Axtens
1f88d5860e powerpc: Remove MSI-related PCI controller ops from ppc_md
Remove unneeded ppc_md functions. Patch callsites to use pci_controller_ops
functions exclusively.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-02 11:47:45 +10:00
Daniel Axtens
e059b105d1 powerpc: Add MSI operations to pci_controller_ops struct
Add MSI setup and teardown functions to pci_controller_ops.

Patch the callsites (arch_{setup,teardown}_msi_irqs) to prefer the
controller ops version if it's available.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-22 15:50:55 +10:00
Anton Blanchard
5e95235ccd powerpc: Align TOC to 256 bytes
Recent toolchains force the TOC to be 256 byte aligned. We need
to enforce this alignment in our linker script, otherwise pointers
to our TOC variables (__toc_start, __prom_init_toc_start) could
be incorrect.

If they are bad, we die a few hundred instructions into boot.

Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-14 16:59:21 +10:00
Wei Yang
f77ceb717d powerpc/eeh: remove unused macro IS_BRIDGE
Currently, the macro IS_BRIDGE is not used any where.
This patch just removes it.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-13 14:00:07 +10:00
Wei Yang
2ac3990cc3 powerpc/eeh: fix comment for wait_state()
To retrieve the PCI slot state, EEH driver would set a timeout for that.
While current comment is not aligned to what the code does.

This patch fixes those comments according to the code.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-13 14:00:07 +10:00
Wei Yang
3721352990 powerpc/eeh: fix start/end/flags type in struct pci_io_addr_range{}
struct pci_io_addr_range{} stores the information of pci resources. It
would be better to keep these related fields have the same type as in
struct resource{}.

This patch fixes the start/end/flags type in struct pci_io_addr_range{} to
have the same type as in struct resource{}.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-13 14:00:07 +10:00
Gavin Shan
ec33d36e5a powerpc/eeh: Introduce eeh_pe_inject_err()
The patch defines PCI error types and functions in uapi/asm/eeh.h
and exports function eeh_pe_inject_err(), which will be called by
VFIO driver to inject the specified PCI error to the indicated
PE for testing purpose.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-12 20:33:35 +10:00
Daniel Axtens
ffb2d78eca powerpc/mce: fix off by one errors in mce event handling
Before 69111bac42 ("powerpc: Replace __get_cpu_var uses"), in
save_mce_event, index got the value of mce_nest_count, and
mce_nest_count was incremented *after* index was set.

However, that patch changed the behaviour so that mce_nest count was
incremented *before* setting index.

This causes an off-by-one error, as get_mce_event sets index as
mce_nest_count - 1 before reading mce_event.  Thus get_mce_event reads
bogus data, causing warnings like
"Machine Check Exception, Unknown event version 0 !"
and breaking MCEs handling.

Restore the old behaviour and unbreak MCE handling by subtracting one
from the newly incremented value.

The same broken change occured in machine_check_queue_event (which set
a queue read by machine_check_process_queued_event).  Fix that too,
unbreaking printing of MCE information.

Fixes: 69111bac42 ("powerpc: Replace __get_cpu_var uses")
CC: stable@vger.kernel.org
CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
CC: Christoph Lameter <cl@linux.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-12 19:44:01 +10:00
Michael Ellerman
e0d0059169 powerpc/vdso: Disable building the 32-bit VDSO on little endian
The only little endian configuration we support is ppc64le. As such if
we're building little endian we don't need a 32-bit VDSO, because there
is no 32-bit userspace.

This patch is a fairly ugly mess of #ifdefs, but is the minimal logic
required to disable the 32-bit VDSO. We can hopefully clean up the
result in future with some further refactoring.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11 20:01:02 +10:00
Michael Ellerman
6e5c077519 powerpc/vdso: Combine start/size variables
In vdso_fixup_features() we have start64/start32 and size64/size32, but
they have the same types, ie. void * and unsigned long.

They're only used to save the return value from find_sectionXX() for the
subsequent call to do_feature_fixups(), so there's no overlap in their
usage either.

So we can just consolidate them into start/size and avoid the
duplication.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11 20:00:32 +10:00
Michael Ellerman
63da88dd48 powerpc/vdso: Remove unused debug code
It's in the git history if we ever need it back.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11 20:00:32 +10:00
Michael Ellerman
5c0aebf6e1 powerpc: Show utsname->machine in boot-up banner
Currently we print "Starting Linux PPC64" at boot. But we don't mention
anywhere whether the kernel is big or little endian.

If we print the utsname->machine value instead we get either "ppc64" or
"ppc64le" which is much more informative, eg:

  Starting Linux ppc64le #1 SMP Wed Apr 15 12:12:20 AEST 2015

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11 19:55:54 +10:00
Dan Streetman
b130e7c04f powerpc: export of_get_ibm_chip_id function
Export the of_get_ibm_chip_id() function.  This will be used by the
PowerNV NX-842 driver.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-11 15:06:37 +08:00
Sam Bobroff
0aab374709 powerpc/powernv: Restore non-volatile CRs after nap
Patches 7cba160ad "powernv/cpuidle: Redesign idle states management"
and 77b54e9f2 "powernv/powerpc: Add winkle support for offline cpus"
use non-volatile condition registers (cr2, cr3 and cr4) early in the system
reset interrupt handler (system_reset_pSeries()) before it has been determined
if state loss has occurred. If state loss has not occurred, control returns via
the power7_wakeup_noloss() path which does not restore those condition
registers, leaving them corrupted.

Fix this by restoring the condition registers in the power7_wakeup_noloss()
case.

This is apparent when running a KVM guest on hardware that does not
support winkle or sleep and the guest makes use of secondary threads. In
practice this means Power7 machines, though some early unreleased Power8
machines may also be susceptible.

The secondary CPUs are taken off line before the guest is started and
they call pnv_smp_cpu_kill_self(). This checks support for sleep
states (in this case there is no support) and power7_nap() is called.

When the CPU is woken, power7_nap() returns and because the CPU is
still off line, the main while loop executes again. The sleep states
support test is executed again, but because the tested values cannot
have changed, the compiler has optimized the test away and instead we
rely on the result of the first test, which has been left in cr3
and/or cr4. With the result overwritten, the wrong branch is taken and
power7_winkle() is called on a CPU that does not support it, leading
to it stalling.

Fixes: 7cba160ad7 ("powernv/cpuidle: Redesign idle states management")
Fixes: 77b54e9f21 ("powernv/powerpc: Add winkle support for offline cpus")
[mpe: Massage change log a bit more]
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01 16:55:11 +10:00
Gavin Shan
d91dafc02f powerpc/eeh: Delay probing EEH device during hotplug
Commit 1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH
devices in early stage, which is reasonable to pSeries platform.
However, it's wrong for PowerNV platform because the PE# isn't
determined until the resources (IO and MMIO) are assigned to
PE in hotplug case. So we have to delay probing EEH devices
for PowerNV platform until the PE# is assigned.

Fixes: ff57b454dd ("powerpc/eeh: Do probe on pci_dn")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01 13:52:32 +10:00
Gavin Shan
1ae79b78bc powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()
When asserting reset in pcibios_set_pcie_reset_state(), the PE
is enforced to (hardware) frozen state in order to drop unexpected
PCI transactions (except PCI config read/write) automatically by
hardware during reset, which would cause recursive EEH error.
However, the (software) frozen state EEH_PE_ISOLATED is missed.
When users get 0xFF from PCI config or MMIO read, EEH_PE_ISOLATED
is set in PE state retrival backend. Unfortunately, nobody (the
reset handler or the EEH recovery functinality in host) will clear
EEH_PE_ISOLATED when the PE has been passed through to guest.

The patch sets and clears EEH_PE_ISOLATED properly during reset
in function pcibios_set_pcie_reset_state() to fix the issue.

Fixes: 28158cd ("Enhance pcibios_set_pcie_reset_state()")
Reported-by: Carol L. Soto <clsoto@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Carol L. Soto <clsoto@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01 13:52:09 +10:00
Michael Ellerman
68fc378ce3 Revert "powerpc/tm: Abort syscalls in active transactions"
This reverts commit feba40362b.

Although the principle of this change is good, the implementation has a
few issues.

Firstly we can sometimes fail to abort a syscall because r12 may have
been clobbered by C code if we went down the virtual CPU accounting
path, or if syscall tracing was enabled.

Secondly we have decided that it is safer to abort the syscall even
earlier in the syscall entry path, so that we avoid the syscall tracing
path when we are transactional.

So that we have time to thoroughly test those changes we have decided to
revert this for this merge window and will merge the fixed version in
the next window.

NB. Rather than reverting the selftest we just drop tm-syscall from
TEST_PROGS so that it's not run by default.

Fixes: feba40362b ("powerpc/tm: Abort syscalls in active transactions")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-30 15:24:58 +10:00
Linus Torvalds
63905bba5b powerpc fixes for 4.1
- Fix for mm_dec_nr_pmds() from Scott.
 - Fixes for oopses seen with KVM + THP from Aneesh.
 - Build fixes from Aneesh & Shreyas.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVOsk5AAoJEFHr6jzI4aWACWMP/3EaNoeA1g8VbZWZEdRaoLvX
 W7D08DI3Dt8HLxyn2JR08jYZF0gr68XrF6OiscVVki7wVXT8fbH4jSNmBkbzNH95
 d9taScJyR1CUavkhsXivnR1qEE1Fi2KA2OW9RaNfoSt1MVtdsvOK6xXklUGksuJQ
 XygzyrRr4Dj82kuMUAMO0YDMvknMlzi3a8dzyrWZBXBZOOTWavGB6bQKtCTaOQ99
 3OFGLQ10uY7lmdHDi0t0tQ99FuYfLiJpg5fTLoUni4J5tFp8JlZ+x0Gwc0apN0cy
 Ym8EO6++qWDv8FXvYEPfVUEjbF1fyPiawUgpkMnyvXgd8K5G85SIrtkGW0Ml+6sX
 GfJH8w9hpDbF5EnWlC9bn/jT7sHBHFdrxZuQUc0L4M2OtM73R2a0Xr3b7ZxFCD1q
 7RpYu8MKKcyvaIXNg7VBJjj8zL+WmUJKF6J5uX5bGU2xH0khmp0vTknyyjbwrlcF
 uHidv5ZhMt3aAI70v14jA5BTEmLyOYRu58Ei6cT/VT/DjdbpEApdK8BMAvKSEeib
 +hzh6oDFT92AM0tbg15bNmqGbGfgqtVKe4GDS2QyGaHGAFOGs1nPuSa9se1xYDcM
 CCtRyABwpzJsrCfwra2fsTU6FxlatK4ONViyWFBXa6mEjBNSZ4XmyZvdWUqlwpSC
 F5jNGppm5Ama6xxcLphA
 =6yQx
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc fixes from Michael Ellerman:

 - fix for mm_dec_nr_pmds() from Scott.

 - fixes for oopses seen with KVM + THP from Aneesh.

 - build fixes from Aneesh & Shreyas.

* tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled
  powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error
  powerpc/mm/thp: Return pte address if we find trans_splitting.
  powerpc/mm/thp: Make page table walk safe against thp split/collapse
  KVM: PPC: Remove page table walk helpers
  KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
  powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
2015-04-26 13:23:15 -07:00
Linus Torvalds
eadf16a912 This mostly includes the PPC changes for 4.1, which this time cover
Book3S HV only (debugging aids, minor performance improvements and some
 cleanups).  But there are also bug fixes and small cleanups for ARM,
 x86 and s390.
 
 The task_migration_notifier revert and real fix is still pending review,
 but I'll send it as soon as possible after -rc1.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJVOONLAAoJEL/70l94x66DbsMIAIpZPsaqgXOC1sDEiZuYay+6
 rD4n4id7j8hIAzcf3AlZdyf5XgLlr6I1Zyt62s1WcoRq/CCnL7k9EljzSmw31WFX
 P2y7/J0iBdkn0et+PpoNThfL2GsgTqNRCLOOQlKgEQwMP9Dlw5fnUbtC1UchOzTg
 eAMeBIpYwufkWkXhdMw4PAD4lJ9WxUZ1eXHEBRzJb0o0ZxAATJ1tPZGrFJzoUOSM
 WsVNTuBsNd7upT02kQdvA1TUo/OPjseTOEoksHHwfcORt6bc5qvpctL3jYfcr7sk
 /L6sIhYGVNkjkuredjlKGLfT2DDJjSEdJb1k2pWrDRsY76dmottQubAE9J9cDTk=
 =OAi2
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second batch of KVM changes from Paolo Bonzini:
 "This mostly includes the PPC changes for 4.1, which this time cover
  Book3S HV only (debugging aids, minor performance improvements and
  some cleanups).  But there are also bug fixes and small cleanups for
  ARM, x86 and s390.

  The task_migration_notifier revert and real fix is still pending
  review, but I'll send it as soon as possible after -rc1"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits)
  KVM: arm/arm64: check IRQ number on userland injection
  KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi
  KVM: VMX: Preserve host CR4.MCE value while in guest mode.
  KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
  KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
  KVM: PPC: Book3S HV: Streamline guest entry and exit
  KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
  KVM: PPC: Book3S HV: Use decrementer to wake napping threads
  KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
  KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
  KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
  KVM: PPC: Book3S HV: Minor cleanups
  KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
  KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
  KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
  KVM: PPC: Book3S HV: Add ICP real mode counters
  KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
  KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
  KVM: PPC: Book3S HV: Add guest->host real mode completion counters
  KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
  ...
2015-04-26 13:06:22 -07:00
Paul Mackerras
66feed61cd KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:

 rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
  rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
  rm_intr:     1660.0ns (12 - 553050, 3600051 samples)

and this after the change:

 rm_entry:     3060.1ns (212 - 65138, 953873 samples)
  rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
  rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)

for a test of booting Fedora 20 big-endian to the login prompt.

The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.

The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:34 +02:00
Paul Mackerras
7d6c40da19 KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:33 +02:00
Paul Mackerras
5d5b99cd68 KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:32 +02:00
Paul Mackerras
1f09c3ed86 KVM: PPC: Book3S HV: Minor cleanups
* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:32 +02:00
Paul Mackerras
b6c295df31 KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

The overhead of the extra code amounts to about 30ns for an hcall that
is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
production environments may not wish to incur this overhead, the new
code is conditional on a new config symbol,
CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:31 +02:00
Aneesh Kumar K.V
691e95fd73 powerpc/mm/thp: Make page table walk safe against thp split/collapse
We can disable a THP split or a hugepage collapse by disabling irq.
We do send IPI to all the cpus in the early part of split/collapse,
and disabling local irq ensure we don't make progress with
split/collapse. If the THP is getting split we return NULL from
find_linux_pte_or_hugepte(). For all the current callers it should be ok.
We need to be careful if we want to use returned pte_t pointer outside
the irq disabled region. W.r.t to THP split, the pfn remains the same,
but then a hugepage collapse will result in a pfn change. There are
few steps we can take to avoid a hugepage collapse.One way is to take page
reference inside the irq disable region. Other option is to take
mmap_sem so that a parallel collapse will not happen. We can also
disable collapse by taking pmd_lock. Another method used by kvm
subsystem is to check whether we had a mmu_notifer update in between
using mmu_notifier_retry().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-17 11:23:39 +10:00
Linus Torvalds
d19d5efd8c powerpc updates for 4.1
- Numerous minor fixes, cleanups etc.
 - More EEH work from Gavin to remove its dependency on device_nodes.
 - Memory hotplug implemented entirely in the kernel from Nathan Fontenot.
 - Removal of redundant CONFIG_PPC_OF by Kevin Hao.
 - Rewrite of VPHN parsing logic & tests from Greg Kurz.
 - A fix from Nish Aravamudan to reduce memory usage by clamping
   nodes_possible_map.
 - Support for pstore on powernv from Hari Bathini.
 - Removal of old powerpc specific byte swap routines by David Gibson.
 - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing
   your firmware when it wasn't.
 - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.
 - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek.
 - Some fixes for migration from Tyrel Datwyler.
 - A new syscall to switch the cpu endian by Michael Ellerman.
 - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn.
 - A fix for the OPAL sensor driver from Cédric Le Goater.
 - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.
 - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per
   machine.
 - Small patch from Sam Bobroff to explicitly abort non-suspended transactions
   on syscalls, plus a test to exercise it.
 - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.
 - Small patch to enable the hard lockup detector from Anton Blanchard.
 - Fix from Dave Olson for missing L2 cache information on some CPUs.
 - Some fixes from Michael Ellerman to get Cell machines booting again.
 - Freescale updates from Scott: Highlights include BMan device tree nodes, an
   MSI erratum workaround, a couple minor performance improvements, config
   updates, and misc fixes/cleanup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVL2cxAAoJEFHr6jzI4aWAR8cP/19VTo/CzCE4ffPSx7qR464n
 F+WFZcbNjIMXu6+B0YLuJZEsuWtKKrCit/MCg3+mSgE4iqvxmtI+HDD0445Buszj
 UD4E4HMdPrXQ+KUSUDORvRjv/FFUXIa94LSv/0g2UeMsPz/HeZlhMxEu7AkXw9Nf
 rTxsmRTsOWME85Y/c9ss7XHuWKXT3DJV7fOoK9roSaN3dJAuWTtG3WaKS0nUu0ok
 0M81D6ZczoD6ybwh2DUMPD9K6SGxLdQ4OzQwtW6vWzcQIBDfy5Pdeo0iAFhGPvXf
 T4LLPkv4cF4AwHsAC4rKDPHQNa+oZBoLlScrHClaebAlDiv+XYKNdMogawUObvSh
 h7avKmQr0Ygp1OvvZAaXLhuDJI9FJJ8lf6AOIeULgHsDR9SyKMjZWxRzPe11uarO
 Fyi0qj3oJaQu6LjazZraApu8mo+JBtQuD3z3o5GhLxeFtBBF60JXj6zAXJikufnl
 kk1/BUF10nKUhtKcDX767AMUCtMH3fp5hx8K/z9T5v+pobJB26Wup1bbdT68pNBT
 NjdKUppV6QTjZvCsA6U2/ECu6E9KeIaFtFSL2IRRoiI0dWBN5/5eYn3RGkO2ZFoL
 1NdwKA2XJcchwTPkpSRrUG70sYH0uM2AldNYyaLfjzrQqza7Y6lF699ilxWmCN/H
 OplzJAE5cQ8Am078veTW
 =03Yh
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - Numerous minor fixes, cleanups etc.

 - More EEH work from Gavin to remove its dependency on device_nodes.

 - Memory hotplug implemented entirely in the kernel from Nathan
   Fontenot.

 - Removal of redundant CONFIG_PPC_OF by Kevin Hao.

 - Rewrite of VPHN parsing logic & tests from Greg Kurz.

 - A fix from Nish Aravamudan to reduce memory usage by clamping
   nodes_possible_map.

 - Support for pstore on powernv from Hari Bathini.

 - Removal of old powerpc specific byte swap routines by David Gibson.

 - Fix from Vasant Hegde to prevent the flash driver telling you it was
   flashing your firmware when it wasn't.

 - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.

 - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan
   Stancek.

 - Some fixes for migration from Tyrel Datwyler.

 - A new syscall to switch the cpu endian by Michael Ellerman.

 - Large series from Wei Yang to implement SRIOV, reviewed and acked by
   Bjorn.

 - A fix for the OPAL sensor driver from Cédric Le Goater.

 - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.

 - Large series from Daniel Axtens to make our PCI hooks per PHB rather
   than per machine.

 - Small patch from Sam Bobroff to explicitly abort non-suspended
   transactions on syscalls, plus a test to exercise it.

 - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.

 - Small patch to enable the hard lockup detector from Anton Blanchard.

 - Fix from Dave Olson for missing L2 cache information on some CPUs.

 - Some fixes from Michael Ellerman to get Cell machines booting again.

 - Freescale updates from Scott: Highlights include BMan device tree
   nodes, an MSI erratum workaround, a couple minor performance
   improvements, config updates, and misc fixes/cleanup.

* tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits)
  powerpc/powermac: Fix build error seen with powermac smp builds
  powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE
  powerpc: Remove PPC32 code from pseries specific find_and_init_phbs()
  powerpc/cell: Fix iommu breakage caused by controller_ops change
  powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
  powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
  powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
  powerpc/pseries: Correct memory hotplug locking
  powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
  powerpc: Add ppc64 hard lockup detector support
  oprofile: Disable oprofile NMI timer on ppc64
  powerpc/perf/hv-24x7: Add missing put_cpu_var()
  powerpc/perf/hv-24x7: Break up single_24x7_request
  powerpc/perf/hv-24x7: Define update_event_count()
  powerpc/perf/hv-24x7: Whitespace cleanup
  powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
  powerpc/perf/hv-24x7: Rename hv_24x7_event_update
  powerpc/perf/hv-24x7: Move debug prints to separate function
  powerpc/perf/hv-24x7: Drop event_24x7_request()
  powerpc/perf/hv-24x7: Use pr_devel() to log message
  ...

Conflicts:
	tools/testing/selftests/powerpc/Makefile
	tools/testing/selftests/powerpc/tm/Makefile
2015-04-16 13:53:32 -05:00
Linus Torvalds
d0bbe0dd35 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Usual trivial tree updates.  Nothing outstanding -- mostly printk()
  and comment fixes and unused identifier removals"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  goldfish: goldfish_tty_probe() is not using 'i' any more
  powerpc: Fix comment in smu.h
  qla2xxx: Fix printks in ql_log message
  lib: correct link to the original source for div64_u64
  si2168, tda10071, m88ds3103: Fix firmware wording
  usb: storage: Fix printk in isd200_log_config()
  qla2xxx: Fix printk in qla25xx_setup_mode
  init/main: fix reset_device comment
  ipwireless: missing assignment
  goldfish: remove unreachable line of code
  coredump: Fix do_coredump() comment
  stacktrace.h: remove duplicate declaration task_struct
  smpboot.h: Remove unused function prototype
  treewide: Fix typo in printk messages
  treewide: Fix typo in printk messages
  mod_devicetable: fix comment for match_flags
2015-04-14 09:50:27 -07:00
Michael Ellerman
89a51df5ab powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
The recent change to the EEH probing causes a crash on Cell because
eeh_ops is NULL.

Check if EEH is enabled and if not bail out.

Fixes: ff57b454dd ("powerpc/eeh: Do probe on pci_dn")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-14 17:13:31 +10:00
Michael Ellerman
ad30cb9946 Merge branch 'next-sriov' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into next
Merge Richard's work to support SR-IOV on PowerNV. All generic PCI
patches acked by Bjorn.

Some minor conflicts with Daniel's pci_controller_ops work.

Conflicts:
	arch/powerpc/include/asm/machdep.h
	arch/powerpc/platforms/powernv/pci-ioda.c
2015-04-14 09:29:23 +10:00
Dave Olson
f7e9e35836 powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
This problem appears to have been introduced in 2.6.29 by commit
93197a36a9 "Rewrite sysfs processor cache info code".

This caused lscpu to error out on at least e500v2 devices, eg:

  error: cannot open /sys/devices/system/cpu/cpu0/cache/index2/size: No such file or directory

Some embedded powerpc systems use cache-size in DTS for the unified L2
cache size, not d-cache-size, so we need to allow for both DTS names.
Added a new CACHE_TYPE_UNIFIED_D cache_type_info structure to handle
this.

Fixes: 93197a36a9 ("powerpc: Rewrite sysfs processor cache info code")
Signed-off-by: Dave Olson <olson@cumulusnetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:28 +10:00
Anton Blanchard
c54b2bf1b5 powerpc: Add ppc64 hard lockup detector support
The hard lockup detector uses a PMU event as a periodic NMI to
detect if we are stuck (where stuck means no timer interrupts have
occurred).

Ben's rework of the ppc64 soft disable code has made ppc64 PMU
exceptions a partial NMI. They can get disabled if an external
interrupt comes in, but otherwise PMU interrupts will fire in
interrupt disabled regions.

We disable the hard lockup detector by default for a few reasons:

- It breaks userspace event based branches on POWER8.
- It is likely to produce false positives on KVM guests.
- Since PMCs can only count to 2^31, counting cycles means we might
  take multiple PMU exceptions per second per hardware thread even
  if our hard lockup timeout is 10 seconds.

It can be enabled via a boot option, or via procfs.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:27 +10:00
Sam bobroff
feba40362b powerpc/tm: Abort syscalls in active transactions
This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return immediately without
performing the syscall.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported. It doesn't reduce functionality because
syscalls will still be performed after tsuspend. It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c
indicate the cost of a system call increases by about 0.5%.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:19 +10:00
Daniel Axtens
467efc2e4f powerpc: Remove shims for pci_controller_ops operations
Remove shims, patch callsites to use pci_controller_ops
versions instead.

Also move back the probe mode defines, as explained in the patch
for pci_probe_mode.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:18 +10:00
Daniel Axtens
97884e00e2 powerpc: fsl_pci, swiotlb: Move controller ops from ppc_md to controller_ops
Move the installation of DMA operations out of swiotlb's subsys
initcall, and into the generic PCI controller operations struct.

These ops are installed conditionally, based on the ppc_swiotlb_enable
global. The global can be set in two places:
 - swiotlb_detect_4g, which is always called at the arch initcall level
 - setup_pci_atmu, which is called as part of the fsl_add_bridge and
fsl_pci_syscore_do_resume.

fsl_pci_syscore_do_resume is called late enough that any changes as a
result of that call will have no effect.

As such, if we test the global and set the operations as part of
fsl_add_bridge, after the call to setup_pci_atmu, we can be confident
that it will cover all the PCI implementations affected by the changes
to dma-swiotlb.c.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:17 +10:00
Daniel Axtens
cd16c7ba0c powerpc: Create pci_controller_ops.reset_secondary_bus and shim
Add pci_controller_ops.reset_secondary_bus,
shadowing ppc_md.pcibios_reset_secondary_bus.
Add a shim, and changes the callsites to use the shim.

Use pcibios_reset_secondary_bus_shim, as both
pcibios_reset_secondary_bus and pci_reset_secondary_bus
are already taken.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:14 +10:00
Daniel Axtens
542070baf4 powerpc: Create pci_controller_ops.window_alignment and shim
Add pci_controller_ops.window_alignment,
shadowing ppc_md.pcibios_window_alignment.
Add a shim, and changes the callsites to use the shim.

Here, we use pci_window_alignment, as pcibios_window_alignment is
already taken.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:13 +10:00
Daniel Axtens
b31e79f8d9 powerpc: Create pci_controller_ops.enable_device_hook and shim
Add pci_controller_ops.enable_device_hook,
shadowing ppc_md.pcibios_enable_device_hook.
Add a shim, and changes the callsites to use the shim.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:13 +10:00
Daniel Axtens
ff9df8c87d powerpc: Create pci_controller_ops.probe_mode and shim
Add pci_controller_ops.probe_mode, shadowing ppc_md.pci_probe_mode.
Add a shim, and changes the callsites to use the shim.

We also need to move the probe mode defines to pci-bridge.h from pci.h.
They are required by the shim in order to return a sensible default.
Previously, the were defined in pci.h, but pci.h includes pci-bridge.h
before the relevant #defines. This means the definitions are absent
if pci.h is included before pci-bridge.h. This occurs in some drivers.
So, move the definitons now, and move them back when we remove the shim.

Anything that wants the defines would have had to include pci.h, and
since pci.h includes pci-bridge.h, nothing will lose access to the
defines.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:12 +10:00
Daniel Axtens
b122c95494 powerpc: Create pci_controller_ops.dma_bus_setup and shim
Add pci_controller_ops.dma_bus_setup, shadowing ppc_md.pci_dma_bus_setup.
Add a shim, and changes the callsites to use the shim.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:11 +10:00
Daniel Axtens
e02def5bce powerpc: Create pci_controller_ops.dma_dev_setup and shim
Introduces the pci_controller_ops structure.
Add pci_controller_ops.dma_dev_setup, shadowing ppc_md.pci_dma_dev_setup.
Add a shim, and change the callsites to use the shim.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:11 +10:00
Daniel Axtens
c88c2a1889 powerpc: pcibios_enable_device_hook: return bool rather than int
pcibios_enable_device_hook returned an int. Every implementation
returned either -EINVAL or 0. The return value wasn't propagated by
the caller: any non-zero return value caused pcibios_enable_device
to return -EINVAL itself. Therefore, make the hook return a bool.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:10 +10:00
Daniel Axtens
bdc728a849 powerpc: move find_and_init_phbs() to pSeries specific code
Previously, find_and_init_phbs() was used in both PowerNV and pSeries
setup. However, since RTAS support has been dropped from PowerNV, we
can move it into a platform-specific file.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:09 +10:00
Michael Ellerman
7e862d7e7d powerpc: Reword the "returning from prom_init" message
We get way too many bug reports that say "the kernel is hung in
prom_init", which stems from the fact that the last piece of output
people see is "returning from prom_init".

The kernel is almost never hung in prom_init(), it's just that it's
crashed somewhere after prom_init() but prior to the console coming up.

The existing message should give a clue to that, ie. "returning from"
indicates that prom_init() has finished, but it doesn't seem to work.
Let's try something different.

This prints:

  Quiescing Open Firmware ...
  Booting Linux via __start() ...

Which hopefully makes it clear that prom_init() is not the problem, and
although __start() probably isn't either, it's at least the right place
to begin looking.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wistfully-Acked-by: Jeremy Kerr <jk@ozlabs.org>
2015-04-10 20:02:48 +10:00
Michael Ellerman
f691fa1080 powerpc: Replace mem_init_done with slab_is_available()
We have a powerpc specific global called mem_init_done which is "set on
boot once kmalloc can be called".

But that's not *quite* true. We set it at the bottom of mem_init(), and
rely on the fact that mm_init() calls kmem_cache_init() immediately
after that, and nothing is running in parallel.

So replace it with the generic and 100% correct slab_is_available().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-10 20:02:48 +10:00
Michael Ellerman
bf4981a006 powerpc: Remove the celleb support
The celleb code has seen no actual development for ~7 years.

We (maintainers) have no access to test hardware, and it is highly
likely the code has bit-rotted.

As far as we're aware the hardware was never widely available, and is
certainly no longer available, and no one on the list has shown any
interest in it over the years.

So remove it. If anyone has one and cares please speak up.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
2015-04-07 17:15:13 +10:00
Michael Ellerman
428d4d6520 Merge branch 'next-eeh' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into next 2015-04-07 13:24:55 +10:00
Benjamin Herrenschmidt
d4ed11aa48 Merge branch 'next-eeh' into next-sriov
Merge in Gavin EEH fixes
2015-03-31 13:11:17 +11:00
Gavin Shan
433185d2b4 powerpc/eeh: Fix PE#0 check in eeh_add_to_parent_pe()
The function eeh_add_parent_pe() is used to create a PE or add one
edev to its parent PE. Current code checks if PE#0 is valid for the
later case. Actually, we should validate PE#0 for both cases when
EEH core regards PE#0 as invalid one (without flag EEH_VALID_PE_ZERO).
Otherwise, not all EEH devices can be added to its parent PE#0 for
EEH on P7IOC.

The patch fixes the issue by validating PE#0 for the two cases. So far,
we don't have PE#0 for EEH on P7IOC, but it will show up when we enable
M64 for P7IOC. The patch also makes the error message more meaningful.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:10:39 +11:00
Wei Yang
781a868f31 powerpc/powernv: Shift VF resource with an offset
On PowerNV platform, resource position in M64 BAR implies the PE# the
resource belongs to. In some cases, adjustment of a resource is necessary
to locate it to a correct position in M64 BAR .

This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR
address according to an offset.

Note:

    After doing so, there would be a "hole" in the /proc/iomem when offset
    is a positive value. It looks like the device return some mmio back to
    the system, which actually no one could use it.

[bhelgaas: rework loops, rework overlap check, index resource[]
conventionally, remove pci_regs.h include, squashed with next patch]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:38 +11:00
Wei Yang
5350ab3fd7 powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv
Implement pcibios_iov_resource_alignment() on powernv platform.

On PowerNV platform, there are 3 cases for the IOV BAR:
1. initial state, the IOV BAR size is multiple times of VF BAR size
2. after expanded, the IOV BAR size is expanded to meet the M64 segment size
3. sizing stage, the IOV BAR is truncated to 0

pnv_pci_iov_resource_alignment() handle these three cases respectively.

[bhelgaas: adjust to drop "align" parameter, return pci_iov_resource_size()
if no ppc_md machdep_call version]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00