Commit Graph

4044 Commits

Author SHA1 Message Date
Linus Torvalds
1b17366d69 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "So here's my next branch for powerpc.  A bit late as I was on vacation
  last week.  It's mostly the same stuff that was in next already, I
  just added two patches today which are the wiring up of lockref for
  powerpc, which for some reason fell through the cracks last time and
  is trivial.

  The highlights are, in addition to a bunch of bug fixes:

   - Reworked Machine Check handling on kernels running without a
     hypervisor (or acting as a hypervisor).  Provides hooks to handle
     some errors in real mode such as TLB errors, handle SLB errors,
     etc...

   - Support for retrieving memory error information from the service
     processor on IBM servers running without a hypervisor and routing
     them to the memory poison infrastructure.

   - _PAGE_NUMA support on server processors

   - 32-bit BookE relocatable kernel support

   - FSL e6500 hardware tablewalk support

   - A bunch of new/revived board support

   - FSL e6500 deeper idle states and altivec powerdown support

  You'll notice a generic mm change here, it has been acked by the
  relevant authorities and is a pre-req for our _PAGE_NUMA support"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits)
  powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked()
  powerpc: Add support for the optimised lockref implementation
  powerpc/powernv: Call OPAL sync before kexec'ing
  powerpc/eeh: Escalate error on non-existing PE
  powerpc/eeh: Handle multiple EEH errors
  powerpc: Fix transactional FP/VMX/VSX unavailable handlers
  powerpc: Don't corrupt transactional state when using FP/VMX in kernel
  powerpc: Reclaim two unused thread_info flag bits
  powerpc: Fix races with irq_work
  Move precessing of MCE queued event out from syscall exit path.
  pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines
  powerpc: Make add_system_ram_resources() __init
  powerpc: add SATA_MV to ppc64_defconfig
  powerpc/powernv: Increase candidate fw image size
  powerpc: Add debug checks to catch invalid cpu-to-node mappings
  powerpc: Fix the setup of CPU-to-Node mappings during CPU online
  powerpc/iommu: Don't detach device without IOMMU group
  powerpc/eeh: Hotplug improvement
  powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space
  powerpc/eeh: Add restore_config operation
  ...
2014-01-27 21:11:26 -08:00
Linus Torvalds
e1ba84597c PCI changes for the v3.14 merge window:
Resource management
     - Change pci_bus_region addresses to dma_addr_t (Bjorn Helgaas)
     - Support 64-bit AGP BARs (Bjorn Helgaas, Yinghai Lu)
     - Add pci_bus_address() to get bus address of a BAR (Bjorn Helgaas)
     - Use pci_resource_start() for CPU address of AGP BARs (Bjorn Helgaas)
     - Enforce bus address limits in resource allocation (Yinghai Lu)
     - Allocate 64-bit BARs above 4G when possible (Yinghai Lu)
     - Convert pcibios_resource_to_bus() to take pci_bus, not pci_dev (Yinghai Lu)
 
   PCI device hotplug
     - Major rescan/remove locking update (Rafael J. Wysocki)
     - Make ioapic builtin only (not modular) (Yinghai Lu)
     - Fix release/free issues (Yinghai Lu)
     - Clean up pciehp (Bjorn Helgaas)
     - Announce pciehp slot info during enumeration (Bjorn Helgaas)
 
   MSI
     - Add pci_msi_vec_count(), pci_msix_vec_count() (Alexander Gordeev)
     - Add pci_enable_msi_range(), pci_enable_msix_range() (Alexander Gordeev)
     - Deprecate "tri-state" interfaces: fail/success/fail+info (Alexander Gordeev)
     - Export MSI mode using attributes, not kobjects (Greg Kroah-Hartman)
     - Drop "irq" param from *_restore_msi_irqs() (DuanZhenzhong)
 
   SR-IOV
     - Clear NumVFs when disabling SR-IOV in sriov_init() (ethan.zhao)
 
   Virtualization
     - Add support for save/restore of extended capabilities (Alex Williamson)
     - Add Virtual Channel to save/restore support (Alex Williamson)
     - Never treat a VF as a multifunction device (Alex Williamson)
     - Add pci_try_reset_function(), et al (Alex Williamson)
 
   AER
     - Ignore non-PCIe error sources (Betty Dall)
     - Support ACPI HEST error sources for domains other than 0 (Betty Dall)
     - Consolidate HEST error source parsers (Bjorn Helgaas)
     - Add a TLP header print helper (Borislav Petkov)
 
   Freescale i.MX6
     - Remove unnecessary code (Fabio Estevam)
     - Make reset-gpio optional (Marek Vasut)
     - Report "link up" only after link training completes (Marek Vasut)
     - Start link in Gen1 before negotiating for Gen2 mode (Marek Vasut)
     - Fix PCIe startup code (Richard Zhu)
 
   Marvell MVEBU
     - Remove duplicate of_clk_get_by_name() call (Andrew Lunn)
     - Drop writes to bridge Secondary Status register (Jason Gunthorpe)
     - Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits (Jason Gunthorpe)
     - Support a bridge with no IO port window (Jason Gunthorpe)
     - Use max_t() instead of max(resource_size_t,) (Jingoo Han)
     - Remove redundant of_match_ptr (Sachin Kamat)
     - Call pci_ioremap_io() at startup instead of dynamically (Thomas Petazzoni)
 
   NVIDIA Tegra
     - Disable Gen2 for Tegra20 and Tegra30 (Eric Brower)
 
   Renesas R-Car
     - Add runtime PM support (Valentine Barshak)
     - Fix rcar_pci_probe() return value check (Wei Yongjun)
 
   Synopsys DesignWare
     - Fix crash in dw_msi_teardown_irq() (Bjørn Erik Nilsen)
     - Remove redundant call to pci_write_config_word() (Bjørn Erik Nilsen)
     - Fix missing MSI IRQs (Harro Haan)
     - Add dw_pcie prefix before cfg_read/write (Pratyush Anand)
     - Fix I/O transfers by using CPU (not realio) address (Pratyush Anand)
     - Whitespace cleanup (Jingoo Han)
 
   EISA
     - Call put_device() if device_register() fails (Levente Kurusa)
     - Revert EISA initialization breakage ((Bjorn Helgaas)
 
   Miscellaneous
     - Remove unused code, including PCIe 3.0 interfaces (Stephen Hemminger)
     - Prevent bus conflicts while checking for bridge apertures (Bjorn Helgaas)
     - Stop clearing bridge Secondary Status when setting up I/O aperture (Bjorn Helgaas)
     - Use dev_is_pci() to identify PCI devices (Yijing Wang)
     - Deprecate DEFINE_PCI_DEVICE_TABLE (Joe Perches)
     - Update documentation 00-INDEX (Erik Ekman)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJS3ujEAAoJEFmIoMA60/r8A4EQAK9AZSUSVNWvlKdC1PrBfT3w
 7fVILx5A4KWsOU8eoFwCPQLrgvUtMltg16yN2tbCjqpKEdrVc36biMO9bwhnXSyZ
 KopHKMWnn0sza/z2H8mcGy+0azGdWcIjcErX/a8WeS6zyWBjm+yzckrHNVpPu4Ca
 SpCBhfgBMjKyIZyLtP6juFSH34S2DfQex4oUSyPC+gjqPy5wW/xw/kBxZfOXl+yU
 P9pQT+geMIc31pETMdG9wd/TT+47YAui4ieSggoVxfVrphCXv6S8mOMCMuQc2bAy
 MHy9uFm1jbvKZZIYrzJ+9HFiiU/6MNiOO3Ygua52xuSp1Zrcjwi2CLD9/QBXbDVs
 pTKU5JIO7q43llkQUpIXTwBvEApSZRhuqzXegsMAYIg4AWmbfm/2fXkfWlQThYMp
 J48blAJZ4t0vhMr9usgwbtdBe8F5euExOxpwH0QMCMABbuu8/B3TLm39+LTcIbsw
 Efgm3N9iUTyiV5fe9Rr62nflhyqXjTevPl4wbZZe4OOdm0MXZY+/BzuNJhg3wyY8
 QKz2J3FB6OR7BCLHCp80l50s5+Ih4F5kmOXwFKjT1D1MFRaNaPDmp9BY6TitU6hg
 zj55gP4c8x6n3alakbf972Yhgs/4oi1va8cZL+pCYWb8nPO5ldaMiT7QBBLUreQV
 BtDtC7u/AFWJ5e73+jVO
 =La1R
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for the v3.14 merge window:

  Resource management
    - Change pci_bus_region addresses to dma_addr_t (Bjorn Helgaas)
    - Support 64-bit AGP BARs (Bjorn Helgaas, Yinghai Lu)
    - Add pci_bus_address() to get bus address of a BAR (Bjorn Helgaas)
    - Use pci_resource_start() for CPU address of AGP BARs (Bjorn Helgaas)
    - Enforce bus address limits in resource allocation (Yinghai Lu)
    - Allocate 64-bit BARs above 4G when possible (Yinghai Lu)
    - Convert pcibios_resource_to_bus() to take pci_bus, not pci_dev (Yinghai Lu)

  PCI device hotplug
    - Major rescan/remove locking update (Rafael J. Wysocki)
    - Make ioapic builtin only (not modular) (Yinghai Lu)
    - Fix release/free issues (Yinghai Lu)
    - Clean up pciehp (Bjorn Helgaas)
    - Announce pciehp slot info during enumeration (Bjorn Helgaas)

  MSI
    - Add pci_msi_vec_count(), pci_msix_vec_count() (Alexander Gordeev)
    - Add pci_enable_msi_range(), pci_enable_msix_range() (Alexander Gordeev)
    - Deprecate "tri-state" interfaces: fail/success/fail+info (Alexander Gordeev)
    - Export MSI mode using attributes, not kobjects (Greg Kroah-Hartman)
    - Drop "irq" param from *_restore_msi_irqs() (DuanZhenzhong)

  SR-IOV
    - Clear NumVFs when disabling SR-IOV in sriov_init() (ethan.zhao)

  Virtualization
    - Add support for save/restore of extended capabilities (Alex Williamson)
    - Add Virtual Channel to save/restore support (Alex Williamson)
    - Never treat a VF as a multifunction device (Alex Williamson)
    - Add pci_try_reset_function(), et al (Alex Williamson)

  AER
    - Ignore non-PCIe error sources (Betty Dall)
    - Support ACPI HEST error sources for domains other than 0 (Betty Dall)
    - Consolidate HEST error source parsers (Bjorn Helgaas)
    - Add a TLP header print helper (Borislav Petkov)

  Freescale i.MX6
    - Remove unnecessary code (Fabio Estevam)
    - Make reset-gpio optional (Marek Vasut)
    - Report "link up" only after link training completes (Marek Vasut)
    - Start link in Gen1 before negotiating for Gen2 mode (Marek Vasut)
    - Fix PCIe startup code (Richard Zhu)

  Marvell MVEBU
    - Remove duplicate of_clk_get_by_name() call (Andrew Lunn)
    - Drop writes to bridge Secondary Status register (Jason Gunthorpe)
    - Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits (Jason Gunthorpe)
    - Support a bridge with no IO port window (Jason Gunthorpe)
    - Use max_t() instead of max(resource_size_t,) (Jingoo Han)
    - Remove redundant of_match_ptr (Sachin Kamat)
    - Call pci_ioremap_io() at startup instead of dynamically (Thomas Petazzoni)

  NVIDIA Tegra
    - Disable Gen2 for Tegra20 and Tegra30 (Eric Brower)

  Renesas R-Car
    - Add runtime PM support (Valentine Barshak)
    - Fix rcar_pci_probe() return value check (Wei Yongjun)

  Synopsys DesignWare
    - Fix crash in dw_msi_teardown_irq() (Bjørn Erik Nilsen)
    - Remove redundant call to pci_write_config_word() (Bjørn Erik Nilsen)
    - Fix missing MSI IRQs (Harro Haan)
    - Add dw_pcie prefix before cfg_read/write (Pratyush Anand)
    - Fix I/O transfers by using CPU (not realio) address (Pratyush Anand)
    - Whitespace cleanup (Jingoo Han)

  EISA
    - Call put_device() if device_register() fails (Levente Kurusa)
    - Revert EISA initialization breakage ((Bjorn Helgaas)

  Miscellaneous
    - Remove unused code, including PCIe 3.0 interfaces (Stephen Hemminger)
    - Prevent bus conflicts while checking for bridge apertures (Bjorn Helgaas)
    - Stop clearing bridge Secondary Status when setting up I/O aperture (Bjorn Helgaas)
    - Use dev_is_pci() to identify PCI devices (Yijing Wang)
    - Deprecate DEFINE_PCI_DEVICE_TABLE (Joe Perches)
    - Update documentation 00-INDEX (Erik Ekman)"

* tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (119 commits)
  Revert "EISA: Initialize device before its resources"
  Revert "EISA: Log device resources in dmesg"
  vfio-pci: Use pci "try" reset interface
  PCI: Check parent kobject in pci_destroy_dev()
  xen/pcifront: Use global PCI rescan-remove locking
  powerpc/eeh: Use global PCI rescan-remove locking
  PCI: Fix pci_check_and_unmask_intx() comment typos
  PCI: Add pci_try_reset_function(), pci_try_reset_slot(), pci_try_reset_bus()
  MPT / PCI: Use pci_stop_and_remove_bus_device_locked()
  platform / x86: Use global PCI rescan-remove locking
  PCI: hotplug: Use global PCI rescan-remove locking
  pcmcia: Use global PCI rescan-remove locking
  ACPI / hotplug / PCI: Use global PCI rescan-remove locking
  ACPI / PCI: Use global PCI rescan-remove locking in PCI root hotplug
  PCI: Add global pci_lock_rescan_remove()
  PCI: Cleanup pci.h whitespace
  PCI: Reorder so actual code comes before stubs
  PCI/AER: Support ACPI HEST AER error sources for PCI domains other than 0
  ACPICA: Add helper macros to extract bus/segment numbers from HEST table.
  PCI: Make local functions static
  ...
2014-01-22 16:39:28 -08:00
Linus Torvalds
9326657abe Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Add Intel RAPL energy counter support (Stephane Eranian)
   - Clean up uprobes (Oleg Nesterov)
   - Optimize ring-buffer writes (Peter Zijlstra)

  Tooling side changes, user visible:

   - 'perf diff':
     - Add column colouring improvements (Ramkumar Ramachandra)

  - 'perf kvm':
     - Add guest related improvements, including allowing to specify a
       directory with guest specific /proc information (Dongsheng Yang)
     - Add shell completion support (Ramkumar Ramachandra)
     - Add '-v' option (Dongsheng Yang)
     - Support --guestmount (Dongsheng Yang)

   - 'perf probe':
     - Support showing source code, asking for variables to be collected
       at probe time and other 'perf probe' operations that use DWARF
       information.

       This supports only binaries with debugging information at this
       time, detached debuginfo (aka debuginfo packages) support should
       come in later patches (Masami Hiramatsu)

   - 'perf record':
     - Rename --no-delay option to --no-buffering, better reflecting its
       purpose and freeing up '--delay' to take the place of
       '--initial-delay', so that 'record' and 'stat' are consistent
       (Arnaldo Carvalho de Melo)
     - Default the -t/--thread option to no inheritance (Adrian Hunter)
     - Make per-cpu mmaps the default (Adrian Hunter)

   - 'perf report':
     - Improve callchain processing performance (Frederic Weisbecker)
     - Retain bfd reference to lookup source line numbers, greatly
       optimizing, among other use cases, 'perf report -s srcline'
       (Adrian Hunter)
     - Improve callchain processing performance even more (Namhyung Kim)
     - Add a perf.data file header window in the 'perf report' TUI,
       associated with the 'i' hotkey, providing a counterpart to the
       --header option in the stdio UI (Namhyung Kim)

   - 'perf script':
     - Add an option in 'perf script' to print the source line number
       (Adrian Hunter)
     - Add --header/--header-only options to 'script' and 'report', the
       default is not tho show the header info, but as this has been the
       default for some time, leave a single line explaining how to
       obtain that information (Jiri Olsa)
     - Add options to show comm, fork, exit and mmap PERF_RECORD_ events
       (Namhyung Kim)
     - Print callchains and symbols if they exist (David Ahern)

   - 'perf timechart'
     - Add backtrace support to CPU info
     - Print pid along the name
     - Add support for CPU topology
     - Add new option --highlight'ing threads, be it by name or, if a
       numeric value is provided, that run more than given duration
       (Stanislav Fomichev)

   - 'perf top':
     - Make 'perf top -g' refer to callchains, for consistency with
       other tools (David Ahern)

   - 'perf trace':
     - Handle old kernels where the "raw_syscalls" tracepoints were
       called plain "syscalls" (David Ahern)
     - Remove thread summary coloring, by Pekka Enberg.
     - Honour -m option in 'trace', the tool was offering the option to
       set the mmap size, but wasn't using it when doing the actual mmap
       on the events file descriptors (Jiri Olsa)

   - generic:
     - Backport libtraceevent plugin support (trace-cmd repository, with
       plugins for jbd2, hrtimer, kmem, kvm, mac80211, sched_switch,
       function, xen, scsi, cfg80211 (Jiri Olsa)
     - Print session information only if --stdio is given (Namhyung Kim)

  Tooling side changes, developer visible (plumbing):

   - Improve 'perf probe' exit path, release resources (Masami
     Hiramatsu)
   - Improve libtraceevent plugins exit path, allowing the registering
     of an unregister handler to be called at exit time (Namhyung Kim)
   - Add an alias to the build test makefile (make -C tools/perf
     build-test) (Namhyung Kim)
   - Get rid of die() and friends (good riddance!) in libtraceevent
     (Namhyung Kim)
   - Fix cross build problems related to pkgconfig and CROSS_COMPILE not
     being propagated to the feature tests, leading to features being
     tested in the host and then being enabled on the target (Mark
     Rutland)
   - Improve forked workload error reporting by sending the errno in the
     signal data queueing integer field, using sigqueue and by doing the
     signal setup in the evlist methods, removing open coded equivalents
     in various tools (Arnaldo Carvalho de Melo)
   - Do more auto exit cleanup chores in the 'evlist' destructor, so
     that the tools don't have to all do that sequence (Arnaldo Carvalho
     de Melo)
   - Pack 'struct perf_session_env' and 'struct trace' (Arnaldo Carvalho
     de Melo)
   - Add test for building detached source tarballs (Arnaldo Carvalho de
     Melo)
   - Move some header files (tools/perf/ to tools/include/ to make them
     available to other tools/ dwelling codebases (Namhyung Kim)
   - Move logic to warn about kptr_restrict'ed kernels to separate
     function in 'report' (Arnaldo Carvalho de Melo)
   - Move hist browser selection code to separate function (Arnaldo
     Carvalho de Melo)
   - Move histogram entries collapsing to separate function (Arnaldo
     Carvalho de Melo)
   - Introduce evlist__for_each() & friends (Arnaldo Carvalho de Melo)
   - Automate setup of FEATURE_CHECK_(C|LD)FLAGS-all variables (Jiri
     Olsa)
   - Move arch setup into seprate Makefile (Jiri Olsa)
   - Make libtraceevent install target quieter (Jiri Olsa)
   - Make tests/make output more compact (Jiri Olsa)
   - Ignore generated files in feature-checks (Chunwei Chen)
   - Introduce pevent_filter_strerror() in libtraceevent, similar in
     purpose to libc's strerror() function (Namhyung Kim)
   - Use perf_data_file methods to write output file in 'record' and
     'inject' (Jiri Olsa)
   - Use pr_*() functions where applicable in 'report' (Namhyumg Kim)
   - Add 'machine' 'addr_location' struct to have full picture (machine,
     thread, map, symbol, addr) for a (partially) resolved address,
     reducing function signatures (Arnaldo Carvalho de Melo)
   - Reduce code duplication in the histogram entry creation/insertion
     (Arnaldo Carvalho de Melo)
   - Auto allocate annotation histogram data structures (Arnaldo
     Carvalho de Melo)
   - No need to test against NULL before calling free, also set freed
     memory in struct pointers to NULL, to help fixing use after free
     bugs (Arnaldo Carvalho de Melo)
   - Rename some struct DSO binary_type related members and methods, to
     clarify its purpose and need for differentiation (symtab_type, ie
     one is about the files .text, CFI, etc, i.e.  its binary contents,
     and the other is about where the symbol table came from (Arnaldo
     Carvalho de Melo)
   - Convert to new topic libraries, starting with an API one (sysfs,
     debugfs, etc), renaming liblk in the process (Borislav Petkov)
   - Get rid of some more panic() like error handling in libtraceevent.
     (Namhyung Kim)
   - Get rid of panic() like calls in libtraceevent (Namyung Kim)
   - Start carving out symbol parsing routines (perf, just moving
     routines to topic files in tools/lib/symbol/, tools that want to
     use it need to integrate it directly, ie no
     tools/lib/symbol/Makefile is provided (Arnaldo Carvalho de Melo)
   - Assorted refactoring patches, moving code around and adding utility
     evlist methods that will be used in the IPT patchset (Adrian
     Hunter)
   - Assorted mmap_pages handling fixes (Adrian Hunter)
   - Several man pages typo fixes (Dongsheng Yang)
   - Get rid of several die() calls in libtraceevent (Namhyung Kim)
   - Use basename() in a more robust way, to avoid problems related to
     different system library implementations for that function
     (Stephane Eranian)
   - Remove open coded management of short_name_allocated member (Adrian
     Hunter)
   - Several cleanups in the "dso" methods, constifying some parameters
     and renaming some fields to clarify its purpose (Arnaldo Carvalho
     de Melo)
   - Add per-feature check flags, fixing libunwind related build
     problems on some architectures (Jean Pihet)
   - Do not disable source line lookup just because of one failure.
     (Adrian Hunter)
   - Several 'perf kvm' man page corrections (Dongsheng Yang)
   - Correct the message in feature-libnuma checking, swowing the right
     devel package names for various distros (Dongsheng Yang)
   - Polish 'readn()' function and introduce its counterpart,
     'writen()' (Jiri Olsa)
   - Start moving timechart state from global variables to a 'perf_tool'
     derived 'timechart' struct (Arnaldo Carvalho de Melo)

  ... and lots of fixes and improvements I forgot to list"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits)
  perf tools: Remove unnecessary callchain cursor state restore on unmatch
  perf callchain: Spare double comparison of callchain first entry
  perf tools: Do proper comm override error handling
  perf symbols: Export elf_section_by_name and reuse
  perf probe: Release all dynamically allocated parameters
  perf probe: Release allocated probe_trace_event if failed
  perf tools: Add 'build-test' make target
  tools lib traceevent: Unregister handler when xen plugin is unloaded
  tools lib traceevent: Unregister handler when scsi plugin is unloaded
  tools lib traceevent: Unregister handler when jbd2 plugin is is unloaded
  tools lib traceevent: Unregister handler when cfg80211 plugin is unloaded
  tools lib traceevent: Unregister handler when mac80211 plugin is unloaded
  tools lib traceevent: Unregister handler when sched_switch plugin is unloaded
  tools lib traceevent: Unregister handler when kvm plugin is unloaded
  tools lib traceevent: Unregister handler when kmem plugin is unloaded
  tools lib traceevent: Unregister handler when hrtimer plugin is unloaded
  tools lib traceevent: Unregister handler when function plugin is unloaded
  tools lib traceevent: Add pevent_unregister_print_function()
  tools lib traceevent: Add pevent_unregister_event_handler()
  tools lib traceevent: fix pointer-integer size mismatch
  ...
2014-01-20 10:28:30 -08:00
Linus Torvalds
897aea303f Merge branch 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core debug changes from Ingo Molnar:
 "Currently there are two methods to set the panic_timeout: via
  'panic=X' boot commandline option, or via /proc/sys/kernel/panic.

  This tree adds a third panic_timeout configuration method:
  configuration via Kconfig, via CONFIG_PANIC_TIMEOUT=X - useful to
  distros that generally want their kernel defaults to come with the
  .config.

  CONFIG_PANIC_TIMEOUT defaults to 0, which was the previous default
  value of panic_timeout.

  Doing that unearthed a few arch trickeries regarding arch-special
  panic_timeout values and related complications - hopefully all
  resolved to the satisfaction of everyone"

* 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  powerpc: Clean up panic_timeout usage
  MIPS: Remove panic_timeout settings
  panic: Make panic_timeout configurable
2014-01-20 10:22:12 -08:00
Ingo Molnar
860fc2f264 Merge branch 'perf/urgent' into perf/core
Pick up the latest fixes, refresh the development tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:33:30 +01:00
Rafael J. Wysocki
1c2042c83a powerpc/eeh: Use global PCI rescan-remove locking
Race conditions are theoretically possible between the PCI device addition
and removal in the PPC64 PCI error recovery driver and the generic PCI bus
rescan and device removal that can be triggered via sysfs.

To avoid those race conditions make PPC64 PCI error recovery driver use
global PCI rescan-remove locking around PCI device addition and removal.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-15 10:31:29 -07:00
Gavin Shan
7e4e7867b1 powerpc/eeh: Handle multiple EEH errors
For one PCI error relevant OPAL event, we possibly have multiple
EEH errors for that. For example, multiple frozen PEs detected on
different PHBs. Unfortunately, we didn't cover the case. The patch
enumarates the return value from eeh_ops::next_error() and change
eeh_handle_special_event() and eeh_ops::next_error() to handle all
existing EEH errors.

As Ben pointed out, we needn't list_for_each_entry_safe() since we
are not deleting any PHB from the hose_list and the EEH serialized
lock should be held while purging EEH events. The patch covers those
suggestions as well.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 17:18:58 +11:00
Benjamin Herrenschmidt
fac515db45 Merge remote-tracking branch 'scott/next' into next
Freescale updates from Scott:

<<
Highlights include 32-bit booke relocatable support, e6500 hardware
tablewalk support, various e500 SPE fixes, some new/revived boards, and
e6500 deeper idle and altivec powerdown modes.
>>
2014-01-15 14:22:35 +11:00
Paul Mackerras
3ac8ff1c47 powerpc: Fix transactional FP/VMX/VSX unavailable handlers
Currently, if a process starts a transaction and then takes an
exception because the FPU, VMX or VSX unit is unavailable to it,
we end up corrupting any FP/VMX/VSX state that was valid before
the interrupt.  For example, if the process starts a transaction
with the FPU available to it but VMX unavailable, and then does
a VMX instruction inside the transaction, the FP state gets
corrupted.

Loading up the desired state generally involves doing a reclaim
and a recheckpoint.  To avoid corrupting already-valid state, we have
to be careful not to reload that state from the thread_struct
between the reclaim and the recheckpoint (since the thread_struct
values are stale by now), and we have to reload that state from
the transact_fp/vr arrays after the recheckpoint to get back the
current transactional values saved there by the reclaim.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:59:14 +11:00
Paul Mackerras
d31626f70b powerpc: Don't corrupt transactional state when using FP/VMX in kernel
Currently, when we have a process using the transactional memory
facilities on POWER8 (that is, the processor is in transactional
or suspended state), and the process enters the kernel and the
kernel then uses the floating-point or vector (VMX/Altivec) facility,
we end up corrupting the user-visible FP/VMX/VSX state.  This
happens, for example, if a page fault causes a copy-on-write
operation, because the copy_page function will use VMX to do the
copy on POWER8.  The test program below demonstrates the bug.

The bug happens because when FP/VMX state for a transactional process
is stored in the thread_struct, we store the checkpointed state in
.fp_state/.vr_state and the transactional (current) state in
.transact_fp/.transact_vr.  However, when the kernel wants to use
FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(),
which saves the current state in .fp_state/.vr_state.  Furthermore,
when we return to the user process we return with FP/VMX/VSX
disabled.  The next time the process uses FP/VMX/VSX, we don't know
which set of state (the current register values, .fp_state/.vr_state,
or .transact_fp/.transact_vr) we should be using, since we have no
way to tell if we are still in the same transaction, and if not,
whether the previous transaction succeeded or failed.

Thus it is necessary to strictly adhere to the rule that if FP has
been enabled at any point in a transaction, we must keep FP enabled
for the user process with the current transactional state in the
FP registers, until we detect that it is no longer in a transaction.
Similarly for VMX; once enabled it must stay enabled until the
process is no longer transactional.

In order to keep this rule, we add a new thread_info flag which we
test when returning from the kernel to userspace, called TIF_RESTORE_TM.
This flag indicates that there is FP/VMX/VSX state to be restored
before entering userspace, and when it is set the .tm_orig_msr field
in the thread_struct indicates what state needs to be restored.
The restoration is done by restore_tm_state().  The TIF_RESTORE_TM
bit is set by new giveup_fpu/altivec_maybe_transactional helpers,
which are called from enable_kernel_fp/altivec, giveup_vsx, and
flush_fp/altivec_to_thread instead of giveup_fpu/altivec.

The other thing to be done is to get the transactional FP/VMX/VSX
state from .fp_state/.vr_state when doing reclaim, if that state
has been saved there by giveup_fpu/altivec_maybe_transactional.
Having done this, we set the FP/VMX bit in the thread's MSR after
reclaim to indicate that that part of the state is now valid
(having been reclaimed from the processor's checkpointed state).

Finally, in the signal handling code, we move the clearing of the
transactional state bits in the thread's MSR a bit earlier, before
calling flush_fp_to_thread(), so that we don't unnecessarily set
the TIF_RESTORE_TM bit.

This is the test program:

/* Michael Neuling 4/12/2013
 *
 * See if the altivec state is leaked out of an aborted transaction due to
 * kernel vmx copy loops.
 *
 *   gcc -m64 htm_vmxcopy.c -o htm_vmxcopy
 *
 */

/* We don't use all of these, but for reference: */

int main(int argc, char *argv[])
{
	long double vecin = 1.3;
	long double vecout;
	unsigned long pgsize = getpagesize();
	int i;
	int fd;
	int size = pgsize*16;
	char tmpfile[] = "/tmp/page_faultXXXXXX";
	char buf[pgsize];
	char *a;
	uint64_t aborted = 0;

	fd = mkstemp(tmpfile);
	assert(fd >= 0);

	memset(buf, 0, pgsize);
	for (i = 0; i < size; i += pgsize)
		assert(write(fd, buf, pgsize) == pgsize);

	unlink(tmpfile);

	a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
	assert(a != MAP_FAILED);

	asm __volatile__(
		"lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value
		TBEGIN
		"beq	3f ;"
		TSUSPEND
		"xxlxor 40,40,40 ; " // set 40 to 0
		"std	5, 0(%[map]) ;" // cause kernel vmx copy page
		TABORT
		TRESUME
		TEND
		"li	%[res], 0 ;"
		"b	5f ;"
		"3: ;" // Abort handler
		"li	%[res], 1 ;"
		"5: ;"
		"stxvd2x 40,0,%[vecoutptr] ; "
		: [res]"=r"(aborted)
		: [vecinptr]"r"(&vecin),
		  [vecoutptr]"r"(&vecout),
		  [map]"r"(a)
		: "memory", "r0", "r3", "r4", "r5", "r6", "r7");

	if (aborted && (vecin != vecout)){
		printf("FAILED: vector state leaked on abort %f != %f\n",
		       (double)vecin, (double)vecout);
		exit(1);
	}

	munmap(a, size);

	close(fd);

	printf("PASSED!\n");
	return 0;
}

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:59:11 +11:00
Benjamin Herrenschmidt
0215f7d8c5 powerpc: Fix races with irq_work
If we set irq_work on a processor and immediately afterward, before the
irq work has a chance to be processed, we change the decrementer value,
we can seriously delay the handling of that irq_work.

Fix it by checking in a few places for pending irq work, first before
changing the decrementer in decrementer_set_next_event() and after
changing it in the same function and in timer_interrupt().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:59:03 +11:00
Mahesh Salgaonkar
30c826358d Move precessing of MCE queued event out from syscall exit path.
Huge Dickins reported an issue that b5ff4211a8
"powerpc/book3s: Queue up and process delayed MCE events" breaks the
PowerMac G5 boot. This patch fixes it by moving the mce even processing
away from syscall exit, which was wrong to do that in first place, and
using irq work framework to delay processing of mce event.

Reported-by: Hugh Dickins <hughd@google.com
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:58:59 +11:00
Gavin Shan
0c4b9e27b0 powerpc/iommu: Don't detach device without IOMMU group
Some devices, for example PCI root port, don't have IOMMU table and
group. We needn't detach them from their IOMMU group. Otherwise, it
potentially incurs kernel crash because of referring NULL IOMMU group
as following backtrace indicates:

  .iommu_group_remove_device+0x74/0x1b0
  .iommu_bus_notifier+0x94/0xb4
  .notifier_call_chain+0x78/0xe8
  .__blocking_notifier_call_chain+0x7c/0xbc
  .blocking_notifier_call_chain+0x38/0x48
  .device_del+0x50/0x234
  .pci_remove_bus_device+0x88/0x138
  .pci_stop_and_remove_bus_device+0x2c/0x40
  .pcibios_remove_pci_devices+0xcc/0xfc
  .pcibios_remove_pci_devices+0x3c/0xfc

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:58:33 +11:00
Gavin Shan
f26c7a035b powerpc/eeh: Hotplug improvement
When EEH error comes to one specific PCI device before its driver
is loaded, we will apply hotplug to recover the error. During the
plug time, the PCI device will be probed and its driver is loaded.
Then we wrongly calls to the error handlers if the driver supports
EEH explicitly.

The patch intends to fix by introducing flag EEH_DEV_NO_HANDLER and
set it before we remove the PCI device. In turn, we can avoid wrongly
calls the error handlers of the PCI device after its driver loaded.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:58:29 +11:00
Gavin Shan
1d350544d5 powerpc/eeh: Add restore_config operation
After reset on the specific PE or PHB, we never configure AER
correctly on PowerNV platform. We needn't care it on pSeries
platform. The patch introduces additional EEH operation eeh_ops::
restore_config() so that we have chance to configure AER correctly
for PowerNV platform.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:46:46 +11:00
Paul Gortmaker
c141611fb1 powerpc: Delete non-required instances of include <linux/init.h>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.  Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

The one instance where we add an include for init.h covers off
a case where that file was implicitly getting it from another
header which itself didn't need it.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:46:44 +11:00
Linus Torvalds
a6da83f982 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fix from Ben Herrenschmidt:
 "Here's one regression fix for 3.13 that I would appreciate if you
  could still pull in.  It was an "interesting" one to debug, basically
  it's an old bug that got somewhat "exposed" by new code breaking the
  boot on PA Semi boards (yes, it does appear that some people are still
  using these!)"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Check return value of instance-to-package OF call
2014-01-13 10:59:05 +07:00
Benjamin Herrenschmidt
10348f5976 powerpc: Check return value of instance-to-package OF call
On PA-Semi firmware, the instance-to-package callback doesn't seem
to be implemented. We didn't check for error, however, thus
subsequently passed the -1 value returned into stdout_node to
thins like prom_getprop etc...

Thus caused the firmware to load values around 0 (physical) internally
as node structures. It somewhat "worked" as long as we had a NULL in the
right place (address 8) at the beginning of the kernel, we didn't "see"
the bug. But commit 5c0484e25e
"powerpc: Endian safe trampoline" changed the kernel entry point causing
that old bug to now cause a crash early during boot.

This fixes booting on PA-Semi board by properly checking the return
value from instance-to-package.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Olof Johansson <olof@lixom.net>
---
2014-01-13 09:49:17 +11:00
Diana Craciun
ed2ddc56e7 powerpc: Replaced tlbilx with tlbwe in the initialization code
On Freescale e6500 cores EPCR[DGTMI] controls whether guest supervisor
state can execute TLB management instructions. If EPCR[DGTMI]=0
tlbwe and tlbilx are allowed to execute normally in the guest state.

A hypervisor may choose to virtualize TLB1 and for this purpose it
may use IPROT to protect the entries for being invalidated by the
guest. However, because tlbwe and tlbilx execution in the guest state
are sharing the same bit, it is not possible to have a scenario where
tlbwe is allowed to be executed in guest state and tlbilx traps. When
guest TLB management instructions are allowed to be executed in guest
state the guest cannot use tlbilx to invalidate TLB1 guest entries.

Linux is using tlbilx in the boot code to invalidate the temporary
entries it creates when initializing the MMU. The patch is replacing
the usage of tlbilx in initialization code with tlbwe with VALID bit
cleared.

Linux is also using tlbilx in other contexts (like huge pages or
indirect entries) but removing the tlbilx from the initialization code
offers the possibility to have scenarios under hypervisor which are
not using huge pages or indirect entries.

Signed-off-by: Diana Craciun <Diana.Craciun@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-10 17:34:04 -06:00
Scott Wood
28efc35fe6 powerpc/e6500: TLB miss handler with hardware tablewalk support
There are a few things that make the existing hw tablewalk handlers
unsuitable for e6500:

 - Indirect entries go in TLB1 (though the resulting direct entries go in
   TLB0).

 - It has threads, but no "tlbsrx." -- so we need a spinlock and
   a normal "tlbsx".  Because we need this lock, hardware tablewalk
   is mandatory on e6500 unless we want to add spinlock+tlbsx to
   the normal bolted TLB miss handler.

 - TLB1 has no HES (nor next-victim hint) so we need software round robin
   (TODO: integrate this round robin data with hugetlb/KVM)

 - The existing tablewalk handlers map half of a page table at a time,
   because IBM hardware has a fixed 1MiB indirect page size.  e6500
   has variable size indirect entries, with a minimum of 2MiB.
   So we can't do the half-page indirect mapping, and even if we
   could it would be less efficient than mapping the full page.

 - Like on e5500, the linear mapping is bolted, so we don't need the
   overhead of supporting nested tlb misses.

Note that hardware tablewalk does not work in rev1 of e6500.
We do not expect to support e6500 rev1 in mainline Linux.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
2014-01-09 17:52:19 -06:00
Kevin Hao
0be7d969b0 powerpc/fsl_booke: smp support for booting a relocatable kernel above 64M
When booting above the 64M for a secondary cpu, we also face the
same issue as the boot cpu that the PAGE_OFFSET map two different
physical address for the init tlb and the final map. So we have to use
switch_to_as1/restore_to_as0 between the conversion of these two
maps. When restoring to as0 for a secondary cpu, we only need to
return to the caller. So add a new parameter for function
restore_to_as0 for this purpose.

Use LOAD_REG_ADDR_PIC to get the address of variables which may
be used before we set the final map in cams for the secondary cpu.
Move the setting of cams a bit earlier in order to avoid the
unnecessary using of LOAD_REG_ADDR_PIC.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09 17:52:18 -06:00
Kevin Hao
7d2471f9fa powerpc/fsl_booke: make sure PAGE_OFFSET map to memstart_addr for relocatable kernel
This is always true for a non-relocatable kernel. Otherwise the kernel
would get stuck. But for a relocatable kernel, it seems a little
complicated. When booting a relocatable kernel, we just align the
kernel start addr to 64M and map the PAGE_OFFSET from there. The
relocation will base on this virtual address. But if this address
is not the same as the memstart_addr, we will have to change the
map of PAGE_OFFSET to the real memstart_addr and do another relocation
again.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
[scottwood@freescale.com: make offset long and non-negative in simple case]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09 17:52:17 -06:00
Kevin Hao
b27652dd21 powerpc: introduce early_get_first_memblock_info
For a relocatable kernel since it can be loaded at any place, there
is no any relation between the kernel start addr and the memstart_addr.
So we can't calculate the memstart_addr from kernel start addr. And
also we can't wait to do the relocation after we get the real
memstart_addr from device tree because it is so late. So introduce
a new function we can use to get the first memblock address and size
in a very early stage (before machine_init).

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09 17:52:17 -06:00
Kevin Hao
78a235efdc powerpc/fsl_booke: set the tlb entry for the kernel address in AS1
We use the tlb1 entries to map low mem to the kernel space. In the
current code, it assumes that the first tlb entry would cover the
kernel image. But this is not true for some special cases, such as
when we run a relocatable kernel above the 64M or set
CONFIG_KERNEL_START above 64M. So we choose to switch to address
space 1 before setting these tlb entries.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09 17:52:16 -06:00
Kevin Hao
dd189692d4 powerpc: enable the relocatable support for the fsl booke 32bit kernel
This is based on the codes in the head_44x.S. The difference is that
the init tlb size we used is 64M. With this patch we can only load the
kernel at address between memstart_addr ~ memstart_addr + 64M. We will
fix this restriction in the following patches.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09 17:52:16 -06:00
Kevin Hao
99739611e8 powerpc/fsl_booke: introduce get_phys_addr function
Move the codes which translate a effective address to physical address
to a separate function. So it can be reused by other code.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09 17:52:15 -06:00
Kevin Hao
7c732cba3d powerpc/fsl_booke: protect the access to MAS7
The e500v1 doesn't implement the MAS7, so we should avoid to access
this register on that implementations. In the current kernel, the
access to MAS7 are protected by either CONFIG_PHYS_64BIT or
MMU_FTR_BIG_PHYS. Since some code are executed before the code
patching, we have to use CONFIG_PHYS_64BIT in these cases.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09 17:52:14 -06:00
Wang Dongsheng
a7189483f0 powerpc/85xx: add sysfs for pw20 state and altivec idle
Add a sys interface to enable/diable pw20 state or altivec idle, and
control the wait entry time.

Enable/Disable interface:
    0, disable. 1, enable.
    /sys/devices/system/cpu/cpuX/pw20_state
    /sys/devices/system/cpu/cpuX/altivec_idle

Set wait time interface:(Nanosecond)
    /sys/devices/system/cpu/cpuX/pw20_wait_time
    /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
Example: Base on TBfreq is 41MHZ.
    1~48(ns): TB[63]
    49~97(ns): TB[62]
    98~195(ns): TB[61]
    196~390(ns): TB[60]
    391~780(ns): TB[59]
    781~1560(ns): TB[58]
    ...

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
[scottwood@freescale.com: change ifdef]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09 17:51:38 -06:00
Wang Dongsheng
1d47ddf7c3 powerpc/85xx: add hardware automatically enter pw20 state
Using hardware features make core automatically enter PW20 state.
Set a TB count to hardware, the effective count begins when PW10
is entered. When the effective period has expired, the core will
proceed from PW10 to PW20 if no exit conditions have occurred during
the period.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07 19:40:28 -06:00
Wang Dongsheng
202e059ce3 powerpc/85xx: add hardware automatically enter altivec idle state
Each core's AltiVec unit may be placed into a power savings mode
by turning off power to the unit. Core hardware will automatically
power down the AltiVec unit after no AltiVec instructions have
executed in N cycles. The AltiVec power-control is triggered by hardware.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07 19:39:48 -06:00
Scott Wood
b58a7bd6df powerpc/fsl-booke: Use SPRN_SPRGn rather than mfsprg/mtsprg
This fixes a build break that was probably introduced with the removal
of -Wa,-me500 (commit f49596a4cf), where
the assembler refuses to recognize SPRG4-7 with a generic PPC target.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Dongsheng Wang <dongsheng.wang@freescale.com>
Cc: Anton Vorontsov <avorontsov@mvista.com>
Reviewed-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Tested-by: Wang Dongsheng <dongsheng.wang@freescale.com>
2014-01-07 19:06:03 -06:00
Joseph Myers
640e922501 powerpc: fix exception clearing in e500 SPE float emulation
The e500 SPE floating-point emulation code clears existing exceptions
(__FPU_FPSCR &= ~FP_EX_MASK;) before ORing in the exceptions from the
emulated operation.  However, these exception bits are the "sticky",
cumulative exception bits, and should only be cleared by the user
program setting SPEFSCR, not implicitly by any floating-point
instruction (whether executed purely by the hardware or emulated).
The spurious clearing of these bits shows up as missing exceptions in
glibc testing.

Fixing this, however, is not as simple as just not clearing the bits,
because while the bits may be from previous floating-point operations
(in which case they should not be cleared), the processor can also set
the sticky bits itself before the interrupt for an exception occurs,
and this can happen in cases when IEEE 754 semantics are that the
sticky bit should not be set.  Specifically, the "invalid" sticky bit
is set in various cases with non-finite operands, where IEEE 754
semantics do not involve raising such an exception, and the
"underflow" sticky bit is set in cases of exact underflow, whereas
IEEE 754 semantics are that this flag is set only for inexact
underflow.  Thus, for correct emulation the kernel needs to know the
setting of these two sticky bits before the instruction being
emulated.

When a floating-point operation raises an exception, the kernel can
note the state of the sticky bits immediately afterwards.  Some
<fenv.h> functions that affect the state of these bits, such as
fesetenv and feholdexcept, need to use prctl with PR_GET_FPEXC and
PR_SET_FPEXC anyway, and so it is natural to record the state of those
bits during that call into the kernel and so avoid any need for a
separate call into the kernel to inform it of a change to those bits.
Thus, the interface I chose to use (in this patch and the glibc port)
is that one of those prctl calls must be made after any userspace
change to those sticky bits, other than through a floating-point
operation that traps into the kernel anyway.  feclearexcept and
fesetexceptflag duly make those calls, which would not be required
were it not for this issue.

The previous EGLIBC port, and the uClibc code copied from it, is
fundamentally broken as regards any use of prctl for floating-point
exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its
prctl calls (and did various worse things, such as passing a pointer
when prctl expected an integer).  If you avoid anything where prctl is
used, the clearing of sticky bits still means it will never give
anything approximating correct exception semantics with existing
kernels.  I don't believe the patch makes things any worse for
existing code that doesn't try to inform the kernel of changes to
sticky bits - such code may get incorrect exceptions in some cases,
but it would have done so anyway in other cases.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07 18:32:21 -06:00
Mihai Caraman
228b1a4730 powerpc/booke64: Add LRAT error exception handler
LRAT (Logical to Real Address Translation) present in MMU v2 provides hardware
translation from a logical page number (LPN) to a real page number (RPN) when
tlbwe is executed by a guest or when a page table translation occurs from a
guest virtual address.

Add LRAT error exception handler to Booke3E 64-bit kernel and the basic KVM
handler to avoid build breakage. This is a prerequisite for KVM LRAT support
that will follow.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07 18:15:29 -06:00
Linus Torvalds
6e4c61968b Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
 "A bit more endian problems found during testing of 3.13 and a few
  other simple fixes and regressions fixes"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix alignment of secondary cpu spin vars
  powerpc: Align p_end
  powernv/eeh: Add buffer for P7IOC hub error data
  powernv/eeh: Fix possible buffer overrun in ioda_eeh_phb_diag()
  powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian
  powerpc: Make unaligned accesses endian-safe for powerpc
  powerpc: Fix bad stack check in exception entry
  powerpc/512x: dts: disable MPC5125 usb module
  powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node (5125)
2013-12-30 10:22:57 -08:00
Benjamin Herrenschmidt
dece8ada99 Merge branch 'merge' into next
Merge a pile of fixes that went into the "merge" branch (3.13-rc's) such
as Anton Little Endian fixes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 15:19:31 +11:00
Anton Blanchard
a68c33f359 powerpc: Fix endian issues in power7/8 machine check handler
The SLB save area is shared with the hypervisor and is defined
as big endian, so we need to byte swap on little endian builds.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 14:51:09 +11:00
Alistair Popple
d084775738 powerpc/iommu: Update the generic code to use dynamic iommu page sizes
This patch updates the generic iommu backend code to use the
it_page_shift field to determine the iommu page size instead of
using hardcoded values.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 14:17:19 +11:00
Alistair Popple
3a553170d3 powerpc/iommu: Add it_page_shift field to determine iommu page size
This patch adds a it_page_shift field to struct iommu_table and
initiliases it to 4K for all platforms.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 14:17:13 +11:00
Alistair Popple
e589a4404f powerpc/iommu: Update constant names to reflect their hardcoded page size
The powerpc iommu uses a hardcoded page size of 4K. This patch changes
the name of the IOMMU_PAGE_* macros to reflect the hardcoded values. A
future patch will use the existing names to support dynamic page
sizes.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 14:17:06 +11:00
Mahesh Salgaonkar
4e243b79b0 powerpc: Fix "attempt to move .org backwards" error
With recent machine check patch series changes, The exception vectors
starting from 0x4300 are now overflowing with allyesconfig. Fix that by
moving machine_check_common and machine_check_handle_early code out of
that region to make enough room for exception vector area.

Fixes this build error reportes by Stephen:

arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:958: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:959: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:983: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:984: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1003: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1013: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1014: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1015: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1016: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1017: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1018: Error: attempt to move .org backwards

[Moved the code further down as it introduced link errors due to too long
 relative branches to the masked interrupts handlers from the exception
 prologs. Also removed the useless feature section --BenH
]

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 14:16:30 +11:00
Olof Johansson
7d4151b509 powerpc: Fix alignment of secondary cpu spin vars
Commit 5c0484e25e ('powerpc: Endian safe trampoline') resulted in
losing proper alignment of the spinlock variables used when booting
secondary CPUs, causing some quite odd issues with failing to boot on
PA Semi-based systems.

This showed itself on ppc64_defconfig, but not on pasemi_defconfig,
so it had gone unnoticed when I initially tested the LE patch set.

Fix is to add explicit alignment instead of relying on good luck. :)

[ It appears that there is a different issue with PA Semi systems
  however this fix is definitely correct so applying anyway -- BenH
]

Fixes: 5c0484e25e ('powerpc: Endian safe trampoline')
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=67811
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 14:02:34 +11:00
Anton Blanchard
286e4f90a7 powerpc: Align p_end
p_end is an 8 byte value embedded in the text section. This means it
is only 4 byte aligned when it should be 8 byte aligned. Fix this
by adding an explicit alignment.

This fixes an issue where POWER7 little endian builds with
CONFIG_RELOCATABLE=y fail to boot.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 14:02:33 +11:00
Yinghai Lu
fc2798502f PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_dev
These interfaces:

  pcibios_resource_to_bus(struct pci_dev *dev, *bus_region, *resource)
  pcibios_bus_to_resource(struct pci_dev *dev, *resource, *bus_region)

took a pci_dev, but they really depend only on the pci_bus.  And we want to
use them in resource allocation paths where we have the bus but not a
device, so this patch converts them to take the pci_bus instead of the
pci_dev:

  pcibios_resource_to_bus(struct pci_bus *bus, *bus_region, *resource)
  pcibios_bus_to_resource(struct pci_bus *bus, *resource, *bus_region)

In fact, with standard PCI-PCI bridges, they only depend on the host
bridge, because that's the only place address translation occurs, but
we aren't going that far yet.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-12-21 10:06:10 -07:00
Linus Torvalds
46dd0835ca The PPC folks had a large amount of changes queued for 3.13, and now they
are fixing the bugs.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJStImwAAoJEBvWZb6bTYbyvR0P/2tH/IuHe7xDaXyWy3JVlmzF
 CmdnOLTPSlQjpLv7BRQ0K5TAU6DZWisRnXGUp1e8+Do4Ho9OuZzJugCr1Lt/4kTA
 kZT2xWP5U4AbLTjoxlVckybk4Ci0oP+iZGqV8d95NurEb1oR1halAZ+7BTqujwch
 jGSd3gk6mVN4np09Bj06P0nddttJubIki1VeZyQUFILqAIkzWv4qyL/awibYCFQA
 +jHEcND8b5D9bkMniMojXaR0BGIdMZOKWGvKUdxbth+FbZgPqzOLwXoCVM5EmuuH
 9aIee65y34+WXT4EHIou5Q4HyDxuKpciv1A7UhwLxEcfgUklvHOV/nZeQAKFIBIt
 uabgHO/Psj6i9qSCuAJX8xYgB+BmktE8d+/r1XmIgQ/gPYRumOl5BVJo6OOIaGrF
 M6cgccPD1dnMzFt4ccxoM1OhJivh30XfHAKKco7i8DhwcHh1cYcYlDqPEOy3wBA5
 i4n99N/5gCSIB87y1EjvDw1CMiJ5PzuialvscH/a4knL9JFuukKS6O+C2z5LULKN
 TixvTZMZWuHdNWezahcjSpbDeqWPBdB8RIEbGi2xBAHU2hsuxV2acjhdQ0vVgP48
 qo8lLiXv4W030y9H+iflg5R6b3tJ5dmNKZN1fYiwhs4ijgL3wOu8iWia57sQFdyD
 Nb+X/MeeD+tD5JYVyqvr
 =k+i/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "The PPC folks had a large amount of changes queued for 3.13, and now
  they are fixing the bugs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Book3S HV: Don't drop low-order page address bits
  powerpc: book3s: kvm: Don't abuse host r2 in exit path
  powerpc/kvm/booke: Fix build break due to stack frame size warning
  KVM: PPC: Book3S: PR: Enable interrupts earlier
  KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy
  KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
  KVM: PPC: Book3S: PR: Don't clobber our exit handler id
  powerpc: kvm: fix rare but potential deadlock scene
  KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
  KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
  KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
  KVM: PPC: Book3S HV: Fix physical address calculations
2013-12-20 12:26:54 -08:00
Paolo Bonzini
5e6d26cf48 Patch queue for 3.13 - 2013-12-18
This fixes some grave issues we've only found after 3.13-rc1:
 
   - Make the modularized HV/PR book3s kvm work well as modules
   - Fix some race conditions
   - Fix compilation with certain compilers (booke)
   - Fix THP for book3s_hv
   - Fix preemption for book3s_pr
 
 Alexander Graf (4):
       KVM: PPC: Book3S: PR: Don't clobber our exit handler id
       KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
       KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy
       KVM: PPC: Book3S: PR: Enable interrupts earlier
 
 Aneesh Kumar K.V (1):
       powerpc: book3s: kvm: Don't abuse host r2 in exit path
 
 Paul Mackerras (5):
       KVM: PPC: Book3S HV: Fix physical address calculations
       KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
       KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
       KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
       KVM: PPC: Book3S HV: Don't drop low-order page address bits
 
 Scott Wood (1):
       powerpc/kvm/booke: Fix build break due to stack frame size warning
 
 pingfan liu (1):
       powerpc: kvm: fix rare but potential deadlock scene
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJSscYdAAoJECszeR4D/txgcqAP/1hztHJ+QVwOovEmHSkd6s9G
 A9Ib48U3r/YX5Xugp3VeJQEoSvvRQQDvi1lcu20YO7HRFL3AZBnq2/EgXaMSfu0s
 kKWZiadlpYNkSfjcipuia1yu2auAVWyGTMjuwWhKSH7WJnTrQD17vTNaOhnfrEvY
 wfUTCux7JSUlDnAuNBPHjtWgPsNXZ9U5ODThLVKMuXUceFxse/pRER+RM8/sGwGD
 h5uQicwPAD4bp2epg7zG7NgFs9np1U/WZvwHn3LGlb/eHJW0lB/lqdCFMtBFaDiA
 3GS3AOIJCtWhEPzghUJMyId8Yc7E5Bi27ur+8fOKHddbM+NFR154hTzoOuVZgvmq
 HdNhcjTDfhimKl+aPaQyFpnePBLk2hZ5zEyxr5eMocyvZ+uRL7ghhUBjnNFNXk1k
 FAlzyEWXirdumN2sS9u9/PUhoETL13yhghxXzDq35/rjWxPuLtjvVlmroQfPI5cl
 0AW5d3G5lEnb/vNo/dUFG8EAxunX26sgaro6XxLA3Y/tZ4691S9mNaeyLv/w4VDS
 T9IcLUIhnpkR6HPkXci1mRrX13GC1uBB74jhBJvgJs91UmgLZN3W3VEcS5ulXxxb
 UoLsDSO1qo2Md2KrRltsRcMJAaAjbbcTzApudpN24d6zMCUxxfnjNW9Q8h2+eaoi
 ST9nIxzK3a9HHnnJ6AsJ
 =kveZ
 -----END PGP SIGNATURE-----

Merge tag 'signed-for-3.13' of git://github.com/agraf/linux-2.6 into kvm-master

Patch queue for 3.13 - 2013-12-18

This fixes some grave issues we've only found after 3.13-rc1:

  - Make the modularized HV/PR book3s kvm work well as modules
  - Fix some race conditions
  - Fix compilation with certain compilers (booke)
  - Fix THP for book3s_hv
  - Fix preemption for book3s_pr

Alexander Graf (4):
      KVM: PPC: Book3S: PR: Don't clobber our exit handler id
      KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
      KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy
      KVM: PPC: Book3S: PR: Enable interrupts earlier

Aneesh Kumar K.V (1):
      powerpc: book3s: kvm: Don't abuse host r2 in exit path

Paul Mackerras (5):
      KVM: PPC: Book3S HV: Fix physical address calculations
      KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
      KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
      KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
      KVM: PPC: Book3S HV: Don't drop low-order page address bits

Scott Wood (1):
      powerpc/kvm/booke: Fix build break due to stack frame size warning

pingfan liu (1):
      powerpc: kvm: fix rare but potential deadlock scene
2013-12-20 19:13:58 +01:00
Aneesh Kumar K.V
36e7bb3802 powerpc: book3s: kvm: Don't abuse host r2 in exit path
We don't use PACATOC for PR. Avoid updating HOST_R2 with PR
KVM mode when both HV and PR are enabled in the kernel. Without this we
get the below crash

(qemu)
Unable to handle kernel paging request for data at address 0xffffffffffff8310
Faulting instruction address: 0xc00000000001d5a4
cpu 0x2: Vector: 300 (Data Access) at [c0000001dc53aef0]
    pc: c00000000001d5a4: .vtime_delta.isra.1+0x34/0x1d0
    lr: c00000000001d760: .vtime_account_system+0x20/0x60
    sp: c0000001dc53b170
   msr: 8000000000009032
   dar: ffffffffffff8310
 dsisr: 40000000
  current = 0xc0000001d76c62d0
  paca    = 0xc00000000fef1100   softe: 0        irq_happened: 0x01
    pid   = 4472, comm = qemu-system-ppc
enter ? for help
[c0000001dc53b200] c00000000001d760 .vtime_account_system+0x20/0x60
[c0000001dc53b290] c00000000008d050 .kvmppc_handle_exit_pr+0x60/0xa50
[c0000001dc53b340] c00000000008f51c kvm_start_lightweight+0xb4/0xc4
[c0000001dc53b510] c00000000008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0
[c0000001dc53b9e0] c00000000008341c .kvmppc_vcpu_run+0x2c/0x40
[c0000001dc53ba50] c000000000080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0
[c0000001dc53bae0] c00000000007b4c8 .kvm_vcpu_ioctl+0x478/0x730
[c0000001dc53bca0] c0000000002140cc .do_vfs_ioctl+0x4ac/0x770
[c0000001dc53bd80] c0000000002143e8 .SyS_ioctl+0x58/0xb0
[c0000001dc53be30] c000000000009e58 syscall_exit+0x0/0x98

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-18 11:29:31 +01:00
Ingo Molnar
fe361cfcf4 Linux 3.13-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSrhGrAAoJEHm+PkMAQRiGsNoH/jIK3CsQ2lbW7yRLXmfgtbzz
 i2Kep6D4SDvmaLpLYOVC8xNYTiE8jtTbSXHomwP5wMZ63MQDhBfnEWsEWqeZ9+D9
 3Q46p0QWuoBgYu2VGkoxTfygkT6hhSpwWIi3SeImbY4fg57OHiUil/+YGhORM4Qc
 K4549OCTY3sIrgmWL77gzqjRUo+pQ4C73NKqZ3+5nlOmYBZC1yugk8mFwEpQkwhK
 4NRNU760Fo+XIht/bINqRiPMddzC15p0mxvJy3cDW8bZa1tFSS9SB7AQUULBbcHL
 +2dFlFOEb5SV1sNiNPrJ0W+h2qUh2e7kPB0F8epaBppgbwVdyQoC2u4uuLV2ZN0=
 =lI2r
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc4' into perf/core

Merge Linux 3.13-rc4, to refresh this branch with the latest fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16 14:51:32 +01:00
Anton Blanchard
a29e30efa3 powerpc: Fix endian issues in crash dump code
A couple more device tree properties that need byte swapping.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13 15:48:39 +11:00
Anton Blanchard
f8a1883a83 powerpc: Fix topology core_id endian issue on LE builds
cpu_to_core_id() is missing a byteswap:

cat /sys/devices/system/cpu/cpu63/topology/core_id
201326592

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13 15:48:34 +11:00
Anton Blanchard
01666c8ee2 powerpc: Fix endian issue in setup-common.c
During on LE boot we see:

    Partition configured for 1073741824 cpus, operating system maximum is 2048.

Clearly missing a byteswap here.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13 15:48:34 +11:00