Commit Graph

26457 Commits

Author SHA1 Message Date
Linus Torvalds
3dee9fb2a4 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "The main changes in this cycle were:

   - add the 'Corrected Errors Collector' kernel feature which collect
     and monitor correctable errors statistics and will preemptively
     (soft-)offline physical pages that have a suspiciously high error
     count.

   - handle MCE errors during kexec() more gracefully

   - factor out and deprecate the /dev/mcelog driver

   - ... plus misc fixes and cleanpus"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Check MCi_STATUS[MISCV] for usable addr on Intel only
  ACPI/APEI: Use setup_deferrable_timer()
  x86/mce: Update notifier priority check
  x86/mce: Enable PPIN for Knights Landing/Mill
  x86/mce: Do not register notifiers with invalid prio
  x86/mce: Factor out and deprecate the /dev/mcelog driver
  RAS: Add a Corrected Errors Collector
  x86/mce: Rename mce_log to mce_log_buffer
  x86/mce: Rename mce_log()'s argument
  x86/mce: Init some CPU features early
  x86/mce: Handle broadcasted MCE gracefully with kexec
2017-05-01 20:48:33 -07:00
Linus Torvalds
7c8c03bfc7 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main changes in this cycle were:

  Kernel side changes:

   - Kprobes and uprobes changes:
      - Make their trampolines read-only while they are used
      - Make UPROBES_EVENTS default-y which is the distro practice
      - Apply misc fixes and robustization to probe point insertion.

   - add support for AMD IOMMU events

   - extend hw events on Intel Goldmont CPUs

   - ... plus misc fixes and updates.

  Tooling side changes:

   - support s390 jump instructions in perf annotate (Christian
     Borntraeger)

   - vendor hardware events updates (Andi Kleen)

   - add argument support for SDT events in powerpc (Ravi Bangoria)

   - beautify the statx syscall arguments in 'perf trace' (Arnaldo
     Carvalho de Melo)

   - handle inline functions in callchains (Jin Yao)

   - enable sorting by srcline as key (Milian Wolff)

   - add 'brstackinsn' field in 'perf script' to reuse the x86
     instruction decoder used in the Intel PT code to study hot paths to
     samples (Andi Kleen)

   - add PERF_RECORD_NAMESPACES so that the kernel can record
     information required to associate samples to namespaces, helping in
     container problem characterization. (Hari Bathini)

   - allow sorting by symbol_size in 'perf report' and 'perf top'
     (Charles Baylis)

   - in perf stat, make system wide (-a) the default option if no target
     was specified and one of following conditions is met:
      - no workload specified (current behaviour)
      - a workload is specified but all requested events are system wide
        ones, like uncore ones. (Jiri Olsa)

   - ... plus lots of other updates, enhancements, cleanups and fixes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (235 commits)
  perf tools: Fix the code to strip command name
  tools arch x86: Sync cpufeatures.h
  tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel
  tools: Update asm-generic/mman-common.h copy from the kernel
  perf tools: Use just forward declarations for struct thread where possible
  perf tools: Add the right header to obtain PERF_ALIGN()
  perf tools: Remove poll.h and wait.h from util.h
  perf tools: Remove string.h, unistd.h and sys/stat.h from util.h
  perf tools: Remove stale prototypes from builtin.h
  perf tools: Remove string.h from util.h
  perf tools: Remove sys/ioctl.h from util.h
  perf tools: Remove a few more needless includes from util.h
  perf tools: Include sys/param.h where needed
  perf callchain: Move callchain specific routines from util.[ch]
  perf tools: Add compress.h for the *_decompress_to_file() headers
  perf mem: Fix display of data source snoop indication
  perf debug: Move dump_stack() and sighandler_dump_stack() to debug.h
  perf kvm: Make function only used by 'perf kvm' static
  perf tools: Move timestamp routines from util.h to time-utils.h
  perf tools: Move units conversion/formatting routines to separate object
  ...
2017-05-01 20:23:17 -07:00
Linus Torvalds
6dc2cce932 Merge branch 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pul x86/process updates from Ingo Molnar:
 "The main change in this cycle was to add the ARCH_[GET|SET]_CPUID
  prctl() ABI extension to control the availability of the CPUID
  instruction, analogously to the existing PR_GET|SET_TSC ABI that
  controls RDTSC.

  Motivation: the 'rr' user-space record-and-replay execution debugger
  would like to trap and emulate the CPUID instruction - which
  instruction is normally unprivileged.

  Trapping CPUID is possible on IvyBridge and later Intel CPUs - expose
  this hardware capability"

* 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/syscalls/32: Ignore arch_prctl for other architectures
  um/arch_prctl: Fix fallout from x86 arch_prctl() rework
  x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  x86/cpufeature: Detect CPUID faulting support
  x86/syscalls/32: Wire up arch_prctl on x86-32
  x86/arch_prctl: Add do_arch_prctl_common()
  x86/arch_prctl/64: Rename do_arch_prctl() to do_arch_prctl_64()
  x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl()
  x86/arch_prctl: Rename 'code' argument to 'option'
  x86/msr: Rename MISC_FEATURE_ENABLES to MISC_FEATURES_ENABLES
  x86/process: Optimize TIF_NOTSC switch
  x86/process: Correct and optimize TIF_BLOCKSTEP switch
  x86/process: Optimize TIF checks in __switch_to_xtra()
2017-05-01 19:57:58 -07:00
Linus Torvalds
207fb8c304 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - a big round of FUTEX_UNLOCK_PI improvements, fixes, cleanups and
     general restructuring

   - lockdep updates such as new checks for lock_downgrade()

   - introduce the new atomic_try_cmpxchg() locking API and use it to
     optimize refcount code generation

   - ... plus misc fixes, updates and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  MAINTAINERS: Add FUTEX SUBSYSTEM
  futex: Clarify mark_wake_futex memory barrier usage
  futex: Fix small (and harmless looking) inconsistencies
  futex: Avoid freeing an active timer
  rtmutex: Plug preempt count leak in rt_mutex_futex_unlock()
  rtmutex: Fix more prio comparisons
  rtmutex: Fix PI chain order integrity
  sched,tracing: Update trace_sched_pi_setprio()
  sched/rtmutex: Refactor rt_mutex_setprio()
  rtmutex: Clean up
  sched/deadline/rtmutex: Dont miss the dl_runtime/dl_period update
  sched/rtmutex/deadline: Fix a PI crash for deadline tasks
  rtmutex: Deboost before waking up the top waiter
  locking/ww-mutex: Limit stress test to 2 seconds
  locking/atomic: Fix atomic_try_cmpxchg() semantics
  lockdep: Fix per-cpu static objects
  futex: Drop hb->lock before enqueueing on the rtmutex
  futex: Futex_unlock_pi() determinism
  futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
  futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
  ...
2017-05-01 19:36:00 -07:00
Linus Torvalds
3527d3e951 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - another round of rq-clock handling debugging, robustization and
     fixes

   - PELT accounting improvements

   - CPU hotplug related ->cpus_allowed affinity handling fixes all
     around the tree

   - ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  sched/x86: Update reschedule warning text
  crypto: N2 - Replace racy task affinity logic
  cpufreq/sparc-us2e: Replace racy task affinity logic
  cpufreq/sparc-us3: Replace racy task affinity logic
  cpufreq/sh: Replace racy task affinity logic
  cpufreq/ia64: Replace racy task affinity logic
  ACPI/processor: Replace racy task affinity logic
  ACPI/processor: Fix error handling in __acpi_processor_start()
  sparc/sysfs: Replace racy task affinity logic
  powerpc/smp: Replace open coded task affinity logic
  ia64/sn/hwperf: Replace racy task affinity logic
  ia64/salinfo: Replace racy task affinity logic
  workqueue: Provide work_on_cpu_safe()
  ia64/topology: Remove cpus_allowed manipulation
  sched/fair: Move the PELT constants into a generated header
  sched/fair: Increase PELT accuracy for small tasks
  sched/fair: Fix comments
  sched/Documentation: Add 'sched-pelt' tool
  sched/fair: Fix corner case in __accumulate_sum()
  sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags()
  ...
2017-05-01 19:12:53 -07:00
Linus Torvalds
3711c94fd6 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes in this cycle were:

   - move BGRT handling to drivers/acpi so it can be shared between x86
     and ARM

   - bring the EFI stub's initrd and FDT allocation logic in line with
     the latest changes to the arm64 boot protocol

   - improvements and fixes to the EFI stub's command line parsing
     routines

   - randomize the virtual mapping of the UEFI runtime services on
     ARM/arm64

   - ... and other misc enhancements, cleanups and fixes"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub/arm: Don't use TASK_SIZE when randomizing the RT space
  ef/libstub/arm/arm64: Randomize the base of the UEFI rt services region
  efi/libstub/arm/arm64: Disable debug prints on 'quiet' cmdline arg
  efi/libstub: Unify command line param parsing
  efi/libstub: Fix harmless command line parsing bug
  efi/arm32-stub: Allow boot-time allocations in the vmlinux region
  x86/efi: Clean up a minor mistake in comment
  efi/pstore: Return error code (if any) from efi_pstore_write()
  efi/bgrt: Enable ACPI BGRT handling on arm64
  x86/efi/bgrt: Move efi-bgrt handling out of arch/x86
  efi/arm-stub: Round up FDT allocation to mapping size
  efi/arm-stub: Correct FDT and initrd allocation rules for arm64
2017-05-01 18:20:03 -07:00
Linus Torvalds
174ddfd5df Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timer departement delivers:

   - more year 2038 rework

   - a massive rework of the arm achitected timer

   - preparatory patches to allow NTP correction of clock event devices
     to avoid early expiry

   - the usual pile of fixes and enhancements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits)
  timer/sysclt: Restrict timer migration sysctl values to 0 and 1
  arm64/arch_timer: Mark errata handlers as __maybe_unused
  Clocksource/mips-gic: Remove redundant non devicetree init
  MIPS/Malta: Probe gic-timer via devicetree
  clocksource: Use GENMASK_ULL in definition of CLOCKSOURCE_MASK
  acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver
  clocksource: arm_arch_timer: add GTDT support for memory-mapped timer
  acpi/arm64: Add memory-mapped timer support in GTDT driver
  clocksource: arm_arch_timer: simplify ACPI support code.
  acpi/arm64: Add GTDT table parse driver
  clocksource: arm_arch_timer: split MMIO timer probing.
  clocksource: arm_arch_timer: add structs to describe MMIO timer
  clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init call
  clocksource: arm_arch_timer: refactor arch_timer_needs_probing
  clocksource: arm_arch_timer: split dt-only rate handling
  x86/uv/time: Set ->min_delta_ticks and ->max_delta_ticks
  unicore32/time: Set ->min_delta_ticks and ->max_delta_ticks
  um/time: Set ->min_delta_ticks and ->max_delta_ticks
  tile/time: Set ->min_delta_ticks and ->max_delta_ticks
  score/time: Set ->min_delta_ticks and ->max_delta_ticks
  ...
2017-05-01 16:15:18 -07:00
Linus Torvalds
5db6db0d40 Merge branch 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess unification updates from Al Viro:
 "This is the uaccess unification pile. It's _not_ the end of uaccess
  work, but the next batch of that will go into the next cycle. This one
  mostly takes copy_from_user() and friends out of arch/* and gets the
  zero-padding behaviour in sync for all architectures.

  Dealing with the nocache/writethrough mess is for the next cycle;
  fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
  sold on access_ok() in there, BTW; just not in this pile), same for
  reducing __copy_... callsites, strn*... stuff, etc. - there will be a
  pile about as large as this one in the next merge window.

  This one sat in -next for weeks. -3KLoC"

* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
  HAVE_ARCH_HARDENED_USERCOPY is unconditional now
  CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
  m32r: switch to RAW_COPY_USER
  hexagon: switch to RAW_COPY_USER
  microblaze: switch to RAW_COPY_USER
  get rid of padding, switch to RAW_COPY_USER
  ia64: get rid of copy_in_user()
  ia64: sanitize __access_ok()
  ia64: get rid of 'segment' argument of __do_{get,put}_user()
  ia64: get rid of 'segment' argument of __{get,put}_user_check()
  ia64: add extable.h
  powerpc: get rid of zeroing, switch to RAW_COPY_USER
  esas2r: don't open-code memdup_user()
  alpha: fix stack smashing in old_adjtimex(2)
  don't open-code kernel_setsockopt()
  mips: switch to RAW_COPY_USER
  mips: get rid of tail-zeroing in primitives
  mips: make copy_from_user() zero tail explicitly
  mips: clean and reorder the forest of macros...
  mips: consolidate __invoke_... wrappers
  ...
2017-05-01 14:41:04 -07:00
Linus Torvalds
89d1cf89c8 * An EDAC driver for Cavium ThunderX RAS IP (Sergey Temerkhanov)
* Removal of DRAM error reporting through PCI SERR NMI (Borislav Petkov)
 
 * Misc small fixes (Jan Glauber, Thor Thayer)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlkGtO0ACgkQEsHwGGHe
 VUoXfA/+JLgcHpI04KcvJtTMpNWE3p04xLdzw7hvgvWPLg+JDHF1jXxA4HRy7usI
 BAsEZIcpyk/9tzYjKm4zc/8nhlrjx/ic9cU+hZa8zCy/47uArX9HlrsxAUpgVxcx
 YmWzZ2gyo9Jsi/44wZwnp4dNWibvyG5ECrgis7AFOihT1qyi74YajNfqJWWUbG/H
 W3DkCVs2JVzelue3rI9J8f9MSZk5sL3C9vfFWxk6ifiqr+rlUphoSNFdF+mRnBdr
 dvk555G4Xmmz97ZiBAOM12M1trn+4lCkyfuQuMw0cZYt7F/nS7ZdLqAKK8H1KIoE
 mGl29p85svZRhIM25Cd759LSharAetqpNyxicjAwONwLcKiXVf2UuR5NohVj3y1f
 Dbrh4zRx0OVJctaAKzLEHhW3Re/VA6lU8JUuvjBytKV5fr64jBpqSXFDL8J4y7p2
 RJnKNbPkoXB75LukNqxDgpL+YEnJjzlslqxLqgPVgHFtrsUjpNHAJ9rKDeJQoW3b
 wC2wVBZmwx+4ShyHjJePJC7C6a/gDktbDos2/XW11DHa4w8ZbZ2Q4ep9oYegBKcd
 szliytm0LWlUTUDVNoc9DW/ka0NAh43kjvCqcmUcfC+4lhMO28eajvj35PP7fcic
 hmCAQnJz6M8t1VgxO7xvWi4jAwhvbzXM5IV1O3tIDMYHJQhrLBw=
 =vGf1
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:

 - an EDAC driver for Cavium ThunderX RAS IP (Sergey Temerkhanov)

 - removal of DRAM error reporting through PCI SERR NMI (Borislav
   Petkov)

 - misc small fixes (Jan Glauber, Thor Thayer)

* tag 'edac_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, ghes: Do not enable it by default
  EDAC: Rename report status accessors
  EDAC: Delete edac_stub.c
  EDAC: Update Kconfig help text
  EDAC: Remove EDAC_MM_EDAC
  EDAC: Issue tracepoint only when it is defined
  ACPI/extlog: Add EDAC dependency
  EDAC: Move edac_op_state to edac_mc.c
  EDAC: Remove edac_err_assert
  EDAC: Get rid of edac_handlers
  x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI
  EDAC, highbank: Align Makefile directives
  EDAC, thunderx: Remove unused code
  EDAC, thunderx: Change LMC index calculation
  EDAC, altera: Fix peripheral warnings for Cyclone5
  EDAC, thunderx: Fix L2C MCI interrupt disable
  EDAC, thunderx: Add Cavium ThunderX EDAC driver
2017-05-01 11:36:00 -07:00
Linus Torvalds
97ce89f8a4 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "The final fixes for 4.11:

   - prevent a triple fault with function graph tracing triggered via
     suspend to ram

   - prevent optimizing for size when function graph tracing is enabled
     and the compiler does not support -mfentry

   - prevent mwaitx() being called with a zero timeout as mwaitx() might
     never return. Observed on the new Ryzen CPUs"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Prevent timer value 0 for MWAITX
  x86/build: convert function graph '-Os' error to warning
  ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram
2017-04-30 11:44:18 -07:00
Janakarajan Natarajan
88d879d29f Prevent timer value 0 for MWAITX
Newer hardware has uncovered a bug in the software implementation of
using MWAITX for the delay function. A value of 0 for the timer is meant
to indicate that a timeout will not be used to exit MWAITX. On newer
hardware this can result in MWAITX never returning, resulting in NMI
soft lockup messages being printed. On older hardware, some of the other
conditions under which MWAITX can exit masked this issue. The AMD APM
does not currently document this and will be updated.

Please refer to http://marc.info/?l=kvm&m=148950623231140 for
information regarding NMI soft lockup messages on an AMD Ryzen 1800X.
This has been root-caused as a 0 passed to MWAITX causing it to wait
indefinitely.

This change has the added benefit of avoiding the unnecessary setup of
MONITORX/MWAITX when the delay value is zero.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Link: http://lkml.kernel.org/r/1493156643-29366-1-git-send-email-Janakarajan.Natarajan@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-30 13:35:11 +02:00
Al Viro
2fefc97b21 HAVE_ARCH_HARDENED_USERCOPY is unconditional now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 12:11:06 -04:00
Al Viro
701cac61d0 CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
all architectures converted

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 12:11:01 -04:00
Al Viro
eea86b637a Merge branches 'uaccess.alpha', 'uaccess.arc', 'uaccess.arm', 'uaccess.arm64', 'uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess 2017-04-26 12:06:59 -04:00
Prarit Bhargava
21173d0b4d sched/x86: Update reschedule warning text
Modify the reschedule warning to output the offline CPU number and
use a better debug message.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1492518305-3808-1-git-send-email-prarit@redhat.com
[ Tweaked the warning message. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-20 10:14:30 +02:00
Ingo Molnar
afa7a17f3a Merge branch 'WIP.x86/process' into perf/core 2017-04-20 10:07:12 +02:00
Borislav Petkov
c6a9583fb4 x86/mce: Check MCi_STATUS[MISCV] for usable addr on Intel only
mce_usable_address() does a bunch of basic sanity checks to verify
whether the address reported with the error is usable for further
processing. However, we do check MCi_STATUS[MISCV] and that is not
needed on AMD as that bit says that there's additional information about
the logged error in the MCi_MISCj banks.

But we don't need that to know whether the address is usable - we only
need to know whether the physical address is valid - i.e., ADDRV.

On Intel the MISCV bit is needed to perform additional checks to determine
whether the reported address is a physical one, etc.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170418183924.6agjkebilwqj26or@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-19 12:04:46 +02:00
Josh Poimboeuf
a5859c6d7b x86/build: convert function graph '-Os' error to warning
For pre-4.6.0 versions of GCC, which don't have '-mfentry', the
'-maccumulate-outgoing-args' option is required for function graph
tracing in order to avoid GCC bug 42109.

However, GCC ignores '-maccumulate-outgoing-args' when '-Os' is
also set.

Currently we force a build error to prevent that scenario, but that
breaks randconfigs.  So change the error to a warning which also
disables CONFIG_CC_OPTIMIZE_FOR_SIZE.

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Link: http://lkml.kernel.org/r/20170418214429.o7fbwbmf4nqosezy@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-19 09:57:23 +02:00
Vishal Verma
0dc9c639e6 x86/mce: Make the MCE notifier a blocking one
The NFIT MCE handler callback (for handling media errors on NVDIMMs)
takes a mutex to add the location of a memory error to a list. But since
the notifier call chain for machine checks (x86_mce_decoder_chain) is
atomic, we get a lockdep splat like:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
  in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
  [..]
  Call Trace:
   dump_stack
   ___might_sleep
   __might_sleep
   mutex_lock_nested
   ? __lock_acquire
   nfit_handle_mce
   notifier_call_chain
   atomic_notifier_call_chain
   ? atomic_notifier_call_chain
   mce_gen_pool_process

Convert the notifier to a blocking one which gets to run only in process
context.

Boris: remove the notifier call in atomic context in print_mce(). For
now, let's print the MCE on the atomic path so that we can make sure
they go out and get logged at least.

Fixes: 6839a6d96f ("nfit: do an ARS scrub on hitting a latent media error")
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170411224457.24777-1-vishal.l.verma@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-18 22:23:48 +02:00
Borislav Petkov
415601b191 x86/mce: Update notifier priority check
Update the check which enforces the registration of MCE decoder notifier
callbacks with valid priority only, to include mcelog's priority.

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkp@01.org
Link: http://lkml.kernel.org/r/20170418073820.i6kl5tggcntwlisa@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-18 10:27:52 +02:00
Linus Torvalds
d5ff0814fd Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull nvdimm fixes from Dan Williams:
 "A small crop of lockdep, sleeping while atomic, and other fixes /
  band-aids in advance of the full-blown reworks targeting the next
  merge window. The largest change here is "libnvdimm: fix blk free
  space accounting" which deletes a pile of buggy code that better
  testing would have caught before merging. The next change that is
  borderline too big for a late rc is switching the device-dax locking
  from rcu to srcu, I couldn't think of a smaller way to make that fix.

  The __copy_user_nocache fix will have a full replacement in 4.12 to
  move those pmem special case considerations into the pmem driver. The
  "libnvdimm: band aid btt vs clear poison locking" commit admits that
  our error clearing support for btt went in broken, so we just disable
  it in 4.11 and -stable. A replacement / full fix is in the pipeline
  for 4.12

  Some of these would have been caught earlier had DEBUG_ATOMIC_SLEEP
  been enabled on my development station. I wonder if we should have:

      config DEBUG_ATOMIC_SLEEP
        default PROVE_LOCKING

  ...since I mistakenly thought I got both with PROVE_LOCKING=y.

  These have received a build success notification from the 0day robot,
  and some have appeared in a -next release with no reported issues"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
  device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation
  libnvdimm: band aid btt vs clear poison locking
  libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
  libnvdimm: fix blk free space accounting
  acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
2017-04-15 14:07:03 -07:00
Linus Torvalds
91174391bf Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of small fixes for x86:

   - fix locking in RDT to prevent memory leaks and freeing in use
     memory

   - prevent setting invalid values for vdso32_enabled which cause
     inconsistencies for user space resulting in application crashes.

   - plug a race in the vdso32 code between fork and sysctl which causes
     inconsistencies for user space resulting in application crashes.

   - make MPX signal delivery work in compat mode

   - make the dmesg output of traps and faults readable again"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel_rdt: Fix locking in rdtgroup_schemata_write()
  x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection()
  x86/vdso: Plug race between mapping and ELF header setup
  x86/vdso: Ensure vdso32_enabled gets set to valid values only
  x86/signals: Fix lower/upper bound reporting in compat siginfo
2017-04-14 17:00:01 -07:00
Linus Torvalds
07c7016de7 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Two small fixes for perf:

   - the move to support cross arch annotation introduced per arch
     initialization requirements, fullfill them for s/390 (Christian
     Borntraeger)

   - add the missing initialization to the LBR entries to avoid exposing
     random or stale data"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
  perf annotate s390: Fix perf annotate error -95 (4.10 regression)
2017-04-14 16:58:38 -07:00
Linus Torvalds
f399ecb4b4 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Thomas Gleixner:
 "Three fixes from EFI land:

   - prevent accessing a Graphic Output Device (GOP) which the kernel
     does not know to handle

   - prevent PCI reconfiguration to modify a BAR which covers the
     framebuffer because that's already in use through the EFI GOP
     interface

   - avoid reserving EFI runtime regions as this results in bogus memory
     mappings"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Don't try to reserve runtime regions
  efi/fb: Avoid reconfiguration of BAR that covers the framebuffer
  efi/libstub: Skip GOP with PIXEL_BLT_ONLY format
2017-04-14 16:55:33 -07:00
Nicolai Stange
6fc46497a9 x86/uv/time: Set ->min_delta_ticks and ->max_delta_ticks
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.

Currently, the x86's uv rtc clockevent device is initialized as follows:

  clock_event_device_uv.min_delta_ns = NSEC_PER_SEC /
                                 sn_rtc_cycles_per_second;
  clock_event_device_uv.max_delta_ns = clocksource_uv.mask *
                                 (NSEC_PER_SEC / sn_rtc_cycles_per_second);

This translates to a ->min_delta_ticks value of 1 and a ->max_delta_ticks
value of clocksource_uv.mask.

Initialize ->min_delta_ticks and ->max_delta_ticks with these values
respectively.

This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Mike Travis <travis@sgi.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-04-14 13:11:23 -07:00
Nicolai Stange
747d04b30e x86/apic/timer: Set ->min_delta_ticks and ->max_delta_ticks
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.

Make the x86 arch's apic clockevent driver initialize these fields
properly.

This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
CC: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-04-14 13:11:17 -07:00
Nicolai Stange
b77f41618c x86/lguest/timer: Set ->min_delta_ticks and ->max_delta_ticks
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.

Make the x86 arch's lguest clockevent driver initialize these fields
properly.

This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: lguest@lists.ozlabs.org
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-04-14 13:11:12 -07:00
Nicolai Stange
3d18d661aa x86/xen/time: Set ->min_delta_ticks and ->max_delta_ticks
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.

Make the x86 arch's xen clockevent driver initialize these fields properly.

This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: xen-devel@lists.xenproject.org
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-04-14 13:11:03 -07:00
Josh Poimboeuf
34a477e529 ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram
On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable function
graph tracing and then suspend to RAM, it will triple fault and reboot when
it resumes.

The first fault happens when booting a secondary CPU:

startup_32_smp()
  load_ucode_ap()
    prepare_ftrace_return()
      ftrace_graph_is_dead()
        (accesses 'kill_ftrace_graph')

The early head_32.S code calls into load_ucode_ap(), which has an an
ftrace hook, so it calls prepare_ftrace_return(), which calls
ftrace_graph_is_dead(), which tries to access the global
'kill_ftrace_graph' variable with a virtual address, causing a fault
because the CPU is still in real mode.

The fix is to add a check in prepare_ftrace_return() to make sure it's
running in protected mode before continuing.  The check makes sure the
stack pointer is a virtual kernel address.  It's a bit of a hack, but
it's not very intrusive and it works well enough.

For reference, here are a few other (more difficult) ways this could
have potentially been fixed:

- Move startup_32_smp()'s call to load_ucode_ap() down to *after* paging
  is enabled.  (No idea what that would break.)

- Track down load_ucode_ap()'s entire callee tree and mark all the
  functions 'notrace'.  (Probably not realistic.)

- Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu()
  or __cpu_up(), and ensure that the pause facility can be queried from
  real mode.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: linux-acpi@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@kernel.org
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/5c1272269a580660703ed2eccf44308e790c7a98.1492123841.git.jpoimboe@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 11:48:51 +02:00
Piotr Luc
9ea74f7c70 x86/mce: Enable PPIN for Knights Landing/Mill
Intel Xeon Phi processors (KNL and KNM) support PPIN as well, so add their
CPUIDs to the whitelist of supported processors.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170408172004.8463-1-piotr.luc@intel.com
Link: http://lkml.kernel.org/r/20170413201056.10525-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 10:46:12 +02:00
Kan Liang
fd583ad156 perf/x86: Fix spurious NMI with PEBS Load Latency event
Spurious NMIs will be observed with the following command:

  while :; do
    perf record -bae "cpu/umask=0x01,event=0xcd,ldlat=0x80/pp"
                  -e "cpu/umask=0x03,event=0x0/"
                  -e "cpu/umask=0x02,event=0x0/"
                  -e cycles,branches,cache-misses
                  -e cache-references -- sleep 10
  done

The bug was introduced by commit:

  8077eca079 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+")

That commit clears the status bits for the counters used for PEBS
events, by masking the whole 64 bits pebs_enabled. However, only the
low 32 bits of both status and pebs_enabled are reserved for PEBS-able
counters.

For status bits 32-34 are fixed counter overflow bits. For
pebs_enabled bits 32-34 are for PEBS Load Latency.

In the test case, the PEBS Load Latency event and fixed counter event
could overflow at the same time. The fixed counter overflow bit will
be cleared by mistake. Once it is cleared, the fixed counter overflow
never be processed, which finally trigger spurious NMI.

Correct the PEBS enabled mask by ignoring the non-PEBS bits.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 8077eca079 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+")
Link: http://lkml.kernel.org/r/1491333246-3965-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14 10:31:39 +02:00
Ingo Molnar
18c5c7c618 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14 10:30:28 +02:00
Ingo Molnar
0ba78a95a6 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14 10:29:40 +02:00
Peter Zijlstra
f2200ac311 perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
When the perf_branch_entry::{in_tx,abort,cycles} fields were added,
intel_pmu_lbr_read_32() wasn't updated to initialize them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Fixes: 135c5612c4 ("perf/x86/intel: Support Haswell/v4 LBR format")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-14 10:18:00 +02:00
Omar Sandoval
6f6266a561 x86/efi: Don't try to reserve runtime regions
Reserving a runtime region results in splitting the EFI memory
descriptors for the runtime region. This results in runtime region
descriptors with bogus memory mappings, leading to interesting crashes
like the following during a kexec:

  general protection fault: 0000 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53
  Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05   09/30/2016
  RIP: 0010:virt_efi_set_variable()
  ...
  Call Trace:
   efi_delete_dummy_variable()
   efi_enter_virtual_mode()
   start_kernel()
   ? set_init_arg()
   x86_64_start_reservations()
   x86_64_start_kernel()
   start_cpu()
  ...
  Kernel panic - not syncing: Fatal exception

Runtime regions will not be freed and do not need to be reserved, so
skip the memmap modification in this case.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
Link: http://lkml.kernel.org/r/20170412152719.9779-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-13 08:09:27 +02:00
Dan Williams
11e63f6d92 x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.

Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.

Fixes: 5de490daec ("pmem: add copy_from_iter_pmem() and clear_pmem()")
Cc: <stable@vger.kernel.org>
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-12 13:45:18 -07:00
Kees Cook
a4866aa812 mm: Tighten x86 /dev/mem with zeroing reads
Under CONFIG_STRICT_DEVMEM, reading System RAM through /dev/mem is
disallowed. However, on x86, the first 1MB was always allowed for BIOS
and similar things, regardless of it actually being System RAM. It was
possible for heap to end up getting allocated in low 1MB RAM, and then
read by things like x86info or dd, which would trip hardened usercopy:

usercopy: kernel memory exposure attempt detected from ffff880000090000 (dma-kmalloc-256) (4096 bytes)

This changes the x86 exception for the low 1MB by reading back zeros for
System RAM areas instead of blindly allowing them. More work is needed to
extend this to mmap, but currently mmap doesn't go through usercopy, so
hardened usercopy won't Oops the kernel.

Reported-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-04-12 11:40:23 -07:00
Masami Hiramatsu
a8d11cd071 kprobes/x86: Consolidate insn decoder users for copying code
Consolidate x86 instruction decoder users on the path of
copying original code for kprobes.

Kprobes decodes the same instruction a maximum of 3 times when
preparing the instruction buffer:

 - The first time for getting the length of the instruction,
 - the 2nd for adjusting displacement,
 - and the 3rd for checking whether the instruction is boostable or not.

For each time, the actual decoding target address is slightly
different (1st is original address or recovered instruction buffer,
2nd and 3rd are pointing to the copied buffer), but all have
the same instruction.

Thus, this patch also changes the target address to the copied
buffer at first and reuses the decoded "insn" for displacement
adjusting and checking boostability.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076389643.22469.13151892839998777373.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:47 +02:00
Masami Hiramatsu
ea1e34fc36 kprobes/x86: Use probe_kernel_read() instead of memcpy()
Use probe_kernel_read() for avoiding unexpected faults while
copying kernel text in __recover_probed_insn(),
__recover_optprobed_insn() and __copy_instruction().

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076382624.22469.10091613887942958518.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:47 +02:00
Masami Hiramatsu
d0381c81c2 kprobes/x86: Set kprobes pages read-only
Set the pages which is used for kprobes' singlestep buffer
and optprobe's trampoline instruction buffer to readonly.
This can prevent unexpected (or unintended) instruction
modification.

This also passes rodata_test as below.

Without this patch, rodata_test shows a warning:

  WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x7a9/0xa20
  x86/mm: Found insecure W+X mapping at address ffffffffa0000000/0xffffffffa0000000

With this fix, no W+X pages are found:

  x86/mm: Checked W+X mappings: passed, no W+X pages found.
  rodata_test: all tests were successful

Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076375592.22469.14174394514338612247.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:46 +02:00
Masami Hiramatsu
490154bc68 kprobes/x86: Make boostable flag boolean
Make arch_specific_insn.boostable to boolean, since it has
only 2 states, boostable or not. So it is better to use
boolean from the viewpoint of code readability.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076368566.22469.6322906866458231844.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:46 +02:00
Masami Hiramatsu
804dec5bda kprobes/x86: Do not modify singlestep buffer while resuming
Do not modify singlestep execution buffer (kprobe.ainsn.insn)
while resuming from single-stepping, instead, modifies
the buffer to add a jump back instruction at preparing
buffer.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076361560.22469.1610155860343077495.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:45 +02:00
Masami Hiramatsu
17880e4d57 kprobes/x86: Use instruction decoder for booster
Use x86 instruction decoder for checking whether the probed
instruction is able to boost or not, instead of hand-written
code.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076354563.22469.13379472209338986858.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:45 +02:00
Masami Hiramatsu
129d17e8e8 kprobes/x86: Fix the description of __copy_instruction()
Fix the description comment of __copy_instruction() function
since it has already been changed to return the length of the
copied instruction.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076347582.22469.3775133607244923462.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:45 +02:00
Masami Hiramatsu
bd0b90676c kprobes/x86: Fix kprobe-booster not to boost far call instructions
Fix the kprobe-booster not to boost far call instruction,
because a call may store the address in the single-step
execution buffer to the stack, which should be modified
after single stepping.

Currently, this instruction will be filtered as not
boostable in resume_execution(), so this is not a
critical issue.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076340615.22469.14066273186134229909.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:44 +02:00
Jiri Olsa
7f00f38871 x86/intel_rdt: Fix locking in rdtgroup_schemata_write()
The schemata lock is released before freeing the resource's temporary
tmp_cbms allocation. That's racy versus another write which allocates and
uses new temporary storage, resulting in memory leaks, freeing in use
memory, double a free or any combination of those.

Move the unlock after the release code.

Fixes: 60ec2440c6 ("x86/intel_rdt: Add schemata file")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170411071446.15241-1-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-11 09:48:12 +02:00
Markus Trippelsdorf
1c99a68741 x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection()
Since commit:

  4bcc595ccd "printk: reinstate KERN_CONT for printing"

... the debug output of signal_fault(), do_trap() and do_general_protection()
looks garbled, e.g.:

 traps: conftest[9335] trap invalid opcode ip:400428 sp:7ffeaba1b0d8 error:0
  in conftest[400000+1000]

(note the unintended line break.)

Fix the bug by adding KERN_CONTs.

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 09:11:13 +02:00
Borislav Petkov
9df9078ef2 perf/amd/uncore: Fix pr_fmt() prefix
Make it "perf/amd/uncore: ", i.e., something more specific than "perf: ".

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170410122047.3026-4-bp@alien8.de
[ Changed it to perf/amd/uncore/ ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:44:59 +02:00
Borislav Petkov
68e8038048 perf/amd/uncore: Clean up per-family setup
Fam16h is the same as the default one, remove it. Turn the switch-case
into a simple if-else.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170410122047.3026-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:44:59 +02:00
Borislav Petkov
c2628f90c9 perf/amd/uncore: Do feature check first, before assignments
... and save some unnecessary work. Remove now unused label while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170410122047.3026-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:44:59 +02:00