Commit Graph

10378 Commits

Author SHA1 Message Date
Sean Christopherson
985ab27801 KVM: x86/mmu: Make kvm_mmu_page definition and accessor internal-only
Make 'struct kvm_mmu_page' MMU-only, nothing outside of the MMU should
be poking into the gory details of shadow pages.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622202034.15093-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:54 -04:00
Uros Bizjak
4cb5b77eec KVM: x86: Use VMCALL and VMMCALL mnemonics in kvm_para.h
Current minimum required version of binutils is 2.23,
which supports VMCALL and VMMCALL instruction mnemonics.

Replace the byte-wise specification of VMCALL and
VMMCALL with these proper mnemonics.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200623183439.5526-1-ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:49 -04:00
Jim Mattson
8a14fe4f0c kvm: x86: Move last_cpu into kvm_vcpu_arch as last_vmentry_cpu
Both the vcpu_vmx structure and the vcpu_svm structure have a
'last_cpu' field. Move the common field into the kvm_vcpu_arch
structure. For clarity, rename it to 'last_vmentry_cpu.'

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-6-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:45 -04:00
Sean Christopherson
02f5fb2e69 KVM: x86/mmu: Make .write_log_dirty a nested operation
Move .write_log_dirty() into kvm_x86_nested_ops to help differentiate it
from the non-nested dirty log hooks.  And because it's a nested-only
operation.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622215832.22090-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:38 -04:00
Paolo Bonzini
26d05b368a Merge branch 'kvm-async-pf-int' into HEAD 2020-07-08 16:20:30 -04:00
Peter Zijlstra
faa2fd7cba Merge branch 'sched/urgent' 2020-07-08 11:38:59 +02:00
Kan Liang
ce711ea3ca perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch
In the LBR call stack mode, LBR information is used to reconstruct a
call stack. To get the complete call stack, perf has to save/restore
all LBR registers during a context switch. Due to a large number of the
LBR registers, this process causes a high CPU overhead. To reduce the
CPU overhead during a context switch, use the XSAVES/XRSTORS
instructions.

Every XSAVE area must follow a canonical format: the legacy region, an
XSAVE header and the extended region. Although the LBR information is
only kept in the extended region, a space for the legacy region and
XSAVE header is still required. Add a new dedicated structure for LBR
XSAVES support.

Before enabling XSAVES support, the size of the LBR state has to be
sanity checked, because:
- the size of the software structure is calculated from the max number
of the LBR depth, which is enumerated by the CPUID leaf for Arch LBR.
The size of the LBR state is enumerated by the CPUID leaf for XSAVE
support of Arch LBR. If the values from the two CPUID leaves are not
consistent, it may trigger a buffer overflow. For example, a hypervisor
may unconsciously set inconsistent values for the two emulated CPUID.
- unlike other state components, the size of an LBR state depends on the
max number of LBRs, which may vary from generation to generation.

Expose the function xfeature_size() for the sanity check.
The LBR XSAVES support will be disabled if the size of the LBR state
enumerated by CPUID doesn't match with the size of the software
structure.

The XSAVE instruction requires 64-byte alignment for state buffers. A
new macro is added to reflect the alignment requirement. A 64-byte
aligned kmem_cache is created for architecture LBR.

Currently, the structure for each state component is maintained in
fpu/types.h. The structure for the new LBR state component should be
maintained in the same place. Move structure lbr_entry to fpu/types.h as
well for broader sharing.

Add dedicated lbr_save/lbr_restore functions for LBR XSAVES support,
which invokes the corresponding xstate helpers to XSAVES/XRSTORS LBR
information at the context switch when the call stack mode is enabled.
Since the XSAVES/XRSTORS instructions will be eventually invoked, the
dedicated functions is named with '_xsaves'/'_xrstors' postfix.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-23-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Kan Liang
50f408d96d x86/fpu/xstate: Add helpers for LBR dynamic supervisor feature
The perf subsystem will only need to save/restore the LBR state.
However, the existing helpers save all supported supervisor states to a
kernel buffer, which will be unnecessary. Two helpers are introduced to
only save/restore requested dynamic supervisor states. The supervisor
features in XFEATURE_MASK_SUPERVISOR_SUPPORTED and
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED mask cannot be saved/restored using
these helpers.

The helpers will be used in the following patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-22-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Kan Liang
f0dccc9da4 x86/fpu/xstate: Support dynamic supervisor feature for LBR
Last Branch Records (LBR) registers are used to log taken branches and
other control flows. In perf with call stack mode, LBR information is
used to reconstruct a call stack. To get the complete call stack, perf
has to save/restore all LBR registers during a context switch. Due to
the large number of the LBR registers, e.g., the current platform has
96 LBR registers, this process causes a high CPU overhead. To reduce
the CPU overhead during a context switch, an LBR state component that
contains all the LBR related registers is introduced in hardware. All
LBR registers can be saved/restored together using one XSAVES/XRSTORS
instruction.

However, the kernel should not save/restore the LBR state component at
each context switch, like other state components, because of the
following unique features of LBR:
- The LBR state component only contains valuable information when LBR
  is enabled in the perf subsystem, but for most of the time, LBR is
  disabled.
- The size of the LBR state component is huge. For the current
  platform, it's 808 bytes.
If the kernel saves/restores the LBR state at each context switch, for
most of the time, it is just a waste of space and cycles.

To efficiently support the LBR state component, it is desired to have:
- only context-switch the LBR when the LBR feature is enabled in perf.
- only allocate an LBR-specific XSAVE buffer on demand.
  (Besides the LBR state, a legacy region and an XSAVE header have to be
   included in the buffer as well. There is a total of (808+576) byte
   overhead for the LBR-specific XSAVE buffer. The overhead only happens
   when the perf is actively using LBRs. There is still a space-saving,
   on average, when it replaces the constant 808 bytes of overhead for
   every task, all the time on the systems that support architectural
   LBR.)
- be able to use XSAVES/XRSTORS for accessing LBR at run time.
  However, the IA32_XSS should not be adjusted at run time.
  (The XCR0 | IA32_XSS are used to determine the requested-feature
  bitmap (RFBM) of XSAVES.)

A solution, called dynamic supervisor feature, is introduced to address
this issue, which
- does not allocate a buffer in each task->fpu;
- does not save/restore a state component at each context switch;
- sets the bit corresponding to the dynamic supervisor feature in
  IA32_XSS at boot time, and avoids setting it at run time.
- dynamically allocates a specific buffer for a state component
  on demand, e.g. only allocates LBR-specific XSAVE buffer when LBR is
  enabled in perf. (Note: The buffer has to include the LBR state
  component, a legacy region and a XSAVE header space.)
  (Implemented in a later patch)
- saves/restores a state component on demand, e.g. manually invokes
  the XSAVES/XRSTORS instruction to save/restore the LBR state
  to/from the buffer when perf is active and a call stack is required.
  (Implemented in a later patch)

A new mask XFEATURE_MASK_DYNAMIC and a helper xfeatures_mask_dynamic()
are introduced to indicate the dynamic supervisor feature. For the
systems which support the Architecture LBR, LBR is the only dynamic
supervisor feature for now. For the previous systems, there is no
dynamic supervisor feature available.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-21-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Kan Liang
a063bf249b x86/fpu: Use proper mask to replace full instruction mask
When saving xstate to a kernel/user XSAVE area with the XSAVE family of
instructions, the current code applies the 'full' instruction mask (-1),
which tries to XSAVE all possible features. This method relies on
hardware to trim 'all possible' down to what is enabled in the
hardware. The code works well for now. However, there will be a
problem, if some features are enabled in hardware, but are not suitable
to be saved into all kernel XSAVE buffers, like task->fpu, due to
performance consideration.

One such example is the Last Branch Records (LBR) state. The LBR state
only contains valuable information when LBR is explicitly enabled by
the perf subsystem, and the size of an LBR state is large (808 bytes
for now). To avoid both CPU overhead and space overhead at each context
switch, the LBR state should not be saved into task->fpu like other
state components. It should be saved/restored on demand when LBR is
enabled in the perf subsystem. Current copy_xregs_to_* will trigger a
buffer overflow for such cases.

Three sites use the '-1' instruction mask which must be updated.

Two are saving/restoring the xstate to/from a kernel-allocated XSAVE
buffer and can use 'xfeatures_mask_all', which will save/restore all of
the features present in a normal task FPU buffer.

The last one saves the register state directly to a user buffer. It
could
also use 'xfeatures_mask_all'. Just as it was with the '-1' argument,
any supervisor states in the mask will be filtered out by the hardware
and not saved to the buffer.  But, to be more explicit about what is
expected to be saved, use xfeatures_mask_user() for the instruction
mask.

KVM includes the header file fpu/internal.h. To avoid 'undefined
xfeatures_mask_all' compiling issue, move copy_fpregs_to_fpstate() to
fpu/core.c and export it, because:
- The xfeatures_mask_all is indirectly used via copy_fpregs_to_fpstate()
  by KVM. The function which is directly used by other modules should be
  exported.
- The copy_fpregs_to_fpstate() is a function, while xfeatures_mask_all
  is a variable for the "internal" FPU state. It's safer to export a
  function than a variable, which may be implicitly changed by others.
- The copy_fpregs_to_fpstate() is a big function with many checks. The
  removal of the inline keyword should not impact the performance.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-20-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Kan Liang
5624986dc6 perf/x86/intel/lbr: Unify the stored format of LBR information
Current LBR information in the structure x86_perf_task_context is stored
in a different format from the PEBS LBR record and Architecture LBR,
which prevents the sharing of the common codes.

Use the format of the PEBS LBR record as a unified format. Use a generic
name lbr_entry to replace pebs_lbr_entry.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1593780569-62993-11-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:53 +02:00
Kan Liang
af6cf12970 perf/x86: Expose CPUID enumeration bits for arch LBR
The LBR capabilities of Architecture LBR are retrieved from the CPUID
enumeration once at boot time. The capabilities have to be saved for
future usage.

Several new fields are added into structure x86_pmu to indicate the
capabilities. The fields will be used in the following patches.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1593780569-62993-9-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:53 +02:00
Kan Liang
d6a162a41b x86/msr-index: Add bunch of MSRs for Arch LBR
Add Arch LBR related MSRs and the new LBR INFO bits in MSR-index.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1593780569-62993-8-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:52 +02:00
Kan Liang
bd657aa3dd x86/cpufeatures: Add Architectural LBRs feature bit
CPUID.(EAX=07H, ECX=0):EDX[19] indicates whether an Intel CPU supports
Architectural LBRs.

The "X86_FEATURE_..., word 18" is already mirrored from CPUID
"0x00000007:0 (EDX)". Add X86_FEATURE_ARCH_LBR under the "word 18"
section.

The feature will appear as "arch_lbr" in /proc/cpuinfo.

The Architectural Last Branch Records (LBR) feature enables recording
of software path history by logging taken branches and other control
flows. The feature will be supported in the perf_events subsystem.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-2-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:51 +02:00
Andy Lutomirski
b037b09b90 x86/entry: Rename idtentry_enter/exit_cond_rcu() to idtentry_enter/exit()
They were originally called _cond_rcu because they were special versions
with conditional RCU handling.  Now they're the standard entry and exit
path, so the _cond_rcu part is just confusing.  Drop it.

Also change the signature to make them more extensible and more foolproof.

No functional change -- it's pure refactoring.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/247fc67685263e0b673e1d7f808182d28ff80359.1593795633.git.luto@kernel.org
2020-07-06 21:15:52 +02:00
Ingo Molnar
a4c0e91d1d x86/entry/32: Fix XEN_PV build dependency
xenpv_exc_nmi() and xenpv_exc_debug() are only defined on 64-bit kernels,
but they snuck into the 32-bit build via <asm/identry.h>, causing the link
to fail:

  ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_nmi':
  (.entry.text+0x817): undefined reference to `xenpv_exc_nmi'

  ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_debug':
  (.entry.text+0x827): undefined reference to `xenpv_exc_debug'

Only use them on 64-bit kernels.

Fixes: f41f082422: ("x86/entry/xen: Route #DB correctly on Xen PV")
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-05 21:39:23 +02:00
Linus Torvalds
72674d4800 A series of fixes for x86:
- Reset MXCSR in kernel_fpu_begin() to prevent using a stale user space
    value.
 
  - Prevent writing MSR_TEST_CTRL on CPUs which are not explicitly
    whitelisted for split lock detection. Some CPUs which do not support
    it crash even when the MSR is written to 0 which is the default value.
 
  - Fix the XEN PV fallout of the entry code rework
 
  - Fix the 32bit fallout of the entry code rework
 
  - Add more selftests to ensure that these entry problems don't come back.
 
  - Disable 16 bit segments on XEN PV. It's not supported because XEN PV
    does not implement ESPFIX64
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8B9JoTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoV8LEAC6QJPDvqYUl4r0rNIRG+S6D99lQOse
 1smxvgXX4UaRz5Tgz6kvYUcucqmmnTfvnO8cg82LASeFw1xfVPPAtl3GZjoClwhv
 0NJkKYcMm5QUOSVjJmjkcbAld//FyRfxHuJ8HMEtrbvkys2qWBmLzMaUNhFDNhcc
 73UMmyuyL4kef9v/iAeR5WXG5+b+j9lZDiC1lTWuEKs10d1EdTwt2O/wtSRRPpMn
 kL1qGTJAL+iRyRe7weLOkC2KZ9+Gq2NtyJQutkthZtGe5+pLT3AT6AlWxeg1HU8q
 pxaQP25oe8/8naIoOmwiuwAP2qmm5eHedzXoN0h7i2XmofYOJaWeF95K7oDro8Nj
 2deCx1bk0wr/RUxbYlfUacs8S+wmMWe7+BPnHXZphkSq5Vx+oXIw6mJOqmNb7Yiv
 7ld1QwSD5dyWCEk1af16XKsFvSIRiGh8FypfTiTxyk+z7HIWBNXlu8OWHn1A7Sra
 iaolCZfXtTJzm4w5+VVT2FX3s7jJrmMM4iSLtM2ISo2k+1HMlTbgLE6/yGjQ3ZaY
 U298W7Pm8CwBRgzyKBvZVfncm0U/B0FNo/8C0jsJKPIOdpoLhs+u7sjpyaNC+toz
 GE0skoWZxMhga4xPF84ua/l1VGncVUN1d5/dmnXz8xdyxFlktUtkt2iPE4G0rt3S
 Xgh2uLHOgST6Kw==
 =lI9c
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A series of fixes for x86:

   - Reset MXCSR in kernel_fpu_begin() to prevent using a stale user
     space value.

   - Prevent writing MSR_TEST_CTRL on CPUs which are not explicitly
     whitelisted for split lock detection. Some CPUs which do not
     support it crash even when the MSR is written to 0 which is the
     default value.

   - Fix the XEN PV fallout of the entry code rework

   - Fix the 32bit fallout of the entry code rework

   - Add more selftests to ensure that these entry problems don't come
     back.

   - Disable 16 bit segments on XEN PV. It's not supported because XEN
     PV does not implement ESPFIX64"

* tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ldt: Disable 16-bit segments on Xen PV
  x86/entry/32: Fix #MC and #DB wiring on x86_32
  x86/entry/xen: Route #DB correctly on Xen PV
  x86/entry, selftests: Further improve user entry sanity checks
  x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
  selftests/x86: Consolidate and fix get/set_eflags() helpers
  selftests/x86/syscall_nt: Clear weird flags after each test
  selftests/x86/syscall_nt: Add more flag combinations
  x86/entry/64/compat: Fix Xen PV SYSENTER frame setup
  x86/entry: Move SYSENTER's regs->sp and regs->flags fixups into C
  x86/entry: Assert that syscalls are on the right stack
  x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
  x86/fpu: Reset MXCSR to default in kernel_fpu_begin()
2020-07-05 12:23:49 -07:00
Andy Lutomirski
13cbc0cd4a x86/entry/32: Fix #MC and #DB wiring on x86_32
DEFINE_IDTENTRY_MCE and DEFINE_IDTENTRY_DEBUG were wired up as non-RAW
on x86_32, but the code expected them to be RAW.

Get rid of all the macro indirection for them on 32-bit and just use
DECLARE_IDTENTRY_RAW and DEFINE_IDTENTRY_RAW directly.

Also add a warning to make sure that we only hit the _kernel paths
in kernel mode.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9e90a7ee8e72fd757db6d92e1e5ff16339c1ecf9.1593795633.git.luto@kernel.org
2020-07-04 19:47:26 +02:00
Andy Lutomirski
f41f082422 x86/entry/xen: Route #DB correctly on Xen PV
On Xen PV, #DB doesn't use IST. It still needs to be correctly routed
depending on whether it came from user or kernel mode.

Get rid of DECLARE/DEFINE_IDTENTRY_XEN -- it was too hard to follow the
logic.  Instead, route #DB and NMI through DECLARE/DEFINE_IDTENTRY_RAW on
Xen, and do the right thing for #DB.  Also add more warnings to the
exc_debug* handlers to make this type of failure more obvious.

This fixes various forms of corruption that happen when usermode
triggers #DB on Xen PV.

Fixes: 4c0dcd8350 ("x86/entry: Implement user mode C entry points for #DB and #MCE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/4163e733cce0b41658e252c6c6b3464f33fdff17.1593795633.git.luto@kernel.org
2020-07-04 19:47:25 +02:00
Peter Zijlstra
78c2141b65 Merge branch 'perf/vlbr' 2020-07-02 15:51:48 +02:00
Like Xu
097e4311cd perf/x86: Add constraint to create guest LBR event without hw counter
The hypervisor may request the perf subsystem to schedule a time window
to directly access the LBR records msrs for its own use. Normally, it would
create a guest LBR event with callstack mode enabled, which is scheduled
along with other ordinary LBR events on the host but in an exclusive way.

To avoid wasting a counter for the guest LBR event, the perf tracks its
hw->idx via INTEL_PMC_IDX_FIXED_VLBR and assigns it with a fake VLBR
counter with the help of new vlbr_constraint. As with the BTS event,
there is actually no hardware counter assigned for the guest LBR event.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200514083054.62538-5-like.xu@linux.intel.com
2020-07-02 15:51:46 +02:00
Like Xu
b2d6504761 perf/x86/lbr: Add interface to get LBR information
The LBR records msrs are model specific. The perf subsystem has already
obtained the base addresses of LBR records based on the cpu model.

Therefore, an interface is added to allow callers outside the perf
subsystem to obtain these LBR information. It's useful for hypervisors
to emulate the LBR feature for guests with less code.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200613080958.132489-4-like.xu@linux.intel.com
2020-07-02 15:51:46 +02:00
Srinivas Pandruvada
ed7bde7a6d cpufreq: intel_pstate: Allow enable/disable energy efficiency
By default intel_pstate the driver disables energy efficiency by setting
MSR_IA32_POWER_CTL bit 19 for Kaby Lake desktop CPU model in HWP mode.
This CPU model is also shared by Coffee Lake desktop CPUs. This allows
these systems to reach maximum possible frequency. But this adds power
penalty, which some customers don't want. They want some way to enable/
disable dynamically.

So, add an additional attribute "energy_efficiency" under
/sys/devices/system/cpu/intel_pstate/ for these CPU models. This allows
to read and write bit 19 ("Disable Energy Efficiency Optimization") in
the MSR IA32_POWER_CTL.

This attribute is present in both HWP and non-HWP mode as this has an
effect in both modes. Refer to Intel Software Developer's manual for
details.

The scope of this bit is package wide. Also these systems are single
package systems. So read/write MSR on the current CPU is enough.

The energy efficiency (EE) bit setting needs to be preserved during
suspend/resume and CPU offline/online operation. To do this:
- Restoring the EE setting from the cpufreq resume() callback, if there
is change from the system default.
- By default, don't disable EE from cpufreq init() callback for matching
CPU models. Since the scope is package wide and is a single package
system, move the disable EE calls from init() callback to
intel_pstate_init() function, which is called only once.

Suggested-by: Len Brown <lenb@kernel.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-02 13:02:46 +02:00
Andy Lutomirski
40c45904f8 x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase
Debuggers expect that doing PTRACE_GETREGS, then poking at a tracee
and maybe letting it run for a while, then doing PTRACE_SETREGS will
put the tracee back where it was.  In the specific case of a 32-bit
tracer and tracee, the PTRACE_GETREGS/SETREGS data structure doesn't
have fs_base or gs_base fields, so FSBASE and GSBASE fields are
never stored anywhere.  Everything used to still work because
nonzero FS or GS would result full reloads of the segment registers
when the tracee resumes, and the bases associated with FS==0 or
GS==0 are irrelevant to 32-bit code.

Adding FSGSBASE support broke this: when FSGSBASE is enabled, FSBASE
and GSBASE are now restored independently of FS and GS for all tasks
when context-switched in.  This means that, if a 32-bit tracer
restores a previous state using PTRACE_SETREGS but the tracee's
pre-restore and post-restore bases don't match, then the tracee is
resumed with the wrong base.

Fix it by explicitly loading the base when a 32-bit tracer pokes FS
or GS on a 64-bit kernel.

Also add a test case.

Fixes: 673903495c ("x86/process/64: Use FSBSBASE in switch_to() if available")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/229cc6a50ecbb701abd50fe4ddaf0eda888898cd.1593192140.git.luto@kernel.org
2020-07-01 15:27:20 +02:00
Joerg Roedel
ad962d864c x86: Remove dev->archdata.iommu pointer
There are no users left, all drivers have been converted to use the
per-device private pointer offered by IOMMU core.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20200625130836.1916-10-joro@8bytes.org
2020-06-30 11:59:48 +02:00
Petteri Aimonen
7ad816762f x86/fpu: Reset MXCSR to default in kernel_fpu_begin()
Previously, kernel floating point code would run with the MXCSR control
register value last set by userland code by the thread that was active
on the CPU core just before kernel call. This could affect calculation
results if rounding mode was changed, or a crash if a FPU/SIMD exception
was unmasked.

Restore MXCSR to the kernel's default value.

 [ bp: Carve out from a bigger patch by Petteri, add feature check, add
   FNINIT call too (amluto). ]

Signed-off-by: Petteri Aimonen <jpa@git.mail.kapsi.fi>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=207979
Link: https://lkml.kernel.org/r/20200624114646.28953-2-bp@alien8.de
2020-06-29 10:02:00 +02:00
Linus Torvalds
098c793821 * AMD Memory bandwidth counter width fix, by Babu Moger.
* Use the proper length type in the 32-bit truncate() syscall variant,
 by Jiri Slaby.
 
 * Reinit IA32_FEAT_CTL during wakeup to fix the case where after
 resume, VMXON would #GP due to VMX not being properly enabled, by Sean
 Christopherson.
 
 * Fix a static checker warning in the resctrl code, by Dan Carpenter.
 
 * Add a CR4 pinning mask for bits which cannot change after boot, by
 Kees Cook.
 
 * Align the start of the loop of __clear_user() to 16 bytes, to improve
 performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl74q8kACgkQEsHwGGHe
 VUqYig/8CRyHBweLnR9naD6uZ+rF83LXiTKOGLt60WRzNPCLpkwGD5aRiUwzRmFL
 FOn9g2YLDY32+SzPRkqwJioodfxXRhvjKMnEChgnDcWAtTkWfMXWQfj2w5E8sTLE
 /9cpc9rmfCQJmZFDPkL88lfH38t+Uye4Ydcur/HMetkoR4C8hGrUOGZpkG3nR8EJ
 PGmmQ1VpMmwKMUsdD+GgKC+wgyrHbhFcrr+ZH5quU3XIzuvxXsHBiK2MlqVnN1a/
 1xKglMHfQQ1MI7tmJth8s1xLQ1/Mr+ctxhC5nyyMpheDU9/257bVNKE1uF+yz7or
 KylFUcvYje49mm7fxyEDrX+NMJGT7ZBBK/Xn7Fw5sLSsGGNY2/2HwYRbnzMSTjNO
 JzY7HDkZuQgzLxlKSIKgRvz5f1j1m8D0UaG/q+JuJ6mJoPDS5qiPyshv4cW8v8iD
 t5mzEuj++dWfiyPR4sWruP36jNKqPnbe8bUGe4j+QJ+TZL0SsSlopCFxo3TEJ4Bo
 dlHUxXZcYE2/48wlP15X+jFultKcqi0HwO+rQm8uPN7O7X1xsWcO4PbTl/lngvg6
 HxClDwmfDjoCmEXij3U9gqWvXmy++C5ljWCwhYNM60Fc1yIChfnwJHZBUvx3XGui
 DZqimVa+QIRNFwWqMVF1RmE1ZuyCMYGZulZPo68gEXNeeNZ0R6g=
 =hxkd
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - AMD Memory bandwidth counter width fix, by Babu Moger.

 - Use the proper length type in the 32-bit truncate() syscall variant,
   by Jiri Slaby.

 - Reinit IA32_FEAT_CTL during wakeup to fix the case where after
   resume, VMXON would #GP due to VMX not being properly enabled, by
   Sean Christopherson.

 - Fix a static checker warning in the resctrl code, by Dan Carpenter.

 - Add a CR4 pinning mask for bits which cannot change after boot, by
   Kees Cook.

 - Align the start of the loop of __clear_user() to 16 bytes, to improve
   performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming.

* tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/64: Align start of __clear_user() loop to 16-bytes
  x86/cpu: Use pinning mask for CR4 bits needing to be 0
  x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get()
  x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup
  syscalls: Fix offset type of ksys_ftruncate()
  x86/resctrl: Fix memory bandwidth counter width for AMD
2020-06-28 10:35:01 -07:00
Linus Torvalds
a358505d8a Peter Zijlstra says:
These patches address a number of instrumentation issues that were found after
 the x86/entry overhaul. When combined with rcu/urgent and objtool/urgent, these
 patches make UBSAN/KASAN/KCSAN happy again.
 
 Part of making this all work is bumping the minimum GCC version for KASAN
 builds to gcc-8.3, the reason for this is that the __no_sanitize_address
 function attribute is broken in GCC releases before that.
 
 No known GCC version has a working __no_sanitize_undefined, however because the
 only noinstr violation that results from this happens when an UB is found, we
 treat it like WARN. That is, we allow it to violate the noinstr rules in order
 to get the warning out.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl74oWMACgkQEsHwGGHe
 VUpZCw/5AfanXrEixuh4hZLPBOJ7MtW0YI3eyBRJ8j14R8iaK+Hvn/yU4/+qC2jj
 eAlc42QS6Ckzcdknyy8VpHVDR7LR2angN0ePJmrbKsjYq0LTrnfa2H5uABcAQoiW
 0BuGFub0QBRjCkxgsOoG3llqWsTkhRrGX1928lCuuK+8L+kB0bREGMqpR36EBFaS
 wIyLodLO/Bd+YcoWDMvm4I6FvHcdyY3Oq++mzro+5ye7bE9s0PpMC5IXNzmIuGmR
 31UvST+ooRMsM6GlhxHpn6pZuCqfjygXAYuuutwdK10g1f75ESkQdYz9T9KDlHrF
 4GqzcCGtOlN4DAvk3L7KGfHw3XIhioGFxeRT+gGgKsnxoBjvJXJ8x9GrcLA9jdJi
 WeqlqiEOiAa949nclwQQ+fSrx4LgLhJ8bexyOkwiRPx7R75Y0e6OqpxZtE6GiL8O
 BA6Z6cR7U8H4uhKIzZZ0NJiLwO1cSGo5Uz/ERcyg4L23rHYKrDdaQwFSDUxXWq/s
 2lEqISD0WrSwMxJtfET3zB0B20n6IO7Uszo0FdnDFO62fck8HlStZsqV4meoT2Cc
 moqIZsYc3qnESxO9OhWHdSGGAyGS0qcE4Sq/oM8d2dIvIeL4KwHqTE6QFSmcUivi
 QYdXIIQnqJgqX4dmvLFrTuI2Whc86oS40U5/Dhv7BlHx0oewSlg=
 =fcu1
 -----END PGP SIGNATURE-----

Merge tag 'x86_entry_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry fixes from Borislav Petkov:
 "This is the x86/entry urgent pile which has accumulated since the
  merge window.

  It is not the smallest but considering the almost complete entry core
  rewrite, the amount of fixes to follow is somewhat higher than usual,
  which is to be expected.

  Peter Zijlstra says:
   'These patches address a number of instrumentation issues that were
    found after the x86/entry overhaul. When combined with rcu/urgent
    and objtool/urgent, these patches make UBSAN/KASAN/KCSAN happy
    again.

    Part of making this all work is bumping the minimum GCC version for
    KASAN builds to gcc-8.3, the reason for this is that the
    __no_sanitize_address function attribute is broken in GCC releases
    before that.

    No known GCC version has a working __no_sanitize_undefined, however
    because the only noinstr violation that results from this happens
    when an UB is found, we treat it like WARN. That is, we allow it to
    violate the noinstr rules in order to get the warning out'"

* tag 'x86_entry_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Fix #UD vs WARN more
  x86/entry: Increase entry_stack size to a full page
  x86/entry: Fixup bad_iret vs noinstr
  objtool: Don't consider vmlinux a C-file
  kasan: Fix required compiler version
  compiler_attributes.h: Support no_sanitize_undefined check with GCC 4
  x86/entry, bug: Comment the instrumentation_begin() usage for WARN()
  x86/entry, ubsan, objtool: Whitelist __ubsan_handle_*()
  x86/entry, cpumask: Provide non-instrumented variant of cpu_is_offline()
  compiler_types.h: Add __no_sanitize_{address,undefined} to noinstr
  kasan: Bump required compiler version
  x86, kcsan: Add __no_kcsan to noinstr
  kcsan: Remove __no_kcsan_or_inline
  x86, kcsan: Remove __no_kcsan_or_inline usage
2020-06-28 09:42:47 -07:00
Mauro Carvalho Chehab
985098a05e docs: fix references for DMA*.txt files
As we moved those files to core-api, fix references to point
to their newer locations.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/37b2fd159fbc7655dbf33b3eb1215396a25f6344.1592895969.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-06-26 10:01:32 -06:00
Ingo Molnar
2c92d787cc Merge branch 'linus' into x86/entry, to resolve conflicts
Conflicts:
	arch/x86/kernel/traps.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-06-26 12:24:42 +02:00
Christoph Hellwig
800e26b813 x86/hyperv: allocate the hypercall page with only read and execute bits
Patch series "fix a hyperv W^X violation and remove vmalloc_exec"

Dexuan reported a W^X violation due to the fact that the hyper hypercall
page due switching it to be allocated using vmalloc_exec.

The problem is that PAGE_KERNEL_EXEC as used by vmalloc_exec actually
sets writable permissions in the pte.  This series fixes the issue by
switching to the low-level __vmalloc_node_range interface that allows
specifing more detailed permissions instead.  It then also open codes
the other two callers and removes the somewhat confusing vmalloc_exec
interface.

Peter noted that the hyper hypercall page allocation also has another
long standing issue in that it shouldn't use the full vmalloc but just
the module space.  This issue is so far theoretical as the allocation is
done early in the boot process.  I plan to fix it with another bigger
series for 5.9.

This patch (of 3):

Avoid a W^X violation cause by the fact that PAGE_KERNEL_EXEC includes
the writable bit.

For this resurrect the removed PAGE_KERNEL_RX definition, but as
PAGE_KERNEL_ROX to match arm64 and powerpc.

Link: http://lkml.kernel.org/r/20200618064307.32739-2-hch@lst.de
Fixes: 78bb17f76e ("x86/hyperv: use vmalloc_exec for the hypercall page")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:38 -07:00
Al Viro
4dfa103e82 x86: kill dump_fpu()
dead since the removal of aout coredump support...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-26 01:01:33 -04:00
Peter Zijlstra
c7aadc0932 x86/entry: Increase entry_stack size to a full page
Marco crashed in bad_iret with a Clang11/KCSAN build due to
overflowing the stack. Now that we run C code on it, expand it to a
full page.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Reported-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200618144801.819246178@infradead.org
2020-06-25 13:45:40 +02:00
Linus Torvalds
26e122e97a All bugfixes except for a couple cleanup patches.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7x2lwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPiVAgAn/83Vx/YrF9sr0+TLzukzfOubJVK
 Majxb0I06De23VDExiDoZjh5CnCN3kDja0m2c543ZI1XOrHRbp09v1goJQkAgiT0
 AQ8Npi1KB71io18SbZtrAhPLmSiUgRirF+XWHB38qjdbZixvZyWz8nvSITFY8aJQ
 ICgbm5jftzBdSOKEhqbHwZ+LcXjEGZsehwTiHpUBKUR/kNlRFV5UFAd5m+CT5i4O
 3DydLIReATDCoZUKfkBjYtoR3c9DyWESyfWD4GZ/2xRKr/1QfiZ4dA0cd/P9hJYz
 7MAG+ULvJGlasSzmcEQJ/X3o9QuIJzpQFpwbKeMX6gOsEsSVUQeriUHIFA==
 =jTFw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "All bugfixes except for a couple cleanup patches"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: Remove vcpu_vmx's defunct copy of host_pkru
  KVM: x86: allow TSC to differ by NTP correction bounds without TSC scaling
  KVM: X86: Fix MSR range of APIC registers in X2APIC mode
  KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL
  KVM: nVMX: Plumb L2 GPA through to PML emulation
  KVM: x86/mmu: Avoid mixing gpa_t with gfn_t in walk_addr_generic()
  KVM: LAPIC: ensure APIC map is up to date on concurrent update requests
  kvm: lapic: fix broken vcpu hotplug
  Revert "KVM: VMX: Micro-optimize vmexit time when not exposing PMU"
  KVM: VMX: Add helpers to identify interrupt type from intr_info
  kvm/svm: disable KCSAN for svm_vcpu_run()
  KVM: MIPS: Fix a build error for !CPU_LOONGSON64
2020-06-23 11:01:16 -07:00
Sean Christopherson
bf09fb6cba KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL
Remove support for context switching between the guest's and host's
desired UMWAIT_CONTROL.  Propagating the guest's value to hardware isn't
required for correct functionality, e.g. KVM intercepts reads and writes
to the MSR, and the latency effects of the settings controlled by the
MSR are not architecturally visible.

As a general rule, KVM should not allow the guest to control power
management settings unless explicitly enabled by userspace, e.g. see
KVM_CAP_X86_DISABLE_EXITS.  E.g. Intel's SDM explicitly states that C0.2
can improve the performance of SMT siblings.  A devious guest could
disable C0.2 so as to improve the performance of their workloads at the
detriment to workloads running in the host or on other VMs.

Wholesale removal of UMWAIT_CONTROL context switching also fixes a race
condition where updates from the host may cause KVM to enter the guest
with the incorrect value.  Because updates are are propagated to all
CPUs via IPI (SMP function callback), the value in hardware may be
stale with respect to the cached value and KVM could enter the guest
with the wrong value in hardware.  As above, the guest can't observe the
bad value, but it's a weird and confusing wart in the implementation.

Removal also fixes the unnecessary usage of VMX's atomic load/store MSR
lists.  Using the lists is only necessary for MSRs that are required for
correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on
old hardware, or for MSRs that need to-the-uop precision, e.g. perf
related MSRs.  For UMWAIT_CONTROL, the effects are only visible in the
kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in
vcpu_vmx_run().  Using the atomic lists is undesirable as they are more
expensive than direct RDMSR/WRMSR.

Furthermore, even if giving the guest control of the MSR is legitimate,
e.g. in pass-through scenarios, it's not clear that the benefits would
outweigh the overhead.  E.g. saving and restoring an MSR across a VMX
roundtrip costs ~250 cycles, and if the guest diverged from the host
that cost would be paid on every run of the guest.  In other words, if
there is a legitimate use case then it should be enabled by a new
per-VM capability.

Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can
correctly expose other WAITPKG features to the guest, e.g. TPAUSE,
UMWAIT and UMONITOR.

Fixes: 6e3ba4abce ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
Cc: stable@vger.kernel.org
Cc: Jingqi Liu <jingqi.liu@intel.com>
Cc: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200623005135.10414-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22 20:54:57 -04:00
Sean Christopherson
2dbebf7ae1 KVM: nVMX: Plumb L2 GPA through to PML emulation
Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all
intents and purposes is vmx_write_pml_buffer(), instead of having the
latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS.  If the dirty bit
update is the result of KVM emulation (rare for L2), then the GPA in the
VMCS may be stale and/or hold a completely unrelated GPA.

Fixes: c5f983f6e8 ("nVMX: Implement emulated Page Modification Logging")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22 18:23:03 -04:00
Paolo Bonzini
44d5271707 KVM: LAPIC: ensure APIC map is up to date on concurrent update requests
The following race can cause lost map update events:

         cpu1                            cpu2

                                apic_map_dirty = true
  ------------------------------------------------------------
                                kvm_recalculate_apic_map:
                                     pass check
                                         mutex_lock(&kvm->arch.apic_map_lock);
                                         if (!kvm->arch.apic_map_dirty)
                                     and in process of updating map
  -------------------------------------------------------------
    other calls to
       apic_map_dirty = true         might be too late for affected cpu
  -------------------------------------------------------------
                                     apic_map_dirty = false
  -------------------------------------------------------------
    kvm_recalculate_apic_map:
    bail out on
      if (!kvm->arch.apic_map_dirty)

To fix it, record the beginning of an update of the APIC map in
apic_map_dirty.  If another APIC map change switches apic_map_dirty
back to DIRTY during the update, kvm_recalculate_apic_map should not
make it CLEAN, and the other caller will go through the slow path.

Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22 13:37:30 -04:00
Borislav Petkov
99e40204e0 x86/msr: Move the F15h MSRs where they belong
1068ed4547 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")

moved the three F15h power MSRs to the architectural list but that was
wrong as they belong in the family 0x15 list. That also caused:

  In file included from trace/beauty/tracepoints/x86_msr.c:10:
  perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: error: initialized field overwritten [-Werror=override-init]
    292 |  [0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
        |                                             ^~~~~~~~~~~
  perf/trace/beauty/generated/x86_arch_MSRs_array.c:292:45: note: (near initialization for 'x86_AMD_V_KVM_MSRs[640]')

due to MSR_F15H_PTSC ending up being defined twice. Move them where they
belong and drop the duplicate.

Also, drop the respective tools/ changes of the msr-index.h copy the
above commit added because perf tool developers prefer to go through
those changes themselves in order to figure out whether changes to the
kernel headers would need additional handling in perf.

Fixes: 1068ed4547 ("x86/msr: Lift AMD family 0x15 power-specific MSRs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lkml.kernel.org/r/20200621163323.14e8533f@canb.auug.org.au
2020-06-22 17:15:53 +02:00
Benjamin Thiel
56ce93700e x86/mm/32: Fix -Wmissing prototypes warnings for init.c
Fix:

  arch/x86/mm/init.c:503:21:
  warning: no previous prototype for ‘init_memory_mapping’ [-Wmissing-prototypes]
  unsigned long __ref init_memory_mapping(unsigned long start,

  arch/x86/mm/init.c:745:13:
  warning: no previous prototype for ‘poking_init’ [-Wmissing-prototypes]
  void __init poking_init(void)

Lift init_memory_mapping() and poking_init() out of the ifdef
CONFIG_X86_64 to make the functions visible on 32-bit too.

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200606123743.3277-1-b.thiel@posteo.de
2020-06-18 18:04:00 +02:00
Andi Kleen
742c45c3ec x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
The kernel needs to explicitly enable FSGSBASE. So, the application needs
to know if it can safely use these instructions. Just looking at the CPUID
bit is not enough because it may be running in a kernel that does not
enable the instructions.

One way for the application would be to just try and catch the SIGILL.
But that is difficult to do in libraries which may not want to overwrite
the signal handlers of the main application.

Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF
aux vector. AT_HWCAP2 is already used by PPC for similar purposes.

The application can access it open coded or by using the getauxval()
function in newer versions of glibc.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1557309753-24073-18-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-14-sashal@kernel.org
2020-06-18 15:47:05 +02:00
Chang S. Bae
eaad981291 x86/entry/64: Introduce the FIND_PERCPU_BASE macro
GSBASE is used to find per-CPU data in the kernel. But when GSBASE is
unknown, the per-CPU base can be found from the per_cpu_offset table with a
CPU NR.  The CPU NR is extracted from the limit field of the CPUNODE entry
in GDT, or by the RDPID instruction. This is a prerequisite for using
FSGSBASE in the low level entry code.

Also, add the GAS-compatible RDPID macro as binutils 2.23 do not support
it. Support is added in version 2.27.

[ tglx: Massaged changelog ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1557309753-24073-12-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-11-sashal@kernel.org
2020-06-18 15:47:04 +02:00
Thomas Gleixner
6758034e4d x86/process/64: Make save_fsgs_for_kvm() ready for FSGSBASE
save_fsgs_for_kvm() is invoked via

  vcpu_enter_guest()
    kvm_x86_ops.prepare_guest_switch(vcpu)
      vmx_prepare_switch_to_guest()
        save_fsgs_for_kvm()

with preemption disabled, but interrupts enabled.

The upcoming FSGSBASE based GS safe needs interrupts to be disabled. This
could be done in the helper function, but that function is also called from
switch_to() which has interrupts disabled already.

Disable interrupts inside save_fsgs_for_kvm() and rename the function to
current_save_fsgs() so it can be invoked from other places.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200528201402.1708239-7-sashal@kernel.org
2020-06-18 15:47:01 +02:00
Chang S. Bae
58edfd2e0a x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
Add cpu feature conditional FSGSBASE access to the relevant helper
functions. That allows to accelerate certain FS/GS base operations in
subsequent changes.

Note, that while possible, the user space entry/exit GSBASE operations are
not going to use the new FSGSBASE instructions. The reason is that it would
require additional storage for the user space value which adds more
complexity to the low level code and experiments have shown marginal
benefit. This may be revisited later but for now the SWAPGS based handling
in the entry code is preserved except for the paranoid entry/exit code.

To preserve the SWAPGS entry mechanism introduce __[rd|wr]gsbase_inactive()
helpers. Note, for Xen PV, paravirt hooks can be added later as they might
allow a very efficient but different implementation.

[ tglx: Massaged changelog, convert it to noinstr and force inline
  	native_swapgs() ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1557309753-24073-7-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-5-sashal@kernel.org
2020-06-18 15:47:00 +02:00
Andi Kleen
b15378ca50 x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
  make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1557309753-24073-6-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-4-sashal@kernel.org
2020-06-18 15:47:00 +02:00
Brian Gerst
c9a1ff316b x86/stackprotector: Pre-initialize canary for secondary CPUs
The idle tasks created for each secondary CPU already have a random stack
canary generated by fork().  Copy the canary to the percpu variable before
starting the secondary CPU which removes the need to call
boot_init_stack_canary().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200617225624.799335-1-brgerst@gmail.com
2020-06-18 13:09:17 +02:00
Christoph Hellwig
fe557319aa maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault
Better describe what these functions do.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17 10:57:41 -07:00
Benjamin Thiel
d5249bc7a1 x86/mm: Fix -Wmissing-prototypes warnings for arch/x86/mm/init.c
Fix -Wmissing-prototypes warnings:

  arch/x86/mm/init.c:81:6:
  warning: no previous prototype for ‘x86_has_pat_wp’ [-Wmissing-prototypes]
  bool x86_has_pat_wp(void)

  arch/x86/mm/init.c:86:22:
  warning: no previous prototype for ‘pgprot2cachemode’ [-Wmissing-prototypes]
  enum page_cache_mode pgprot2cachemode(pgprot_t pgprot)

by including the respective header containing prototypes. Also fix:

  arch/x86/mm/init.c:893:13:
  warning: no previous prototype for ‘mem_encrypt_free_decrypted_mem’ [-Wmissing-prototypes]
  void __weak mem_encrypt_free_decrypted_mem(void) { }

by making it static inline for the !CONFIG_AMD_MEM_ENCRYPT case. This
warning happens when CONFIG_AMD_MEM_ENCRYPT is not enabled (defconfig
for example):

  ./arch/x86/include/asm/mem_encrypt.h:80:27:
  warning: inline function ‘mem_encrypt_free_decrypted_mem’ declared weak [-Wattributes]
  static inline void __weak mem_encrypt_free_decrypted_mem(void) { }
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It's ok to convert to static inline because the function is used only in
x86. Is not shared with other architectures so drop the __weak too.

 [ bp: Massage and adjust __weak comments while at it. ]

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200606122629.2720-1-b.thiel@posteo.de
2020-06-17 10:45:46 +02:00
Borislav Petkov
28b60197b5 x86/asm: Unify __ASSEMBLY__ blocks
Merge the two ifndef __ASSEMBLY__ blocks.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200604133204.7636-1-bp@alien8.de
2020-06-15 19:29:36 +02:00
Borislav Petkov
fbd5969d1f x86/cpufeatures: Mark two free bits in word 3
... so that they get reused when needed.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200604104150.2056-1-bp@alien8.de
2020-06-15 19:26:23 +02:00
Borislav Petkov
1068ed4547 x86/msr: Lift AMD family 0x15 power-specific MSRs
... into the global msr-index.h header because they're used in multiple
compilation units. Sort the MSR list a bit. Update the msr-index.h copy
in tools.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lkml.kernel.org/r/20200608164847.14232-1-bp@alien8.de
2020-06-15 19:25:53 +02:00
Sean Christopherson
5d5103595e x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup
Reinitialize IA32_FEAT_CTL on the BSP during wakeup to handle the case
where firmware doesn't initialize or save/restore across S3.  This fixes
a bug where IA32_FEAT_CTL is left uninitialized and results in VMXON
taking a #GP due to VMX not being fully enabled, i.e. breaks KVM.

Use init_ia32_feat_ctl() to "restore" IA32_FEAT_CTL as it already deals
with the case where the MSR is locked, and because APs already redo
init_ia32_feat_ctl() during suspend by virtue of the SMP boot flow being
used to reinitialize APs upon wakeup.  Do the call in the early wakeup
flow to avoid dependencies in the syscore_ops chain, e.g. simply adding
a resume hook is not guaranteed to work, as KVM does VMXON in its own
resume hook, kvm_resume(), when KVM has active guests.

Fixes: 21bd3467a5 ("KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR")
Reported-by: Brad Campbell <lists2009@fnarfbargle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Brad Campbell <lists2009@fnarfbargle.com>
Cc: stable@vger.kernel.org # v5.6
Link: https://lkml.kernel.org/r/20200608174134.11157-1-sean.j.christopherson@intel.com
2020-06-15 14:18:37 +02:00
Peter Zijlstra
8e8bb06d19 x86/entry, bug: Comment the instrumentation_begin() usage for WARN()
Explain the rationale for annotating WARN(), even though, strictly
speaking printk() and friends are very much not safe in many of the
places we put them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-06-15 14:10:10 +02:00
Peter Zijlstra
14d3b376b6 x86/entry, cpumask: Provide non-instrumented variant of cpu_is_offline()
vmlinux.o: warning: objtool: exc_nmi()+0x12: call to cpumask_test_cpu.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: mce_check_crashing_cpu()+0x12: call to cpumask_test_cpu.constprop.0()leaves .noinstr.text section

  cpumask_test_cpu()
    test_bit()
      instrument_atomic_read()
      arch_test_bit()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-06-15 14:10:09 +02:00
Peter Zijlstra
e825873366 x86, kcsan: Remove __no_kcsan_or_inline usage
Now that KCSAN relies on -tsan-distinguish-volatile we no longer need
the annotation for constant_test_bit(). Remove it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-06-15 14:10:08 +02:00
Giovanni Gherdovich
e2b0d619b4 x86, sched: check for counters overflow in frequency invariant accounting
The product mcnt * arch_max_freq_ratio can overflows u64.

For context, a large value for arch_max_freq_ratio would be 5000,
corresponding to a turbo_freq/base_freq ratio of 5 (normally it's more like
1500-2000). A large increment frequency for the MPERF counter would be 5GHz
(the base clock of all CPUs on the market today is less than that). With
these figures, a CPU would need to go without a scheduler tick for around 8
days for the u64 overflow to happen. It is unlikely, but the check is
warranted.

Under similar conditions, the difference acnt of two consecutive APERF
readings can overflow as well.

In these circumstances is appropriate to disable frequency invariant
accounting: the feature relies on measures of the clock frequency done at
every scheduler tick, which need to be "fresh" to be at all meaningful.

A note on i386: prior to version 5.1, the GCC compiler didn't have the
builtin function __builtin_mul_overflow. In these GCC versions the macro
check_mul_overflow needs __udivdi3() to do (u64)a/b, which the kernel
doesn't provide. For this reason this change fails to build on i386 if
GCC<5.1, and we protect the entire frequency invariant code behind
CONFIG_X86_64 (special thanks to "kbuild test robot" <lkp@intel.com>).

Fixes: 1567c3e346 ("x86, sched: Add support for frequency invariance")
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200531182453.15254-2-ggherdovich@suse.cz
2020-06-15 14:10:02 +02:00
Oleg Nesterov
3dc167ba57 sched/cputime: Improve cputime_adjust()
People report that utime and stime from /proc/<pid>/stat become very
wrong when the numbers are big enough, especially if you watch these
counters incrementally.

Specifically, the current implementation of: stime*rtime/total,
results in a saw-tooth function on top of the desired line, where the
teeth grow in size the larger the values become. IOW, it has a
relative error.

The result is that, when watching incrementally as time progresses
(for large values), we'll see periods of pure stime or utime increase,
irrespective of the actual ratio we're striving for.

Replace scale_stime() with a math64.h helper: mul_u64_u64_div_u64()
that is far more accurate. This also allows architectures to override
the implementation -- for instance they can opt for the old algorithm
if this new one turns out to be too expensive for them.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200519172506.GA317395@hirez.programming.kicks-ass.net
2020-06-15 14:10:00 +02:00
Adrian Hunter
3e46bb40af perf/x86: Add perf text poke events for kprobes
Add perf text poke events for kprobes. That includes:

 - the replaced instruction(s) which are executed out-of-line
   i.e. arch_copy_kprobe() and arch_remove_kprobe()

 - the INT3 that activates the kprobe
   i.e. arch_arm_kprobe() and arch_disarm_kprobe()

 - optimised kprobe function
   i.e. arch_prepare_optimized_kprobe() and
      __arch_remove_optimized_kprobe()

 - optimised kprobe
   i.e. arch_optimize_kprobes() and arch_unoptimize_kprobe()

Resulting in 8 possible text_poke events:

 0:  NULL -> probe.ainsn.insn (if ainsn.boostable && !kp.post_handler)
					arch_copy_kprobe()

 1:  old0 -> INT3			arch_arm_kprobe()

 // boosted kprobe active

 2:  NULL -> optprobe_trampoline	arch_prepare_optimized_kprobe()

 3:  INT3,old1,old2,old3,old4 -> JMP32	arch_optimize_kprobes()

 // optprobe active

 4:  JMP32 -> INT3,old1,old2,old3,old4

 // optprobe disabled and kprobe active (this sometimes goes back to 3)
					arch_unoptimize_kprobe()

 5:  optprobe_trampoline -> NULL	arch_remove_optimized_kprobe()

 // boosted kprobe active

 6:  INT3 -> old0			arch_disarm_kprobe()

 7:  probe.ainsn.insn -> NULL (if ainsn.boostable && !kp.post_handler)
					arch_remove_kprobe()

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20200512121922.8997-6-adrian.hunter@intel.com
2020-06-15 14:09:49 +02:00
Vitaly Kuznetsov
b1d405751c KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
KVM now supports using interrupt for 'page ready' APF event delivery and
legacy mechanism was deprecated. Switch KVM guests to the new one.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-9-vkuznets@redhat.com>
[Use HYPERVISOR_CALLBACK_VECTOR instead of a separate vector. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-15 07:46:49 -04:00
Linus Torvalds
a9429089d3 RAS updates from Borislav Petkov:
* Unmap a whole guest page if an MCE is encountered in it to avoid
     follow-on MCEs leading to the guest crashing, by Tony Luck.
 
     This change collided with the entry changes and the merge resolution
     would have been rather unpleasant. To avoid that the entry branch was
     merged in before applying this. The resulting code did not change
     over the rebase.
 
   * AMD MCE error thresholding machinery cleanup and hotplug sanitization, by
     Thomas Gleixner.
 
   * Change the MCE notifiers to denote whether they have handled the error
     and not break the chain early by returning NOTIFY_STOP, thus giving the
     opportunity for the later handlers in the chain to see it. By Tony Luck.
 
   * Add AMD family 0x17, models 0x60-6f support, by Alexander Monakov.
 
   * Last but not least, the usual round of fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7j5m0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXyMD/9GneajFaI5D0F59/btEGAx1X0PTDz1
 LrGf79Y5NqSJrzggsnrdFzsGjJNcQ2KbfSgs9fhdsvvvIpK+YqZ+rVFAg7DcKc2n
 RwHd+X3TluKsc4oCuagZli7R4HHO5P9hbkHY6DD++F0eeMblLhNnq1hGUSdoENHN
 HFsZapQpvlpn3IYN1e07lFBVvujRL/pBez7tmhh6bPxmcLZFCBrIHuAXz7dbzz0Y
 BjhVRLNq6+9Yztvrt8uIgc1EAoMfprkY6nVtvkxC5gmVor3orkRC4rRNc/+jhgDK
 p0s1JxDgb3SNN79no9wvQaqRNs/rNlAx6xSA0gmW+SbxrFEsk6cUp1BVVRr031dk
 /QGedvpJzK7PjCX+d7Jvy+391q1YEsdnbQhXRdjSXQf+DihWm98O++wDodw9kgwt
 FgkZD4qICT3xtpGs1bqDgrm220g8d27nGjsXlvFfyVYAQAlE2vcx0NqySOTT7NeT
 Zu6GIvGcGCObJT2JTWbPkvbm2aNYXzYNZGRBLlEzy7qFXuVG4aKR6W1L6uSW3SmK
 UUo/F3KHgZWM/h1PyMbxzAvu60eojBcEXva8jDxBv0GCDJhzFV3yOVdgxrLPpGcZ
 7EqiUtTrxvxGOFjpFFaZRiT0R89ZfvOxVyXGwMX8zph9NyPLSj9MspyQSkhFFREz
 0FAfy/7wqDfMRg==
 =iWiy
 -----END PGP SIGNATURE-----

Merge tag 'ras-core-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Thomas Gleixner:
 "RAS updates from Borislav Petkov:

   - Unmap a whole guest page if an MCE is encountered in it to avoid
     follow-on MCEs leading to the guest crashing, by Tony Luck.

     This change collided with the entry changes and the merge
     resolution would have been rather unpleasant. To avoid that the
     entry branch was merged in before applying this. The resulting code
     did not change over the rebase.

   - AMD MCE error thresholding machinery cleanup and hotplug
     sanitization, by Thomas Gleixner.

   - Change the MCE notifiers to denote whether they have handled the
     error and not break the chain early by returning NOTIFY_STOP, thus
     giving the opportunity for the later handlers in the chain to see
     it. By Tony Luck.

   - Add AMD family 0x17, models 0x60-6f support, by Alexander Monakov.

   - Last but not least, the usual round of fixes and improvements"

* tag 'ras-core-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/mce/dev-mcelog: Fix -Wstringop-truncation warning about strncpy()
  x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned
  EDAC/amd64: Add AMD family 17h model 60h PCI IDs
  hwmon: (k10temp) Add AMD family 17h model 60h PCI match
  x86/amd_nb: Add AMD family 17h model 60h PCI IDs
  x86/mcelog: Add compat_ioctl for 32-bit mcelog support
  x86/mce: Drop bogus comment about mce.kflags
  x86/mce: Fixup exception only for the correct MCEs
  EDAC: Drop the EDAC report status checks
  x86/mce: Add mce=print_all option
  x86/mce: Change default MCE logger to check mce->kflags
  x86/mce: Fix all mce notifiers to update the mce->kflags bitmask
  x86/mce: Add a struct mce.kflags field
  x86/mce: Convert the CEC to use the MCE notifier
  x86/mce: Rename "first" function as "early"
  x86/mce/amd, edac: Remove report_gart_errors
  x86/mce/amd: Make threshold bank setting hotplug robust
  x86/mce/amd: Cleanup threshold device remove path
  x86/mce/amd: Straighten CPU hotplug path
  x86/mce/amd: Sanitize thresholding device creation hotplug path
  ...
2020-06-13 10:21:00 -07:00
Linus Torvalds
076f14be7f The X86 entry, exception and interrupt code rework
This all started about 6 month ago with the attempt to move the Posix CPU
 timer heavy lifting out of the timer interrupt code and just have lockless
 quick checks in that code path. Trivial 5 patches.
 
 This unearthed an inconsistency in the KVM handling of task work and the
 review requested to move all of this into generic code so other
 architectures can share.
 
 Valid request and solved with another 25 patches but those unearthed
 inconsistencies vs. RCU and instrumentation.
 
 Digging into this made it obvious that there are quite some inconsistencies
 vs. instrumentation in general. The int3 text poke handling in particular
 was completely unprotected and with the batched update of trace events even
 more likely to expose to endless int3 recursion.
 
 In parallel the RCU implications of instrumenting fragile entry code came
 up in several discussions.
 
 The conclusion of the X86 maintainer team was to go all the way and make
 the protection against any form of instrumentation of fragile and dangerous
 code pathes enforcable and verifiable by tooling.
 
 A first batch of preparatory work hit mainline with commit d5f744f9a2.
 
 The (almost) full solution introduced a new code section '.noinstr.text'
 into which all code which needs to be protected from instrumentation of all
 sorts goes into. Any call into instrumentable code out of this section has
 to be annotated. objtool has support to validate this. Kprobes now excludes
 this section fully which also prevents BPF from fiddling with it and all
 'noinstr' annotated functions also keep ftrace off. The section, kprobes
 and objtool changes are already merged.
 
 The major changes coming with this are:
 
     - Preparatory cleanups
 
     - Annotating of relevant functions to move them into the noinstr.text
       section or enforcing inlining by marking them __always_inline so the
       compiler cannot misplace or instrument them.
 
     - Splitting and simplifying the idtentry macro maze so that it is now
       clearly separated into simple exception entries and the more
       interesting ones which use interrupt stacks and have the paranoid
       handling vs. CR3 and GS.
 
     - Move quite some of the low level ASM functionality into C code:
 
        - enter_from and exit to user space handling. The ASM code now calls
          into C after doing the really necessary ASM handling and the return
 	 path goes back out without bells and whistels in ASM.
 
        - exception entry/exit got the equivivalent treatment
 
        - move all IRQ tracepoints from ASM to C so they can be placed as
          appropriate which is especially important for the int3 recursion
          issue.
 
     - Consolidate the declaration and definition of entry points between 32
       and 64 bit. They share a common header and macros now.
 
     - Remove the extra device interrupt entry maze and just use the regular
       exception entry code.
 
     - All ASM entry points except NMI are now generated from the shared header
       file and the corresponding macros in the 32 and 64 bit entry ASM.
 
     - The C code entry points are consolidated as well with the help of
       DEFINE_IDTENTRY*() macros. This allows to ensure at one central point
       that all corresponding entry points share the same semantics. The
       actual function body for most entry points is in an instrumentable
       and sane state.
 
       There are special macros for the more sensitive entry points,
       e.g. INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
       They allow to put the whole entry instrumentation and RCU handling
       into safe places instead of the previous pray that it is correct
       approach.
 
     - The INT3 text poke handling is now completely isolated and the
       recursion issue banned. Aside of the entry rework this required other
       isolation work, e.g. the ability to force inline bsearch.
 
     - Prevent #DB on fragile entry code, entry relevant memory and disable
       it on NMI, #MC entry, which allowed to get rid of the nested #DB IST
       stack shifting hackery.
 
     - A few other cleanups and enhancements which have been made possible
       through this and already merged changes, e.g. consolidating and
       further restricting the IDT code so the IDT table becomes RO after
       init which removes yet another popular attack vector
 
     - About 680 lines of ASM maze are gone.
 
 There are a few open issues:
 
    - An escape out of the noinstr section in the MCE handler which needs
      some more thought but under the aspect that MCE is a complete
      trainwreck by design and the propability to survive it is low, this was
      not high on the priority list.
 
    - Paravirtualization
 
      When PV is enabled then objtool complains about a bunch of indirect
      calls out of the noinstr section. There are a few straight forward
      ways to fix this, but the other issues vs. general correctness were
      more pressing than parawitz.
 
    - KVM
 
      KVM is inconsistent as well. Patches have been posted, but they have
      not yet been commented on or picked up by the KVM folks.
 
    - IDLE
 
      Pretty much the same problems can be found in the low level idle code
      especially the parts where RCU stopped watching. This was beyond the
      scope of the more obvious and exposable problems and is on the todo
      list.
 
 The lesson learned from this brain melting exercise to morph the evolved
 code base into something which can be validated and understood is that once
 again the violation of the most important engineering principle
 "correctness first" has caused quite a few people to spend valuable time on
 problems which could have been avoided in the first place. The "features
 first" tinkering mindset really has to stop.
 
 With that I want to say thanks to everyone involved in contributing to this
 effort. Special thanks go to the following people (alphabetical order):
 
    Alexandre Chartre
    Andy Lutomirski
    Borislav Petkov
    Brian Gerst
    Frederic Weisbecker
    Josh Poimboeuf
    Juergen Gross
    Lai Jiangshan
    Macro Elver
    Paolo Bonzini
    Paul McKenney
    Peter Zijlstra
    Vitaly Kuznetsov
    Will Deacon
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7j510THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoU2WD/4refvaNm08fG7aiVYem3JJzr0+Pq5O
 /opwnI/1D973ApApj5W/Nd53sN5tVqOiXncSKgywRBWZxRCAGjVYypl9rjpvXu4l
 HlMjhEKBmWkDryxxrM98Vr7hl3hnId5laR56oFfH+G4LUsItaV6Uak/HfXZ4Mq1k
 iYVbEtl2CN+KJjvSgZ6Y1l853Ab5mmGvmeGNHHWCj8ZyjF3cOLoelDTQNnsb0wXM
 crKXBcXJSsCWKYyJ5PTvB82crQCET7Su+LgwK06w/ZbW1//2hVIjSCiN5o/V+aRJ
 06BZNMj8v9tfglkN8LEQvRIjTlnEQ2sq3GxbrVtA53zxkzbBCBJQ96w8yYzQX0ux
 yhqQ/aIZJ1wTYEjJzSkftwLNMRHpaOUnKvJndXRKAYi+eGI7syF61qcZSYGKuAQ/
 bK3b/CzU6QWr1235oTADxh4isEwxA0Pg5wtJCfDDOG0MJ9ALMSOGUkhoiz5EqpkU
 mzFAwfG/Uj7hRjlkms7Yj2OjZfnU7iypj63GgpXghLjr5ksRFKEOMw8e1GXltVHs
 zzwghUjqp2EPq0VOOQn3lp9lol5Prc3xfFHczKpO+CJW6Rpa4YVdqJmejBqJy/on
 Hh/T/ST3wa2qBeAw89vZIeWiUJZZCsQ0f//+2hAbzJY45Y6DuR9vbTAPb9agRgOM
 xg+YaCfpQqFc1A==
 =llba
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry updates from Thomas Gleixner:
 "The x86 entry, exception and interrupt code rework

  This all started about 6 month ago with the attempt to move the Posix
  CPU timer heavy lifting out of the timer interrupt code and just have
  lockless quick checks in that code path. Trivial 5 patches.

  This unearthed an inconsistency in the KVM handling of task work and
  the review requested to move all of this into generic code so other
  architectures can share.

  Valid request and solved with another 25 patches but those unearthed
  inconsistencies vs. RCU and instrumentation.

  Digging into this made it obvious that there are quite some
  inconsistencies vs. instrumentation in general. The int3 text poke
  handling in particular was completely unprotected and with the batched
  update of trace events even more likely to expose to endless int3
  recursion.

  In parallel the RCU implications of instrumenting fragile entry code
  came up in several discussions.

  The conclusion of the x86 maintainer team was to go all the way and
  make the protection against any form of instrumentation of fragile and
  dangerous code pathes enforcable and verifiable by tooling.

  A first batch of preparatory work hit mainline with commit
  d5f744f9a2 ("Pull x86 entry code updates from Thomas Gleixner")

  That (almost) full solution introduced a new code section
  '.noinstr.text' into which all code which needs to be protected from
  instrumentation of all sorts goes into. Any call into instrumentable
  code out of this section has to be annotated. objtool has support to
  validate this.

  Kprobes now excludes this section fully which also prevents BPF from
  fiddling with it and all 'noinstr' annotated functions also keep
  ftrace off. The section, kprobes and objtool changes are already
  merged.

  The major changes coming with this are:

    - Preparatory cleanups

    - Annotating of relevant functions to move them into the
      noinstr.text section or enforcing inlining by marking them
      __always_inline so the compiler cannot misplace or instrument
      them.

    - Splitting and simplifying the idtentry macro maze so that it is
      now clearly separated into simple exception entries and the more
      interesting ones which use interrupt stacks and have the paranoid
      handling vs. CR3 and GS.

    - Move quite some of the low level ASM functionality into C code:

       - enter_from and exit to user space handling. The ASM code now
         calls into C after doing the really necessary ASM handling and
         the return path goes back out without bells and whistels in
         ASM.

       - exception entry/exit got the equivivalent treatment

       - move all IRQ tracepoints from ASM to C so they can be placed as
         appropriate which is especially important for the int3
         recursion issue.

    - Consolidate the declaration and definition of entry points between
      32 and 64 bit. They share a common header and macros now.

    - Remove the extra device interrupt entry maze and just use the
      regular exception entry code.

    - All ASM entry points except NMI are now generated from the shared
      header file and the corresponding macros in the 32 and 64 bit
      entry ASM.

    - The C code entry points are consolidated as well with the help of
      DEFINE_IDTENTRY*() macros. This allows to ensure at one central
      point that all corresponding entry points share the same
      semantics. The actual function body for most entry points is in an
      instrumentable and sane state.

      There are special macros for the more sensitive entry points, e.g.
      INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
      They allow to put the whole entry instrumentation and RCU handling
      into safe places instead of the previous pray that it is correct
      approach.

    - The INT3 text poke handling is now completely isolated and the
      recursion issue banned. Aside of the entry rework this required
      other isolation work, e.g. the ability to force inline bsearch.

    - Prevent #DB on fragile entry code, entry relevant memory and
      disable it on NMI, #MC entry, which allowed to get rid of the
      nested #DB IST stack shifting hackery.

    - A few other cleanups and enhancements which have been made
      possible through this and already merged changes, e.g.
      consolidating and further restricting the IDT code so the IDT
      table becomes RO after init which removes yet another popular
      attack vector

    - About 680 lines of ASM maze are gone.

  There are a few open issues:

   - An escape out of the noinstr section in the MCE handler which needs
     some more thought but under the aspect that MCE is a complete
     trainwreck by design and the propability to survive it is low, this
     was not high on the priority list.

   - Paravirtualization

     When PV is enabled then objtool complains about a bunch of indirect
     calls out of the noinstr section. There are a few straight forward
     ways to fix this, but the other issues vs. general correctness were
     more pressing than parawitz.

   - KVM

     KVM is inconsistent as well. Patches have been posted, but they
     have not yet been commented on or picked up by the KVM folks.

   - IDLE

     Pretty much the same problems can be found in the low level idle
     code especially the parts where RCU stopped watching. This was
     beyond the scope of the more obvious and exposable problems and is
     on the todo list.

  The lesson learned from this brain melting exercise to morph the
  evolved code base into something which can be validated and understood
  is that once again the violation of the most important engineering
  principle "correctness first" has caused quite a few people to spend
  valuable time on problems which could have been avoided in the first
  place. The "features first" tinkering mindset really has to stop.

  With that I want to say thanks to everyone involved in contributing to
  this effort. Special thanks go to the following people (alphabetical
  order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian
  Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai
  Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra,
  Vitaly Kuznetsov, and Will Deacon"

* tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits)
  x86/entry: Force rcu_irq_enter() when in idle task
  x86/entry: Make NMI use IDTENTRY_RAW
  x86/entry: Treat BUG/WARN as NMI-like entries
  x86/entry: Unbreak __irqentry_text_start/end magic
  x86/entry: __always_inline CR2 for noinstr
  lockdep: __always_inline more for noinstr
  x86/entry: Re-order #DB handler to avoid *SAN instrumentation
  x86/entry: __always_inline arch_atomic_* for noinstr
  x86/entry: __always_inline irqflags for noinstr
  x86/entry: __always_inline debugreg for noinstr
  x86/idt: Consolidate idt functionality
  x86/idt: Cleanup trap_init()
  x86/idt: Use proper constants for table size
  x86/idt: Add comments about early #PF handling
  x86/idt: Mark init only functions __init
  x86/entry: Rename trace_hardirqs_off_prepare()
  x86/entry: Clarify irq_{enter,exit}_rcu()
  x86/entry: Remove DBn stacks
  x86/entry: Remove debug IDT frobbing
  x86/entry: Optimize local_db_save() for virt
  ...
2020-06-13 10:05:47 -07:00
Linus Torvalds
52cd0d972f MIPS:
- Loongson port
 
 PPC:
 - Fixes
 
 ARM:
 - Fixes
 
 x86:
 - KVM_SET_USER_MEMORY_REGION optimizations
 - Fixes
 - Selftest fixes
 
 The guest side of the asynchronous page fault work has been delayed to 5.9
 in order to sync with Thomas's interrupt entry rework.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7icj4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPHGQgAj9+5j+f5v06iMP/+ponWwsVfh+5/
 UR1gPbpMSFMKF0U+BCFxsBeGKWPDiz9QXaLfy6UGfOFYBI475Su5SoZ8/i/o6a2V
 QjcKIJxBRNs66IG/774pIpONY8/mm/3b6vxmQktyBTqjb6XMGlOwoGZixj/RTp85
 +uwSICxMlrijg+fhFMwC4Bo/8SFg+FeBVbwR07my88JaLj+3cV/NPolG900qLSa6
 uPqJ289EQ86LrHIHXCEWRKYvwy77GFsmBYjKZH8yXpdzUlSGNexV8eIMAz50figu
 wYRJGmHrRqwuzFwEGknv8SA3s2HVggXO4WVkWWCeJyO8nIVfYFUhME5l6Q==
 =+Hh0
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "The guest side of the asynchronous page fault work has been delayed to
  5.9 in order to sync with Thomas's interrupt entry rework, but here's
  the rest of the KVM updates for this merge window.

  MIPS:
   - Loongson port

  PPC:
   - Fixes

  ARM:
   - Fixes

  x86:
   - KVM_SET_USER_MEMORY_REGION optimizations
   - Fixes
   - Selftest fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits)
  KVM: x86: do not pass poisoned hva to __kvm_set_memory_region
  KVM: selftests: fix sync_with_host() in smm_test
  KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
  KVM: async_pf: Cleanup kvm_setup_async_pf()
  kvm: i8254: remove redundant assignment to pointer s
  KVM: x86: respect singlestep when emulating instruction
  KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported
  KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check
  KVM: nVMX: Consult only the "basic" exit reason when routing nested exit
  KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h
  KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
  KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
  KVM: arm64: Remove host_cpu_context member from vcpu structure
  KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr
  KVM: arm64: Handle PtrAuth traps early
  KVM: x86: Unexport x86_fpu_cache and make it static
  KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K
  KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
  KVM: arm64: Stop save/restoring ACTLR_EL1
  KVM: arm64: Add emulation for 32bit guests accessing ACTLR2
  ...
2020-06-12 11:05:52 -07:00
Thomas Gleixner
71ed49d8fb x86/entry: Make NMI use IDTENTRY_RAW
For no reason other than beginning brainmelt, IDTENTRY_NMI was mapped to
IDTENTRY_IST.

This is not a problem on 64bit because the IST default entry point maps to
IDTENTRY_RAW which does not any entry handling. The surplus function
declaration for the noist C entry point is unused and as there is no ASM
code emitted for NMI this went unnoticed.

On 32bit IDTENTRY_IST maps to a regular IDTENTRY which does the normal
entry handling. That is clearly the wrong thing to do for NMI.

Map it to IDTENTRY_RAW to unbreak it. The IDTENTRY_NMI mapping needs to
stay to avoid emitting ASM code.

Fixes: 6271fef00b ("x86/entry: Convert NMI to IDTENTRY_NMI")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Debugged-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/CA+G9fYvF3cyrY+-iw_SZtpN-i2qA2BruHg4M=QYECU2-dNdsMw@mail.gmail.com
2020-06-12 14:15:48 +02:00
Andy Lutomirski
15a416e8aa x86/entry: Treat BUG/WARN as NMI-like entries
BUG/WARN are cleverly optimized using UD2 to handle the BUG/WARN out of
line in an exception fixup.

But if BUG or WARN is issued in a funny RCU context, then the
idtentry_enter...() path might helpfully WARN that the RCU context is
invalid, which results in infinite recursion.

Split the BUG/WARN handling into an nmi_enter()/nmi_exit() path in
exc_invalid_op() to increase the chance to survive the experience.

[ tglx: Make the declaration match the implementation ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/f8fe40e0088749734b4435b554f73eee53dcf7a8.1591932307.git.luto@kernel.org
2020-06-12 12:12:57 +02:00
Linus Torvalds
b791d1bdf9 The Kernel Concurrency Sanitizer (KCSAN)
KCSAN is a dynamic race detector, which relies on compile-time
 instrumentation, and uses a watchpoint-based sampling approach to detect
 races.
 
 The feature was under development for quite some time and has already found
 legitimate bugs.
 
 Unfortunately it comes with a limitation, which was only understood late in
 the development cycle:
 
   It requires an up to date CLANG-11 compiler
 
 CLANG-11 is not yet released (scheduled for June), but it's the only
 compiler today which handles the kernel requirements and especially the
 annotations of functions to exclude them from KCSAN instrumentation
 correctly.
 
 These annotations really need to work so that low level entry code and
 especially int3 text poke handling can be completely isolated.
 
 A detailed discussion of the requirements and compiler issues can be found
 here:
 
   https://lore.kernel.org/lkml/CANpmjNMTsY_8241bS7=XAfqvZHFLrVEkv_uM4aDUWE_kh3Rvbw@mail.gmail.com/
 
 We came to the conclusion that trying to work around compiler limitations
 and bugs again would end up in a major trainwreck, so requiring a working
 compiler seemed to be the best choice.
 
 For Continous Integration purposes the compiler restriction is manageable
 and that's where most xxSAN reports come from.
 
 For a change this limitation might make GCC people actually look at their
 bugs. Some issues with CSAN in GCC are 7 years old and one has been 'fixed'
 3 years ago with a half baken solution which 'solved' the reported issue
 but not the underlying problem.
 
 The KCSAN developers also ponder to use a GCC plugin to become independent,
 but that's not something which will show up in a few days.
 
 Blocking KCSAN until wide spread compiler support is available is not a
 really good alternative because the continuous growth of lockless
 optimizations in the kernel demands proper tooling support.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7im98THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoQ3xD/9+q87OmwnyoRTs6O3GDDbWZYoJGolh
 rctDOAYW8RSS73Fiw23z8hKlLl9tJCya6/X8Q9qoonB1YeIEPPRVj5HJWAMUNEIs
 YgjlZJFmh+mnbP/KQFctm3AWpoX8kqt3ncqj6zG72oQ9qKui691BY/2NmGVSLxUV
 DqtUYSKmi51XEQtZuXRuHEf3zBxoyeD43DaSCdJAXd6f5O2X7tmrWDuazHVeKzHV
 lhijvkyBvGMWvPg0IBrXkkLmeOvS0++MTGm3o+L72XF6nWpzTkcV7N0E9GEDFg45
 zwcidRVKD5d/1DoU5Tos96rCJpBEGh/wimlu0z14mcZpNiJgRQH5rzVEO9Y14UcP
 KL9FgRrb5dFw7yfX2zRQ070OFJ4AEDBMK0o5Lbu/QO5KLkvFkqnuWlQfmmtZJWCW
 DTRw/FgUgU7lvyPjRrao6HBvwy+yTb0u9K5seCOTRkuepR9nPJs0710pFiBsNCfV
 RY3cyggNBipAzgBOgLxixnq9+rHt70ton6S8Gijxpvt0dGGfO8k0wuEhFtA4zKrQ
 6HGK+pidxnoVdEgyQZhS+qzMMkyiUL0FXdaGJ2IX+/DC+Ij1UrUPjZBn7v25M0hQ
 ESkvxWKCn7snH4/NJsNxqCV1zyEc3zAW/WvLJUc9I7H8zPwtVvKWPrKEMzrJJ5bA
 aneySilbRxBFUg==
 =iplm
 -----END PGP SIGNATURE-----

Merge tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull the Kernel Concurrency Sanitizer from Thomas Gleixner:
 "The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector,
  which relies on compile-time instrumentation, and uses a
  watchpoint-based sampling approach to detect races.

  The feature was under development for quite some time and has already
  found legitimate bugs.

  Unfortunately it comes with a limitation, which was only understood
  late in the development cycle:

     It requires an up to date CLANG-11 compiler

  CLANG-11 is not yet released (scheduled for June), but it's the only
  compiler today which handles the kernel requirements and especially
  the annotations of functions to exclude them from KCSAN
  instrumentation correctly.

  These annotations really need to work so that low level entry code and
  especially int3 text poke handling can be completely isolated.

  A detailed discussion of the requirements and compiler issues can be
  found here:

    https://lore.kernel.org/lkml/CANpmjNMTsY_8241bS7=XAfqvZHFLrVEkv_uM4aDUWE_kh3Rvbw@mail.gmail.com/

  We came to the conclusion that trying to work around compiler
  limitations and bugs again would end up in a major trainwreck, so
  requiring a working compiler seemed to be the best choice.

  For Continous Integration purposes the compiler restriction is
  manageable and that's where most xxSAN reports come from.

  For a change this limitation might make GCC people actually look at
  their bugs. Some issues with CSAN in GCC are 7 years old and one has
  been 'fixed' 3 years ago with a half baken solution which 'solved' the
  reported issue but not the underlying problem.

  The KCSAN developers also ponder to use a GCC plugin to become
  independent, but that's not something which will show up in a few
  days.

  Blocking KCSAN until wide spread compiler support is available is not
  a really good alternative because the continuous growth of lockless
  optimizations in the kernel demands proper tooling support"

* tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
  compiler_types.h, kasan: Use __SANITIZE_ADDRESS__ instead of CONFIG_KASAN to decide inlining
  compiler.h: Move function attributes to compiler_types.h
  compiler.h: Avoid nested statement expression in data_race()
  compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE()
  kcsan: Update Documentation to change supported compilers
  kcsan: Remove 'noinline' from __no_kcsan_or_inline
  kcsan: Pass option tsan-instrument-read-before-write to Clang
  kcsan: Support distinguishing volatile accesses
  kcsan: Restrict supported compilers
  kcsan: Avoid inserting __tsan_func_entry/exit if possible
  ubsan, kcsan: Don't combine sanitizer with kcov on clang
  objtool, kcsan: Add kcsan_disable_current() and kcsan_enable_current_nowarn()
  kcsan: Add __kcsan_{enable,disable}_current() variants
  checkpatch: Warn about data_race() without comment
  kcsan: Use GFP_ATOMIC under spin lock
  Improve KCSAN documentation a bit
  kcsan: Make reporting aware of KCSAN tests
  kcsan: Fix function matching in report
  kcsan: Change data_race() to no longer require marking racing accesses
  kcsan: Move kcsan_{disable,enable}_current() to kcsan-checks.h
  ...
2020-06-11 18:55:43 -07:00
Linus Torvalds
9716e57a01 Peter Zijlstras rework of atomics and fallbacks. This solves two problems:
1) Compilers uninline small atomic_* static inline functions which can
      expose them to instrumentation.
 
   2) The instrumentation of atomic primitives was done at the architecture
      level while composites or fallbacks were provided at the generic level.
      As a result there are no uninstrumented variants of the fallbacks.
 
 Both issues were in the way of fully isolating fragile entry code pathes
 and especially the text poke int3 handler which is prone to an endless
 recursion problem when anything in that code path is about to be
 instrumented. This was always a problem, but got elevated due to the new
 batch mode updates of tracing.
 
 The solution is to mark the functions __always_inline and to flip the
 fallback and instrumentation so the non-instrumented variants are at the
 architecture level and the instrumentation is done in generic code.
 
 The latter introduces another fallback variant which will go away once all
 architectures have been moved over to arch_atomic_*.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7imyETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoT0wEACcI3mDiK/9hNlfnobIJTup1E8erUdY
 /EZX8yFc/FgpSSKAMROu3kswZ+rSWmBEyzTJLEtBAaYU6haAuGx77AugoDHfVkYi
 +CEJvVEpeK7fzsgu9aTb/5B6EDUo/P1fzTFjVTK1I9M9KrGLxbkGRZWYUeX3KRZd
 RskRJMbp9L4oiNJNAuIP6QKoJ7PK/sL16e8oVZSQR6WW9ZH4uDZbyfl5z0xLjI7u
 PIsFCoDu7/ig2wpOhtAYRVsL8C6EQ8mSeEUMKeM7A7UFAkVadYB8PTmEJ/QcixW+
 5R0+cnQE/3I/n0KRwfz/7p2gzILJk/cY6XJWVoAsQb990MD2ahjZJPYI4jdknjz6
 8bL/QjBq+pZwbHWOhy+IdUntIYGkyjfLKoPLdSoh+uK1kl8Jsg+AlB2lN469BV1D
 r0NltiCLggvtqXEDLV4YZqxie6H38gvOzPDbH8I6M34+WkOI2sM0D1P/Naqw/Wgl
 M1Ygx4wYG8X4zDESAYMy9tSXh5lGDIjiF6sjGTOPYWwUIeRlINfWeJkiXKnYNwv/
 qTiC8ciCxhlQcDifdyfQjT3mHNcP7YpVKp317TCtU4+WxMSrW1h2SL6m6j74dNI/
 P7/J6GKONeLRbt0ZQbQGjqHxSuu6kqUEu69aVs5W9+WjNEoJW1EW4vrJ3TeF5jLh
 0Srl4VsyDwzuXw==
 =Jkzv
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull atomics rework from Thomas Gleixner:
 "Peter Zijlstras rework of atomics and fallbacks. This solves two
  problems:

   1) Compilers uninline small atomic_* static inline functions which
      can expose them to instrumentation.

   2) The instrumentation of atomic primitives was done at the
      architecture level while composites or fallbacks were provided at
      the generic level. As a result there are no uninstrumented
      variants of the fallbacks.

  Both issues were in the way of fully isolating fragile entry code
  pathes and especially the text poke int3 handler which is prone to an
  endless recursion problem when anything in that code path is about to
  be instrumented. This was always a problem, but got elevated due to
  the new batch mode updates of tracing.

  The solution is to mark the functions __always_inline and to flip the
  fallback and instrumentation so the non-instrumented variants are at
  the architecture level and the instrumentation is done in generic
  code.

  The latter introduces another fallback variant which will go away once
  all architectures have been moved over to arch_atomic_*"

* tag 'locking-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomics: Flip fallbacks and instrumentation
  asm-generic/atomic: Use __always_inline for fallback wrappers
2020-06-11 18:27:19 -07:00
Linus Torvalds
6a45a65888 A set of fixes and updates for x86:
- Unbreak paravirt VDSO clocks. While the VDSO code was moved into lib
     for sharing a subtle check for the validity of paravirt clocks got
     replaced. While the replacement works perfectly fine for bare metal as
     the update of the VDSO clock mode is synchronous, it fails for paravirt
     clocks because the hypervisor can invalidate them asynchronous. Bring
     it back as an optional function so it does not inflict this on
     architectures which are free of PV damage.
 
   - Fix the jiffies to jiffies64 mapping on 64bit so it does not trigger
     an ODR violation on newer compilers
 
   - Three fixes for the SSBD and *IB* speculation mitigation maze to ensure
     consistency, not disabling of some *IB* variants wrongly and to prevent
     a rogue cross process shutdown of SSBD. All marked for stable.
 
   - Add yet more CPU models to the splitlock detection capable list !@#%$!
 
   - Bring the pr_info() back which tells that TSC deadline timer is enabled.
 
   - Reboot quirk for MacBook6,1
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7ie1oTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofXrEACDD0mNBU2c4vQiR+n4d41PqW1p15DM
 /wG7dYqYt2RdR6qOAspmNL5ilUP+L+eoT/86U9y0g4j3FtTREqyy6mpWE4MQzqaQ
 eKWVoeYt7l9QbR1kP4eks1CN94OyVBUPo3P78UPruWMB11iyKjyrkEdsDmRSLOdr
 6doqMFGHgowrQRwsLPFUt7b2lls6ssOSYgM/ChHi2Iga431ZuYYcRe2mNVsvqx3n
 0N7QZlJ/LivXdCmdpe3viMBsDaomiXAloKUo+HqgrCLYFXefLtfOq09U7FpddYqH
 ztxbGW/7gFn2HEbmdeaiufux263MdHtnjvdPhQZKHuyQmZzzxDNBFgOILSrBJb5y
 qLYJGhMa0sEwMBM9MMItomNgZnOITQ3WGYAdSCg3mG3jK4EXzr6aQm/Qz5SI+Cte
 bQKB2dgR53Gw/1uc7F5qMGQ2NzeUbKycT0ZbF3vkUPVh1kdU3juIntsovv2lFeBe
 Rog/rZliT1xdHrGAHRbubb2/3v66CSodMoYz0eQtr241Oz0LGwnyFqLN3qcZVLDt
 OtxHQ3bbaxevDEetJXfSh3CfHKNYMToAcszmGDse3MJxC7DL5AA51OegMa/GYOX6
 r5J99MUsEzZQoQYyXFf1MjwgxH4CQK1xBBUXYaVG65AcmhT21YbNWnCbxgf7hW+V
 hqaaUSig4V3NLw==
 =VlBk
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more x86 updates from Thomas Gleixner:
 "A set of fixes and updates for x86:

   - Unbreak paravirt VDSO clocks.

     While the VDSO code was moved into lib for sharing a subtle check
     for the validity of paravirt clocks got replaced. While the
     replacement works perfectly fine for bare metal as the update of
     the VDSO clock mode is synchronous, it fails for paravirt clocks
     because the hypervisor can invalidate them asynchronously.

     Bring it back as an optional function so it does not inflict this
     on architectures which are free of PV damage.

   - Fix the jiffies to jiffies64 mapping on 64bit so it does not
     trigger an ODR violation on newer compilers

   - Three fixes for the SSBD and *IB* speculation mitigation maze to
     ensure consistency, not disabling of some *IB* variants wrongly and
     to prevent a rogue cross process shutdown of SSBD. All marked for
     stable.

   - Add yet more CPU models to the splitlock detection capable list
     !@#%$!

   - Bring the pr_info() back which tells that TSC deadline timer is
     enabled.

   - Reboot quirk for MacBook6,1"

* tag 'x86-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Unbreak paravirt VDSO clocks
  lib/vdso: Provide sanity check for cycles (again)
  clocksource: Remove obsolete ifdef
  x86_64: Fix jiffies ODR violation
  x86/speculation: PR_SPEC_FORCE_DISABLE enforcement for indirect branches.
  x86/speculation: Prevent rogue cross-process SSBD shutdown
  x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.
  x86/cpu: Add Sapphire Rapids CPU model number
  x86/split_lock: Add Icelake microserver and Tigerlake CPU models
  x86/apic: Make TSC deadline timer detection message visible
  x86/reboot/quirks: Add MacBook6,1 reboot quirk
2020-06-11 15:54:31 -07:00
Thomas Gleixner
37d1a04b13 Rebase locking/kcsan to locking/urgent
Merge the state of the locking kcsan branch before the read/write_once()
and the atomics modifications got merged.

Squash the fallout of the rebase on top of the read/write once and atomic
fallback work into the merge. The history of the original branch is
preserved in tag locking-kcsan-2020-06-02.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-06-11 20:02:46 +02:00
Vitaly Kuznetsov
2a18b7e7cd KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
'Page not present' event may or may not get injected depending on
guest's state. If the event wasn't injected, there is no need to
inject the corresponding 'page ready' event as the guest may get
confused. E.g. Linux thinks that the corresponding 'page not present'
event wasn't delivered *yet* and allocates a 'dummy entry' for it.
This entry is never freed.

Note, 'wakeup all' events have no corresponding 'page not present'
event and always get injected.

s390 seems to always be able to inject 'page not present', the
change is effectively a nop.

Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-2-vkuznets@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-11 12:35:19 -04:00
Tony Luck
17fae1294a x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned
An interesting thing happened when a guest Linux instance took a machine
check. The VMM unmapped the bad page from guest physical space and
passed the machine check to the guest.

Linux took all the normal actions to offline the page from the process
that was using it. But then guest Linux crashed because it said there
was a second machine check inside the kernel with this stack trace:

do_memory_failure
    set_mce_nospec
         set_memory_uc
              _set_memory_uc
                   change_page_attr_set_clr
                        cpa_flush
                             clflush_cache_range_opt

This was odd, because a CLFLUSH instruction shouldn't raise a machine
check (it isn't consuming the data). Further investigation showed that
the VMM had passed in another machine check because is appeared that the
guest was accessing the bad page.

Fix is to check the scope of the poison by checking the MCi_MISC register.
If the entire page is affected, then unmap the page. If only part of the
page is affected, then mark the page as uncacheable.

This assumes that VMMs will do the logical thing and pass in the "whole
page scope" via the MCi_MISC register (since they unmapped the entire
page).

  [ bp: Adjust to x86/entry changes. ]

Fixes: 284ce4011b ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Reported-by: Jue Wang <juew@google.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jue Wang <juew@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200520163546.GA7977@agluck-desk2.amr.corp.intel.com
2020-06-11 15:19:17 +02:00
Thomas Gleixner
f77d26a9fc Merge branch 'x86/entry' into ras/core
to fixup conflicts in arch/x86/kernel/cpu/mce/core.c so MCE specific follow
up patches can be applied without creating a horrible merge conflict
afterwards.
2020-06-11 15:17:57 +02:00
Thomas Gleixner
f0178fc01f x86/entry: Unbreak __irqentry_text_start/end magic
The entry rework moved interrupt entry code from the irqentry to the
noinstr section which made the irqentry section empty.

This breaks boundary checks which rely on the __irqentry_text_start/end
markers to find out whether a function in a stack trace is
interrupt/exception entry code. This affects the function graph tracer and
filter_irq_stacks().

As the IDT entry points are all sequentialy emitted this is rather simple
to unbreak by injecting __irqentry_text_start/end as global labels.

To make this work correctly:

  - Remove the IRQENTRY_TEXT section from the x86 linker script
  - Define __irqentry so it breaks the build if it's used
  - Adjust the entry mirroring in PTI
  - Remove the redundant kprobes and unwinder bound checks

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-06-11 15:15:29 +02:00
Peter Zijlstra
2823e83a3d x86/entry: __always_inline CR2 for noinstr
vmlinux.o: warning: objtool: exc_page_fault()+0x9: call to read_cr2() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_page_fault()+0x24: call to prefetchw() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_page_fault()+0x21: call to kvm_handle_async_pf.isra.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_nmi()+0x1cc: call to write_cr2() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200603114052.243227806@infradead.org
2020-06-11 15:15:28 +02:00
Peter Zijlstra
4b281e541b x86/entry: __always_inline arch_atomic_* for noinstr
vmlinux.o: warning: objtool: rcu_dynticks_eqs_exit()+0x33: call to arch_atomic_and.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200603114052.070166551@infradead.org
2020-06-11 15:15:27 +02:00
Peter Zijlstra
7a745be1cc x86/entry: __always_inline irqflags for noinstr
vmlinux.o: warning: objtool: lockdep_hardirqs_on()+0x65: call to arch_local_save_flags() leaves .noinstr.text section
vmlinux.o: warning: objtool: lockdep_hardirqs_off()+0x5d: call to arch_local_save_flags() leaves .noinstr.text section
vmlinux.o: warning: objtool: lock_is_held_type()+0x35: call to arch_local_irq_save() leaves .noinstr.text section
vmlinux.o: warning: objtool: check_preemption_disabled()+0x31: call to arch_local_save_flags() leaves .noinstr.text section
vmlinux.o: warning: objtool: check_preemption_disabled()+0x33: call to arch_irqs_disabled_flags() leaves .noinstr.text section
vmlinux.o: warning: objtool: lock_is_held_type()+0x2f: call to native_irq_disable() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200603114052.012171668@infradead.org
2020-06-11 15:15:27 +02:00
Peter Zijlstra
28eaf87121 x86/entry: __always_inline debugreg for noinstr
vmlinux.o: warning: objtool: exc_debug()+0x21: call to native_get_debugreg() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200603114051.954401211@infradead.org
2020-06-11 15:15:26 +02:00
Thomas Gleixner
3e77abda65 x86/idt: Consolidate idt functionality
- Move load_current_idt() out of line and replace the hideous comment with
   a lockdep assert. This allows to make idt_table and idt_descr static.

 - Mark idt_table read only after the IDT initialization is complete.

 - Shuffle code around to consolidate the #ifdef sections into one.

 - Adapt the F00F bug code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200528145523.084915381@linutronix.de
2020-06-11 15:15:26 +02:00
Peter Zijlstra
59bc300b71 x86/entry: Clarify irq_{enter,exit}_rcu()
Because:

  irq_enter_rcu() includes lockdep_hardirq_enter()
  irq_exit_rcu() does *NOT* include lockdep_hardirq_exit()

Which resulted in two 'stray' lockdep_hardirq_exit() calls in
idtentry.h, and me spending a long time trying to find the matching
enter calls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.359433429@infradead.org
2020-06-11 15:15:24 +02:00
Peter Zijlstra
fd501d4f03 x86/entry: Remove DBn stacks
Both #DB itself, as all other IST users (NMI, #MC) now clear DR7 on
entry. Combined with not allowing breakpoints on entry/noinstr/NOKPROBE
text and no single step (EFLAGS.TF) inside the #DB handler should guarantee
no nested #DB.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.303027161@infradead.org
2020-06-11 15:15:23 +02:00
Peter Zijlstra
f9912ada82 x86/entry: Remove debug IDT frobbing
This is all unused now.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.245019500@infradead.org
2020-06-11 15:15:23 +02:00
Peter Zijlstra
84b6a34915 x86/entry: Optimize local_db_save() for virt
Because DRn access is 'difficult' with virt; but the DR7 read is cheaper
than a cacheline miss on native, add a virt specific fast path to
local_db_save(), such that when breakpoints are not in use to avoid
touching DRn entirely.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.187833200@infradead.org
2020-06-11 15:15:22 +02:00
Peter Zijlstra
e1de11d4d1 x86/entry: Introduce local_db_{save,restore}()
In order to allow other exceptions than #DB to disable breakpoints,
provide common helpers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.012060983@infradead.org
2020-06-11 15:15:21 +02:00
Thomas Gleixner
320100a5ff x86/entry: Remove the TRACE_IRQS cruft
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202120.523289762@linutronix.de
2020-06-11 15:15:19 +02:00
Thomas Gleixner
75da04f7f3 x86/entry: Remove the apic/BUILD interrupt leftovers
Remove all the code which was there to emit the system vector stubs. All
users are gone.

Move the now unused GET_CR2_INTO macro muck to head_64.S where the last
user is. Fixup the eye hurting comment there while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.927433002@linutronix.de
2020-06-11 15:15:16 +02:00
Thomas Gleixner
13cad9851e x86/entry: Convert reschedule interrupt to IDTENTRY_SYSVEC_SIMPLE
The scheduler IPI does not need the full interrupt entry handling logic
when the entry is from kernel mode. Use IDTENTRY_SYSVEC_SIMPLE and spare
all the overhead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.835425642@linutronix.de
2020-06-11 15:15:16 +02:00
Thomas Gleixner
cb09ea2924 x86/entry: Convert XEN hypercall vector to IDTENTRY_SYSVEC
Convert the last oldstyle defined vector to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

Fixup the related XEN code by providing the primary C entry point in x86 to
avoid cluttering the generic code with X86'isms.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.741950104@linutronix.de
2020-06-11 15:15:15 +02:00
Thomas Gleixner
a16be368dd x86/entry: Convert various hypervisor vectors to IDTENTRY_SYSVEC
Convert various hypervisor vectors to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.647997594@linutronix.de
2020-06-11 15:15:15 +02:00
Thomas Gleixner
9c3b1f4975 x86/entry: Convert KVM vectors to IDTENTRY_SYSVEC*
Convert KVM specific system vectors to IDTENTRY_SYSVEC*:

The two empty stub handlers which only increment the stats counter do no
need to run on the interrupt stack. Use IDTENTRY_SYSVEC_SIMPLE for them.

The wakeup handler does more work and runs on the interrupt stack.

None of these handlers need to save and restore the irq_regs pointer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.555715519@linutronix.de
2020-06-11 15:15:15 +02:00
Thomas Gleixner
720909a7ab x86/entry: Convert various system vectors
Convert various system vectors to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.464812973@linutronix.de
2020-06-11 15:15:14 +02:00
Thomas Gleixner
582f919123 x86/entry: Convert SMP system vectors to IDTENTRY_SYSVEC
Convert SMP system vectors to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.372234635@linutronix.de
2020-06-11 15:15:14 +02:00
Thomas Gleixner
db0338eec5 x86/entry: Convert APIC interrupts to IDTENTRY_SYSVEC
Convert APIC interrupts to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.280728850@linutronix.de
2020-06-11 15:15:13 +02:00
Thomas Gleixner
6368558c37 x86/entry: Provide IDTENTRY_SYSVEC
Provide IDTENTRY variants for system vectors to consolidate the different
mechanisms to emit the ASM stubs for 32- and 64-bit.

On 64-bit this also moves the stack switching from ASM to C code. 32-bit will
excute the system vectors w/o stack switching as before.

The simple variant is meant for "empty" system vectors like scheduler IPI
and KVM posted interrupt vectors. These do not need the full glory of irq
enter/exit handling with softirq processing and more.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.185317067@linutronix.de
2020-06-11 15:15:13 +02:00
Thomas Gleixner
fa5e5c4092 x86/entry: Use idtentry for interrupts
Replace the extra interrupt handling code and reuse the existing idtentry
machinery. This moves the irq stack switching on 64-bit from ASM to C code;
32-bit already does the stack switching in C.

This requires to remove HAVE_IRQ_EXIT_ON_IRQ_STACK as the stack switch is
not longer in the low level entry code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.078690991@linutronix.de
2020-06-11 15:15:12 +02:00
Thomas Gleixner
0bf7c314ff x86/entry: Add IRQENTRY_IRQ macro
Provide a seperate IDTENTRY macro for device interrupts. Similar to
IDTENTRY_ERRORCODE with the addition of invoking irq_enter/exit_rcu() and
providing the errorcode as a 'u8' argument to the C function, which
truncates the sign extended vector number.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.984573165@linutronix.de
2020-06-11 15:15:12 +02:00
Thomas Gleixner
7c2a57364c x86/irq: Rework handle_irq() for 64-bit
To consolidate the interrupt entry/exit code vs. the other exceptions
make handle_irq() an inline and handle both 64-bit and 32-bit mode.

Preparatory change to move irq stack switching for 64-bit to C which allows
to consolidate the entry exit handling by reusing the idtentry machinery
both in ASM and C.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.889972748@linutronix.de
2020-06-11 15:15:12 +02:00
Thomas Gleixner
633260fa14 x86/irq: Convey vector as argument and not in ptregs
Device interrupts which go through do_IRQ() or the spurious interrupt
handler have their separate entry code on 64 bit for no good reason.

Both 32 and 64 bit transport the vector number through ORIG_[RE]AX in
pt_regs. Further the vector number is forced to fit into an u8 and is
complemented and offset by 0x80 so it's in the signed character
range. Otherwise GAS would expand the pushq to a 5 byte instruction for any
vector > 0x7F.

Treat the vector number like an error code and hand it to the C function as
argument. This allows to get rid of the extra entry code in a later step.

Simplify the error code push magic by implementing the pushq imm8 via a
'.byte 0x6a, vector' sequence so GAS is not able to screw it up. As the
pushq imm8 is sign extending the resulting error code needs to be truncated
to 8 bits in C code.

Originally-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.796915981@linutronix.de
2020-06-11 15:15:11 +02:00
Thomas Gleixner
79b9c18302 x86/irq: Use generic irq_regs implementation
The only difference is the name of the per-CPU variable: irq_regs
vs. __irq_regs, but the accessor functions are identical.

Remove the pointless copy and use the generic variant.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.704169051@linutronix.de
2020-06-11 15:15:11 +02:00
Thomas Gleixner
e2dcb5f139 x86/entry: Remove the transition leftovers
Now that all exceptions are converted over the sane flag is not longer
needed. Also the vector argument of idtentry_body on 64-bit is pointless
now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.331115895@linutronix.de
2020-06-11 15:15:09 +02:00
Thomas Gleixner
91eeafea1e x86/entry: Switch page fault exception to IDTENTRY_RAW
Convert page fault exceptions to IDTENTRY_RAW:

  - Implement the C entry point with DEFINE_IDTENTRY_RAW
  - Add the CR2 read into the exception handler
  - Add the idtentry_enter/exit_cond_rcu() invocations in
    in the regular page fault handler and in the async PF
    part.
  - Emit the ASM stub with DECLARE_IDTENTRY_RAW
  - Remove the ASM idtentry in 64-bit
  - Remove the CR2 read from 64-bit
  - Remove the open coded ASM entry code in 32-bit
  - Fix up the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.238455120@linutronix.de
2020-06-11 15:15:09 +02:00
Thomas Gleixner
2f6474e463 x86/entry: Switch XEN/PV hypercall entry to IDTENTRY
Convert the XEN/PV hypercall to IDTENTRY:

  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64-bit
  - Remove the open coded ASM entry code in 32-bit
  - Remove the old prototypes

The handler stubs need to stay in ASM code as they need corner case handling
and adjustment of the stack pointer.

Provide a new C function which invokes the entry/exit handling and calls
into the XEN handler on the interrupt stack if required.

The exit code is slightly different from the regular idtentry_exit() on
non-preemptible kernels. If the hypercall is preemptible and need_resched()
is set then XEN provides a preempt hypercall scheduling function.

Move this functionality into the entry code so it can use the existing
idtentry functionality.

[ mingo: Build fixes. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20200521202118.055270078@linutronix.de
2020-06-11 15:15:08 +02:00
Thomas Gleixner
931b941459 x86/entry: Provide helpers for executing on the irqstack
Device interrupt handlers and system vector handlers are executed on the
interrupt stack. The stack switch happens in the low level assembly entry
code. This conflicts with the efforts to consolidate the exit code in C to
ensure correctness vs. RCU and tracing.

As there is no way to move #DB away from IST due to the MOV SS issue, the
requirements vs. #DB and NMI for switching to the interrupt stack do not
exist anymore. The only requirement is that interrupts are disabled.

That allows the moving of the stack switching to C code, which simplifies the
entry/exit handling further, because it allows the switching of stacks after
handling the entry and on exit before handling RCU, returning to usermode and
kernel preemption in the same way as for regular exceptions.

The initial attempt of having the stack switching in inline ASM caused too
much headache vs. objtool and the unwinder. After analysing the use cases
it was agreed on that having the stack switch in ASM for the price of an
indirect call is acceptable, as the main users are indirect call heavy
anyway and the few system vectors which are empty shells (scheduler IPI and
KVM posted interrupt vectors) can run from the regular stack.

Provide helper functions to check whether the interrupt stack is already
active and whether stack switching is required.

64-bit only for now, as 32-bit has a variant of that already. Once this is
cleaned up, the two implementations might be consolidated as an additional
cleanup on top.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.763775313@linutronix.de
2020-06-11 15:15:07 +02:00
Thomas Gleixner
9ee01e0f69 x86/entry: Clean up idtentry_enter/exit() leftovers
Now that everything is converted to conditional RCU handling remove
idtentry_enter/exit() and tidy up the conditional functions.

This does not remove rcu_irq_exit_preempt(), to avoid conflicts with the RCU
tree. Will be removed once all of this hits Linus's tree.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.473597954@linutronix.de
2020-06-11 15:15:06 +02:00
Thomas Gleixner
fa95d7dc1a x86/idtentry: Switch to conditional RCU handling
Switch all idtentry_enter/exit() users over to the new conditional RCU
handling scheme and make the user mode entries in #DB, #INT3 and #MCE use
the user mode idtentry functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.382387286@linutronix.de
2020-06-11 15:15:05 +02:00
Thomas Gleixner
9f9781b60d x86/entry: Provide idtentry_enter/exit_user()
As there are exceptions which already handle entry from user mode and from
kernel mode separately, providing explicit user entry/exit handling callbacks
makes sense and makes the code easier to understand.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.289548561@linutronix.de
2020-06-11 15:15:05 +02:00
Thomas Gleixner
3eeec38584 x86/entry: Provide idtentry_entry/exit_cond_rcu()
After a lengthy discussion [1] it turned out that RCU does not need a full
rcu_irq_enter/exit() when RCU is already watching. All it needs if
NOHZ_FULL is active is to check whether the tick needs to be restarted.

This allows to avoid a separate variant for the pagefault handler which
cannot invoke rcu_irq_enter() on a kernel pagefault which might sleep.

The cond_rcu argument is only temporary and will be removed once the
existing users of idtentry_enter/exit() have been cleaned up. After that
the code can be significantly simplified.

[ mingo: Simplified the control flow ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Paul E. McKenney" <paulmck@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: [1] https://lkml.kernel.org/r/20200515235125.628629605@linutronix.de
Link: https://lore.kernel.org/r/20200521202117.181397835@linutronix.de
2020-06-11 15:15:04 +02:00
Thomas Gleixner
c29c775a55 x86/entry: Convert double fault exception to IDTENTRY_DF
Convert #DF to IDTENTRY_DF
  - Implement the C entry point with DEFINE_IDTENTRY_DF
  - Emit the ASM stub with DECLARE_IDTENTRY_DF on 64bit
  - Remove the ASM idtentry in 64bit
  - Adjust the 32bit shim code
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.583415264@linutronix.de
2020-06-11 15:15:03 +02:00
Thomas Gleixner
6a8dfa8e40 x86/idtentry: Provide IDTENTRY_DF
Provide a separate macro for #DF as this needs to emit paranoid only code
and has also a special ASM stub in 32bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.583415264@linutronix.de
2020-06-11 15:15:02 +02:00
Thomas Gleixner
f08e32ec3c x86/idtentry: Provide IDTRENTRY_NOIST variants for #DB and #MC
Provide NOIST entry point macros which allows to implement NOIST variants
of the C entry points. These are invoked when #DB or #MC enter from user
space. This allows explicit handling of the difference between user mode
and kernel mode entry later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.084882104@linutronix.de
2020-06-11 15:15:00 +02:00
Thomas Gleixner
2bbc68f837 x86/entry: Convert Debug exception to IDTENTRY_DB
Convert #DB to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY_DB
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.900297476@linutronix.de
2020-06-11 15:14:59 +02:00
Thomas Gleixner
f051f69795 x86/nmi: Protect NMI entry against instrumentation
Mark all functions in the fragile code parts noinstr or force inlining so
they can't be instrumented.

Also make the hardware latency tracer invocation explicit outside of
non-instrumentable section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.716186134@linutronix.de
2020-06-11 15:14:58 +02:00
Thomas Gleixner
6271fef00b x86/entry: Convert NMI to IDTENTRY_NMI
Convert #NMI to IDTENTRY_NMI:
  - Implement the C entry point with DEFINE_IDTENTRY_NMI
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.609932306@linutronix.de
2020-06-11 15:14:58 +02:00
Thomas Gleixner
9cce81cff7 x86/idtentry: Provide IDTENTRY_XEN for XEN/PV
XEN/PV has special wrappers for NMI and DB exceptions. They redirect these
exceptions through regular IDTENTRY points. Provide the necessary IDTENTRY
macros to make this work

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.518622698@linutronix.de
2020-06-11 15:14:57 +02:00
Thomas Gleixner
8cd501c1fa x86/entry: Convert Machine Check to IDTENTRY_IST
Convert #MC to IDTENTRY_MCE:
  - Implement the C entry points with DEFINE_IDTENTRY_MCE
  - Emit the ASM stub with DECLARE_IDTENTRY_MCE
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the error code from *machine_check_vector() as
    it is always 0 and not used by any of the functions
    it can point to. Fixup all the functions as well.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.334980426@linutronix.de
2020-06-11 15:14:57 +02:00
Thomas Gleixner
2c058b03cc x86/idtentry: Provide IDTENTRY_IST
Same as IDTENTRY but for exceptions which run on Interrupt Stacks (IST) on
64bit. For 32bit this maps to IDTENTRY.

There are 3 variants which will be used:
      IDTENTRY_MCE
      IDTENTRY_DB
      IDTENTRY_NMI

These map to IDTENTRY_IST, but only the MCE and DB variants are emitting
ASM code as the NMI entry needs hand crafted ASM still.

The function defines do not contain any idtenter/exit calls as these
exceptions need special treatment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.137125609@linutronix.de
2020-06-11 15:14:55 +02:00
Thomas Gleixner
8edd7e37ae x86/entry: Convert INT3 exception to IDTENTRY_RAW
Convert #BP to IDTENTRY_RAW:
  - Implement the C entry point with DEFINE_IDTENTRY_RAW
  - Invoke idtentry_enter/exit() from the function body
  - Emit the ASM stub with DECLARE_IDTENTRY_RAW
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

This could be a plain IDTENTRY, but as Peter pointed out INT3 is broken
vs. the static key in the context tracking code as this static key might be
in the state of being patched and has an int3 which would recurse forever.
IDTENTRY_RAW is therefore chosen to allow addressing this issue without
lots of code churn.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135313.938474960@linutronix.de
2020-06-11 15:14:55 +02:00
Thomas Gleixner
0dc6cdc21b x86/idtentry: Provide IDTENTRY_RAW
Some exception handlers need to do extra work before any of the entry
helpers are invoked. Provide IDTENTRY_RAW for this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135313.830540017@linutronix.de
2020-06-11 15:14:54 +02:00
Thomas Gleixner
4979fb53ab x86/int3: Ensure that poke_int3_handler() is not traced
In order to ensure poke_int3_handler() is completely self contained -- this
is called while modifying other text, imagine the fun of hitting another
INT3 -- ensure that everything it uses is not traced.

The primary means here is to force inlining; bsearch() is notrace because
all of lib/ is.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135313.410702173@linutronix.de
2020-06-11 15:14:52 +02:00
Thomas Gleixner
d77290507a x86/entry/32: Convert IRET exception to IDTENTRY_SW
Convert the IRET exception handler to IDTENTRY_SW. This is slightly
different than the conversions of hardware exceptions as the IRET exception
is invoked via an exception table when IRET faults. So it just uses the
IDTENTRY_SW mechanism for consistency. It does not emit ASM code as it does
not fit the other idtentry exceptions.

  - Implement the C entry point with DEFINE_IDTENTRY_SW() which maps to
    DEFINE_IDTENTRY()
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134906.128769226@linutronix.de
2020-06-11 15:14:52 +02:00
Thomas Gleixner
48227e21f7 x86/entry: Convert SIMD coprocessor error exception to IDTENTRY
Convert #XF to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Handle INVD_BUG in C
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134906.021552202@linutronix.de
2020-06-11 15:14:52 +02:00
Thomas Gleixner
436608bb00 x86/entry: Convert Alignment check exception to IDTENTRY
Convert #AC to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134905.928967113@linutronix.de
2020-06-11 15:14:51 +02:00
Thomas Gleixner
14a8bd2aa7 x86/entry: Convert Coprocessor error exception to IDTENTRY
Convert #MF to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134905.838823510@linutronix.de
2020-06-11 15:14:51 +02:00
Thomas Gleixner
dad7106f81 x86/entry: Convert Spurious interrupt bug exception to IDTENTRY
Convert #SPURIOUS to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134905.728077036@linutronix.de
2020-06-11 15:14:50 +02:00
Thomas Gleixner
be4c11afbb x86/entry: Convert General protection exception to IDTENTRY
Convert #GP to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134905.637269946@linutronix.de
2020-06-11 15:14:50 +02:00
Thomas Gleixner
fd9689bf91 x86/entry: Convert Stack segment exception to IDTENTRY
Convert #SS to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134905.539867572@linutronix.de
2020-06-11 15:14:49 +02:00
Thomas Gleixner
99a3fb8d01 x86/entry: Convert Segment not present exception to IDTENTRY
Convert #NP to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134905.443591450@linutronix.de
2020-06-11 15:14:49 +02:00
Thomas Gleixner
97b3d290b8 x86/entry: Convert Invalid TSS exception to IDTENTRY
Convert #TS to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134905.350676449@linutronix.de
2020-06-11 15:14:49 +02:00
Thomas Gleixner
aabfe5383e x86/idtentry: Provide IDTENTRY_ERRORCODE
Same as IDTENTRY but the C entry point has an error code argument.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134905.258989060@linutronix.de
2020-06-11 15:14:48 +02:00
Thomas Gleixner
f95658fdb5 x86/entry: Convert Coprocessor segment overrun exception to IDTENTRY
Convert #OLD_MF to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134905.838823510@linutronix.de
2020-06-11 15:14:48 +02:00
Thomas Gleixner
866ae2ccee x86/entry: Convert Device not available exception to IDTENTRY
Convert #NM to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134905.056243863@linutronix.de
2020-06-11 15:14:47 +02:00
Thomas Gleixner
49893c5cb2 x86/entry: Convert Invalid Opcode exception to IDTENTRY
Convert #UD to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Fixup the FOOF bug call in fault.c
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.955511913@linutronix.de
2020-06-11 15:14:47 +02:00
Thomas Gleixner
58d9c81fac x86/entry: Convert Bounds exception to IDTENTRY
Convert #BR to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.863001309@linutronix.de
2020-06-11 15:14:46 +02:00
Thomas Gleixner
4b6b9111c0 x86/entry: Convert Overflow exception to IDTENTRY
Convert #OF to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.771457898@linutronix.de
2020-06-11 15:14:46 +02:00
Thomas Gleixner
9d06c4027f x86/entry: Convert Divide Error to IDTENTRY
Convert #DE to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134904.663914713@linutronix.de
2020-06-11 15:14:46 +02:00
Thomas Gleixner
0ba50e861a x86/entry/common: Provide idtentry_enter/exit()
Provide functions which handle the low level entry and exit similar to
enter/exit from user mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134904.457578656@linutronix.de
2020-06-11 15:14:45 +02:00
Thomas Gleixner
53aaf262c6 x86/idtentry: Provide macros to define/declare IDT entry points
Provide DECLARE/DEFINE_IDTENTRY() macros.

DEFINE_IDTENTRY() provides a wrapper which acts as the function
definition. The exception handler body is just appended to it with curly
brackets. The entry point is marked noinstr so that irq tracing and the
enter_from_user_mode() can be moved into the C-entry point. As all
C-entries use the same macro (or a later variant) the necessary entry
handling can be implemented at one central place.

DECLARE_IDTENTRY() provides the function prototypes:
  - The C entry point 	    	cfunc
  - The ASM entry point		asm_cfunc
  - The XEN/PV entry point	xen_asm_cfunc

They all follow the same naming convention.

When included from ASM code DECLARE_IDTENTRY() is a macro which emits the
low level entry point in assembly by instantiating idtentry.

IDTENTRY is the simplest variant which just has a pt_regs argument. It's
going to be used for all exceptions which have no error code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.273363275@linutronix.de
2020-06-11 15:14:44 +02:00
Thomas Gleixner
877f183f83 x86/traps: Split trap numbers out in a separate header
So they can be used in ASM code. For this it is also necessary to convert
them to defines. Will be used for the rework of the entry code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134903.731004084@linutronix.de
2020-06-11 15:14:42 +02:00
Thomas Gleixner
410367e321 x86/entry: Disable interrupts for native_load_gs_index() in C code
There is absolutely no point in doing this in ASM code. Move it to C.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134903.531534675@linutronix.de
2020-06-11 15:14:41 +02:00
Thomas Gleixner
a7ef9ba986 x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline
Prevent the compiler from uninlining and creating traceable/probable
functions as this is invoked _after_ context tracking switched to
CONTEXT_USER and rcu idle.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134340.902709267@linutronix.de
2020-06-11 15:14:40 +02:00
Thomas Gleixner
fba8dbeaf3 x86/idt: Remove update_intr_gate()
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-06-11 15:14:37 +02:00
Thomas Gleixner
5916d5f9b3 bug: Annotate WARN/BUG/stackfail as noinstr safe
Warnings, bugs and stack protection fails from noinstr sections, e.g. low
level and early entry code, are likely to be fatal.

Mark them as "safe" to be invoked from noinstr protected code to avoid
annotating all usage sites. Getting the information out is important.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134100.376598577@linutronix.de
2020-06-11 15:14:36 +02:00
Thomas Gleixner
44d7e4fbc0 x86/entry: Remove the unused LOCKDEP_SYSEXIT cruft
No users left since two years due to commit 21d375b6b3 ("x86/entry/64:
Remove the SYSCALL64 fast path")

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134059.061301403@linutronix.de
2020-06-11 15:14:35 +02:00
Peter Zijlstra
37f8173dd8 locking/atomics: Flip fallbacks and instrumentation
Currently instrumentation of atomic primitives is done at the architecture
level, while composites or fallbacks are provided at the generic level.

The result is that there are no uninstrumented variants of the
fallbacks. Since there is now need of such variants to isolate text poke
from any form of instrumentation invert this ordering.

Doing this means moving the instrumentation into the generic code as
well as having (for now) two variants of the fallbacks.

Notes:

 - the various *cond_read* primitives are not proper fallbacks
   and got moved into linux/atomic.c. No arch_ variants are
   generated because the base primitives smp_cond_load*()
   are instrumented.

 - once all architectures are moved over to arch_atomic_ one of the
   fallback variants can be removed and some 2300 lines reclaimed.

 - atomic_{read,set}*() are no longer double-instrumented

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20200505134058.769149955@linutronix.de
2020-06-11 08:03:24 +02:00
Linus Torvalds
4382a79b27 Merge branch 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc uaccess updates from Al Viro:
 "Assorted uaccess patches for this cycle - the stuff that didn't fit
  into thematic series"

* 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  bpf: make bpf_check_uarg_tail_zero() use check_zeroed_user()
  x86: kvm_hv_set_msr(): use __put_user() instead of 32bit __clear_user()
  user_regset_copyout_zero(): use clear_user()
  TEST_ACCESS_OK _never_ had been checked anywhere
  x86: switch cp_stat64() to unsafe_put_user()
  binfmt_flat: don't use __put_user()
  binfmt_elf_fdpic: don't use __... uaccess primitives
  binfmt_elf: don't bother with __{put,copy_to}_user()
  pselect6() and friends: take handling the combined 6th/7th args into helper
2020-06-10 16:02:54 -07:00
Linus Torvalds
3beff76b54 x86: use proper parentheses around new uaccess macro argument uses
__get_kernel_nofault() didn't have the parentheses around the use of
'src' and 'dst' macro arguments, making the casts potentially do the
wrong thing.

The parentheses aren't necessary with the current very limited use in
mm/access.c, but it's bad form, and future use-cases might have very
unexpected errors as a result.

Do the same for unsafe_copy_loop() while at it, although in that case it
is an entirely internal x86 uaccess helper macro that isn't used
anywhere else and any other use would be invalid anyway.

Fixes: fa94111d94 ("x86: use non-set_fs based maccess routines")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 10:39:33 -07:00
Linus Torvalds
a5ad5742f6 Merge branch 'akpm' (patches from Andrew)
Merge even more updates from Andrew Morton:

 - a kernel-wide sweep of show_stack()

 - pagetable cleanups

 - abstract out accesses to mmap_sem - prep for mmap_sem scalability work

 - hch's user acess work

Subsystems affected by this patch series: debug, mm/pagemap, mm/maccess,
mm/documentation.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (93 commits)
  include/linux/cache.h: expand documentation over __read_mostly
  maccess: return -ERANGE when probe_kernel_read() fails
  x86: use non-set_fs based maccess routines
  maccess: allow architectures to provide kernel probing directly
  maccess: move user access routines together
  maccess: always use strict semantics for probe_kernel_read
  maccess: remove strncpy_from_unsafe
  tracing/kprobes: handle mixed kernel/userspace probes better
  bpf: rework the compat kernel probe handling
  bpf:bpf_seq_printf(): handle potentially unsafe format string better
  bpf: handle the compat string in bpf_trace_copy_string better
  bpf: factor out a bpf_trace_copy_string helper
  maccess: unify the probe kernel arch hooks
  maccess: remove probe_read_common and probe_write_common
  maccess: rename strnlen_unsafe_user to strnlen_user_nofault
  maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault
  maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault
  maccess: update the top of file comment
  maccess: clarify kerneldoc comments
  maccess: remove duplicate kerneldoc comments
  ...
2020-06-09 09:54:46 -07:00
Christoph Hellwig
fa94111d94 x86: use non-set_fs based maccess routines
Provide arch_kernel_read and arch_kernel_write routines to implement the
maccess routines without messing with set_fs and without stac/clac that
opens up access to user space.

[akpm@linux-foundation.org: coding style fixes]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-20-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:16 -07:00
Michel Lespinasse
c1e8d7c6a7 mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Mike Rapoport
974b9b2c68 mm: consolidate pte_index() and pte_offset_*() definitions
All architectures define pte_index() as

	(address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)

and all architectures define pte_offset_kernel() as an entry in the array
of PTEs indexed by the pte_index().

For the most architectures the pte_offset_kernel() implementation relies
on the availability of pmd_page_vaddr() that converts a PMD entry value to
the virtual address of the page containing PTEs array.

Let's move x86 definitions of the PTE accessors to the generic place in
<linux/pgtable.h> and then simply drop the respective definitions from the
other architectures.

The architectures that didn't provide pmd_page_vaddr() are updated to have
that defined.

The generic implementation of pte_offset_kernel() can be overridden by an
architecture and alpha makes use of this because it has special ordering
requirements for its version of pte_offset_kernel().

[rppt@linux.ibm.com: v2]
  Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
[rppt@linux.ibm.com: update]
  Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
[rppt@linux.ibm.com: update]
  Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
[akpm@linux-foundation.org: fix x86 warning]
[sfr@canb.auug.org.au: fix powerpc build]
  Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Mike Rapoport
88107d330d x86/mm: simplify init_trampoline() and surrounding logic
There are three cases for the trampoline initialization:
* 32-bit does nothing
* 64-bit with kaslr disabled simply copies a PGD entry from the direct map
  to the trampoline PGD
* 64-bit with kaslr enabled maps the real mode trampoline at PUD level

These cases are currently differentiated by a bunch of ifdefs inside
asm/include/pgtable.h and the case of 64-bits with kaslr on uses
pgd_index() helper.

Replacing the ifdefs with a static function in arch/x86/mm/init.c gives
clearer code and allows moving pgd_index() to the generic implementation
in include/linux/pgtable.h

[rppt@linux.ibm.com: take CONFIG_RANDOMIZE_MEMORY into account in kaslr_enabled()]
  Link: http://lkml.kernel.org/r/20200525104045.GB13212@linux.ibm.com

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-8-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport
65fddcfca8 mm: reorder includes after introduction of linux/pgtable.h
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
of the latter in the middle of asm includes.  Fix this up with the aid of
the below script and manual adjustments here and there.

	import sys
	import re

	if len(sys.argv) is not 3:
	    print "USAGE: %s <file> <header>" % (sys.argv[0])
	    sys.exit(1)

	hdr_to_move="#include <linux/%s>" % sys.argv[2]
	moved = False
	in_hdrs = False

	with open(sys.argv[1], "r") as f:
	    lines = f.readlines()
	    for _line in lines:
		line = _line.rstrip('
')
		if line == hdr_to_move:
		    continue
		if line.startswith("#include <linux/"):
		    in_hdrs = True
		elif not moved and in_hdrs:
		    moved = True
		    print hdr_to_move
		print line

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport
ca5999fde0 mm: introduce include/linux/pgtable.h
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.

Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport
e31cf2f4ca mm: don't include asm/pgtable.h if linux/mm.h is already included
Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once.  For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
        return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g.  pte_alloc() and
pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

	for f in $(git grep -l "include <linux/mm.h>") ; do
		sed -i -e '/include <asm\/pgtable.h>/ d' $f
	done

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Dmitry Safonov
d46b3df78a x86: add missing const qualifiers for log_lvl
Currently, the log-level of show_stack() depends on a platform
realization.  It creates situations where the headers are printed with
lower log level or higher than the stacktrace (depending on a platform or
user).

Furthermore, it forces the logic decision from user to an architecture
side.  In result, some users as sysrq/kdb/etc are doing tricks with
temporary rising console_loglevel while printing their messages.  And in
result it not only may print unwanted messages from other CPUs, but also
omit printing at all in the unlucky case where the printk() was deferred.

Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems an easier
approach than introducing more printk buffers.  Also, it will consolidate
printings with headers.

Keep log_lvl const show_trace_log_lvl() and printk_stack_address() as the
new generic show_stack_loglvl() wants to have a proper const qualifier.

And gcc rightfully produces warnings in case it's not keept:
arch/x86/kernel/dumpstack.c: In function `show_stack':
arch/x86/kernel/dumpstack.c:294:37: warning: passing argument 4 of `show_trace_log_lv ' discards `const' qualifier from pointer target type [-Wdiscarded-qualifiers]
  294 |  show_trace_log_lvl(task, NULL, sp, loglvl);
      |                                     ^~~~~~
arch/x86/kernel/dumpstack.c:163:32: note: expected `char *' but argument is of type `const char *'
  163 |    unsigned long *stack, char *log_lvl)
      |                          ~~~~~~^~~~~~~

[1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/T/#u

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200418201944.482088-41-dima@arista.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:12 -07:00
Linus Torvalds
8b4d37db9a Merge branch 'x86/srbds' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 srbds fixes from Thomas Gleixner:
 "The 9th episode of the dime novel "The performance killer" with the
  subtitle "Slow Randomizing Boosts Denial of Service".

  SRBDS is an MDS-like speculative side channel that can leak bits from
  the random number generator (RNG) across cores and threads. New
  microcode serializes the processor access during the execution of
  RDRAND and RDSEED. This ensures that the shared buffer is overwritten
  before it is released for reuse. This is equivalent to a full bus
  lock, which means that many threads running the RNG instructions in
  parallel have the same effect as the same amount of threads issuing a
  locked instruction targeting an address which requires locking of two
  cachelines at once.

  The mitigation support comes with the usual pile of unpleasant
  ingredients:

   - command line options

   - sysfs file

   - microcode checks

   - a list of vulnerable CPUs identified by model and stepping this
     time which requires stepping match support for the cpu match logic.

   - the inevitable slowdown of affected CPUs"

* branch 'x86/srbds' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Add Ivy Bridge to affected list
  x86/speculation: Add SRBDS vulnerability and mitigation documentation
  x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation
  x86/cpu: Add 'table' argument to cpu_matches()
2020-06-09 09:30:21 -07:00
Thomas Gleixner
7778d8417b x86/vdso: Unbreak paravirt VDSO clocks
The conversion of x86 VDSO to the generic clock mode storage broke the
paravirt and hyperv clocksource logic. These clock sources have their own
internal sequence counter to validate the clocksource at the point of
reading it. This is necessary because the hypervisor can invalidate the
clocksource asynchronously so a check during the VDSO data update is not
sufficient. If the internal check during read invalidates the clocksource
the read return U64_MAX. The original code checked this efficiently by
testing whether the result (casted to signed) is negative, i.e. bit 63 is
set. This was done that way because an extra indicator for the validity had
more overhead.

The conversion broke this check because the check was replaced by a check
for a valid VDSO clock mode.

The wreckage manifests itself when the paravirt clock is installed as a
valid VDSO clock and during runtime invalidated by the hypervisor,
e.g. after a host suspend/resume cycle. After the invalidation the read
function returns U64_MAX which is used as cycles and makes the clock jump
by ~2200 seconds, and become stale until the 2200 seconds have elapsed
where it starts to jump again. The period of this effect depends on the
shift/mult pair of the clocksource and the jumps and staleness are an
artifact of undefined but reproducible behaviour of math overflow.

Implement an x86 version of the new vdso_cycles_ok() inline which adds this
check back and a variant of vdso_clocksource_ok() which lets the compiler
optimize it out to avoid the extra conditional. That's suboptimal when the
system does not have a VDSO capable clocksource, but that's not the case
which is optimized for.

Fixes: 5d51bee725 ("clocksource: Add common vdso clock mode storage")
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200606221532.080560273@linutronix.de
2020-06-09 16:36:49 +02:00
Sean Christopherson
80fbd280be KVM: x86: Unexport x86_fpu_cache and make it static
Make x86_fpu_cache static now that FPU allocation and destruction is
handled entirely by common x86 code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200608180218.20946-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-09 05:57:27 -04:00
Christoph Hellwig
e0cf615d72 asm-generic: don't include <linux/mm.h> in cacheflush.h
This seems to lead to some crazy include loops when using
asm-generic/cacheflush.h on more architectures, so leave it to the arch
header for now.

[hch@lst.de: fix warning]
  Link: http://lkml.kernel.org/r/20200520173520.GA11199@lst.de

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <will@kernel.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200515143646.3857579-7-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 11:05:57 -07:00
Linus Torvalds
ac7b34218a Split the old READ_IMPLIES_EXEC workaround from executable PT_GNU_STACK
now that toolchains long support PT_GNU_STACK marking and there's no
 need anymore to force modern programs into having all its user mappings
 executable instead of only the stack and the PROT_EXEC ones. Disable
 that automatic READ_IMPLIES_EXEC forcing on x86-64 and arm64. Add tables
 documenting how READ_IMPLIES_EXEC is handled on x86-64, arm and arm64.
 By Kees Cook.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl7YFDIACgkQEsHwGGHe
 VUpnzxAAmXdODNOb1gGQvt+KJthkfkWh2A2R+tWxCRmFtjFTcS/eRxFfvGu2KmFY
 2b2AcJzuJeGjs7WIvQU0pkR2p6STyzuSBBLj5J/OJR9FonQ4pPah38df4A0fOgI6
 GJyJV9Ie7O2Ph1w2iLOeWBdmR90CnYuabxsfipgOL+sjHlEI0RqLSDgARRQsxTEj
 KM+JVAFD472KcUJnQKBVBOD1I1DOVBGu12r3y6chgsOtwshLNW/cO15cDgYrgnJZ
 OlR3EIUukCEEc1KQzUCihsypLuGfrmdq1MyPN8CME8gLfmOBsJyGRDhvmdbS+Wxh
 kAMYQ9BuNP/jMVtN950qV0qUtnZCeIPlj1sDb9STWz5fInLsXDSCS0eYi32yBFi+
 7yviVU95ml6Mda1Qd5axItTHFAjKIn0qfMZszkLOtUszIzNinCgH7t3ThoXeV223
 BqrpntRwiGZVpXDdcp0QFYBsWSMchR47yuhL8pB4SWxQzgNzXqAEg2KFQU0XMDKp
 pdia9IzUozg/BrjG5cnRfZhq2lBra7fy3Dn6fw5+NR5vqhka0Wr8L6dyM1Rj74EU
 HPk5bRXgt0OIiIFPi4139ApY7k+8j2nbf12qUchue1ZVVKzbvK996FDXbrGgW3zD
 Wis1wglxB9urSUTmC1bMOeyOd+gebo3i/ACAjgSo+EbDN7qW0Qw=
 =2L7y
 -----END PGP SIGNATURE-----

Merge tag 'core_core_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull READ_IMPLIES_EXEC changes from Borislav Petkov:
 "Split the old READ_IMPLIES_EXEC workaround from executable
  PT_GNU_STACK now that toolchains long support PT_GNU_STACK marking and
  there's no need anymore to force modern programs into having all its
  user mappings executable instead of only the stack and the PROT_EXEC
  ones.

  Disable that automatic READ_IMPLIES_EXEC forcing on x86-64 and
  arm64.

  Add tables documenting how READ_IMPLIES_EXEC is handled on x86-64, arm
  and arm64.

  By Kees Cook"

* tag 'core_core_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm64/elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address spaces
  arm32/64/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK
  arm32/64/elf: Add tables to document READ_IMPLIES_EXEC
  x86/elf: Disable automatic READ_IMPLIES_EXEC on 64-bit
  x86/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK
  x86/elf: Add table to document READ_IMPLIES_EXEC
2020-06-05 13:45:21 -07:00
Linus Torvalds
f4dd60a3d4 Misc changes:
- Unexport various PAT primitives
 
  - Unexport per-CPU tlbstate
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7Z+3cRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jgyxAAjPoXEzi9rqGHY6Eus37DNbzHtdQj4fqN
 68h8T2tSnOMzETe3L/c4puxI50YFpMA0sFbzm8BfjCtucs0K7Tj4Sv8Aoap2b99A
 /bP+ySgHh2BMoI/tu9TiD8et+vttAGGwkXQhIOgeakZcYzpAY7oUNwc+CogkytbQ
 DaC8s9FL7RjCXCL91fvZ33C0ksg5J9ynFbRozEHOacHPrE3CbrqUwu+75PmS7nJC
 13vatOxjdqNPQhVMg7waN1nHv7K06kph1wxWxYHoD0QwAPy1ecE84wLvg9gv5AqK
 BfUBmB34qRW21qbB5tQrMlGDS9tuV0vUB1fxUV7/iOKXQUH6viEG/7J7jm+YwXji
 U9S54UPj/TOp8fvYdS18sp6vI1gS3HKjd3LO3pPHWsyZVMJBoGuMConZRs3C31Cp
 WuwBU1gY+mFB5l4prt8WU8ocPvEnZkP00cCYNyzPk21tblfUwFbrmu3wcZxOkx3s
 ZhRO4KrhxtL7l/wDLuNtWShBL2c6Rz2tts58tr/fj/M+UscJK2MPKxPLCAb20QYZ
 qSkMa36+r8LkuMCyjpegEEmo4sw9yC6aLXFKfYu2ABki5o9AR4tavk+lwO+dad6T
 k0DJjGXLsG9sReR6hrfaNTk5h7ImiRFDVntnWAhgKhARRoloJJS4/RkzW+ylPbac
 mTuNNJDChUQ=
 =RXKK
 -----END PGP SIGNATURE-----

Merge tag 'x86-mm-2020-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Ingo Molnar:
 "Misc changes:

   - Unexport various PAT primitives

   - Unexport per-CPU tlbstate and uninline TLB helpers"

* tag 'x86-mm-2020-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/tlb/uv: Add a forward declaration for struct flush_tlb_info
  x86/cpu: Export native_write_cr4() only when CONFIG_LKTDM=m
  x86/tlb: Restrict access to tlbstate
  xen/privcmd: Remove unneeded asm/tlb.h include
  x86/tlb: Move PCID helpers where they are used
  x86/tlb: Uninline nmi_uaccess_okay()
  x86/tlb: Move cr4_set_bits_and_update_boot() to the usage site
  x86/tlb: Move paravirt_tlb_remove_table() to the usage site
  x86/tlb: Move __flush_tlb_all() out of line
  x86/tlb: Move flush_tlb_others() out of line
  x86/tlb: Move __flush_tlb_one_kernel() out of line
  x86/tlb: Move __flush_tlb_one_user() out of line
  x86/tlb: Move __flush_tlb_global() out of line
  x86/tlb: Move __flush_tlb() out of line
  x86/alternatives: Move temporary_mm helpers into C
  x86/cr4: Sanitize CR4.PCE update
  x86/cpu: Uninline CR4 accessors
  x86/tlb: Uninline __get_current_cr3_fast()
  x86/mm: Use pgprotval_t in protval_4k_2_large() and protval_large_2_4k()
  x86/mm: Unexport __cachemode2pte_tbl
  ...
2020-06-05 11:18:53 -07:00
Linus Torvalds
886d7de631 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - More MM work. 100ish more to go. Mike Rapoport's "mm: remove
   __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue

 - Various other little subsystems

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits)
  lib/ubsan.c: fix gcc-10 warnings
  tools/testing/selftests/vm: remove duplicate headers
  selftests: vm: pkeys: fix multilib builds for x86
  selftests: vm: pkeys: use the correct page size on powerpc
  selftests/vm/pkeys: override access right definitions on powerpc
  selftests/vm/pkeys: test correct behaviour of pkey-0
  selftests/vm/pkeys: introduce a sub-page allocator
  selftests/vm/pkeys: detect write violation on a mapped access-denied-key page
  selftests/vm/pkeys: associate key on a mapped page and detect write violation
  selftests/vm/pkeys: associate key on a mapped page and detect access violation
  selftests/vm/pkeys: improve checks to determine pkey support
  selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()
  selftests/vm/pkeys: fix number of reserved powerpc pkeys
  selftests/vm/pkeys: introduce powerpc support
  selftests/vm/pkeys: introduce generic pkey abstractions
  selftests: vm: pkeys: use the correct huge page size
  selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
  selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()
  selftests/vm/pkeys: fix pkey_disable_clear()
  selftests: vm: pkeys: add helpers for pkey bits
  ...
2020-06-04 19:18:29 -07:00
Ira Weiny
090e77e166 kmap: consolidate kmap_prot definitions
Most architectures define kmap_prot to be PAGE_KERNEL.

Let sparc and xtensa define there own and define PAGE_KERNEL as the
default if not overridden.

[akpm@linux-foundation.org: coding style fixes]
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-16-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny
20b271dfe9 arch/kmap: define kmap_atomic_prot() for all arch's
To support kmap_atomic_prot(), all architectures need to support
protections passed to their kmap_atomic_high() function.  Pass protections
into kmap_atomic_high() and change the name to kmap_atomic_high_prot() to
match.

Then define kmap_atomic_prot() as a core function which calls
kmap_atomic_high_prot() when needed.

Finally, redefine kmap_atomic() as a wrapper of kmap_atomic_prot() with
the default kmap_prot exported by the architectures.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-11-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny
abca2500c0 arch/kunmap_atomic: consolidate duplicate code
Every single architecture (including !CONFIG_HIGHMEM) calls...

	pagefault_enable();
	preempt_enable();

... before returning from __kunmap_atomic().  Lift this code into the
kunmap_atomic() macro.

While we are at it rename __kunmap_atomic() to kunmap_atomic_high() to
be consistent.

[ira.weiny@intel.com: don't enable pagefault/preempt twice]
  Link: http://lkml.kernel.org/r/20200518184843.3029640-1-ira.weiny@intel.com
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20200507150004.1423069-8-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny
78b6d91ec7 arch/kmap_atomic: consolidate duplicate code
Every arch has the same code to ensure atomic operations and a check for
!HIGHMEM page.

Remove the duplicate code by defining a core kmap_atomic() which only
calls the arch specific kmap_atomic_high() when the page is high memory.

[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-7-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny
ee9bc5fdf5 {x86,powerpc,microblaze}/kmap: move preempt disable
During this kmap() conversion series we must maintain bisect-ability.  To
do this, kmap_atomic_prot() in x86, powerpc, and microblaze need to remain
functional.

Create a temporary inline version of kmap_atomic_prot within these
architectures so we can rework their kmap_atomic() calls and then lift
kmap_atomic_prot() to the core.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-6-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny
e23c45976f arch/kunmap: remove duplicate kunmap implementations
All architectures do exactly the same thing for kunmap(); remove all the
duplicate definitions and lift the call to the core.

This also has the benefit of changing kmap_unmap() on a number of
architectures to be an inline call rather than an actual function.

[akpm@linux-foundation.org: fix CONFIG_HIGHMEM=n build on various architectures]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-5-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny
525aaf9bad arch/kmap: remove redundant arch specific kmaps
The kmap code for all the architectures is almost 100% identical.

Lift the common code to the core.  Use ARCH_HAS_KMAP_FLUSH_TLB to indicate
if an arch defines kmap_flush_tlb() and call if if needed.

This also has the benefit of changing kmap() on a number of architectures
to be an inline call rather than an actual function.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-4-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Anshuman Khandual
8898ad58a0 x86/mm: define mm_p4d_folded()
Patch series "mm/debug: Add tests validating architecture page table
helpers", v18.

This adds a test validation for architecture exported page table helpers.
Patch adds basic transformation tests at various levels of the page table.

This test was originally suggested by Catalin during arm64 THP migration
RFC discussion earlier.  Going forward it can include more specific tests
with respect to various generic MM functions like THP, HugeTLB etc and
platform specific tests.

https://lore.kernel.org/linux-mm/20190628102003.GA56463@arrakis.emea.arm.com/

This patch (of 2):

This just defines mm_p4d_folded() to check whether P4D page table level is
folded at runtime.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1587436495-22033-2-git-send-email-anshuman.khandual@arm.com
Link: http://lkml.kernel.org/r/1588564865-31160-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:21 -07:00
Fan Yang
5bfea2d9b1 mm: Fix mremap not considering huge pmd devmap
The original code in mm/mremap.c checks huge pmd by:

		if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd)) {

However, a DAX mapped nvdimm is mapped as huge page (by default) but it
is not transparent huge page (_PAGE_PSE | PAGE_DEVMAP).  This commit
changes the condition to include the case.

This addresses CVE-2020-10757.

Fixes: 5c7fb56e5e ("mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd")
Cc: <stable@vger.kernel.org>
Reported-by: Fan Yang <Fan_Yang@sjtu.edu.cn>
Signed-off-by: Fan Yang <Fan_Yang@sjtu.edu.cn>
Tested-by: Fan Yang <Fan_Yang@sjtu.edu.cn>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:05:24 -07:00
Linus Torvalds
ee01c4d72a Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "More mm/ work, plenty more to come

  Subsystems affected by this patch series: slub, memcg, gup, kasan,
  pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
  thp, mmap, kconfig"

* akpm: (131 commits)
  arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  riscv: support DEBUG_WX
  mm: add DEBUG_WX support
  drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
  mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
  powerpc/mm: drop platform defined pmd_mknotpresent()
  mm: thp: don't need to drain lru cache when splitting and mlocking THP
  hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
  sparc32: register memory occupied by kernel as memblock.memory
  include/linux/memblock.h: fix minor typo and unclear comment
  mm, mempolicy: fix up gup usage in lookup_node
  tools/vm/page_owner_sort.c: filter out unneeded line
  mm: swap: memcg: fix memcg stats for huge pages
  mm: swap: fix vmstats for huge pages
  mm: vmscan: limit the range of LRU type balancing
  mm: vmscan: reclaim writepage is IO cost
  mm: vmscan: determine anon/file pressure balance at the reclaim root
  mm: balance LRU lists based on relative thrashing
  mm: only count actual rotations as LRU reclaim cost
  ...
2020-06-03 20:24:15 -07:00
Anshuman Khandual
86ec2da037 mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
pmd_present() is expected to test positive after pmdp_mknotpresent() as
the PMD entry still points to a valid huge page in memory.
pmdp_mknotpresent() implies that given PMD entry is just invalidated from
MMU perspective while still holding on to pmd_page() referred valid huge
page thus also clearing pmd_present() test.  This creates the following
situation which is counter intuitive.

[pmd_present(pmd_mknotpresent(pmd)) = true]

This renames pmd_mknotpresent() as pmd_mkinvalid() reflecting the helper's
functionality more accurately while changing the above mentioned situation
as follows.  This does not create any functional change.

[pmd_present(pmd_mkinvalid(pmd)) = true]

This is not applicable for platforms that define own pmdp_invalidate() via
__HAVE_ARCH_PMDP_INVALIDATE.  Suggestion for renaming came during a
previous discussion here.

https://patchwork.kernel.org/patch/11019637/

[anshuman.khandual@arm.com: change pmd_mknotvalid() to pmd_mkinvalid() per Will]
  Link: http://lkml.kernel.org/r/1587520326-10099-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Link: http://lkml.kernel.org/r/1584680057-13753-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:49 -07:00
Anshuman Khandual
5be9934328 mm/hugetlb: define a generic fallback for arch_clear_hugepage_flags()
There are multiple similar definitions for arch_clear_hugepage_flags() on
various platforms.  Lets just add it's generic fallback definition for
platforms that do not override.  This help reduce code duplication.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1588907271-11920-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:46 -07:00
Anshuman Khandual
b0eae98c66 mm/hugetlb: define a generic fallback for is_hugepage_only_range()
There are multiple similar definitions for is_hugepage_only_range() on
various platforms.  Lets just add it's generic fallback definition for
platforms that do not override.  This help reduce code duplication.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1588907271-11920-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:46 -07:00
Linus Torvalds
039aeb9deb ARM:
- Move the arch-specific code into arch/arm64/kvm
 - Start the post-32bit cleanup
 - Cherry-pick a few non-invasive pre-NV patches
 
 x86:
 - Rework of TLB flushing
 - Rework of event injection, especially with respect to nested virtualization
 - Nested AMD event injection facelift, building on the rework of generic code
 and fixing a lot of corner cases
 - Nested AMD live migration support
 - Optimization for TSC deadline MSR writes and IPIs
 - Various cleanups
 - Asynchronous page fault cleanups (from tglx, common topic branch with tip tree)
 - Interrupt-based delivery of asynchronous "page ready" events (host side)
 - Hyper-V MSRs and hypercalls for guest debugging
 - VMX preemption timer fixes
 
 s390:
 - Cleanups
 
 Generic:
 - switch vCPU thread wakeup from swait to rcuwait
 
 The other architectures, and the guest side of the asynchronous page fault
 work, will come next week.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7VJcYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPf6QgAq4wU5wdd1lTGz/i3DIhNVJNJgJlp
 ozLzRdMaJbdbn5RpAK6PEBd9+pt3+UlojpFB3gpJh2Nazv2OzV4yLQgXXXyyMEx1
 5Hg7b4UCJYDrbkCiegNRv7f/4FWDkQ9dx++RZITIbxeskBBCEI+I7GnmZhGWzuC4
 7kj4ytuKAySF2OEJu0VQF6u0CvrNYfYbQIRKBXjtOwuRK4Q6L63FGMJpYo159MBQ
 asg3B1jB5TcuGZ9zrjL5LkuzaP4qZZHIRs+4kZsH9I6MODHGUxKonrkablfKxyKy
 CFK+iaHCuEXXty5K0VmWM3nrTfvpEjVjbMc7e1QGBQ5oXsDM0pqn84syRg==
 =v7Wn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - Move the arch-specific code into arch/arm64/kvm

   - Start the post-32bit cleanup

   - Cherry-pick a few non-invasive pre-NV patches

  x86:
   - Rework of TLB flushing

   - Rework of event injection, especially with respect to nested
     virtualization

   - Nested AMD event injection facelift, building on the rework of
     generic code and fixing a lot of corner cases

   - Nested AMD live migration support

   - Optimization for TSC deadline MSR writes and IPIs

   - Various cleanups

   - Asynchronous page fault cleanups (from tglx, common topic branch
     with tip tree)

   - Interrupt-based delivery of asynchronous "page ready" events (host
     side)

   - Hyper-V MSRs and hypercalls for guest debugging

   - VMX preemption timer fixes

  s390:
   - Cleanups

  Generic:
   - switch vCPU thread wakeup from swait to rcuwait

  The other architectures, and the guest side of the asynchronous page
  fault work, will come next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits)
  KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test
  KVM: check userspace_addr for all memslots
  KVM: selftests: update hyperv_cpuid with SynDBG tests
  x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls
  x86/kvm/hyper-v: enable hypercalls regardless of hypercall page
  x86/kvm/hyper-v: Add support for synthetic debugger interface
  x86/hyper-v: Add synthetic debugger definitions
  KVM: selftests: VMX preemption timer migration test
  KVM: nVMX: Fix VMX preemption timer migration
  x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
  KVM: x86/pmu: Support full width counting
  KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in
  KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
  KVM: x86: acknowledgment mechanism for async pf page ready notifications
  KVM: x86: interrupt based APF 'page ready' event delivery
  KVM: introduce kvm_read_guest_offset_cached()
  KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
  KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
  KVM: VMX: Replace zero-length array with flexible-array
  ...
2020-06-03 15:13:47 -07:00
Linus Torvalds
6b2591c212 hyperv-next for 5.8
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl7WhbkTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXlUnB/0R8dBVSeRfNmyJaadBWKFc/LffwKLD
 CQ8PVv22ffkCaEYV2tpnhS6NmkERLNdson4Uo02tVUsjOJ4CrWHTn7aKqYWZyA+O
 qv/PiD9TBXJVYMVP2kkyaJlK5KoqeAWBr2kM16tT0cxQmlhE7g0Xo2wU9vhRbU+4
 i4F0jffe4lWps65TK392CsPr6UEv1HSel191Py5zLzYqChT+L8WfahmBt3chhsV5
 TIUJYQvBwxecFRla7yo+4sUn37ZfcTqD1hCWSr0zs4psW0ge7d80kuaNZS+EqxND
 fGm3Bp1BlUuDKsJ/D+AaHLCR47PUZ9t9iMDjZS/ovYglLFwi+h3tAV+W
 =LwVR
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyper-v updates from Wei Liu:

 - a series from Andrea to support channel reassignment

 - a series from Vitaly to clean up Vmbus message handling

 - a series from Michael to clean up and augment hyperv-tlfs.h

 - patches from Andy to clean up GUID usage in Hyper-V code

 - a few other misc patches

* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (29 commits)
  Drivers: hv: vmbus: Resolve more races involving init_vp_index()
  Drivers: hv: vmbus: Resolve race between init_vp_index() and CPU hotplug
  vmbus: Replace zero-length array with flexible-array
  Driver: hv: vmbus: drop a no long applicable comment
  hyper-v: Switch to use UUID types directly
  hyper-v: Replace open-coded variant of %*phN specifier
  hyper-v: Supply GUID pointer to printf() like functions
  hyper-v: Use UUID API for exporting the GUID (part 2)
  asm-generic/hyperv: Add definitions for Get/SetVpRegister hypercalls
  x86/hyperv: Split hyperv-tlfs.h into arch dependent and independent files
  x86/hyperv: Remove HV_PROCESSOR_POWER_STATE #defines
  KVM: x86: hyperv: Remove duplicate definitions of Reference TSC Page
  drivers: hv: remove redundant assignment to pointer primary_channel
  scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
  Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type
  Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug
  Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic
  PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality
  Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal
  hv_utils: Always execute the fcopy and vss callbacks in a tasklet
  ...
2020-06-03 15:00:05 -07:00
Al Viro
86977da9cb TEST_ACCESS_OK _never_ had been checked anywhere
Once upon a time the predecessor of that thing (TEST_VERIFY_AREA)
used to be.  However, that had been gone for years now (and
the patch that introduced TEST_ACCESS_OK has not touched any
ifdefs - they got gradually removed later).  Just bury it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-03 16:59:27 -04:00
Tony Luck
be25d1b5ea x86/cpu: Add Sapphire Rapids CPU model number
Latest edition (039) of "Intel Architecture Instruction Set Extensions
and Future Features Programming Reference" includes three new CPU model
numbers. Linux already has the two Ice Lake server ones. Add the new
model number for Sapphire Rapids.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200603173352.15506-1-tony.luck@intel.com
2020-06-03 19:53:41 +02:00
Linus Torvalds
f6aee505c7 X86 timer specific updates:
- Add TPAUSE based delay which allows the CPU to enter an optimized power
    state while waiting for the delay to pass. The delay is based on TSC
    cycles.
 
  - Add tsc_early_khz command line parameter to workaround the problem that
    overclocked CPUs can report the wrong frequency via CPUID.16h which
    causes the refined calibration to fail because the delta to the initial
    frequency value is too big. With the parameter users can provide an
    halfways accurate initial value.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7XvMITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoQ59EACWOU2E+S/b+AqKoZRAJWbTASmu2jEU
 4AukhjO3A0y+G3EqnCtvQbUbKkthScSmrDJs2Dt8CTO6q3Fqv/f5JgoubgSx9Hbj
 pF1hvueOvRBpinzGEJbDbv+HbkoCYr10DZ5dZ8uz120pSnlfSNNpgZ6hJkOFaUHu
 nwVEJpkg2x3ZsiJrgyOfdorwbxO5dCNY9YVL3jyVXUi5QfP3lYrr3/Nz6daIRtRn
 Q9tj48N4Bk4ASgmg4rSdXd6OKeZ3Oz1nerol5vFvBeaOc8PVcKSu5sSqMIHHUV2M
 RJq8T4nW5Y4pkYjpdYP7Pr/3HYbSNW6eU+MycfnJOzYYTIQfFWkG2wHDNuOg/v+A
 GC/grS6wNBj/+tZlvWTwLPf44h7V+sowzYPHBWounT/5drFZ+xsm8+Je4s2NtNih
 rbG/4oOQ2jn05PNBCCOyLuP33efQ3ub2UHPCoUxckMiX2eqI+iWpdllZLSiSADZY
 jlbXgTQ/Fa3nGKVYVDi1GYbx1rBr/HbsbgvGV4D802s7inmev0azrbgc/CECrnvO
 rEa501Y1xzxZ7Zet0QvLK/7aKP532pCmgZiBSmcnS73FBbnssvNJiHlAeq4NHtN2
 TsaGYLy0iPSj7siXEaeysUKRjUTNNrgRvtWfo35GDjWahgXhixIvVVpwxnWws5cj
 aNR5FwxnI03V2A==
 =199V
 -----END PGP SIGNATURE-----

Merge tag 'x86-timers-2020-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 timer updates from Thomas Gleixner:
 "X86 timer specific updates:

   - Add TPAUSE based delay which allows the CPU to enter an optimized
     power state while waiting for the delay to pass. The delay is based
     on TSC cycles.

   - Add tsc_early_khz command line parameter to workaround the problem
     that overclocked CPUs can report the wrong frequency via CPUID.16h
     which causes the refined calibration to fail because the delta to
     the initial frequency value is too big. With the parameter users
     can provide an halfways accurate initial value"

* tag 'x86-timers-2020-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Add tsc_early_khz command line parameter
  x86/delay: Introduce TPAUSE delay
  x86/delay: Refactor delay_mwaitx() for TPAUSE support
  x86/delay: Preparatory code cleanup
2020-06-03 10:18:09 -07:00
Linus Torvalds
bce159d734 for-5.8/drivers-2020-06-01
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7VPc4QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgQkEACnQlzWOfNQMz1AzgUAv/S8IYDJCLrkbjLZ
 JK4pJv8Hjhss/7sS+fd8kyKe9VtaZz2IjmrXcC66RMMwtpx4iHnkRffoNAgEdGOl
 /M5TCZGhs+F/mp3Lc0WdR5DFHkM6yy2Tkk9wCFLreB4bW67janAWnd7nbU4INqJj
 +WqIgpzNMc/kfUhpBYTeQLORhL4e2TG9ADTi/zeUITlpnEsA65LOgXKEpeIFYnSX
 KTl4GIZ9tjazG3Y1Eva7DYHDIErNNAtX67KBqf+WBgMV98eB0O6xIPN1WlmhDTqj
 FGMLkb8msH1HHntvxDAuc4/ortnUy8vPI4o6zKP89HJJNjIM5p5eHEuVF5JnBw42
 Rtu9Om6JqWx51nhAhJNBj9bUStYbhEl0vVQCwbkfPbDJhzTy3RR8z709q9+ZwOrL
 xbp4aJBzqrzscjBEiSQbNCf2PyuOAdU0r1x81UN81ZN41d5qUcumcinjw4Y7vru8
 z5zMlo1Iy/AWQYyu7jgHmnpI7ZyA/1Qclo5dV7aa72bLFaJa35e7QxgfQOFBA5dY
 UZl6QPJRlnB80uGRzD5jCh2O2sQ3XZqYnpaKsUAka1GgbceCp9IC4A5mfZvpACsh
 Xk8VXjlhvY/iPJsKLqrh4Oedg4Dj5M3PLL9C3MDfYeIP2qgXpbnk87UV1TPNSpY0
 QcTxsXXXIw==
 =H+/Z
 -----END PGP SIGNATURE-----

Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:
 "On top of the core changes, here are the block driver changes for this
  merge window:

   - NVMe changes:
        - NVMe over Fibre Channel protocol updates, which also reach
          over to drivers/scsi/lpfc (James Smart)
        - namespace revalidation support on the target (Anthony
          Iliopoulos)
        - gcc zero length array fix (Arnd Bergmann)
        - nvmet cleanups (Chaitanya Kulkarni)
        - misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
        - use a SRQ per completion vector (Max Gurtovoy)
        - fix handling of runtime changes to the queue count (Weiping
          Zhang)
        - t10 protection information support for nvme-rdma and
          nvmet-rdma (Israel Rukshin and Max Gurtovoy)
        - target side AEN improvements (Chaitanya Kulkarni)
        - various fixes and minor improvements all over, icluding the
          nvme part of the lpfc driver"

   - Floppy code cleanup series (Willy, Denis)

   - Floppy contention fix (Jiri)

   - Loop CONFIGURE support (Martijn)

   - bcache fixes/improvements (Coly, Joe, Colin)

   - q->queuedata cleanups (Christoph)

   - Get rid of ioctl_by_bdev (Christoph, Stefan)

   - md/raid5 allocation fixes (Coly)

   - zero length array fixes (Gustavo)

   - swim3 task state fix (Xu)"

* tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
  bcache: configure the asynchronous registertion to be experimental
  bcache: asynchronous devices registration
  bcache: fix refcount underflow in bcache_device_free()
  bcache: Convert pr_<level> uses to a more typical style
  bcache: remove redundant variables i and n
  lpfc: Fix return value in __lpfc_nvme_ls_abort
  lpfc: fix axchg pointer reference after free and double frees
  lpfc: Fix pointer checks and comments in LS receive refactoring
  nvme: set dma alignment to qword
  nvmet: cleanups the loop in nvmet_async_events_process
  nvmet: fix memory leak when removing namespaces and controllers concurrently
  nvmet-rdma: add metadata/T10-PI support
  nvmet: add metadata support for block devices
  nvmet: add metadata/T10-PI support
  nvme: add Metadata Capabilities enumerations
  nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
  nvmet: rename nvmet_rw_len to nvmet_rw_data_len
  nvmet: add metadata characteristics for a namespace
  nvme-rdma: add metadata/T10-PI support
  nvme-rdma: introduce nvme_rdma_sgl structure
  ...
2020-06-02 15:37:03 -07:00
Linus Torvalds
a5a82e0a59 platform-drivers-x86 for v5.8-1
* Add a support of  the media keys on the ASUS laptop UX325JA/UX425JA
 * ASUS WMI driver can now handle 2-in-1 models T100TA, T100CHI, T100HA, T200TA
 * Big refactoring of Intel SCU driver with Elkhart Lake support has been added
 * Slim Bootloarder firmware update signaling WMI driver has been added
 * Thinkpad ACPI driver can handle dual fan configuration on new P and X models
 * Touchscreen DMI driver has been extended to support
   - MP-man MPWIN895CL tablet
   - ONDA V891 v5 tablet
   - techBite Arc 11.6
   - Trekstor Twin 10.1
   - Trekstor Yourbook C11B
   - Vinga J116
 * Virtual Button driver got a few fixes to detect mode of 2-in-1 tablet models
 * Intel Speed Select tools update
 * Plenty of small cleanups here and there
 
 The following is an automated git shortlog grouped by driver:
 
 acerhdf:
  -  replace space by * in modalias
 
 New drivers:
  - Add Elkhart Lake SCU/PMC support
  - Add Slim Bootloader firmware update signaling driver
 
 asus-laptop:
  -  Drop duplicate check for led_classdev_unregister()
 
 asus-nb-wmi:
  -  Revert "Do not load on Asus T100TA and T200TA"
  -  Do not load on Asus T100TA and T200TA
 
 asus-wmi:
  -  Ignore WMI events with code 0x79
  -  Add support for SW_TABLET_MODE
  -  Move asus_wmi_input_init and _exit lower in the file
  -  Drop duplicate check for led_classdev_unregister()
  -  Reserve more space for struct bias_args
  -  remove redundant initialization of variable status
 
 dcdbas:
  -  Check SMBIOS for protected buffer address
 
 dell-laptop:
  -  don't register micmute LED if there is no token
 
 dell-wmi:
  -  Ignore keyboard attached / detached events
 
 device property:
  -  export set_secondary_fwnode() to modules
 
 eeepc-laptop:
  -  Drop duplicate check for led_classdev_unregister()
 
 hp-wmi:
  -  Introduce HPWMI_POWER_FW_OR_HW as convenient shortcut
  -  Convert simple_strtoul() to kstrtou32()
  -  Refactor postcode_store() to follow standard patterns
 
 intel_cht_int33fe:
  -  Fix spelling issues
  -  Switch to use acpi_dev_hid_uid_match()
  -  Convert to use set_secondary_fwnode()
  -  Convert software node array to group
 
 intel-hid:
  -  Add a quirk to support HP Spectre X2 (2015)
 
 intel_mid_powerbtn:
  -  Convert to use new SCU IPC API
 
 intel_pmc_core:
  -  avoid unused-function warnings
  -  Change Jasper Lake S0ix debug reg map back to ICL
 
 intel_pmc_ipc:
  -  Convert to MFD
  -  Move PCI IDs to intel_scu_pcidrv.c
  -  Drop intel_pmc_ipc_command()
  -  Start using SCU IPC
 
 intel_scu_ipc:
  -  Add managed function to register SCU IPC
  -  Introduce new SCU IPC API
  -  Move legacy SCU IPC API to a separate header
  -  Log more information if SCU IPC command fails
  -  Split out SCU IPC functionality from the SCU driver
 
 intel_scu_ipcutil:
  -  Convert to use new SCU IPC API
 
 intel-speed-select:
  -  Fix speed-select-base-freq-properties output on CLX-N
 
 intel_telemetry:
  -  Add telemetry_get_pltdata()
  -  Convert to use new SCU IPC API
 
 intel-vbtn:
  -  Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type
  -  Detect switch position before registering the input-device
  -  Move detect_tablet_mode() to higher in the file
  -  Fix probe failure on devices with only switches
  -  Also handle tablet-mode switch on "Detachable" and "Portable" chassis-types
  -  Do not advertise switches to userspace if they are not there
  -  Split keymap into buttons and switches parts
  -  Use acpi_evaluate_integer()
 
 ISST:
  -  Increase timeout
 
 lg-laptop:
  -  Drop duplicate check for led_classdev_unregister()
 
 MAINTAINERS:
  -  Add me as maintainer of Intel SCU drivers
  -  Update entry for Intel Broxton PMC driver
 
 Merges of immutable branches:
  - Merge branch 'for-next'
  - Merge branch 'ib-mfd-x86-usb-watchdog-v5.7'
  - Merge branch 'ib-pdx86-properties'
 
 mfd:
  -  intel_soc_pmic_mrfld: Convert to use new SCU IPC API
  -  intel_soc_pmic_bxtwc: Convert to use new SCU IPC API
  -  intel_soc_pmic: Add SCU IPC member to struct intel_soc_pmic
 
 samsung-laptop:
  -  Drop duplicate check for led_classdev_unregister()
 
 software node:
  -  Allow register and unregister software node groups
 
 sony-laptop:
  -  Make resuming thermal profile safer
  -  SNC calls should handle BUFFER types
 
 thinkpad_acpi:
  -  Replace custom approach by kstrtoint()
  -  Use strndup_user() in dispatch_proc_write()
  -  Replace next_cmd(&buf) with strsep(&buf, ",")
  -  Drop duplicate check for led_classdev_unregister()
  -  Remove always false 'value < 0' statement
  -  Add support for dual fan control
 
 tools/power/x86/intel-speed-select:
  -  Fix invalid core mask
  -  Increase CPU count
  -  Fix json perf-profile output output
  -  Update version
  -  Enable clos for turbo-freq enable
  -  Fix CLX-N package information output
  -  Check support status before enable
  -  Change debug to error
 
 toshiba_acpi:
  -  Drop duplicate check for led_classdev_unregister()
 
 touchscreen_dmi:
  -  Update Trekstor Twin 10.1 entry
  -  Add info for the Trekstor Yourbook C11B
  -  Drop comma in terminator line
  -  add Vinga J116 touchscreen
  -  Add info for the ONDA V891 v5 tablet
  -  Add touchscreen info for techBite Arc 11.6.
  -  Add info for the MP-man MPWIN895CL tablet
 
 usb:
  -  typec: mux: Convert the Intel PMC Mux driver to use new SCU IPC API
 
 watchdog:
  -  iTCO: fix link error
  -  intel-mid_wdt: Convert to use new SCU IPC API
 
 wmi:
  -  Describe function parameters
  -  Fix indentation in some cases
  -  Replace UUID redefinitions by their originals
 
 x86/platform/intel-mid:
  -  Add empty stubs for intel_scu_devices_[create|destroy]()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqaflIX74DDDzMJJtb7wzTHR8rCgFAl7WCcoACgkQb7wzTHR8
 rCi+Pg//dDpMXTxCcXivHZPJHwuAxbwPeJRV9uDKKBSnKqfxyYu37oQf8AQiLTsL
 PZOAIiwlrXw0Jd+EH79zN2DyCujBg16B6mf4dx3fMK95OWhPoslofyKRwl8kOBP5
 QRZVpuwo6ayKwXV3cyFwWjXyWYJFL7+J3x+jjBmufBsoDJTn9edOCUa3oeHG0BYB
 4A91pVKwtfNqqdL/pwd+A9mEZrFJnVilyPRoxTipbpPJqvWQi9dYgb3wHKt/1NM3
 xPNd1GQHCI0Of4NGChszY0XdN4SyxFuyLmn1mogYq82r084QA4pLROb0+VFD2npd
 DQ4jxJqOwQDtC3gm789OeN6bZ0qnkO9HBwEmzVH7rwiajZxGW7U5rCgNYBahlTgr
 gY4kXIBXyOCO2/bItmrSvWDNBvVxD/THCfL4Q/cn6bNTy4TLTHAl2psQcsXIBT6/
 Z5SdmHMhxc80eDAOTtSJj0ODeDGvAgbV20n+X260FFAsefDBuXkYMHEaRBf9n2LJ
 8k9tauXZ6JdIc4K8/K+BaVl761Okl6PJPMTL7JsFqueHpyzZS7WclCYH5QQ1iN56
 10QzddSGp+4HfFFCG2cVkjXG2AnUgT3kQgEOHyLIxp6yKY1PghFXHTEmrLuheYum
 jK93qSva5tvvZzy9UejXXsIkDyg76zaIla3rmEEYAmgzPDawR9I=
 =pprB
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.8-1' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver updates from Andy Shevchenko:

 - Add a support of the media keys on the ASUS laptop UX325JA/UX425JA

 - ASUS WMI driver can now handle 2-in-1 models T100TA, T100CHI, T100HA,
   T200TA

 - Big refactoring of Intel SCU driver with Elkhart Lake support has
   been added

 - Slim Bootloarder firmware update signaling WMI driver has been added

 - Thinkpad ACPI driver can handle dual fan configuration on new P and X
   models

 - Touchscreen DMI driver has been extended to support
    - MP-man MPWIN895CL tablet
    - ONDA V891 v5 tablet
    - techBite Arc 11.6
    - Trekstor Twin 10.1
    - Trekstor Yourbook C11B
    - Vinga J116

 - Virtual Button driver got a few fixes to detect mode of 2-in-1 tablet
   models

 - Intel Speed Select tools update

 - Plenty of small cleanups here and there

* tag 'platform-drivers-x86-v5.8-1' of git://git.infradead.org/linux-platform-drivers-x86: (89 commits)
  platform/x86: dcdbas: Check SMBIOS for protected buffer address
  platform/x86: asus_wmi: Reserve more space for struct bias_args
  platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type
  platform/x86: intel-hid: Add a quirk to support HP Spectre X2 (2015)
  platform/x86: touchscreen_dmi: Update Trekstor Twin 10.1 entry
  platform/x86: touchscreen_dmi: Add info for the Trekstor Yourbook C11B
  platform/x86: hp-wmi: Introduce HPWMI_POWER_FW_OR_HW as convenient shortcut
  platform/x86: hp-wmi: Convert simple_strtoul() to kstrtou32()
  platform/x86: hp-wmi: Refactor postcode_store() to follow standard patterns
  platform/x86: acerhdf: replace space by * in modalias
  platform/x86: ISST: Increase timeout
  tools/power/x86/intel-speed-select: Fix invalid core mask
  tools/power/x86/intel-speed-select: Increase CPU count
  tools/power/x86/intel-speed-select: Fix json perf-profile output output
  platform/x86: dell-wmi: Ignore keyboard attached / detached events
  platform/x86: dell-laptop: don't register micmute LED if there is no token
  platform/x86: thinkpad_acpi: Replace custom approach by kstrtoint()
  platform/x86: thinkpad_acpi: Use strndup_user() in dispatch_proc_write()
  platform/x86: thinkpad_acpi: Replace next_cmd(&buf) with strsep(&buf, ",")
  platform/x86: intel-vbtn: Detect switch position before registering the input-device
  ...
2020-06-02 12:56:58 -07:00
Linus Torvalds
94709049fb Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
 "A few little subsystems and a start of a lot of MM patches.

  Subsystems affected by this patch series: squashfs, ocfs2, parisc,
  vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
  swap, memcg, pagemap, memory-failure, vmalloc, kasan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits)
  kasan: move kasan_report() into report.c
  mm/mm_init.c: report kasan-tag information stored in page->flags
  ubsan: entirely disable alignment checks under UBSAN_TRAP
  kasan: fix clang compilation warning due to stack protector
  x86/mm: remove vmalloc faulting
  mm: remove vmalloc_sync_(un)mappings()
  x86/mm/32: implement arch_sync_kernel_mappings()
  x86/mm/64: implement arch_sync_kernel_mappings()
  mm/ioremap: track which page-table levels were modified
  mm/vmalloc: track which page-table levels were modified
  mm: add functions to track page directory modifications
  s390: use __vmalloc_node in stack_alloc
  powerpc: use __vmalloc_node in alloc_vm_stack
  arm64: use __vmalloc_node in arch_alloc_vmap_stack
  mm: remove vmalloc_user_node_flags
  mm: switch the test_vmalloc module to use __vmalloc_node
  mm: remove __vmalloc_node_flags_caller
  mm: remove both instances of __vmalloc_node_flags
  mm: remove the prot argument to __vmalloc_node
  mm: remove the pgprot argument to __vmalloc
  ...
2020-06-02 12:21:36 -07:00
Joerg Roedel
7f0a002b5a x86/mm: remove vmalloc faulting
Remove fault handling on vmalloc areas, as the vmalloc code now takes
care of synchronizing changes to all page-tables in the system.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200515140023.25469-8-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:12 -07:00
Joerg Roedel
86cf69f1d8 x86/mm/32: implement arch_sync_kernel_mappings()
Implement the function to sync changes in vmalloc and ioremap ranges to
all page-tables.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200515140023.25469-6-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Joerg Roedel
8e19843c36 x86/mm/64: implement arch_sync_kernel_mappings()
Implement the function to sync changes in vmalloc and ioremap ranges to
all page-tables.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200515140023.25469-5-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Christoph Hellwig
88dca4ca5a mm: remove the pgprot argument to __vmalloc
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv]
Acked-by: Gao Xiang <xiang@kernel.org> [erofs]
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Christoph Hellwig
cca98e9f8b mm: enforce that vmap can't map pages executable
To help enforcing the W^X protection don't allow remapping existing pages
as executable.

x86 bits from Peter Zijlstra, arm64 bits from Mark Rutland.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>.
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200414131348.444715-20-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Christoph Hellwig
78bb17f76e x86/hyperv: use vmalloc_exec for the hypercall page
Patch series "decruft the vmalloc API", v2.

Peter noticed that with some dumb luck you can toast the kernel address
space with exported vmalloc symbols.

I used this as an opportunity to decruft the vmalloc.c API and make it
much more systematic.  This also removes any chance to create vmalloc
mappings outside the designated areas or using executable permissions
from modules.  Besides that it removes more than 300 lines of code.

This patch (of 29):

Use the designated helper for allocating executable kernel memory, and
remove the now unused PAGE_KERNEL_RX define.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lkml.kernel.org/r/20200414131348.444715-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200414131348.444715-2-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:10 -07:00
Linus Torvalds
8b39a57e96 Merge branch 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/coredump updates from Al Viro:
 "set_fs() removal in coredump-related area - mostly Christoph's
  stuff..."

* 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  binfmt_elf_fdpic: remove the set_fs(KERNEL_DS) in elf_fdpic_core_dump
  binfmt_elf: remove the set_fs(KERNEL_DS) in elf_core_dump
  binfmt_elf: remove the set_fs in fill_siginfo_note
  signal: refactor copy_siginfo_to_user32
  powerpc/spufs: simplify spufs core dumping
  powerpc/spufs: stop using access_ok
  powerpc/spufs: fix copy_to_user while atomic
2020-06-01 16:21:46 -07:00
Linus Torvalds
4b01285e16 Merge branch 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/csum updates from Al Viro:
 "Regularize the sitation with uaccess checksum primitives:

   - fold csum_partial_... into csum_and_copy_..._user()

   - on x86 collapse several access_ok()/stac()/clac() into
     user_access_begin()/user_access_end()"

* 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  default csum_and_copy_to_user(): don't bother with access_ok()
  take the dummy csum_and_copy_from_user() into net/checksum.h
  arm: switch to csum_and_copy_from_user()
  sh32: convert to csum_and_copy_from_user()
  m68k: convert to csum_and_copy_from_user()
  xtensa: switch to providing csum_and_copy_from_user()
  sparc: switch to providing csum_and_copy_from_user()
  parisc: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
  alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
  ia64: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
  ia64: csum_partial_copy_nocheck(): don't abuse csum_partial_copy_from_user()
  x86: switch 32bit csum_and_copy_to_user() to user_access_{begin,end}()
  x86: switch both 32bit and 64bit to providing csum_and_copy_from_user()
  x86_64: csum_..._copy_..._user(): switch to unsafe_..._user()
  get rid of csum_partial_copy_to_user()
2020-06-01 16:03:37 -07:00
Linus Torvalds
533b220f7b arm64 updates for 5.8
- Branch Target Identification (BTI)
 	* Support for ARMv8.5-BTI in both user- and kernel-space. This
 	  allows branch targets to limit the types of branch from which
 	  they can be called and additionally prevents branching to
 	  arbitrary code, although kernel support requires a very recent
 	  toolchain.
 
 	* Function annotation via SYM_FUNC_START() so that assembly
 	  functions are wrapped with the relevant "landing pad"
 	  instructions.
 
 	* BPF and vDSO updates to use the new instructions.
 
 	* Addition of a new HWCAP and exposure of BTI capability to
 	  userspace via ID register emulation, along with ELF loader
 	  support for the BTI feature in .note.gnu.property.
 
 	* Non-critical fixes to CFI unwind annotations in the sigreturn
 	  trampoline.
 
 - Shadow Call Stack (SCS)
 	* Support for Clang's Shadow Call Stack feature, which reserves
 	  platform register x18 to point at a separate stack for each
 	  task that holds only return addresses. This protects function
 	  return control flow from buffer overruns on the main stack.
 
 	* Save/restore of x18 across problematic boundaries (user-mode,
 	  hypervisor, EFI, suspend, etc).
 
 	* Core support for SCS, should other architectures want to use it
 	  too.
 
 	* SCS overflow checking on context-switch as part of the existing
 	  stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.
 
 - CPU feature detection
 	* Removed numerous "SANITY CHECK" errors when running on a system
 	  with mismatched AArch32 support at EL1. This is primarily a
 	  concern for KVM, which disabled support for 32-bit guests on
 	  such a system.
 
 	* Addition of new ID registers and fields as the architecture has
 	  been extended.
 
 - Perf and PMU drivers
 	* Minor fixes and cleanups to system PMU drivers.
 
 - Hardware errata
 	* Unify KVM workarounds for VHE and nVHE configurations.
 
 	* Sort vendor errata entries in Kconfig.
 
 - Secure Monitor Call Calling Convention (SMCCC)
 	* Update to the latest specification from Arm (v1.2).
 
 	* Allow PSCI code to query the SMCCC version.
 
 - Software Delegated Exception Interface (SDEI)
 	* Unexport a bunch of unused symbols.
 
 	* Minor fixes to handling of firmware data.
 
 - Pointer authentication
 	* Add support for dumping the kernel PAC mask in vmcoreinfo so
 	  that the stack can be unwound by tools such as kdump.
 
 	* Simplification of key initialisation during CPU bringup.
 
 - BPF backend
 	* Improve immediate generation for logical and add/sub
 	  instructions.
 
 - vDSO
 	- Minor fixes to the linker flags for consistency with other
 	  architectures and support for LLVM's unwinder.
 
 	- Clean up logic to initialise and map the vDSO into userspace.
 
 - ACPI
 	- Work around for an ambiguity in the IORT specification relating
 	  to the "num_ids" field.
 
 	- Support _DMA method for all named components rather than only
 	  PCIe root complexes.
 
 	- Minor other IORT-related fixes.
 
 - Miscellaneous
 	* Initialise debug traps early for KGDB and fix KDB cacheflushing
 	  deadlock.
 
 	* Minor tweaks to early boot state (documentation update, set
 	  TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).
 
 	* Refactoring and cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl7U9csQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNLBHCACs/YU4SM7Om5f+7QnxIKao5DBr2CnGGvdC
 yTfDghFDTLQVv3MufLlfno3yBe5G8sQpcZfcc+hewfcGoMzVZXu8s7LzH6VSn9T9
 jmT3KjDMrg0RjSHzyumJp2McyelTk0a4FiKArSIIKsJSXUyb1uPSgm7SvKVDwEwU
 JGDzL9IGilmq59GiXfDzGhTZgmC37QdwRoRxDuqtqWQe5CHoRXYexg87HwBKOQxx
 HgU9L7ehri4MRZfpyjaDrr6quJo3TVnAAKXNBh3mZAskVS9ZrfKpEH0kYWYuqybv
 znKyHRecl/rrGePV8RTMtrwnSdU26zMXE/omsVVauDfG9hqzqm+Q
 =w3qi
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "A sizeable pile of arm64 updates for 5.8.

  Summary below, but the big two features are support for Branch Target
  Identification and Clang's Shadow Call stack. The latter is currently
  arm64-only, but the high-level parts are all in core code so it could
  easily be adopted by other architectures pending toolchain support

  Branch Target Identification (BTI):

   - Support for ARMv8.5-BTI in both user- and kernel-space. This allows
     branch targets to limit the types of branch from which they can be
     called and additionally prevents branching to arbitrary code,
     although kernel support requires a very recent toolchain.

   - Function annotation via SYM_FUNC_START() so that assembly functions
     are wrapped with the relevant "landing pad" instructions.

   - BPF and vDSO updates to use the new instructions.

   - Addition of a new HWCAP and exposure of BTI capability to userspace
     via ID register emulation, along with ELF loader support for the
     BTI feature in .note.gnu.property.

   - Non-critical fixes to CFI unwind annotations in the sigreturn
     trampoline.

  Shadow Call Stack (SCS):

   - Support for Clang's Shadow Call Stack feature, which reserves
     platform register x18 to point at a separate stack for each task
     that holds only return addresses. This protects function return
     control flow from buffer overruns on the main stack.

   - Save/restore of x18 across problematic boundaries (user-mode,
     hypervisor, EFI, suspend, etc).

   - Core support for SCS, should other architectures want to use it
     too.

   - SCS overflow checking on context-switch as part of the existing
     stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.

  CPU feature detection:

   - Removed numerous "SANITY CHECK" errors when running on a system
     with mismatched AArch32 support at EL1. This is primarily a concern
     for KVM, which disabled support for 32-bit guests on such a system.

   - Addition of new ID registers and fields as the architecture has
     been extended.

  Perf and PMU drivers:

   - Minor fixes and cleanups to system PMU drivers.

  Hardware errata:

   - Unify KVM workarounds for VHE and nVHE configurations.

   - Sort vendor errata entries in Kconfig.

  Secure Monitor Call Calling Convention (SMCCC):

   - Update to the latest specification from Arm (v1.2).

   - Allow PSCI code to query the SMCCC version.

  Software Delegated Exception Interface (SDEI):

   - Unexport a bunch of unused symbols.

   - Minor fixes to handling of firmware data.

  Pointer authentication:

   - Add support for dumping the kernel PAC mask in vmcoreinfo so that
     the stack can be unwound by tools such as kdump.

   - Simplification of key initialisation during CPU bringup.

  BPF backend:

   - Improve immediate generation for logical and add/sub instructions.

  vDSO:

   - Minor fixes to the linker flags for consistency with other
     architectures and support for LLVM's unwinder.

   - Clean up logic to initialise and map the vDSO into userspace.

  ACPI:

   - Work around for an ambiguity in the IORT specification relating to
     the "num_ids" field.

   - Support _DMA method for all named components rather than only PCIe
     root complexes.

   - Minor other IORT-related fixes.

  Miscellaneous:

   - Initialise debug traps early for KGDB and fix KDB cacheflushing
     deadlock.

   - Minor tweaks to early boot state (documentation update, set
     TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).

   - Refactoring and cleanup"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
  KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  KVM: arm64: Check advertised Stage-2 page size capability
  arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()
  ACPI/IORT: Remove the unused __get_pci_rid()
  arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context
  arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register
  arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register
  arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register
  arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register
  arm64/cpufeature: Add remaining feature bits in ID_PFR0 register
  arm64/cpufeature: Introduce ID_MMFR5 CPU register
  arm64/cpufeature: Introduce ID_DFR1 CPU register
  arm64/cpufeature: Introduce ID_PFR2 CPU register
  arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0
  arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
  arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register
  arm64: mm: Add asid_gen_match() helper
  firmware: smccc: Fix missing prototype warning for arm_smccc_version_init
  arm64: vdso: Fix CFI directives in sigreturn trampoline
  arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
  ...
2020-06-01 15:18:27 -07:00
Linus Torvalds
88bc1de11c This tree cleans up various aspects of the UV platform support code,
it removes unnecessary functions and cleans up the rest.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VNO4RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i61RAAogLRVi4ga4vmTk5SqUqtR4pupbHJv5IM
 IjkQN0HZ3+Oi6kRxwuOQ9xOOzQWm8GntkZeyN5FA73H7x+bYdU12MIKKTEDcW3xp
 Mg9FtzfeL0V4YmNkmlnIXycyYA3nBdSxnI/OL/58J9CLT15qXYkWjyvkbI2aJ3qL
 U8xM5cTTvhoARjd43o0eAfekTg0XdUAsgvO0vOM5+I1HrQP8SR3ZIFaMSR+MfAQx
 Nbz/UVUSDJ8BNzmS/CfFLFm0F2dkphlLC0r6eAOFZAYSIax0bRVklxV9qdScEQMK
 bkVKXGanCzVTBVM1HXDycLJaILlqcS18tK+VqNIAR5x2BXmaSG8jqwCW4NM0tcaN
 c5zemNsqnAH/VzxeFjE2BcDQnA1nkgj75Vm9O81HMQfyqR16M5pBzRXY/qBqslya
 vX5wLoD962BiVtbELqW6v+Ot29xMYlCLLlTbLHaWQraJS3TjuAvL0/sOFdWgs63F
 N7a+BLvikfYoKCS8IxW87BFBysy9nhv/4UwdaX5RpIQ1wgx/EJLDaowrM+L6Dzmw
 bhQ3AgRZGNZCBDm3uGU/LigTTxN93h5KqKnuUKv3H+tNvEKxPKEAqPyvL8fO8G0U
 BTJiM/XRIzQrkrmwCKqON1iKRjKB2fKklxiq4REIoSqKfeiQ3SVBIHENFnH96Ekf
 G9qptwFEZYI=
 =L8Vy
 -----END PGP SIGNATURE-----

Merge tag 'x86-platform-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 platform updates from Ingo Molnar:
 "This tree cleans up various aspects of the UV platform support code,
  it removes unnecessary functions and cleans up the rest"

* tag 'x86-platform-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/uv: Remove code for unused distributed GRU mode
  x86/platform/uv: Remove the unused _uv_cpu_blade_processor_id() macro
  x86/platform/uv: Unexport uv_apicid_hibits
  x86/platform/uv: Remove _uv_hub_info_check()
  x86/platform/uv: Simplify uv_send_IPI_one()
  x86/platform/uv: Mark uv_min_hub_revision_id static
  x86/platform/uv: Mark is_uv_hubless() static
  x86/platform/uv: Remove the UV*_HUB_IS_SUPPORTED macros
  x86/platform/uv: Unexport symbols only used by x2apic_uv_x.c
  x86/platform/uv: Unexport sn_coherency_id
  x86/platform/uv: Remove the uv_partition_coherence_id() macro
  x86/platform/uv: Mark uv_bios_call() and uv_bios_call_irqsave() static
2020-06-01 14:48:20 -07:00
Linus Torvalds
0a319ef75d Most of the changes here related to 'XSAVES supervisor state' support,
which is a feature that allows kernel-only data to be automatically
 saved/restored by the FPU context switching code.
 
 CPU features that can be supported this way are Intel PT, 'PASID' and
 CET features.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VMZgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jmAQ/7BJpyAHUjFJdChtkvUmLcBgI2qnxP7rc8
 Eh/tSo4PKh484Uqb4WY6XAHIAPBzEt3rHJG3fdaavzlUl98YJCdD9tstfwMPcCQ4
 L4c2Ru+h+mPQCMOZUctOphPjDzGWPzR4IhceH6gqhoS4vg9EqgN4o158x4jW6KFN
 Jlocp9CMfIaGSmaMlRrIUZ4Dj3mgboqqHsuCaibtaKAMK6LqZQDViTEal4mNbESX
 KQPOFpKrhoq6Jtzzer7fLPY2qb6kkLrL03X5IUGFP5UxigSejnfrI9SZpAuPP9S0
 kdN04Jo0T2aBIAikBTVhDWdLMJk19qeu7YXBrFEVbyhZHl1HdDqOhMdWPOp1GH9W
 CtGUalbIvz/5FbXuUImiiNh/bw2FxYjHsrDguW96IvMVFteucrFg9QyL+taYb1cV
 WqWdpIC0VoMuQxQI5FBWu4Bb/cLNV9VCxWAZjZQ806kwmyDxldsw5mucMGmH3+bO
 LD6bwRShSMRzI9bzcJSG+Z3y7Fe8b5IGNjCjzgPb88ezffBEFHzIEKdCL6QTNlRF
 6UgSGbRs41SqXwNw5tdQQNwPpDO73p+KVRGoEzyMJvojLKRGTcOHHUDriGZ30MNX
 3oHvLf5+dNrLC/frbOqUmQ7doBQOplR5VxlZVwwqkdpPw13Jf5zn4ewzriTOmKCq
 mEHMQmbkyi4=
 =M+BC
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 FPU updates from Ingo Molnar:
 "Most of the changes here related to 'XSAVES supervisor state' support,
  which is a feature that allows kernel-only data to be automatically
  saved/restored by the FPU context switching code.

  CPU features that can be supported this way are Intel PT, 'PASID' and
  CET features"

* tag 'x86-fpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu/xstate: Restore supervisor states for signal return
  x86/fpu/xstate: Preserve supervisor states for the slow path in __fpu__restore_sig()
  x86/fpu: Introduce copy_supervisor_to_kernel()
  x86/fpu/xstate: Update copy_kernel_to_xregs_err() for supervisor states
  x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstates
  x86/fpu/xstate: Define new functions for clearing fpregs and xstates
  x86/fpu/xstate: Introduce XSAVES supervisor states
  x86/fpu/xstate: Separate user and supervisor xfeatures mask
  x86/fpu/xstate: Define new macros for supervisor and user xstates
  x86/fpu/xstate: Rename validate_xstate_header() to validate_user_xstate_header()
2020-06-01 14:09:26 -07:00
Linus Torvalds
eff5ddadab Misc updates:
- Extend the x86 family/model macros with a steppings dimension,
    because x86 life isn't complex enough and Intel uses steppings to
    differentiate between different CPUs. :-/
 
  - Convert the TSC deadline timer quirks to the steppings macros.
 
  - Clean up asm mnemonics.
 
  - Fix the handling of an AMD erratum, or in other words, fix a kernel erratum.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VL2wRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hxYQ//dic//qQ+4GOL1wP1Qj8EiGOzaWdynoia
 oDi7Si1I9vy58iCCRgkQKmxEwfnHcM5+NC6/091S4BD2IE6o+iD1YhPsGZK8DT4Y
 FmeD8pgtx5LMJFMBe6KRyek1s0JblP6v0Q0BwUk7YtV6k0oSP+f/2n5BGj2+P7YH
 3Iw438M5JhIrzVp3PnCgJoZkSm9iRnZqbBtR8nd2SO+vx8M75cX27LL6fdaCypRj
 wH9w6+J2NhAZStmEv54LKOdO5RAPJjvatbTZFMEFdceAGFEbHPJIees7paoC+DTP
 3BuhzF/9ghDNKly6Zz3PtyNNDP1vglZ1W9dJkCfTXUWlZKbQV94Yk+JbP5mndxqn
 +f3eD/dInofHiCeAh1Sfj3BCGdOSjgFMBB57CKkCy4LehXwJ9C2eBcbxd4XMfEkd
 h0EywZrp1L10AxDHtq5x82xf1fwfTDyvlYmJrBshXfiitaySn+mPVJMuj3wvqJSP
 WKbJS4HfkekIaf9WoUA+Ay6FJdY7nNirViRrQEZVmDPTV0EDfcaNM5p6Ttkja3Ph
 VoVa8Ms8FRqTfh6xCfckYR+vI44U+AFNLM6YFyetGYc0yVXNzg3vLy2DbqLRolWy
 t1upDdNf1TMJg4BaMrBzZgDg/uI2BM3jeOj69U0cboO2JhJjxjl3qPeiYDKD50MK
 Z1Nho933894=
 =QKjn
 -----END PGP SIGNATURE-----

Merge tag 'x86-cpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Ingo Molnar:
 "Misc updates:

   - Extend the x86 family/model macros with a steppings dimension,
     because x86 life isn't complex enough and Intel uses steppings to
     differentiate between different CPUs. :-/

   - Convert the TSC deadline timer quirks to the steppings macros.

   - Clean up asm mnemonics.

   - Fix the handling of an AMD erratum, or in other words, fix a kernel
     erratum"

* tag 'x86-cpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Use RDRAND and RDSEED mnemonics in archrandom.h
  x86/cpu: Use INVPCID mnemonic in invpcid.h
  x86/cpu/amd: Make erratum #1054 a legacy erratum
  x86/apic: Convert the TSC deadline timer matching to steppings macro
  x86/cpu: Add a X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS() macro
  x86/cpu: Add a steppings field to struct x86_cpu_id
2020-06-01 13:57:51 -07:00
Linus Torvalds
17e0a7cb6a Misc cleanups, with an emphasis on removing obsolete/dead code.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VLcQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iFnhAArGBqco3C2RPQugv7UDDbKEaMvxOGrc5B
 kwnyOS/k/yeIkfhT9u11oBuLcaj/Zgw8YCjFyRfaNsorRqnytLyZzZ6PvdCCE3YU
 X3DVYgulcdAQnM4bS2e3Kt9ciJvFxB27XNm0AfuyLMUxMqCD+iIO4gJ6TuQNBYy3
 dfUMfB1R9OUDW13GCrASe+p1Dw76uaqVngdFWJhnC8Rm49E6gFXq7CLQp5Cka81I
 KZeJ8I6ug9p3gqhOIXdi+S6g5CM5jf86Wkk7dOHwHFH7CceFb3FIz7z0n1je4Wgd
 L5rYX7+PwfNeZ73GIuvEBN+agJH2K0H/KmnlWNWeZHzc+J12MeruSdSMBIkBOEpn
 iSbYAOmDpQLzBjTdZjC8bDqTZf472WrTh4VwN9NxHLucjdC+IqGoTAvnyyEOmZ5o
 R7sv7Q++316CVwRhYVXbzwZcqtiinCDE1EkP5nKTo9z3z0kMF5+ce/k7wn5sgZIk
 zJq3LXtaToiDoDRAPGxcvFPts9MdC0EI1aKTIjaK/n6i2h/SpJfrTKgANWaldYTe
 XJIqlSB43saqf5YAQ3/sY+wnpCRBmmCU+sfKja4C8bH7RuggI3mZS19uhFs0Qctq
 Yx5bIXVSBAIqjJtgzQ0WAAZ5LrCpNNyAzb35ZYefQlGyJlx1URKXVBmxa6S99biU
 KiYX7Dk5uhQ=
 =0ZQd
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups, with an emphasis on removing obsolete/dead code"

* tag 'x86-cleanups-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/spinlock: Remove obsolete ticket spinlock macros and types
  x86/mm: Drop deprecated DISCONTIGMEM support for 32-bit
  x86/apb_timer: Drop unused declaration and macro
  x86/apb_timer: Drop unused TSC calibration
  x86/io_apic: Remove unused function mp_init_irq_at_boot()
  x86/mm: Stop printing BRK addresses
  x86/audit: Fix a -Wmissing-prototypes warning for ia32_classify_syscall()
  x86/nmi: Remove edac.h include leftover
  mm: Remove MPX leftovers
  x86/mm/mmap: Fix -Wmissing-prototypes warnings
  x86/early_printk: Remove unused includes
  crash_dump: Remove no longer used saved_max_pfn
  x86/smpboot: Remove the last ICPU() macro
2020-06-01 13:47:10 -07:00
Linus Torvalds
58ff3b7604 The EFI changes for this cycle are:
- preliminary changes for RISC-V
  - Add support for setting the resolution on the EFI framebuffer
  - Simplify kernel image loading for arm64
  - Move .bss into .data via the linker script instead of relying on symbol
    annotations.
  - Get rid of __pure getters to access global variables
  - Clean up the config table matching arrays
  - Rename pr_efi/pr_efi_err to efi_info/efi_err, and use them consistently
  - Simplify and unify initrd loading
  - Parse the builtin command line on x86 (if provided)
  - Implement printk() support, including support for wide character strings
  - Simplify GDT handling in early mixed mode thunking code
  - Some other minor fixes and cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VANERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iLnhAApADFVx2r/PmBTaLkxTILnyC0zg03kWne
 Fs6K/npgK68M/Qz4OlXqVhirCHTVMDO4T/4hfckSe0HtzLiGtPeKwea7+ATpdeff
 iH0k1xCOO9YoUAGKLpOwNPIzR3F2EEJy0vENF/v6KFODuBNJE8Xuq6GAFs9IKoxz
 zUxFJw/QlKssr4GcxpgW5ODb2rwiP4znyDj/x6/oy81H+RPk+TuDrF/kmHmTQJVN
 MLZsx8oXQUmgZabDd8xOzQh41KWy8CpF3XbrOA+zGiT8oiRjwTJ49Mo1p53Wm0Ba
 xSxxXrPvJAT8OrZaeDCdppn0p0OLAl4fuTkAgURZKwAHJLgiERY0N0EFGoDYriVZ
 qQuBTVuVa3Njg7IKdMyXjyKqX7xmEP+6Mck9j4bg8Q/ss4T4UzpkOxe7txCc/Bqw
 lf3rzx8IvXxh5ep5H+rqcWfjfygdZrv6hJ0xVkwt1C43pVHWMXlrE+IJqvvQC2q4
 KyEkNdGdFFeWiGVo829kuOIVXNFapcSe6G+Q9aCZSLa7LCi+bSNZmi8HzPhiQEr0
 t/Z04DUDYgJ3mUfZuKJ7i1FHbuYo7t1iDwx9txeIM2u1S8E68UmgwLHRg2xLjtyu
 Rkib57mTTi6pO8cSJQVNaSh5F3h870nhrK62kfCgn8c8B5v/C9q56Q0B3hlUIvQ4
 R0T8188LdYk=
 =XG4E
 -----END PGP SIGNATURE-----

Merge tag 'efi-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI updates from Ingo Molnar:
 "The EFI changes for this cycle are:

   - preliminary changes for RISC-V

   - Add support for setting the resolution on the EFI framebuffer

   - Simplify kernel image loading for arm64

   - Move .bss into .data via the linker script instead of relying on
     symbol annotations.

   - Get rid of __pure getters to access global variables

   - Clean up the config table matching arrays

   - Rename pr_efi/pr_efi_err to efi_info/efi_err, and use them
     consistently

   - Simplify and unify initrd loading

   - Parse the builtin command line on x86 (if provided)

   - Implement printk() support, including support for wide character
     strings

   - Simplify GDT handling in early mixed mode thunking code

   - Some other minor fixes and cleanups"

* tag 'efi-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits)
  efi/x86: Don't blow away existing initrd
  efi/x86: Drop the special GDT for the EFI thunk
  efi/libstub: Add missing prototype for PE/COFF entry point
  efi/efivars: Add missing kobject_put() in sysfs entry creation error path
  efi/libstub: Use pool allocation for the command line
  efi/libstub: Don't parse overlong command lines
  efi/libstub: Use snprintf with %ls to convert the command line
  efi/libstub: Get the exact UTF-8 length
  efi/libstub: Use %ls for filename
  efi/libstub: Add UTF-8 decoding to efi_puts
  efi/printf: Add support for wchar_t (UTF-16)
  efi/gop: Add an option to list out the available GOP modes
  efi/libstub: Add definitions for console input and events
  efi/libstub: Implement printk-style logging
  efi/printf: Turn vsprintf into vsnprintf
  efi/printf: Abort on invalid format
  efi/printf: Refactor code to consolidate padding and output
  efi/printf: Handle null string input
  efi/printf: Factor out integer argument retrieval
  efi/printf: Factor out width/precision parsing
  ...
2020-06-01 13:35:27 -07:00
Linus Torvalds
a7092c8204 Kernel side changes:
- Add AMD Fam17h RAPL support
   - Introduce CAP_PERFMON to kernel and user space
   - Add Zhaoxin CPU support
   - Misc fixes and cleanups
 
 Tooling changes:
 
   perf record:
 
     - Introduce --switch-output-event to use arbitrary events to be setup
       and read from a side band thread and, when they take place a signal
       be sent to the main 'perf record' thread, reusing the --switch-output
       code to take perf.data snapshots from the --overwrite ring buffer, e.g.:
 
 	# perf record --overwrite -e sched:* \
 		      --switch-output-event syscalls:*connect* \
 		      workload
 
       will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
       connect syscalls.
 
     - Add --num-synthesize-threads option to control degree of parallelism of the
       synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be
       time consuming. This mimics pre-existing behaviour in 'perf top'.
 
   perf bench:
 
     - Add a multi-threaded synthesize benchmark.
     - Add kallsyms parsing benchmark.
 
   Intel PT support:
 
     - Stitch LBR records from multiple samples to get deeper backtraces,
       there are caveats, see the csets for details.
     - Allow using Intel PT to synthesize callchains for regular events.
     - Add support for synthesizing branch stacks for regular events (cycles,
       instructions, etc) from Intel PT data.
 
   Misc changes:
 
     - Updated perf vendor events for power9 and Coresight.
     - Add flamegraph.py script via 'perf flamegraph'
     - Misc other changes, fixes and cleanups - see the Git log for details.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VJAcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hAYw/8DFtzGkMaaWkrDSj62LXtWQiqr1l01ZFt
 9GzV4aN4/go+K4BQtsQN8cUjOkRHFnOryLuD9LfSBfqsdjuiyTynV/cJkeUGQBck
 TT/GgWf3XKJzTUBRQRk367Gbqs9UKwBP8CdFhOXcNzGEQpjhbwwIDPmem94U4L1N
 XLsysgC45ejWL1kMTZKmk6hDIidlFeDg9j70WDPX1nNfCeisk25rxwTpdgvjsjcj
 3RzPRt2EGS+IkuF4QSCT5leYSGaCpVDHCQrVpHj57UoADfWAyC71uopTLG4OgYSx
 PVd9gvloMeeqWmroirIxM67rMd/TBTfVekNolhnQDjqp60Huxm+gGUYmhsyjNqdx
 Pb8HRZCBAudei9Ue4jNMfhCRK2Ug1oL5wNvN1xcSteAqrwMlwBMGHWns6l12x0ks
 BxYhyLvfREvnKijXc1o8D5paRgqohJgfnHlrUZeacyaw5hQCbiVRpwg0T1mWAF53
 u9hfWLY0Oy+Qs2C7EInNsWSYXRw8oPQNTFVx2I968GZqsEn4DC6Pt3ovWrDKIDnz
 ugoZJQkJ3/O8stYSMiyENehdWlo575NkapCTDwhLWnYztrw4skqqHE8ighU/e8ug
 o/Kx7ANWN9OjjjQpq2GVUeT0jCaFO+OMiGMNEkKoniYgYjogt3Gw5PeedBMtY07p
 OcWTiQZamjU=
 =i27M
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Add AMD Fam17h RAPL support

   - Introduce CAP_PERFMON to kernel and user space

   - Add Zhaoxin CPU support

   - Misc fixes and cleanups

  Tooling changes:

   - perf record:

     Introduce '--switch-output-event' to use arbitrary events to be
     setup and read from a side band thread and, when they take place a
     signal be sent to the main 'perf record' thread, reusing the core
     for '--switch-output' to take perf.data snapshots from the ring
     buffer used for '--overwrite', e.g.:

	# perf record --overwrite -e sched:* \
		      --switch-output-event syscalls:*connect* \
		      workload

     will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
     connect syscalls.

     Add '--num-synthesize-threads' option to control degree of
     parallelism of the synthesize_mmap() code which is scanning
     /proc/PID/task/PID/maps and can be time consuming. This mimics
     pre-existing behaviour in 'perf top'.

   - perf bench:

     Add a multi-threaded synthesize benchmark and kallsyms parsing
     benchmark.

   - Intel PT support:

     Stitch LBR records from multiple samples to get deeper backtraces,
     there are caveats, see the csets for details.

     Allow using Intel PT to synthesize callchains for regular events.

     Add support for synthesizing branch stacks for regular events
     (cycles, instructions, etc) from Intel PT data.

  Misc changes:

   - Updated perf vendor events for power9 and Coresight.

   - Add flamegraph.py script via 'perf flamegraph'

   - Misc other changes, fixes and cleanups - see the Git log for details

  Also, since over the last couple of years perf tooling has matured and
  decoupled from the kernel perf changes to a large degree, going
  forward Arnaldo is going to send perf tooling changes via direct pull
  requests"

* tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits)
  perf/x86/rapl: Add AMD Fam17h RAPL support
  perf/x86/rapl: Make perf_probe_msr() more robust and flexible
  perf/x86/rapl: Flip logic on default events visibility
  perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs
  perf/x86/rapl: Move RAPL support to common x86 code
  perf/core: Replace zero-length array with flexible-array
  perf/x86: Replace zero-length array with flexible-array
  perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont
  perf/x86/rapl: Add Ice Lake RAPL support
  perf flamegraph: Use /bin/bash for report and record scripts
  perf cs-etm: Move definition of 'traceid_list' global variable from header file
  libsymbols kallsyms: Move hex2u64 out of header
  libsymbols kallsyms: Parse using io api
  perf bench: Add kallsyms parsing
  perf: cs-etm: Update to build with latest opencsd version.
  perf symbol: Fix kernel symbol address display
  perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  ...
2020-06-01 13:23:59 -07:00
Linus Torvalds
69fc06f70f There are a lot of objtool changes in this cycle, all across the map:
- Speed up objtool significantly, especially when there are large number of sections
  - Improve objtool's understanding of special instructions such as IRET,
    to reduce the number of annotations required
  - Implement 'noinstr' validation
  - Do baby steps for non-x86 objtool use
  - Simplify/fix retpoline decoding
  - Add vmlinux validation
  - Improve documentation
  - Fix various bugs and apply smaller cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VHvcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gEfBAAhvPWljUmfQsetYq4q9BdbuC4xPSQN9ra
 e+2zu1MQaohkjAdFM1boNVhCCGKFUvlTEEw3GJR141Us6Y/ZRS8VIo70tmVSku6I
 OwuR5i8SgEKwurr1SwLxrI05rovYWRLSaDIRTHn2CViPEjgriyFGRV8QKam3AYmI
 dx47la3ELwuQR68nIdIMzDRt49oZVy+ZKW8Pjgjklzrd5KMYsPy7HPtraHUMeDg+
 GdoC7RresIt5AFiDiIJzKTT/jROI7KuHFnM6blluKHoKenWhYBFCz3sd6IvCdQWX
 JGy+KKY6H+YDMSpgc4FRP56M3GI0hX14oCd7L72epSLfOuzPr9Tmf6wfyQ8f50Je
 LGLD47tyltIcQR9H85YdR8UQspkjSW6xcql4ByCPTEqp0UzSGTsVntvsHzwsgz6A
 Csh3s+DVdv0rk5ZjMCu8STA2oErpehJm7fmugt2oLx+nsCNCBUI25lilw5JGeq5c
 +cO0IRxRxHPeRvMWvItTjbixVAHOHYlB00ilDbvsm+GnTJgu/5cMqpXdLvfXI2Rr
 nl360bSS3t3J4w5rX0mXw4x24vjQmVrA69jU+oo8RSHje2X8Y4Q7sFHNjmN0YAI3
 Re8aP6HSLQjioJxGz9aISlrxmPOXe0CMp8JE586SREVgmS/olXtidMgi7l12uZ2B
 cRdtNYcn31U=
 =dbCU
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:
 "There are a lot of objtool changes in this cycle, all across the map:

   - Speed up objtool significantly, especially when there are large
     number of sections

   - Improve objtool's understanding of special instructions such as
     IRET, to reduce the number of annotations required

   - Implement 'noinstr' validation

   - Do baby steps for non-x86 objtool use

   - Simplify/fix retpoline decoding

   - Add vmlinux validation

   - Improve documentation

   - Fix various bugs and apply smaller cleanups"

* tag 'objtool-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  objtool: Enable compilation of objtool for all architectures
  objtool: Move struct objtool_file into arch-independent header
  objtool: Exit successfully when requesting help
  objtool: Add check_kcov_mode() to the uaccess safelist
  samples/ftrace: Fix asm function ELF annotations
  objtool: optimize add_dead_ends for split sections
  objtool: use gelf_getsymshndx to handle >64k sections
  objtool: Allow no-op CFI ops in alternatives
  x86/retpoline: Fix retpoline unwind
  x86: Change {JMP,CALL}_NOSPEC argument
  x86: Simplify retpoline declaration
  x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
  objtool: Add support for intra-function calls
  objtool: Move the IRET hack into the arch decoder
  objtool: Remove INSN_STACK
  objtool: Make handle_insn_ops() unconditional
  objtool: Rework allocating stack_ops on decode
  objtool: UNWIND_HINT_RET_OFFSET should not check registers
  objtool: is_fentry_call() crashes if call has no destination
  x86,smap: Fix smap_{save,restore}() alternatives
  ...
2020-06-01 13:13:00 -07:00
Linus Torvalds
2227e5b21a The RCU updates for this cycle were:
- RCU-tasks update, including addition of RCU Tasks Trace for
    BPF use and TASKS_RUDE_RCU
  - kfree_rcu() updates.
  - Remove scheduler locking restriction
  - RCU CPU stall warning updates.
  - Torture-test updates.
  - Miscellaneous fixes and other updates.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7U/r0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hSNxAAirKhPGBoLI9DW1qde4OFhZg+BlIpS+LD
 IE/0eGB8hGwhb1793RGbzIJfSnRQpSOPxWbWc6DJZ4Zpi5/ZbVkiPKsuXpM1xGxs
 kuBCTOhWy1/p3iCZ1JH/JCrCAdWGZkIzEoaV7ipnHtV/+UrRbCWH5PB7R0fYvcbI
 q5bUcWJyEp/bYMxQn8DhAih6SLPHx+F9qaGAqqloLSHstTYG2HkBhBGKnqcd/Jex
 twkLK53poCkeP/c08V1dyagU2IRWj2jGB1NjYh/Ocm+Sn/vru15CVGspjVjqO5FF
 oq07lad357ddMsZmKoM2F5DhXbOh95A+EqF9VDvIzCvfGMUgqYI1oxWF4eycsGhg
 /aYJgYuN23YeEe2DkDzJB67GvBOwl4WgdoFaxKRzOiCSfrhkM8KqM4G9Fz1JIepG
 abRJCF85iGcLslU9DkrShQiDsd/CRPzu/jz6ybK0I2II2pICo6QRf76T7TdOvKnK
 yXwC6OdL7/dwOht20uT6XfnDXMCWI4MutiUrb8/C1DbaihwEaI2denr3YYL+IwrB
 B38CdP6sfKZ5UFxKh0xb+sOzWrw0KA+ThSAXeJhz3tKdxdyB6nkaw3J9lFg8oi20
 XGeAujjtjMZG5cxt2H+wO9kZY0RRau/nTqNtmmRrCobd5yJjHHPHH8trEd0twZ9A
 X5Wjh11lv3E=
 =Yisx
 -----END PGP SIGNATURE-----

Merge tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The RCU updates for this cycle were:

   - RCU-tasks update, including addition of RCU Tasks Trace for BPF use
     and TASKS_RUDE_RCU

   - kfree_rcu() updates.

   - Remove scheduler locking restriction

   - RCU CPU stall warning updates.

   - Torture-test updates.

   - Miscellaneous fixes and other updates"

* tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
  rcu: Allow for smp_call_function() running callbacks from idle
  rcu: Provide rcu_irq_exit_check_preempt()
  rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()
  rcu: Provide __rcu_is_watching()
  rcu: Provide rcu_irq_exit_preempt()
  rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
  rcu/tree: Mark the idle relevant functions noinstr
  x86: Replace ist_enter() with nmi_enter()
  x86/mce: Send #MC singal from task work
  x86/entry: Get rid of ist_begin/end_non_atomic()
  sched,rcu,tracing: Avoid tracing before in_nmi() is correct
  sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception
  lockdep: Always inline lockdep_{off,on}()
  hardirq/nmi: Allow nested nmi_enter()
  arm64: Prepare arch_nmi_enter() for recursion
  printk: Disallow instrumenting print_nmi_enter()
  printk: Prepare for nested printk_nmi_enter()
  rcutorture: Convert ULONG_CMP_LT() to time_before()
  torture: Add a --kasan argument
  torture: Save a few lines by using config_override_param initially
  ...
2020-06-01 12:56:29 -07:00
Linus Torvalds
9bf9511e3d Add support for wider Memory Bandwidth Monitoring counters by querying
their width from CPUID. As a prerequsite, streamline and unify the CPUID
 detection of the respective resource control attributes. By Reinette
 Chatre.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl7VNQYACgkQEsHwGGHe
 VUqFgw//Qr6Ot21wtZAgVSCOPJ7rWH4aR6Bbq9YN0/kUGk2PbAjfFK3aMekHd4u6
 +cTc1eroOiWc5mXurbJ6gfmDgTYHj09DII/qXIi9wxWHPyAZ3cBltGXG9A86lIHZ
 hyA6l3Z+RzhnIKoATbXzaEy1nEoGMgCsezuGmN2d1KM2Fl8dmYHT+88iMDSazNb3
 D51+HHvUF03+h41A11mw7Omknycpk3wOmNX0A+t55EExW8xrYjjenY+suWvhDoGj
 +uqDVhYjxvVziI0lsAfcDfN3R3MuPMBXJXtwf9qaZODGs4ttTg/lyHRcTqGBVwrf
 xGuDw0qRLvDKy9UC09H5F7ArScx9X8bOu9NFjNA2AZyLLdhVpTa5AuX1T7VroRkD
 rcDx++EEjVEX36wBoGbwrfZTcaL3er8sPVZiXG1dDc5/GqWfP+abx4G1JPG9xRYH
 V4oghEsBAJXRfGt07PTNSTqDWu/dQWmMUIVfGWX97l+5ED+HlL24MVoThkkH6f2M
 7uAj8IRsbgzn207nZD8DNorARcQVsK2VvzgZzbqa0VsArt57Xk8DSZpfsqw8B7OJ
 OwuA/0S6nFQX9g6C0eLR68WPwv6YrLjKMEO7KJukNbaN/pcVGKXaJ/dSTwpsNQbC
 JZeAuc7DSYJ3Iv+xh64DvpL3hvb46J1UMRCWoVFw3ZQcnGV6t0E=
 =JeAm
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cache resource control updates from Borislav Petkov:
 "Add support for wider Memory Bandwidth Monitoring counters by querying
  their width from CPUID.

  As a prerequsite for that, streamline and unify the CPUID detection of
  the respective resource control attributes.

  By Reinette Chatre"

* tag 'x86_cache_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Support wider MBM counters
  x86/resctrl: Support CPUID enumeration of MBM counter width
  x86/resctrl: Maintain MBM counter width per resource
  x86/resctrl: Query LLC monitoring properties once during boot
  x86/resctrl: Remove unnecessary RMID checks
  x86/cpu: Move resctrl CPUID code to resctrl/
  x86/resctrl: Rename asm/resctrl_sched.h to asm/resctrl.h
2020-06-01 12:24:14 -07:00
Jon Doron
f97f5a56f5 x86/kvm/hyper-v: Add support for synthetic debugger interface
Add support for Hyper-V synthetic debugger (syndbg) interface.
The syndbg interface is using MSRs to emulate a way to send/recv packets
data.

The debug transport dll (kdvm/kdnet) will identify if Hyper-V is enabled
and if it supports the synthetic debugger interface it will attempt to
use it, instead of trying to initialize a network adapter.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jon Doron <arilou@gmail.com>
Message-Id: <20200529134543.1127440-4-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:11 -04:00
Jon Doron
22ad0026d0 x86/hyper-v: Add synthetic debugger definitions
Hyper-V synthetic debugger has two modes, one that uses MSRs and
the other that use Hypercalls.

Add all the required definitions to both types of synthetic debugger
interface.

Some of the required new CPUIDs and MSRs are not documented in the TLFS
so they are in hyperv.h instead.

The reason they are not documented is because they are subjected to be
removed in future versions of Windows.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Jon Doron <arilou@gmail.com>
Message-Id: <20200529134543.1127440-3-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:10 -04:00
Peter Shier
850448f35a KVM: nVMX: Fix VMX preemption timer migration
Add new field to hold preemption timer expiration deadline
appended to struct kvm_vmx_nested_state_hdr. This is to prevent
the first VM-Enter after migration from incorrectly restarting the timer
with the full timer value instead of partially decayed timer value.
KVM_SET_NESTED_STATE restarts timer using migrated state regardless
of whether L1 sets VM_EXIT_SAVE_VMX_PREEMPTION_TIMER.

Fixes: cf8b84f48a ("kvm: nVMX: Prepare for checkpointing L2 state")

Signed-off-by: Peter Shier <pshier@google.com>
Signed-off-by: Makarand Sonare <makarandsonare@google.com>
Message-Id: <20200526215107.205814-2-makarandsonare@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:10 -04:00
Like Xu
27461da310 KVM: x86/pmu: Support full width counting
Intel CPUs have a new alternative MSR range (starting from MSR_IA32_PMC0)
for GP counters that allows writing the full counter width. Enable this
range from a new capability bit (IA32_PERF_CAPABILITIES.FW_WRITE[bit 13]).

The guest would query CPUID to get the counter width, and sign extends
the counter values as needed. The traditional MSRs always limit to 32bit,
even though the counter internally is larger (48 or 57 bits).

When the new capability is set, use the alternative range which do not
have these restrictions. This lowers the overhead of perf stat slightly
because it has to do less interrupts to accumulate the counter value.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20200529074347.124619-3-like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:09 -04:00
Vitaly Kuznetsov
72de5fa4c1 KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
Introduce new capability to indicate that KVM supports interrupt based
delivery of 'page ready' APF events. This includes support for both
MSR_KVM_ASYNC_PF_INT and MSR_KVM_ASYNC_PF_ACK.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:08 -04:00
Vitaly Kuznetsov
557a961abb KVM: x86: acknowledgment mechanism for async pf page ready notifications
If two page ready notifications happen back to back the second one is not
delivered and the only mechanism we currently have is
kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
only be performed with the next vmexit when it happens and in some cases
it may take a while. With interrupt based page ready notification delivery
the situation is even worse: unlike exceptions, interrupts are not handled
immediately so we must check if the slot is empty. This is slow and
unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
the fact that the slot is free and host should check its notification
queue. Mandate using it for interrupt based 'page ready' APF event
delivery.

As kvm_check_async_pf_completion() is going away from vcpu_run() we need
a way to communicate the fact that vcpu->async_pf.done queue has
transitioned from empty to non-empty state. Introduce
kvm_arch_async_page_present_queued() and KVM_REQ_APF_READY to do the job.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:08 -04:00
Vitaly Kuznetsov
2635b5c4a0 KVM: x86: interrupt based APF 'page ready' event delivery
Concerns were expressed around APF delivery via synthetic #PF exception as
in some cases such delivery may collide with real page fault. For 'page
ready' notifications we can easily switch to using an interrupt instead.
Introduce new MSR_KVM_ASYNC_PF_INT mechanism and deprecate the legacy one.

One notable difference between the two mechanisms is that interrupt may not
get handled immediately so whenever we would like to deliver next event
(regardless of its type) we must be sure the guest had read and cleared
previous event in the slot.

While on it, get rid on 'type 1/type 2' names for APF events in the
documentation as they are causing confusion. Use 'page not present'
and 'page ready' everywhere instead.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:07 -04:00
Vitaly Kuznetsov
7c0ade6c90 KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
An innocent reader of the following x86 KVM code:

bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
{
        if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
                return true;
...

may get very confused: if APF mechanism is not enabled, why do we report
that we 'can inject async page present'? In reality, upon injection
kvm_arch_async_page_present() will check the same condition again and,
in case APF is disabled, will just drop the item. This is fine as the
guest which deliberately disabled APF doesn't expect to get any APF
notifications.

Rename kvm_arch_can_inject_async_page_present() to
kvm_arch_can_dequeue_async_page_present() to make it clear what we are
checking: if the item can be dequeued (meaning either injected or just
dropped).

On s390 kvm_arch_can_inject_async_page_present() always returns 'true' so
the rename doesn't matter much.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:07 -04:00
Vitaly Kuznetsov
68fd66f100 KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
Currently, APF mechanism relies on the #PF abuse where the token is being
passed through CR2. If we switch to using interrupts to deliver page-ready
notifications we need a different way to pass the data. Extent the existing
'struct kvm_vcpu_pv_apf_data' with token information for page-ready
notifications.

While on it, rename 'reason' to 'flags'. This doesn't change the semantics
as we only have reasons '1' and '2' and these can be treated as bit flags
but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery
making 'reason' name misleading.

The newly introduced apf_put_user_ready() temporary puts both flags and
token information, this will be changed to put token only when we switch
to interrupt based notifications.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:06 -04:00
Paolo Bonzini
cc440cdad5 KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE
Similar to VMX, the state that is captured through the currently available
IOCTLs is a mix of L1 and L2 state, dependent on whether the L2 guest was
running at the moment when the process was interrupted to save its state.

In particular, the SVM-specific state for nested virtualization includes
the L1 saved state (including the interrupt flag), the cached L2 controls,
and the GIF.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:05 -04:00
Paolo Bonzini
08245e6d2e KVM: nSVM: remove HF_HIF_MASK
The L1 flags can be found in the save area of svm->nested.hsave, fish
it from there so that there is one fewer thing to migrate.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:02 -04:00
Paolo Bonzini
e9fd761a46 KVM: nSVM: remove HF_VINTR_MASK
Now that the int_ctl field is stored in svm->nested.ctl.int_ctl, we can
use it instead of vcpu->arch.hflags to check whether L2 is running
in V_INTR_MASKING mode.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:26:02 -04:00
Paolo Bonzini
7923ef4f6e KVM: nSVM: remove trailing padding for struct vmcb_control_area
Allow placing the VMCB structs on the stack or in other structs without
wasting too much space.  Add BUILD_BUG_ON as a quick safeguard against typos.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01 04:25:59 -04:00
Linus Torvalds
8fc984aedc A pile of x86 fixes:
- Prevent a memory leak in ioperm which was caused by the stupid
     assumption that the exit cleanup is always called for current, which is
     not the case when fork fails after taking a reference on the ioperm
     bitmap.
 
   - Fix an arithmething overflow in the DMA code on 32bit systems
 
   - Fill gaps in the xstate copy with defaults instead of leaving them
     uninitialized
 
   - Revert: o"Make __X32_SYSCALL_BIT be unsigned long" as it turned out
     that existing user space fails to build.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7Tt8YTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobtTEACukhGsuivgiTwltWuHcATqrcNbgHSu
 nnhuQrjJ8KJiF5O60nDztPAVzxD+Ww2tzuDnD1BLFDI9cEA5oPhzXf7kUuJvrYUK
 INY+OALPPpw2iWjmygIsEyw3Pzmnm6peRA4h5UZSZdFxdROGGwBeGYNxowuVWFiH
 X7Fa1J4QxTI7e2X3psDVz94bOnVTPRPAR2bNpX8K8Qs+Wn1FFO92LFU04EvJTCHe
 JdN73VAS+0o0qPlPMewiuyfxaHexc8eJySMdOiysPnGRy+vagyyMPOV2Kg0DD6bp
 caDxCXNjIxXlRExV6F75s8hnl42DwXzLSzY/G7L/HVJ5r3voqcREYtXHgfenl7Jg
 8o6tEi+qFduPJ6SuRjfjPBDBF4wJvcjgmCwJaPJbMkrg8p5jH9Xg35egmEMo9cF8
 JQa2RzWJTR9XUjuPAuHJZR6f9jnle01PCznmw7Mavoed82udW1Lo32+QnvWsx6Qq
 4uuV38FqK3lsVCfFjyZir9OB9DGeuT/NETs3WJuGW5QUnC1mqfvIYipL3BkxNMKP
 IBB7n5X2iCJ545JkydepXF2I+b/i8XhNcIwYMVoSbZzBKccwCZ7zxHFNj6YAWG+M
 TN77x/+lw5zbnxhL3YzK+fgPNLio/By4Zcpmq6uppaf9Ip67SJGVq22Ef3S0w8vG
 X1inh1zqLX9hsQ==
 =DmSb
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A pile of x86 fixes:

   - Prevent a memory leak in ioperm which was caused by the stupid
     assumption that the exit cleanup is always called for current,
     which is not the case when fork fails after taking a reference on
     the ioperm bitmap.

   - Fix an arithmething overflow in the DMA code on 32bit systems

   - Fill gaps in the xstate copy with defaults instead of leaving them
     uninitialized

   - Revert: "Make __X32_SYSCALL_BIT be unsigned long" as it turned out
     that existing user space fails to build"

* tag 'x86-urgent-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioperm: Prevent a memory leak when fork fails
  x86/dma: Fix max PFN arithmetic overflow on 32 bit systems
  copy_xstate_to_kernel(): don't leave parts of destination uninitialized
  x86/syscalls: Revert "x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long"
2020-05-31 10:45:11 -07:00
Al Viro
c281a6c1ac x86: switch 32bit csum_and_copy_to_user() to user_access_{begin,end}()
consolidate HAVE_CSUM_COPY_USER for 32bit and 64bit, while are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-29 16:11:48 -04:00
Al Viro
0a5ea224b2 x86: switch both 32bit and 64bit to providing csum_and_copy_from_user()
... rather than messing with the wrapper.  As a side effect,
32bit variant gets access_ok() into it and can be switched to
user_access_begin()/user_access_end()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-29 16:11:48 -04:00
Jay Lang
4bfe6cce13 x86/ioperm: Prevent a memory leak when fork fails
In the copy_process() routine called by _do_fork(), failure to allocate
a PID (or further along in the function) will trigger an invocation to
exit_thread(). This is done to clean up from an earlier call to
copy_thread_tls(). Naturally, the child task is passed into exit_thread(),
however during the process, io_bitmap_exit() nullifies the parent's
io_bitmap rather than the child's.

As copy_thread_tls() has been called ahead of the failure, the reference
count on the calling thread's io_bitmap is incremented as we would expect.
However, io_bitmap_exit() doesn't accept any arguments, and thus assumes
it should trash the current thread's io_bitmap reference rather than the
child's. This is pretty sneaky in practice, because in all instances but
this one, exit_thread() is called with respect to the current task and
everything works out.

A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to
get a bitmap allocated, and force a clone3() syscall to fail by passing
in a zeroed clone_args structure. The kernel handles the erroneous struct
and the buggy code path is followed, and even though the parent's reference
to the io_bitmap is trashed, the child still holds a reference and thus
the structure will never be freed.

Fix this by tweaking io_bitmap_exit() and its subroutines to accept a
task_struct argument which to operate on.

Fixes: ea5f1cd7ab ("x86/ioperm: Remove bitmap if all permissions dropped")
Signed-off-by: Jay Lang <jaytlang@mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable#@vger.kernel.org
Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
2020-05-28 21:36:20 +02:00
Waiman Long
2ca41f555e x86/spinlock: Remove obsolete ticket spinlock macros and types
Even though the x86 ticket spinlock code has been removed with

  cfd8983f03 ("x86, locking/spinlocks: Remove ticket (spin)lock implementation")

a while ago, there are still some ticket spinlock specific macros and
types left in the asm/spinlock_types.h header file that are no longer
used. Remove those as well to avoid confusion.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200526122014.25241-1-longman@redhat.com
2020-05-28 21:18:40 +02:00
Alexander Dahl
8874347066 x86/dma: Fix max PFN arithmetic overflow on 32 bit systems
The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is
4 294 967 296 or 0x100000000 which is no problem on 64 bit systems.
The patch does not change the later overall result of 0x100000 for
MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new
calculation yields the same result, but does not require 64 bit
arithmetic.

On 32 bit systems the old calculation suffers from an arithmetic
overflow in that intermediate term in braces: 4UL aka unsigned long int
is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does
not fit in 4 bytes), the in braces result is truncated to zero, the
following right shift does not alter that, so MAX_DMA32_PFN evaluates to
0 on 32 bit systems.

That wrong value is a problem in a comparision against MAX_DMA32_PFN in
the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if
swiotlb should be active.  That comparison yields the opposite result,
when compiling on 32 bit systems.

This was not possible before

  1b7e03ef75 ("x86, NUMA: Enable emulation on 32bit too")

when that MAX_DMA32_PFN was first made visible to x86_32 (and which
landed in v3.0).

In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on
x86-32.

However if one has set CONFIG_IOMMU_INTEL, since

  c5a5dc4cbb ("iommu/vt-d: Don't switch off swiotlb if bounce page is used")

there's a dependency on CONFIG_SWIOTLB, which was not necessarily
active before. That landed in v5.4, where we noticed it in the fli4l
Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64
bit kernel configs there (I could not find out why, so let's just say
historical reasons).

The effect is at boot time 64 MiB (default size) were allocated for
bounce buffers now, which is a noticeable amount of memory on small
systems like pcengines ALIX 2D3 with 256 MiB memory, which are still
frequently used as home routers.

We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4
(LTS) in fli4l and got that kernel messages for example:

  Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018
  …
  Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem)
  …
  PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB)

The initial analysis and the suggested fix was done by user 'sourcejedi'
at stackoverflow and explicitly marked as GPLv2 for inclusion in the
Linux kernel:

  https://unix.stackexchange.com/a/520525/50007

The new calculation, which does not suffer from that overflow, is the
same as for arch/mips now as suggested by Robin Murphy.

The fix was tested by fli4l users on round about two dozen different
systems, including both 32 and 64 bit archs, bare metal and virtualized
machines.

 [ bp: Massage commit message. ]

Fixes: 1b7e03ef75 ("x86, NUMA: Enable emulation on 32bit too")
Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alexander Dahl <post@lespocky.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://unix.stackexchange.com/q/520065/50007
Link: https://web.nettworks.org/bugs/browse/FFL-2560
Link: https://lkml.kernel.org/r/20200526175749.20742-1-post@lespocky.de
2020-05-28 20:21:32 +02:00
Mike Rapoport
431732651c x86/mm: Drop deprecated DISCONTIGMEM support for 32-bit
The DISCONTIGMEM support was marked as deprecated in v5.2 and since there
were no complaints about it for almost 5 releases it can be completely
removed.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200223094322.15206-1-rppt@kernel.org
2020-05-28 18:34:30 +02:00
Paolo Bonzini
7c86663b68 KVM: nSVM: inject exceptions via svm_check_nested_events
This allows exceptions injected by the emulator to be properly delivered
as vmexits.  The code also becomes simpler, because we can just let all
L0-intercepted exceptions go through the usual path.  In particular, our
emulation of the VMX #DB exit qualification is very much simplified,
because the vmexit injection path can use kvm_deliver_exception_payload
to update DR6.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-28 11:46:17 -04:00
Paolo Bonzini
c9d40913ac KVM: x86: enable event window in inject_pending_event
In case an interrupt arrives after nested.check_events but before the
call to kvm_cpu_has_injectable_intr, we could end up enabling the interrupt
window even if the interrupt is actually going to be a vmexit.  This is
useless rather than harmful, but it really complicates reasoning about
SVM's handling of the VINTR intercept.  We'd like to never bother with
the VINTR intercept if V_INTR_MASKING=1 && INTERCEPT_INTR=1, because in
that case there is no interrupt window and we can just exit the nested
guest whenever we want.

This patch moves the opening of the interrupt window inside
inject_pending_event.  This consolidates the check for pending
interrupt/NMI/SMI in one place, and makes KVM's usage of immediate
exits more consistent, extending it beyond just nested virtualization.

There are two functional changes here.  They only affect corner cases,
but overall they simplify the inject_pending_event.

- re-injection of still-pending events will also use req_immediate_exit
instead of using interrupt-window intercepts.  This should have no impact
on performance on Intel since it simply replaces an interrupt-window
or NMI-window exit for a preemption-timer exit.  On AMD, which has no
equivalent of the preemption time, it may incur some overhead but an
actual effect on performance should only be visible in pathological cases.

- kvm_arch_interrupt_allowed and kvm_vcpu_has_events will return true
if an interrupt, NMI or SMI is blocked by nested_run_pending.  This
makes sense because entering the VM will allow it to make progress
and deliver the event.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-28 11:41:46 -04:00
Stephane Eranian
5cde265384 perf/x86/rapl: Add AMD Fam17h RAPL support
This patch enables AMD Fam17h RAPL support for the Package level metric.
The support is as per AMD Fam17h Model31h (Zen2) and model 00-ffh (Zen1) PPR.

The same output is available via the energy-pkg pseudo event:

  $ perf stat -a -I 1000 --per-socket -e power/energy-pkg/

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200527224659.206129-6-eranian@google.com
2020-05-28 07:58:56 +02:00
Sean Christopherson
cb97c2d680 KVM: x86: Take an unsigned 32-bit int for has_emulated_msr()'s index
Take a u32 for the index in has_emulated_msr() to match hardware, which
treats MSR indices as unsigned 32-bit values.  Functionally, taking a
signed int doesn't cause problems with the current code base, but could
theoretically cause problems with 32-bit KVM, e.g. if the index were
checked via a less-than statement, which would evaluate incorrectly for
MSR indices with bit 31 set.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200218234012.7110-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27 13:11:08 -04:00
Krzysztof Kozlowski
ed3119e455 x86: Hide the archdata.iommu field behind generic IOMMU_API
There is a generic, kernel wide configuration symbol for enabling the
IOMMU specific bits: CONFIG_IOMMU_API.  Implementations (including
INTEL_IOMMU and AMD_IOMMU driver) select it so use it here as well.

This makes the conditional archdata.iommu field consistent with other
platforms and also fixes any compile test builds of other IOMMU drivers,
when INTEL_IOMMU or AMD_IOMMU are not selected).

For the case when INTEL_IOMMU/AMD_IOMMU and COMPILE_TEST are not
selected, this should create functionally equivalent code/choice.  With
COMPILE_TEST this field could appear if other IOMMU drivers are chosen
but neither INTEL_IOMMU nor AMD_IOMMU are not.

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: e93a1695d7 ("iommu: Enable compile testing for some of drivers")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20200518120855.27822-2-krzk@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-27 16:44:05 +02:00
Johan Hovold
e027a2bc93 x86/apb_timer: Drop unused declaration and macro
Drop an extern declaration that has never been used and a no longer
needed macro.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200513100944.9171-2-johan@kernel.org
2020-05-27 13:12:49 +02:00
Johan Hovold
003d805351 x86/apb_timer: Drop unused TSC calibration
Drop the APB-timer TSC calibration, which hasn't been used since the
removal of Moorestown support by commit

  1a8359e411 ("x86/mid: Remove Intel Moorestown").

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200513100944.9171-1-johan@kernel.org
2020-05-27 13:05:59 +02:00
Andy Lutomirski
700d3a5a66 x86/syscalls: Revert "x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long"
Revert

  45e29d119e ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long")

and add a comment to discourage someone else from making the same
mistake again.

It turns out that some user code fails to compile if __X32_SYSCALL_BIT
is unsigned long. See, for example [1] below.

 [ bp: Massage and do the same thing in the respective tools/ header. ]

Fixes: 45e29d119e ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long")
Reported-by: Thorsten Glaser <t.glaser@tarent.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@kernel.org
Link: [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954294
Link: https://lkml.kernel.org/r/92e55442b744a5951fdc9cfee10badd0a5f7f828.1588983892.git.luto@kernel.org
2020-05-26 16:42:43 +02:00
Ingo Molnar
d1343da330 More EFI changes for v5.8:
- Rename pr_efi/pr_efi_err to efi_info/efi_err, and use them consistently
 - Simplify and unify initrd loading
 - Parse the builtin command line on x86 (if provided)
 - Implement printk() support, including support for wide character strings
 - Some fixes for issues introduced by the first batch of v5.8 changes
 - Fix a missing prototypes warning
 - Simplify GDT handling in early mixed mode thunking code
 - Some other minor fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl7Lb8UACgkQwjcgfpV0
 +n3/aAgAkEqqR/BoyzFiyYHujq6bXjESKYr8LrIjNWfnofB6nZqp1yXwFdL0qbj/
 PTZ1qIQAnOMmj11lvy1X894h2ZLqE6XEkqv7Xd2oxkh3fF6amlQUWfMpXUuGLo1k
 C4QGSfA0OOiM0OOi0Aqk1fL7sTmH23/j63dTR+fH8JMuYgjdls/yWNs0miqf8W2H
 ftj8fAKgHIJzFvdTC0vn1DZ6dEKczGLPEcVZ2ns2IJOJ69DsStKPLcD0mlW+EgV2
 EyfRSCQv55RYZRhdUOb+yVLRfU0M0IMDrrCDErHxZHXnQy00tmKXiEL20yuegv3u
 MUtRRw8ocn2/RskjgZkxtMjAAlty9A==
 =AwCh
 -----END PGP SIGNATURE-----

Merge tag 'efi-changes-for-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core

More EFI changes for v5.8:

 - Rename pr_efi/pr_efi_err to efi_info/efi_err, and use them consistently
 - Simplify and unify initrd loading
 - Parse the builtin command line on x86 (if provided)
 - Implement printk() support, including support for wide character strings
 - Some fixes for issues introduced by the first batch of v5.8 changes
 - Fix a missing prototypes warning
 - Simplify GDT handling in early mixed mode thunking code
 - Some other minor fixes and cleanups

Conflicts:
	drivers/firmware/efi/libstub/efistub.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-25 15:11:14 +02:00
Ingo Molnar
a5d8e55b2c Linux 5.7-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7K9iEeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGzTAH/0ifZEG4BQ8x/WlB
 8YLSLE6QQTSXYi25nyExuJbFkkKY5Tik8M2HD/36xwY/HnZOlH9jH6m0ntqZxpaA
 3EU9lr1ct79nCBMYhiJssvz8d9AOZXlyogFW9y2y9pmPjlmUtseZ7yGh1xD465cj
 B5Ty2w2W34cs7zF3og2xn5agOJMtWWXLXZ5mRa9EOquKC5zeYyRicmd0T+plYQD6
 hbRYmxFfDfppVnBCBARPNN0+NU5JJD94H+8bOuf1tl48XNrLiZMOicmtohKNQ6+W
 rZNpJNEGEp7KMtqWH0Nl3hmy3yfZHMwe1DXM/AZDqR7jTHZY4mZ0GEpLyfI9AU4n
 34jVHwU=
 =SmJ9
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc7' into efi/core, to refresh the branch and pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-25 15:10:37 +02:00
Nick Desaulniers
c071b0f11e x86: bitops: fix build regression
This is easily reproducible via CC=clang + CONFIG_STAGING=y +
CONFIG_VT6656=m.

It turns out that if your config tickles __builtin_constant_p via
differences in choices to inline or not, these statements produce
invalid assembly:

    $ cat foo.c
    long a(long b, long c) {
      asm("orb	%1, %0" : "+q"(c): "r"(b));
      return c;
    }
    $ gcc foo.c
    foo.c: Assembler messages:
    foo.c:2: Error: `%rax' not allowed with `orb'

Use the `%b` "x86 Operand Modifier" to instead force register allocation
to select a lower-8-bit GPR operand.

The "q" constraint only has meaning on -m32 otherwise is treated as
"r".  Not all GPRs have low-8-bit aliases for -m32.

Fixes: 1651e70066 ("x86: Fix bitops.h warning with a moved cast")
Reported-by: kernelci.org bot <bot@kernelci.org>
Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Suggested-by: Brian Gerst <brgerst@gmail.com>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Ilie Halip <ilie.halip@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>	[build, clang-11]
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-By: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Marco Elver <elver@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20200508183230.229464-1-ndesaulniers@google.com
Link: https://github.com/ClangBuiltLinux/linux/issues/961
Link: https://lore.kernel.org/lkml/20200504193524.GA221287@google.com/
Link: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-23 10:26:31 -07:00
Arvind Sankar
9b47c52756 efi/libstub: Add definitions for console input and events
Add the required typedefs etc for using con_in's simple text input
protocol, and for using the boottime event services.

Also add the prototype for the "stall" boot service.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200518190716.751506-19-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-05-20 19:09:20 +02:00
Michael Kelley
c55a844f46 x86/hyperv: Split hyperv-tlfs.h into arch dependent and independent files
In preparation for adding ARM64 support, split hyperv-tlfs.h into
architecture dependent and architecture independent files, similar
to what has been done with mshyperv.h. Move architecture independent
definitions into include/asm-generic/hyperv-tlfs.h.  The split will
avoid duplicating significant lines of code in the ARM64 version of
hyperv-tlfs.h.  The split has no functional impact.

Some of the common definitions have "X64" in the symbol name.  Change
these to remove the "X64" in the architecture independent version of
hyperv-tlfs.h, but add aliases with the "X64" in the x86 version so
that x86 code will continue to compile.  A later patch set will
change all the references and allow removal of the aliases.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200422195737.10223-4-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-20 09:13:58 +00:00
Michael Kelley
a8a42d0284 x86/hyperv: Remove HV_PROCESSOR_POWER_STATE #defines
The HV_PROCESSOR_POWER_STATE_C<n> #defines date back to year 2010,
but they are not in the TLFS v6.0 document and are not used anywhere
in Linux.  Remove them.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200422195737.10223-3-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-20 09:13:58 +00:00
Michael Kelley
7357b1df74 KVM: x86: hyperv: Remove duplicate definitions of Reference TSC Page
The Hyper-V Reference TSC Page structure is defined twice. struct
ms_hyperv_tsc_page has padding out to a full 4 Kbyte page size. But
the padding is not needed because the declaration includes a union
with HV_HYP_PAGE_SIZE.  KVM uses the second definition, which is
struct _HV_REFERENCE_TSC_PAGE, because it does not have the padding.

Fix the duplication by removing the padding from ms_hyperv_tsc_page.
Fix up the KVM code to use it. Remove the no longer used struct
_HV_REFERENCE_TSC_PAGE.

There is no functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20200422195737.10223-2-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-20 09:13:58 +00:00
Paolo Bonzini
9d5272f5e3 Merge tag 'noinstr-x86-kvm-2020-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD 2020-05-20 03:40:09 -04:00
Benjamin Thiel
0e5e3d4461 x86/audit: Fix a -Wmissing-prototypes warning for ia32_classify_syscall()
Lift the prototype of ia32_classify_syscall() into its own header.

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200516123816.2680-1-b.thiel@posteo.de
2020-05-19 18:03:07 +02:00
Thomas Gleixner
6bca69ada4 x86/kvm: Sanitize kvm_async_pf_task_wait()
While working on the entry consolidation I stumbled over the KVM async page
fault handler and kvm_async_pf_task_wait() in particular. It took me a
while to realize that the randomly sprinkled around rcu_irq_enter()/exit()
invocations are just cargo cult programming. Several patches "fixed" RCU
splats by curing the symptoms without noticing that the code is flawed 
from a design perspective.

The main problem is that this async injection is not based on a proper
handshake mechanism and only respects the minimal requirement, i.e. the
guest is not in a state where it has interrupts disabled.

Aside of that the actual code is a convoluted one fits it all swiss army
knife. It is invoked from different places with different RCU constraints:

  1) Host side:

     vcpu_enter_guest()
       kvm_x86_ops->handle_exit()
         kvm_handle_page_fault()
           kvm_async_pf_task_wait()

     The invocation happens from fully preemptible context.

  2) Guest side:

     The async page fault interrupted:

         a) user space

	 b) preemptible kernel code which is not in a RCU read side
	    critical section

     	 c) non-preemtible kernel code or a RCU read side critical section
	    or kernel code with CONFIG_PREEMPTION=n which allows not to
	    differentiate between #2b and #2c.

RCU is watching for:

  #1  The vCPU exited and current is definitely not the idle task

  #2a The #PF entry code on the guest went through enter_from_user_mode()
      which reactivates RCU

  #2b There is no preemptible, interrupts enabled code in the kernel
      which can run with RCU looking away. (The idle task is always
      non preemptible).

I.e. all schedulable states (#1, #2a, #2b) do not need any of this RCU
voodoo at all.

In #2c RCU is eventually not watching, but as that state cannot schedule
anyway there is no point to worry about it so it has to invoke
rcu_irq_enter() before running that code. This can be optimized, but this
will be done as an extra step in course of the entry code consolidation
work.

So the proper solution for this is to:

  - Split kvm_async_pf_task_wait() into schedule and halt based waiting
    interfaces which share the enqueueing code.

  - Add comments (condensed form of this changelog) to spare others the
    time waste and pain of reverse engineering all of this with the help of
    uncomprehensible changelogs and code history.

  - Invoke kvm_async_pf_task_wait_schedule() from kvm_handle_page_fault(),
    user mode and schedulable kernel side async page faults (#1, #2a, #2b)

  - Invoke kvm_async_pf_task_wait_halt() for the non schedulable kernel
    case (#2c).

    For this case also remove the rcu_irq_exit()/enter() pair around the
    halt as it is just a pointless exercise:

       - vCPUs can VMEXIT at any random point and can be scheduled out for
         an arbitrary amount of time by the host and this is not any
         different except that it voluntary triggers the exit via halt.

       - The interrupted context could have RCU watching already. So the
	 rcu_irq_exit() before the halt is not gaining anything aside of
	 confusing the reader. Claiming that this might prevent RCU stalls
	 is just an illusion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134059.262701431@linutronix.de
2020-05-19 15:53:58 +02:00
Andy Lutomirski
ef68017eb5 x86/kvm: Handle async page faults directly through do_page_fault()
KVM overloads #PF to indicate two types of not-actually-page-fault
events.  Right now, the KVM guest code intercepts them by modifying
the IDT and hooking the #PF vector.  This makes the already fragile
fault code even harder to understand, and it also pollutes call
traces with async_page_fault and do_async_page_fault for normal page
faults.

Clean it up by moving the logic into do_page_fault() using a static
branch.  This gets rid of the platform trap_init override mechanism
completely.

[ tglx: Fixed up 32bit, removed error code from the async functions and
  	massaged coding style ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134059.169270470@linutronix.de
2020-05-19 15:53:57 +02:00
Peter Zijlstra
0d00449c7a x86: Replace ist_enter() with nmi_enter()
A few exceptions (like #DB and #BP) can happen at any location in the code,
this then means that tracers should treat events from these exceptions as
NMI-like. The interrupted context could be holding locks with interrupts
disabled for instance.

Similarly, #MC is an actual NMI-like exception.

All of them use ist_enter() which only concerns itself with RCU, but does
not do any of the other setup that NMIs need. This means things like:

	printk()
	  raw_spin_lock_irq(&logbuf_lock);
	  <#DB/#BP/#MC>
	     printk()
	       raw_spin_lock_irq(&logbuf_lock);

are entirely possible (well, not really since printk tries hard to
play nice, but the concept stands).

So replace ist_enter() with nmi_enter(). Also observe that any nmi_enter()
caller must be both notrace and NOKPROBE, or in the noinstr text section.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134101.525508608@linutronix.de
2020-05-19 15:51:20 +02:00
Thomas Gleixner
b052df3da8 x86/entry: Get rid of ist_begin/end_non_atomic()
This is completely overengineered and definitely not an interface which
should be made available to anything else than this particular MCE case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134059.462640294@linutronix.de
2020-05-19 15:51:19 +02:00
Uros Bizjak
3d81b3d1e5 x86/cpu: Use RDRAND and RDSEED mnemonics in archrandom.h
Current minimum required version of binutils is 2.23,
which supports RDRAND and RDSEED instruction mnemonics.

Replace the byte-wise specification of RDRAND and
RDSEED with these proper mnemonics.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200508105817.207887-1-ubizjak@gmail.com
2020-05-18 19:50:47 +02:00
Ingo Molnar
7c0577f4e6 Linux 5.7-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7BzV8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGg8EH/A2pXMTxtc96RI4S
 sttEsUQqbakFS0Z/2tQPpMGr/qW2e5eHgsTX/a3SiUeZiIXk6f4lMFkMuctzBf7p
 X77cNEDwGOEdbtCXTsMcmKSde7sP2zCXsPB8xTWLyE6rnaFRgikwwkeqgkIKhp1h
 bvOQV0t9HNGvxGAM0iZeOvQAvFl4vd7nS123/MYbir9cugfQUSJRueQ4BiCiJqVE
 6cNA7/vFzDJuFGszzIrJ7HXn/IdQMMWHkvTDjgBw0GZw1mDbGFbfbZwOeTz1ojCt
 smUQ4tIFxBa/VA5zx7dOy2P2keHbSVf4VLkZRPcceT7OqVS65ETmFDp+qt5NdWM5
 vZ8+7/0=
 =CyYH
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc6' into objtool/core, to pick up fixes and resolve semantic conflict

Resolve structural conflict between:

  59566b0b62: ("x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up")

which introduced a new reference to 'ftrace_epilogue', and:

  0298739b79: ("x86,ftrace: Fix ftrace_regs_caller() unwind")

Which renamed it to 'ftrace_caller_end'. Rename the new usage site in the merge commit.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-18 13:09:37 +03:00
Linus Torvalds
43567139f5 A single fix for early boot crashes of kernels built with gcc10 and
stack protector enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl7A+q4ACgkQEsHwGGHe
 VUpvtA/+NNPKVGSKZPdDlUm64JEPy7XrbzFJ+zigWGQjUPtZsDkAT4U33eQIvV5f
 ea7vB2u+e7iRZBExgTI1JfyjTenGpBffhubR/ueawtxeTgvZSopFajHQir/VGPlJ
 KQdtqe2wZek3Wux8BsKl8vcbqhgNH/LKgQzoG2y5P1LuA77MpFkMVkAoxKqbTDbt
 Nx7j147ffZBJHfmUHz2/nWD9r0Exu+abeSPJeO4T52ImhVkr+Pd1nFS8S+mRCHMj
 uJjxL/nB/sZmDDX+EX/zA7Du3ibaVa2po9cuhMTwNIPZIpak8Yyopl64fVm/N7jH
 w0DIc1CgEaA1IkG7lwyKSgB/T6Fsg4SQp8gM4V3BkcTgVDuhTH0J/kGrOk2+YFSc
 akk3420XBS4Q54BQ547woOImabxgQXDBvqBq+DhJFwP1qSllUXbZX7rlwZ3VQ160
 sfmItVM0c4J9bgaXqZuwqHxJdgakaIECkXWZwpksQAzVxaOKpZo7drLq6SDhX9HH
 BZdm/5AhIJ5rIGaiMXsZj5cC+H341N5TlaXA+I2b0r/vVOLtbe3it1rbSsvMoZJQ
 7WOesyqFSjSObDUpXZ0riLl1X+rdrCAfzHsm5IMwLAoxmv80973johZKNZIgqIoh
 CbPdyvaJoNK8FK6gT7bw3HNJ1ILGqk53jpWH1Gr1MlfzSzErOdQ=
 =5Xi5
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:
 "A single fix for early boot crashes of kernels built with gcc10 and
  stack protector enabled"

* tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix early boot crash on gcc-10, third try
2020-05-17 11:08:29 -07:00
Linus Torvalds
5d438e071f A new testcase for guest debugging (gdbstub) that exposed a bunch of
bugs, mostly for AMD processors.  And a few other x86 fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6/0xcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOZuwf/bQZw/SP9awLjOOVsRaSWUmwRGD4q
 6KVq9+JYsPU4CyJ7P+vdsFF39a0ixoAnKWqRe/vsXdXZrdYCDUuQxh+7X+lmjKAb
 dCQBnoqxI0w3yuxrm9Kn6Xs1AGIWibaRlZnXUKbuyn4ecFrh08OfYKGkYsEovhxK
 G4ftY4/xyM7Qvm0fq7ZmzxPrkzd74HDZBvB83R6uiyPiX3w4O9qumqkUogcVXIJX
 l3mnvSPClDDX4FOr8uhnU93varuR7Bek4Fh+Abj4uNks/F3z9ooJO9Hy9E+V5fhY
 g6Oj2IrxDwJ2G6hqyucr1kujukJC1bX2nMZ1O4gNayXsxZEU/JtI0Y26SA==
 =EzBt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "A new testcase for guest debugging (gdbstub) that exposed a bunch of
  bugs, mostly for AMD processors. And a few other x86 fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce
  KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c
  KVM: SVM: Disable AVIC before setting V_IRQ
  KVM: Introduce kvm_make_all_cpus_request_except()
  KVM: VMX: pass correct DR6 for GD userspace exit
  KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6
  KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6
  KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on
  KVM: selftests: Add KVM_SET_GUEST_DEBUG test
  KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUG
  KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUG
  KVM: x86: fix DR6 delivery for various cases of #DB injection
  KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
2020-05-16 13:39:22 -07:00
Yu-cheng Yu
eeedf15336 x86/fpu: Introduce copy_supervisor_to_kernel()
The XSAVES instruction takes a mask and saves only the features specified
in that mask.  The kernel normally specifies that all features be saved.

XSAVES also unconditionally uses the "compacted format" which means that
all specified features are saved next to each other in memory.  If a
feature is removed from the mask, all the features after it will "move
up" into earlier locations in the buffer.

Introduce copy_supervisor_to_kernel(), which saves only supervisor states
and then moves those states into the standard location where they are
normally found.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200512145444.15483-9-yu-cheng.yu@intel.com
2020-05-16 11:24:14 +02:00
David Matlack
cb953129bf kvm: add halt-polling cpu usage stats
Two new stats for exposing halt-polling cpu usage:
halt_poll_success_ns
halt_poll_fail_ns

Thus sum of these 2 stats is the total cpu time spent polling. "success"
means the VCPU polled until a virtual interrupt was delivered. "fail"
means the VCPU had to schedule out (either because the maximum poll time
was reached or it needed to yield the CPU).

To avoid touching every arch's kvm_vcpu_stat struct, only update and
export halt-polling cpu usage stats if we're on x86.

Exporting cpu usage as a u64 and in nanoseconds means we will overflow at
~500 years, which seems reasonably large.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Jon Cargille <jcargill@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

Message-Id: <20200508182240.68440-1-jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:26 -04:00
Jim Mattson
93dff2fed2 KVM: nVMX: Migrate the VMX-preemption timer
The hrtimer used to emulate the VMX-preemption timer must be pinned to
the same logical processor as the vCPU thread to be interrupted if we
want to have any hope of adhering to the architectural specification
of the VMX-preemption timer. Even with this change, the emulated
VMX-preemption timer VM-exit occasionally arrives too late.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Message-Id: <20200508203643.85477-4-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:26 -04:00
Wanpeng Li
404d5d7bff KVM: X86: Introduce more exit_fastpath_completion enum values
Adds a fastpath_t typedef since enum lines are a bit long, and replace
EXIT_FASTPATH_SKIP_EMUL_INS with two new exit_fastpath_completion enum values.

- EXIT_FASTPATH_EXIT_HANDLED  kvm will still go through it's full run loop,
                              but it would skip invoking the exit handler.

- EXIT_FASTPATH_REENTER_GUEST complete fastpath, guest can be re-entered
                              without invoking the exit handler or going
                              back to vcpu_run

Tested-by: Haiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-4-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:19 -04:00
Sean Christopherson
2c4c413255 KVM: x86: Print symbolic names of VMX VM-Exit flags in traces
Use __print_flags() to display the names of VMX flags in VM-Exit traces
and strip the flags when printing the basic exit reason, e.g. so that a
failed VM-Entry due to invalid guest state gets recorded as
"INVALID_STATE FAILED_VMENTRY" instead of "0x80000021".

Opportunstically fix misaligned variables in the kvm_exit and
kvm_nested_vmexit_inject tracepoints.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200508235348.19427-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:18 -04:00
Peter Xu
dd03bcaad0 KVM: X86: Force ASYNC_PF_PER_VCPU to be power of two
Forcing the ASYNC_PF_PER_VCPU to be power of two is much easier to be
used rather than calling roundup_pow_of_two() from time to time.  Do
this by adding a BUILD_BUG_ON() inside the hash function.

Another point is that generally async pf does not allow concurrency
over ASYNC_PF_PER_VCPU after all (see kvm_setup_async_pf()), so it
does not make much sense either to have it not a power of two or some
of the entries will definitely be wasted.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200416155859.267366-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:13 -04:00
Sean Christopherson
3bae0459bc KVM: x86/mmu: Drop KVM's hugepage enums in favor of the kernel's enums
Replace KVM's PT_PAGE_TABLE_LEVEL, PT_DIRECTORY_LEVEL and PT_PDPE_LEVEL
with the kernel's PG_LEVEL_4K, PG_LEVEL_2M and PG_LEVEL_1G.  KVM's
enums are borderline impossible to remember and result in code that is
visually difficult to audit, e.g.

        if (!enable_ept)
                ept_lpage_level = 0;
        else if (cpu_has_vmx_ept_1g_page())
                ept_lpage_level = PT_PDPE_LEVEL;
        else if (cpu_has_vmx_ept_2m_page())
                ept_lpage_level = PT_DIRECTORY_LEVEL;
        else
                ept_lpage_level = PT_PAGE_TABLE_LEVEL;

versus

        if (!enable_ept)
                ept_lpage_level = 0;
        else if (cpu_has_vmx_ept_1g_page())
                ept_lpage_level = PG_LEVEL_1G;
        else if (cpu_has_vmx_ept_2m_page())
                ept_lpage_level = PG_LEVEL_2M;
        else
                ept_lpage_level = PG_LEVEL_4K;

No functional change intended.

Suggested-by: Barret Rhoden <brho@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:11 -04:00
Sean Christopherson
e662ec3e07 KVM: x86/mmu: Move max hugepage level to a separate #define
Rename PT_MAX_HUGEPAGE_LEVEL to KVM_MAX_HUGEPAGE_LEVEL and make it a
separate define in anticipation of dropping KVM's PT_*_LEVEL enums in
favor of the kernel's PG_LEVEL_* enums.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:11 -04:00
Xiaoyao Li
a71936ab46 kvm: x86: Cleanup vcpu->arch.guest_xstate_size
vcpu->arch.guest_xstate_size lost its only user since commit df1daba7d1
("KVM: x86: support XSAVES usage in the host"), so clean it up.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200429154312.1411-1-xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:10 -04:00
Sean Christopherson
68cda40d9f KVM: nVMX: Tweak handling of failure code for nested VM-Enter failure
Use an enum for passing around the failure code for a failed VM-Enter
that results in VM-Exit to provide a level of indirection from the final
resting place of the failure code, vmcs.EXIT_QUALIFICATION.  The exit
qualification field is an unsigned long, e.g. passing around
'u32 exit_qual' throws up red flags as it suggests KVM may be dropping
bits when reporting errors to L1.  This is a red herring because the
only defined failure codes are 0, 2, 3, and 4, i.e. don't come remotely
close to overflowing a u32.

Setting vmcs.EXIT_QUALIFICATION on entry failure is further complicated
by the MSR load list, which returns the (1-based) entry that failed, and
the number of MSRs to load is a 32-bit VMCS field.  At first blush, it
would appear that overflowing a u32 is possible, but the number of MSRs
that can be loaded is hardcapped at 4096 (limited by MSR_IA32_VMX_MISC).

In other words, there are two completely disparate types of data that
eventually get stuffed into vmcs.EXIT_QUALIFICATION, neither of which is
an 'unsigned long' in nature.  This was presumably the reasoning for
switching to 'u32' when the related code was refactored in commit
ca0bde28f2 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()").

Using an enum for the failure code addresses the technically-possible-
but-will-never-happen scenario where Intel defines a failure code that
doesn't fit in a 32-bit integer.  The enum variables and values will
either be automatically sized (gcc 5.4 behavior) or be subjected to some
combination of truncation.  The former case will simply work, while the
latter will trigger a compile-time warning unless the compiler is being
particularly unhelpful.

Separating the failure code from the failed MSR entry allows for
disassociating both from vmcs.EXIT_QUALIFICATION, which avoids the
conundrum where KVM has to choose between 'u32 exit_qual' and tracking
values as 'unsigned long' that have no business being tracked as such.
To cement the split, set vmcs12->exit_qualification directly from the
entry error code or failed MSR index instead of bouncing through a local
variable.

Opportunistically rename the variables in load_vmcs12_host_state() and
vmx_set_nested_state() to call out that they're ignored, set exit_reason
on demand on nested VM-Enter failure, and add a comment in
nested_vmx_load_msr() to call out that returning 'i + 1' can't wrap.

No functional change intended.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200511220529.11402-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:07:31 -04:00
Borislav Petkov
a9a3ed1eff x86: Fix early boot crash on gcc-10, third try
... or the odyssey of trying to disable the stack protector for the
function which generates the stack canary value.

The whole story started with Sergei reporting a boot crash with a kernel
built with gcc-10:

  Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139
  Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013
  Call Trace:
    dump_stack
    panic
    ? start_secondary
    __stack_chk_fail
    start_secondary
    secondary_startup_64
  -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary

This happens because gcc-10 tail-call optimizes the last function call
in start_secondary() - cpu_startup_entry() - and thus emits a stack
canary check which fails because the canary value changes after the
boot_init_stack_canary() call.

To fix that, the initial attempt was to mark the one function which
generates the stack canary with:

  __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused)

however, using the optimize attribute doesn't work cumulatively
as the attribute does not add to but rather replaces previously
supplied optimization options - roughly all -fxxx options.

The key one among them being -fno-omit-frame-pointer and thus leading to
not present frame pointer - frame pointer which the kernel needs.

The next attempt to prevent compilers from tail-call optimizing
the last function call cpu_startup_entry(), shy of carving out
start_secondary() into a separate compilation unit and building it with
-fno-stack-protector, was to add an empty asm("").

This current solution was short and sweet, and reportedly, is supported
by both compilers but we didn't get very far this time: future (LTO?)
optimization passes could potentially eliminate this, which leads us
to the third attempt: having an actual memory barrier there which the
compiler cannot ignore or move around etc.

That should hold for a long time, but hey we said that about the other
two solutions too so...

Reported-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kalle Valo <kvalo@codeaurora.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
2020-05-15 11:48:01 +02:00
Linus Torvalds
f44d5c4890 Various tracing fixes:
- Fix a crash when having function tracing and function stack tracing on
    the command line. The ftrace trampolines are created as executable and
    read only. But the stack tracer tries to modify them with text_poke()
    which expects all kernel text to still be writable at boot.
    Keep the trampolines writable at boot, and convert them to read-only
    with the rest of the kernel.
 
  - A selftest was triggering in the ring buffer iterator code, that
    is no longer valid with the update of keeping the ring buffer
    writable while a iterator is reading. Just bail after three failed
    attempts to get an event and remove the warning and disabling of the
    ring buffer.
 
  - While modifying the ring buffer code, decided to remove all the
    unnecessary BUG() calls.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXr1CDhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qsXcAQCoL229SBrtHsn4DUO7eAQRppUT3hNw
 RuKzvQ56+1GccQEAh8VGCeg89uMSK6imrTujEl6VmOUdbgrD5R96yiKoGQw=
 =vi+k
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing fixes from Steven Rostedt:
 "Various tracing fixes:

   - Fix a crash when having function tracing and function stack tracing
     on the command line.

     The ftrace trampolines are created as executable and read only. But
     the stack tracer tries to modify them with text_poke() which
     expects all kernel text to still be writable at boot. Keep the
     trampolines writable at boot, and convert them to read-only with
     the rest of the kernel.

   - A selftest was triggering in the ring buffer iterator code, that is
     no longer valid with the update of keeping the ring buffer writable
     while a iterator is reading.

     Just bail after three failed attempts to get an event and remove
     the warning and disabling of the ring buffer.

   - While modifying the ring buffer code, decided to remove all the
     unnecessary BUG() calls"

* tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Remove all BUG() calls
  ring-buffer: Don't deactivate the ring buffer on failed iterator reads
  x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
2020-05-14 11:46:52 -07:00
Yu-cheng Yu
c95473e175 x86/fpu/xstate: Update copy_kernel_to_xregs_err() for supervisor states
The function copy_kernel_to_xregs_err() uses XRSTOR which can work with
standard or compacted format without supervisor xstates. However, when
supervisor xstates are present, XRSTORS must be used. Fix it by using
XRSTORS when supervisor state handling is enabled.

I also considered if there were additional cases where XRSTOR might be
mistakenly called instead of XRSTORS.  There are only three XRSTOR sites
in the kernel:

1. copy_kernel_to_xregs_booting(), already switches between XRSTOR and
   XRSTORS based on X86_FEATURE_XSAVES.

2. copy_user_to_xregs(), which *needs* XRSTOR because it is copying from
   userspace and must never copy supervisor state with XRSTORS.

3. copy_kernel_to_xregs_err() mistakenly used XRSTOR only.  Fix it.

 [ bp: Massage commit message. ]

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-8-yu-cheng.yu@intel.com
2020-05-14 16:46:43 +02:00
Sean Christopherson
e93fd3b3e8 KVM: x86/mmu: Capture TDP level when updating CPUID
Snapshot the TDP level now that it's invariant (SVM) or dependent only
on host capabilities and guest CPUID (VMX).  This avoids having to call
kvm_x86_ops.get_tdp_level() when initializing a TDP MMU and/or
calculating the page role, and thus avoids the associated retpoline.

Drop the WARN in vmx_get_tdp_level() as updating CPUID while L2 is
active is legal, if dodgy.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-11-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:15:14 -04:00
Sean Christopherson
bd31fe495d KVM: VMX: Add proper cache tracking for CR0
Move CR0 caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.
This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0()
is called multiple times in a single VM-Exit, and more importantly
eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading
CR0, and squashes the confusing naming discrepancy of "cache_reg" vs.
"decache_cr0_guest_bits".

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:15:12 -04:00
Sean Christopherson
f98c1e7712 KVM: VMX: Add proper cache tracking for CR4
Move CR4 caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.
This avoids multiple VMREADs and retpolines (when configured) during
nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times
on each transition, e.g. when stuffing CR0 and CR3.

As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline
on SVM when reading CR4, and squashes the confusing naming discrepancy
of "cache_reg" vs. "decache_cr4_guest_bits".

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:15:10 -04:00
Sean Christopherson
56ba77a459 KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch'
Save L1's TSC offset in 'struct kvm_vcpu_arch' and drop the kvm_x86_ops
hook read_l1_tsc_offset().  This avoids a retpoline (when configured)
when reading L1's effective TSC, which is done at least once on every
VM-Exit.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:15:04 -04:00
Paolo Bonzini
c300ab9f08 KVM: x86: Replace late check_nested_events() hack with more precise fix
Add an argument to interrupt_allowed and nmi_allowed, to checking if
interrupt injection is blocked.  Use the hook to handle the case where
an interrupt arrives between check_nested_events() and the injection
logic.  Drop the retry of check_nested_events() that hack-a-fixed the
same condition.

Blocking injection is also a bit of a hack, e.g. KVM should do exiting
and non-exiting interrupt processing in a single pass, but it's a more
precise hack.  The old comment is also misleading, e.g. KVM_REQ_EVENT is
purely an optimization, setting it on every run loop (which KVM doesn't
do) should not affect functionality, only performance.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-13-sean.j.christopherson@intel.com>
[Extend to SVM, add SMI and NMI.  Even though NMI and SMI cannot come
 asynchronously right now, making the fix generic is easy and removes a
 special case. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:14:49 -04:00
Sean Christopherson
88c604b66e KVM: x86: Make return for {interrupt_nmi,smi}_allowed() a bool instead of int
Return an actual bool for kvm_x86_ops' {interrupt_nmi}_allowed() hook to
better reflect the return semantics, and to avoid creating an even
bigger mess when the related VMX code is refactored in upcoming patches.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:14:29 -04:00
Sean Christopherson
d2060bd42e KVM: nVMX: Open a window for pending nested VMX preemption timer
Add a kvm_x86_ops hook to detect a nested pending "hypervisor timer" and
use it to effectively open a window for servicing the expired timer.
Like pending SMIs on VMX, opening a window simply means requesting an
immediate exit.

This fixes a bug where an expired VMX preemption timer (for L2) will be
delayed and/or lost if a pending exception is injected into L2.  The
pending exception is rightly prioritized by vmx_check_nested_events()
and injected into L2, with the preemption timer left pending.  Because
no window opened, L2 is free to run uninterrupted.

Fixes: f4124500c2 ("KVM: nVMX: Fully emulate preemption timer")
Reported-by: Jim Mattson <jmattson@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-3-sean.j.christopherson@intel.com>
[Check it in kvm_vcpu_has_events too, to ensure that the preemption
 timer is serviced promptly even if the vCPU is halted and L1 is not
 intercepting HLT. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:14:27 -04:00
Paolo Bonzini
4aef2ec902 Merge branch 'kvm-amd-fixes' into HEAD 2020-05-13 12:14:05 -04:00
Babu Moger
37486135d3 KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c
Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
resource isn't. It can be read with XSAVE and written with XRSTOR.
So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
the guest can read the host value.

In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
potentially use XRSTOR to change the host PKRU value.

While at it, move pkru state save/restore to common code and the
host_pkru field to kvm_vcpu_arch.  This will let SVM support protection keys.

Cc: stable@vger.kernel.org
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 11:27:41 -04:00
Fenghua Yu
b860eb8dce x86/fpu/xstate: Define new functions for clearing fpregs and xstates
Currently, fpu__clear() clears all fpregs and xstates.  Once XSAVES
supervisor states are introduced, supervisor settings (e.g. CET xstates)
must remain active for signals; It is necessary to have separate functions:

- Create fpu__clear_user_states(): clear only user settings for signals;
- Create fpu__clear_all(): clear both user and supervisor settings in
   flush_thread().

Also modify copy_init_fpstate_to_fpregs() to take a mask from above two
functions.

Remove obvious side-comment in fpu__clear(), while at it.

 [ bp: Make the second argument of fpu__clear() bool after requesting it
   a bunch of times during review.
  - Add a comment about copy_init_fpstate_to_fpregs() locking needs. ]

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-6-yu-cheng.yu@intel.com
2020-05-13 13:41:50 +02:00
Yu-cheng Yu
524bb73bc1 x86/fpu/xstate: Separate user and supervisor xfeatures mask
Before the introduction of XSAVES supervisor states, 'xfeatures_mask' is
used at various places to determine XSAVE buffer components and XCR0 bits.
It contains only user xstates.  To support supervisor xstates, it is
necessary to separate user and supervisor xstates:

- First, change 'xfeatures_mask' to 'xfeatures_mask_all', which represents
  the full set of bits that should ever be set in a kernel XSAVE buffer.
- Introduce xfeatures_mask_supervisor() and xfeatures_mask_user() to
  extract relevant xfeatures from xfeatures_mask_all.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-4-yu-cheng.yu@intel.com
2020-05-13 10:31:07 +02:00
Steven Rostedt (VMware)
59566b0b62 x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
Booting one of my machines, it triggered the following crash:

 Kernel/User page tables isolation: enabled
 ftrace: allocating 36577 entries in 143 pages
 Starting tracer 'function'
 BUG: unable to handle page fault for address: ffffffffa000005c
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0003) - permissions violation
 PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061
 Oops: 0003 [#1] PREEMPT SMP PTI
 CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ #24
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
 RIP: 0010:text_poke_early+0x4a/0x58
 Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff
0 41 57 49
 RSP: 0000:ffffffff82003d38 EFLAGS: 00010046
 RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005
 RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c
 RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0
 R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0
 R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0
 FS:  0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0
 Call Trace:
  text_poke_bp+0x27/0x64
  ? mutex_lock+0x36/0x5d
  arch_ftrace_update_trampoline+0x287/0x2d5
  ? ftrace_replace_code+0x14b/0x160
  ? ftrace_update_ftrace_func+0x65/0x6c
  __register_ftrace_function+0x6d/0x81
  ftrace_startup+0x23/0xc1
  register_ftrace_function+0x20/0x37
  func_set_flag+0x59/0x77
  __set_tracer_option.isra.19+0x20/0x3e
  trace_set_options+0xd6/0x13e
  apply_trace_boot_options+0x44/0x6d
  register_tracer+0x19e/0x1ac
  early_trace_init+0x21b/0x2c9
  start_kernel+0x241/0x518
  ? load_ucode_intel_bsp+0x21/0x52
  secondary_startup_64+0xa4/0xb0

I was able to trigger it on other machines, when I added to the kernel
command line of both "ftrace=function" and "trace_options=func_stack_trace".

The cause is the "ftrace=function" would register the function tracer
and create a trampoline, and it will set it as executable and
read-only. Then the "trace_options=func_stack_trace" would then update
the same trampoline to include the stack tracer version of the function
tracer. But since the trampoline already exists, it updates it with
text_poke_bp(). The problem is that text_poke_bp() called while
system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
the page mapping, as it would think that the text is still read-write.
But in this case it is not, and we take a fault and crash.

Instead, lets keep the ftrace trampolines read-write during boot up,
and then when the kernel executable text is set to read-only, the
ftrace trampolines get set to read-only as well.

Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Fixes: 768ae4406a ("x86/ftrace: Use text_poke()")
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-12 18:24:34 -04:00
Fenghua Yu
8ab22804ef x86/fpu/xstate: Define new macros for supervisor and user xstates
XCNTXT_MASK is 'all supported xfeatures' before introducing supervisor
xstates.  Rename it to XFEATURE_MASK_USER_SUPPORTED to make clear that
these are user xstates.

Replace XFEATURE_MASK_SUPERVISOR with the following:
- XFEATURE_MASK_SUPERVISOR_SUPPORTED: Currently nothing.  ENQCMD and
  Control-flow Enforcement Technology (CET) will be introduced in separate
  series.
- XFEATURE_MASK_SUPERVISOR_UNSUPPORTED: Currently only Processor Trace.
- XFEATURE_MASK_SUPERVISOR_ALL: the combination of above.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-3-yu-cheng.yu@intel.com
2020-05-12 20:34:38 +02:00
Fenghua Yu
5274e6c172 x86/fpu/xstate: Rename validate_xstate_header() to validate_user_xstate_header()
The function validate_xstate_header() validates an xstate header coming
from userspace (PTRACE or sigreturn). To make it clear, rename it to
validate_user_xstate_header().

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200512145444.15483-2-yu-cheng.yu@intel.com
2020-05-12 20:20:32 +02:00
Willy Tarreau
38ede90831 floppy: use symbolic register names in the x86 port
Now we can use FD_STATUS and FD_DATA instead of 4 or 5, let's do
this, and also use STATUS_DMA and STATUS_READY for the status bits.

Link: https://lore.kernel.org/r/20200331094054.24441-9-w@1wt.eu
Cc: x86@kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
2020-05-12 19:34:53 +03:00
Willy Tarreau
e72e8bf1c9 floppy: split the base port from the register in I/O accesses
Currently we have architecture-specific fd_inb() and fd_outb() functions
or macros, taking just a port which is in fact made of a base address and
a register. The base address is FDC-specific and derived from the local or
global "fdc" variable through the FD_IOPORT macro used in the base address
calculation.

This change splits this by explicitly passing the FDC's base address and
the register separately to fd_outb() and fd_inb(). It affects the
following archs:
  - x86, alpha, mips, powerpc, parisc, arm, m68k:
    simple remap of port -> base+reg

  - sparc32: use of reg only, since the base address was already masked
    out and the FDC controller is known from a static struct.

  - sparc64: like x86 for PCI, like sparc32 for 82077

Some archs use inline functions and others macros. This was not
unified in order to minimize the number of changes to review. For the
same reason checkpatch still spews a few warnings about things that
were already there before.

The parisc still uses hard-coded register values and could be cleaned up
by taking the register definitions.

The sparc per-controller inb/outb functions could further be refined
to explicitly take an FDC register instead of a port in argument but it
was not needed yet and may be cleaned later.

Link: https://lore.kernel.org/r/20200331094054.24441-2-w@1wt.eu
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ian Molton <spyro@f2s.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: x86@kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
2020-05-12 19:34:52 +03:00
Uros Bizjak
7e32a9dac9 x86/cpu: Use INVPCID mnemonic in invpcid.h
The current minimum required version of binutils is 2.23, which supports
the INVPCID instruction mnemonic. Replace the byte-wise specification of
INVPCID with the proper mnemonic.

 [ bp: Add symbolic operand names for increased readability and flip
   their order like the insn expects them for the AT&T syntax. ]

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200508092247.132147-1-ubizjak@gmail.com

Signed-off-by: Borislav Petkov <bp@suse.de>
2020-05-12 16:05:30 +02:00
Linus Torvalds
c14cab2688 A set of fixes for x86:
- Ensure that direct mapping alias is always flushed when changing page
    attributes. The optimization for small ranges failed to do so when
    the virtual address was in the vmalloc or module space.
 
  - Unbreak the trace event registration for syscalls without arguments
    caused by the refactoring of the SYSCALL_DEFINE0() macro.
 
  - Move the printk in the TSC deadline timer code to a place where it is
    guaranteed to only be called once during boot and cannot be rearmed by
    clearing warn_once after boot. If it's invoked post boot then lockdep
    rightfully complains about a potential deadlock as the calling context
    is different.
 
  - A series of fixes for objtool and the ORC unwinder addressing variety
    of small issues:
 
      Stack offset tracking for indirect CFAs in objtool ignored subsequent
      pushs and pops
 
      Repair the unwind hints in the register clearing entry ASM code
 
      Make the unwinding in the low level exit to usermode code stop after
      switching to the trampoline stack. The unwind hint is not longer valid
      and the ORC unwinder emits a warning as it can't find the registers
      anymore.
 
      Fix the unwind hints in switch_to_asm() and rewind_stack_do_exit()
      which caused objtool to generate bogus ORC data.
 
      Prevent unwinder warnings when dumping the stack of a non-current
      task as there is no way to be sure about the validity because the
      dumped stack can be a moving target.
 
      Make the ORC unwinder behave the same way as the frame pointer
      unwinder when dumping an inactive tasks stack and do not skip the
      first frame.
 
      Prevent ORC unwinding before ORC data has been initialized
 
      Immediately terminate unwinding when a unknown ORC entry type is
      found.
 
      Prevent premature stop of the unwinder caused by IRET frames.
 
      Fix another infinite loop in objtool caused by a negative offset which
      was not catched.
 
      Address a few build warnings in the ORC unwinder and add missing
      static/ro_after_init annotations
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6363QTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRJHD/4hWjzJLsUZ9xq2NrzhevoeJtxj+wVM
 66x9NM3mlFQ30BN4Aye4EnNEhR0iIvNPWWdfEmaJYfPHPwnUjjcOa426HYxP/WXA
 DWd5F20wGaaPOJ65LJpy/+pfcxAeQynt4I2cDEWHAplswfOWV/Hv8mSeKAKuq400
 lCWaTMkWcO/toexSNn8PVyWi9rHlm+76E1bHkVwuoekGBGt1VloKGlK6OPyElzL2
 w9VtrjSLlYQ0MdfCJKQeg44XQPMbf4hZRfc88x9SwDWB01q7aSvb0pWNl9AJKNXA
 7fFu5T4F4PABPgRM7eJ5yNk0De9jM1y+6eCp66f9UXoNOeSr7Boz9Xc4xWqAraIi
 9Dtx3WliO9CAxwUiD+Cj2iJO5o83AdRK/xhCth2VRnYMS6imfSidEqTC+LhEtkzw
 Yplu7sbrWQDa5JTh8vk60clDvbkU+pfdxJisY+KClRguWfQfR6MJNuQnE0NHr7cH
 H4VXFFHEE6tDdJneQ9RxA4iF20RTgSlJGK0YlsH6QsxPsRgoHVkGUao8fQhrNvRc
 MIdpm9YasWStjJ7ZXbDeStmnLFN3DCj1RC8wmvJ4i/R1sPnBvPvRUt4Lm988a951
 Vyr23VIcVrE7zykiqQZVH7bvIv6ULORqTJbIOF1rO/aIut4W8z0ojoVXC0Z7CiwF
 S5SGj+hlWciIew==
 =0rCi
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Ensure that direct mapping alias is always flushed when changing
     page attributes. The optimization for small ranges failed to do so
     when the virtual address was in the vmalloc or module space.

   - Unbreak the trace event registration for syscalls without arguments
     caused by the refactoring of the SYSCALL_DEFINE0() macro.

   - Move the printk in the TSC deadline timer code to a place where it
     is guaranteed to only be called once during boot and cannot be
     rearmed by clearing warn_once after boot. If it's invoked post boot
     then lockdep rightfully complains about a potential deadlock as the
     calling context is different.

   - A series of fixes for objtool and the ORC unwinder addressing
     variety of small issues:

       - Stack offset tracking for indirect CFAs in objtool ignored
         subsequent pushs and pops

       - Repair the unwind hints in the register clearing entry ASM code

       - Make the unwinding in the low level exit to usermode code stop
         after switching to the trampoline stack. The unwind hint is no
         longer valid and the ORC unwinder emits a warning as it can't
         find the registers anymore.

       - Fix unwind hints in switch_to_asm() and rewind_stack_do_exit()
         which caused objtool to generate bogus ORC data.

       - Prevent unwinder warnings when dumping the stack of a
         non-current task as there is no way to be sure about the
         validity because the dumped stack can be a moving target.

       - Make the ORC unwinder behave the same way as the frame pointer
         unwinder when dumping an inactive tasks stack and do not skip
         the first frame.

       - Prevent ORC unwinding before ORC data has been initialized

       - Immediately terminate unwinding when a unknown ORC entry type
         is found.

       - Prevent premature stop of the unwinder caused by IRET frames.

       - Fix another infinite loop in objtool caused by a negative
         offset which was not catched.

       - Address a few build warnings in the ORC unwinder and add
         missing static/ro_after_init annotations"

* tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
  x86/apic: Move TSC deadline timer debug printk
  ftrace/x86: Fix trace event registration for syscalls without arguments
  x86/mm/cpa: Flush direct map alias during cpa
  objtool: Fix infinite loop in for_offset_range()
  x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
  x86/unwind/orc: Fix error path for bad ORC entry type
  x86/unwind/orc: Prevent unwinding before ORC initialization
  x86/unwind/orc: Don't skip the first frame for inactive tasks
  x86/unwind: Prevent false warnings for non-current tasks
  x86/unwind/orc: Convert global variables to static
  x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
  x86/entry/64: Fix unwind hints in __switch_to_asm()
  x86/entry/64: Fix unwind hints in kernel exit path
  x86/entry/64: Fix unwind hints in register clearing code
  objtool: Fix stack offset tracking for indirect CFAs
2020-05-10 11:59:53 -07:00
Paolo Bonzini
d67668e9dd KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6
There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the
different handling of DR6 on intercepted #DB exceptions on Intel and AMD.

On Intel, #DB exceptions transmit the DR6 value via the exit qualification
field of the VMCS, and the exit qualification only contains the description
of the precise event that caused a vmexit.

On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception
was to be injected into the guest.  This has two effects when guest debugging
is in use:

* the guest DR6 is clobbered

* the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather
than just the last one that happened (the testcase in the next patch covers
this issue).

This patch fixes both issues by emulating, so to speak, the Intel behavior
on AMD processors.  The important observation is that (after the previous
patches) the VMCB value of DR6 is only ever observable from the guest is
KVM_DEBUGREG_WONT_EXIT is set.  Therefore we can actually set vmcb->save.dr6
to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it
will be if guest debugging is enabled.

Therefore it is possible to enter the guest with an all-zero DR6,
reconstruct the #DB payload from the DR6 we get at exit time, and let
kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6.
Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT
is set, but this is harmless.

This may not be the most optimized way to deal with this, but it is
simple and, being confined within SVM code, it gets rid of the set_dr6
callback and kvm_update_dr6.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08 07:44:31 -04:00
Paolo Bonzini
5679b803e4 KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6
kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the
second argument.  Ensure that the VMCB value is synchronized to
vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so
that the current value of DR6 is always available in vcpu->arch.dr6.
The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08 07:43:47 -04:00
Linus Torvalds
8c16ec94dc Bugfixes, mostly for ARM and AMD, and more documentation.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6yqbIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroObBQf+NH9DCs6X92YggAoNpJl6uSIOX35X
 ErdWqYj80Xx95QU73aMukjs3Zqxe6WfYI9jPEOD8SDUZzZlVfIA35D8BYlqt1c5R
 A2K2ebTQbZ+j487QTUPbEvEivyxyVSozwvOdKBfL5kv0D9Cn2STyjVjmguUoCp9n
 VztmwbwpSZdOnexRSolwAWuyOriYbvpV12cIZpcMGrjL67yZPv8UyCxxJplDCLlB
 1c8tvGI2Md8apE/YZDqlCFh3H4YBQsact8uOoyY8cXKO/xIAsZOI+Dhm/cQAhGDk
 QIQqv/hkM4HPvOXQluwIau4Cx+Fl05xY/ggtQt4z/8yml2pOw8PKmwziZA==
 =60QX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes, mostly for ARM and AMD, and more documentation.

  Slightly bigger than usual because I couldn't send out what was
  pending for rc4, but there is nothing worrisome going on. I have more
  fixes pending for guest debugging support (gdbstub) but I will send
  them next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
  KVM: selftests: Fix build for evmcs.h
  kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
  KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
  docs/virt/kvm: Document configuring and running nested guests
  KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
  kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
  KVM: x86: Fixes posted interrupt check for IRQs delivery modes
  KVM: SVM: fill in kvm_run->debug.arch.dr[67]
  KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
  KVM: arm64: Fix 32bit PC wrap-around
  KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
  KVM: arm64: Save/restore sp_el0 as part of __guest_enter
  KVM: arm64: Delete duplicated label in invalid_vector
  KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
  KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
  KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
  KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
  KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
  KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
  ...
2020-05-07 09:50:59 -07:00
Kyung Min Park
cec5f268cd x86/delay: Introduce TPAUSE delay
TPAUSE instructs the processor to enter an implementation-dependent
optimized state. The instruction execution wakes up when the time-stamp
counter reaches or exceeds the implicit EDX:EAX 64-bit input value.
The instruction execution also wakes up due to the expiration of
the operating system time-limit or by an external interrupt
or exceptions such as a debug exception or a machine check exception.

TPAUSE offers a choice of two lower power states:
 1. Light-weight power/performance optimized state C0.1
 2. Improved power/performance optimized state C0.2

This way, it can save power with low wake-up latency in comparison to
spinloop based delay. The selection between the two is governed by the
input register.

TPAUSE is available on processors with X86_FEATURE_WAITPKG.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1587757076-30337-4-git-send-email-kyung.min.park@intel.com
2020-05-07 16:06:20 +02:00
Thomas Gleixner
e882489024 x86/delay: Preparatory code cleanup
The naming conventions in the delay code are confusing at best.

All delay variants use a loops argument and or variable which originates
from the original delay_loop() implementation. But all variants except
delay_loop() are based on TSC cycles.

Rename the argument to cycles and make it type u64 to avoid these weird
expansions to u64 in the functions.

Rename MWAITX_MAX_LOOPS to MWAITX_MAX_WAIT_CYCLES for the same reason
and fixup the comment of delay_mwaitx() as well.

Mark the delay_fn function pointer __ro_after_init and fixup the comment
for it.

No functional change and preparation for the upcoming TPAUSE based delay
variant.

[ Kyung Min Park: Added __init to use_tsc_delay() ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1587757076-30337-2-git-send-email-kyung.min.park@intel.com
2020-05-07 16:06:19 +02:00
Christoph Hellwig
2981cf8361 x86/platform/uv: Remove the unused _uv_cpu_blade_processor_id() macro
No users anywhere in the kernel tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-12-hch@lst.de
2020-05-07 15:32:23 +02:00
Christoph Hellwig
fbe1d37866 x86/platform/uv: Remove _uv_hub_info_check()
Neither this functions nor the helpers used to implement it are used
anywhere in the kernel tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-10-hch@lst.de
2020-05-07 15:32:23 +02:00
Christoph Hellwig
8e77554580 x86/platform/uv: Simplify uv_send_IPI_one()
Merge two helpers only used by uv_send_IPI_one() into the main function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-9-hch@lst.de
2020-05-07 15:32:22 +02:00
Christoph Hellwig
e4dd8b8351 x86/platform/uv: Mark is_uv_hubless() static
is_uv_hubless() is only used in x2apic_uv_x.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-7-hch@lst.de
2020-05-07 15:32:21 +02:00
Christoph Hellwig
cc19910587 x86/platform/uv: Remove the UV*_HUB_IS_SUPPORTED macros
All of the macros are always defined to one.  Remove them and the dead
code keyed off them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-6-hch@lst.de
2020-05-07 15:32:21 +02:00
Christoph Hellwig
32988cfd57 x86/platform/uv: Remove the uv_partition_coherence_id() macro
uv_partition_coherence_id() is only used once.  Just open code it in the
only user.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-3-hch@lst.de
2020-05-07 15:32:19 +02:00
Christoph Hellwig
30ad8db3a2 x86/platform/uv: Mark uv_bios_call() and uv_bios_call_irqsave() static
Both functions are only used inside of bios_uv.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-2-hch@lst.de
2020-05-07 15:32:19 +02:00
Borislav Petkov
d8422f6bb0 x86/cpu: Add a X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS() macro
... to match Intel family 6 CPUs with steppings.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Link: https://lkml.kernel.org/r/20200506071516.25445-3-bp@alien8.de
2020-05-07 13:48:05 +02:00
Borislav Petkov
51485635eb Merge 'x86/urgent' into x86/cpu
... to resolve conflicting changes to arch/x86/kernel/apic/apic.c

Signed-off-by: Borislav Petkov <bp@suse.de>
2020-05-07 12:27:43 +02:00
Paolo Bonzini
4d5523cfd5 KVM: x86: fix DR6 delivery for various cases of #DB injection
Go through kvm_queue_exception_p so that the payload is correctly delivered
through the exit qualification, and add a kvm_update_dr6 call to
kvm_deliver_exception_payload that is needed on AMD.

Reported-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07 06:13:41 -04:00
Reinette Chatre
f3d44f18b0 x86/resctrl: Support CPUID enumeration of MBM counter width
The original Memory Bandwidth Monitoring (MBM) architectural
definition defines counters of up to 62 bits in the
IA32_QM_CTR MSR while the first-generation MBM implementation
uses statically defined 24 bit counters.

Expand the MBM CPUID enumeration properties to include the MBM
counter width. The previously undefined EAX output register contains,
in bits [7:0], the MBM counter width encoded as an offset from
24 bits. Enumerating this property is only specified for Intel
CPUs.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.com
2020-05-06 18:02:41 +02:00
Reinette Chatre
0118ad82c2 x86/cpu: Move resctrl CPUID code to resctrl/
The function determining a platform's support and properties of cache
occupancy and memory bandwidth monitoring (properties of
X86_FEATURE_CQM_LLC) can be found among the common CPU code. After
the feature's properties is populated in the per-CPU data the resctrl
subsystem is the only consumer (via boot_cpu_data).

Move the function that obtains the CPU information used by resctrl to
the resctrl subsystem and rename it from init_cqm() to
resctrl_cpu_detect(). The function continues to be called from the
common CPU code. This move is done in preparation of the addition of some
vendor specific code.

No functional change.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/38433b99f9d16c8f4ee796f8cc42b871531fa203.1588715690.git.reinette.chatre@intel.com
2020-05-06 17:51:21 +02:00
Reinette Chatre
8dd97c6518 x86/resctrl: Rename asm/resctrl_sched.h to asm/resctrl.h
asm/resctrl_sched.h is dedicated to the code used for configuration
of the CPU resource control state when a task is scheduled.

Rename resctrl_sched.h to resctrl.h in preparation of additions that
will no longer make this file dedicated to work done during scheduling.

No functional change.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/6914e0ef880b539a82a6d889f9423496d471ad1d.1588715690.git.reinette.chatre@intel.com
2020-05-06 17:45:22 +02:00
Christoph Hellwig
c3b3f52476 signal: refactor copy_siginfo_to_user32
Factor out a copy_siginfo_to_external32 helper from
copy_siginfo_to_user32 that fills out the compat_siginfo, but does so
on a kernel space data structure.  With that we can let architectures
override copy_siginfo_to_user32 with their own implementations using
copy_siginfo_to_external32.  That allows moving the x32 SIGCHLD purely
to x86 architecture code.

As a nice side effect copy_siginfo_to_external32 also comes in handy
for avoiding a set_fs() call in the coredump code later on.

Contains improvements from Eric W. Biederman <ebiederm@xmission.com>
and Arnd Bergmann <arnd@arndb.de>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-05 16:46:09 -04:00
Arvind Sankar
de8c55208c efi/libstub: Fix mixed mode boot issue after macro refactor
Commit

  22090f84bc ("efi/libstub: unify EFI call wrappers for non-x86")

refactored the macros that are used to provide wrappers for mixed-mode
calls on x86, allowing us to boot a 64-bit kernel on 32-bit firmware.

Unfortunately, this broke mixed mode boot due to the fact that
efi_is_native() is not a macro on x86.

All of these macros should go together, so rather than testing each one
to see if it is defined, condition the generic macro definitions on a
new ARCH_HAS_EFISTUB_WRAPPERS, and remove the wrapper definitions on x86
as well if CONFIG_EFI_MIXED is not enabled.

Fixes: 22090f84bc ("efi/libstub: unify EFI call wrappers for non-x86")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200504150248.62482-1-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-05-05 09:25:39 +02:00
Suravee Suthikulpanit
637543a8d6 KVM: x86: Fixes posted interrupt check for IRQs delivery modes
Current logic incorrectly uses the enum ioapic_irq_destination_types
to check the posted interrupt destination types. However, the value was
set using APIC_DM_XXX macros, which are left-shifted by 8 bits.

Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead.

Fixes: (fdcf756213 'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes')
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <1586239989-58305-1-git-send-email-suravee.suthikulpanit@amd.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04 12:16:51 -04:00
Borislav Petkov
bd1de2a7aa x86/tlb/uv: Add a forward declaration for struct flush_tlb_info
... to fix these build warnings:

  In file included from ./arch/x86/include/asm/uv/uv_hub.h:22,
                   from drivers/misc/sgi-gru/grukdump.c:16:
  ./arch/x86/include/asm/uv/uv.h:39:21: warning: ‘struct flush_tlb_info’ declared \
     inside parameter list will not be visible outside of this definition or declaration
     39 |        const struct flush_tlb_info *info);
        |                     ^~~~~~~~~~~~~~
  In file included from ./arch/x86/include/asm/uv/uv_hub.h:22,
                   from drivers/misc/sgi-gru/grutlbpurge.c:28:
  ./arch/x86/include/asm/uv/uv.h:39:21: warning: ‘struct flush_tlb_info’ declared \
    inside parameter list will not be visible outside of this definition or declaration
     39 |        const struct flush_tlb_info *info);
        |                     ^~~~~~~~~~~~~~

  ...

after

  bfe3d8f631 ("x86/tlb: Restrict access to tlbstate")

restricted access to tlbstate.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200503103107.3419-1-bp@alien8.de
2020-05-03 21:43:47 +02:00
Konstantin Khlebnikov
fdc63ff0e4 ftrace/x86: Fix trace event registration for syscalls without arguments
The refactoring of SYSCALL_DEFINE0() macros removed the ABI stubs and
simply defines __abi_sys_$NAME as alias of __do_sys_$NAME.

As a result kallsyms_lookup() returns "__do_sys_$NAME" which does not match
with the declared trace event name.

See also commit 1c758a2202 ("tracing/x86: Update syscall trace events to
handle new prefixed syscall func names").

Add __do_sys_ to the valid prefixes which are checked in
arch_syscall_match_sym_name().

Fixes: d2b5de495e ("x86/entry: Refactor SYSCALL_DEFINE0 macros")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/158636958997.7900.16485049455470033557.stgit@buzz
2020-05-01 19:15:40 +02:00
Peter Zijlstra
cc1ac9c792 x86/retpoline: Fix retpoline unwind
Currently objtool cannot understand retpolines, and thus cannot
generate ORC unwind information for them. This means that we cannot
unwind from the middle of a retpoline.

The recent ANNOTATE_INTRA_FUNCTION_CALL and UNWIND_HINT_RET_OFFSET
support in objtool enables it to understand the basic retpoline
construct. A further problem is that the ORC unwind information is
alternative invariant; IOW. every alternative should have the same
ORC, retpolines obviously violate this. This means we need to
out-of-line them.

Since all GCC generated code already uses out-of-line retpolines, this
should not affect performance much, if anything.

This will enable objtool to generate valid ORC data for the
out-of-line copies, which means we can correctly and reliably unwind
through a retpoline.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191700.210835357@infradead.org
2020-04-30 20:14:34 +02:00
Peter Zijlstra
34fdce6981 x86: Change {JMP,CALL}_NOSPEC argument
In order to change the {JMP,CALL}_NOSPEC macros to call out-of-line
versions of the retpoline magic, we need to remove the '%' from the
argument, such that we can paste it onto symbol names.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191700.151623523@infradead.org
2020-04-30 20:14:34 +02:00
Peter Zijlstra
ca3f0d80dd x86: Simplify retpoline declaration
Because of how KSYM works, we need one declaration per line. Seeing
how we're going to be doubling the amount of retpoline symbols,
simplify the machinery in order to avoid having to copy/paste even
more.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191700.091696925@infradead.org
2020-04-30 20:14:34 +02:00
Peter Zijlstra
089dd8e531 x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
Change FILL_RETURN_BUFFER so that objtool groks it and can generate
correct ORC unwind information.

 - Since ORC is alternative invariant; that is, all alternatives
   should have the same ORC entries, the __FILL_RETURN_BUFFER body
   can not be part of an alternative.

   Therefore, move it out of the alternative and keep the alternative
   as a sort of jump_label around it.

 - Use the ANNOTATE_INTRA_FUNCTION_CALL annotation to white-list
   these 'funny' call instructions to nowhere.

 - Use UNWIND_HINT_EMPTY to 'fill' the speculation traps, otherwise
   objtool will consider them unreachable.

 - Move the RSP adjustment into the loop, such that the loop has a
   deterministic stack layout.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191700.032079304@infradead.org
2020-04-30 20:14:34 +02:00
Peter Zijlstra
1ff865e343 x86,smap: Fix smap_{save,restore}() alternatives
As reported by objtool:

  lib/ubsan.o: warning: objtool: .altinstr_replacement+0x0: alternative modifies stack
  lib/ubsan.o: warning: objtool: .altinstr_replacement+0x7: alternative modifies stack

the smap_{save,restore}() alternatives violate (the newly enforced)
rule on stack invariance. That is, due to there only being a single
ORC table it must be valid to any alternative. These alternatives
violate this with the direct result that unwinds will not be correct
when it hits between the PUSH and POP instructions.

Rewrite the functions to only have a conditional jump.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200429101802.GI13592@hirez.programming.kicks-ass.net
2020-04-30 20:14:31 +02:00
Will Deacon
bf60333977 Merge branch 'x86/asm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-next/asm
As agreed with Boris, merge in the 'x86/asm' branch from -tip so that we
can select the new 'ARCH_USE_SYM_ANNOTATIONS' Kconfig symbol, which is
required by the BTI kernel patches.

* 'x86/asm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Provide a Kconfig symbol for disabling old assembly annotations
  x86/32: Remove CONFIG_DOUBLEFAULT
2020-04-30 17:39:42 +01:00
Linus Torvalds
869997be0e hyperv-fixes for 5.7-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl6mwOETHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXrFLB/4yKsrl41WwYRbTKgiir576/LA0vGxQ
 cZjUQwkVv3S5/AfhvpwiGFV4dBV6j81KtNhRE6luaa3FBHObnjrx5tNqMw/P8a0j
 HZGZ68n4qE+OPVtTxj54s81iWIi9vgT/La92GPYhuXoiVPTd5zJ2lwY3so04BSFJ
 p30+RZFKNkTjNYZNZSHcoodr+js4Uws8JSn8OmpCJr8Gt+FJqkujQROG3HMKhJlk
 KlJlCJhV48tj/nlgcbGHBF0Yy5l8DVCaKIz+MiF5F/i+P8r0cErfyihc9Ene0/un
 LNFhIVGn8/MTi0CVrltcnur2qFH1qPCuLolKSpd/FKd6H2UDgK16XgAd
 =NJP/
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V fixes from Wei Liu:

 - Two patches from Dexuan fixing suspension bugs

 - Three cleanup patches from Andy and Michael

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hyper-v: Remove internal types from UAPI header
  hyper-v: Use UUID API for exporting the GUID
  x86/hyperv: Suspend/resume the VP assist page for hibernation
  Drivers: hv: Move AEOI determination to architecture dependent code
  Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM
2020-04-27 13:28:27 -07:00
Thomas Gleixner
bfe3d8f631 x86/tlb: Restrict access to tlbstate
Hide tlbstate, flush_tlb_info and related helpers when tlbflush.h is
included from a module. Modules have absolutely no business with these
internals.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092600.328438734@linutronix.de
2020-04-26 18:52:33 +02:00
Thomas Gleixner
6c9b7d79a8 x86/tlb: Move PCID helpers where they are used
Aside of the fact that they are used only in the TLB code, especially
having the comment close to the actual implementation makes a lot of
sense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092600.145772183@linutronix.de
2020-04-26 18:49:44 +02:00
Thomas Gleixner
af5c40c6ee x86/tlb: Uninline nmi_uaccess_okay()
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

nmi_access_ok() is the last inline function which requires access to
cpu_tlbstate. Move it into the TLB code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092600.052543007@linutronix.de
2020-04-26 18:47:05 +02:00
Thomas Gleixner
96f59fe291 x86/tlb: Move cr4_set_bits_and_update_boot() to the usage site
No point in having this exposed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.940978251@linutronix.de
2020-04-26 18:39:48 +02:00
Thomas Gleixner
69de6c1a7f x86/tlb: Move paravirt_tlb_remove_table() to the usage site
Move it where the only user is.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.849801011@linutronix.de
2020-04-26 18:19:35 +02:00
Thomas Gleixner
4b04e6c236 x86/tlb: Move __flush_tlb_all() out of line
Reduce the number of required exports to one and make flush_tlb_global()
static to the TLB code.

flush_tlb_local() cannot be confined to the TLB code as the MTRR
handling requires a PGE-less flush.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200421092559.740388137@linutronix.de
2020-04-26 18:17:31 +02:00
Thomas Gleixner
29def599b3 x86/tlb: Move flush_tlb_others() out of line
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

As a last step, move __flush_tlb_others() out of line and hide the
native function. The latter can be static when CONFIG_PARAVIRT is
disabled.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.641957686@linutronix.de
2020-04-26 11:10:25 +02:00
Thomas Gleixner
58430c5dba x86/tlb: Move __flush_tlb_one_kernel() out of line
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

As a fourth step, move __flush_tlb_one_kernel() out of line and hide
the native function. The latter can be static when CONFIG_PARAVIRT is
disabled.

Consolidate the name space while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.535159540@linutronix.de
2020-04-26 11:01:22 +02:00
Thomas Gleixner
127ac915c8 x86/tlb: Move __flush_tlb_one_user() out of line
cpu_tlbstate is exported because various TLB-related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well-contained kernel functions and not be directly exposed to
modules.

As a third step, move _flush_tlb_one_user() out of line and hide the
native function. The latter can be static when CONFIG_PARAVIRT is
disabled.

Consolidate the name space while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.428213098@linutronix.de
2020-04-26 11:00:29 +02:00
Thomas Gleixner
cd30d26cf3 x86/tlb: Move __flush_tlb_global() out of line
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

As a second step, move __flush_tlb_global() out of line and hide the
native function. The latter can be static when CONFIG_PARAVIRT is
disabled.

Consolidate the namespace while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.336916818@linutronix.de
2020-04-26 11:00:27 +02:00
Thomas Gleixner
2faf153bb7 x86/tlb: Move __flush_tlb() out of line
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

As a first step, move __flush_tlb() out of line and hide the native
function. The latter can be static when CONFIG_PARAVIRT is disabled.

Consolidate the namespace while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.246130908@linutronix.de
2020-04-26 11:00:05 +02:00
Josh Poimboeuf
81b67439d1 x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
The following execution path is possible:

  fsnotify()
    [ realign the stack and store previous SP in R10 ]
    <IRQ>
      [ only IRET regs saved ]
      common_interrupt()
        interrupt_entry()
	  <NMI>
	    [ full pt_regs saved ]
	    ...
	    [ unwind stack ]

When the unwinder goes through the NMI and the IRQ on the stack, and
then sees fsnotify(), it doesn't have access to the value of R10,
because it only has the five IRET registers.  So the unwind stops
prematurely.

However, because the interrupt_entry() code is careful not to clobber
R10 before saving the full regs, the unwinder should be able to read R10
from the previously saved full pt_regs associated with the NMI.

Handle this case properly.  When encountering an IRET regs frame
immediately after a full pt_regs frame, use the pt_regs as a backup
which can be used to get the C register values.

Also, note that a call frame resets the 'prev_regs' value, because a
function is free to clobber the registers.  For this fix to work, the
IRET and full regs frames must be adjacent, with no FUNC frames in
between.  So replace the FUNC hint in interrupt_entry() with an
IRET_REGS hint.

Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:29 +02:00
Linus Torvalds
b9916af776 Kbuild fixes for v5.7
- fix scripts/config to properly handle ':' in string type CONFIG options
 
  - fix unneeded rebuilds of DT schema check rule
 
  - git rid of ordering dependency between <linux/vermagic.h> and
    <linux/module.h> to fix build errors in some network drivers
 
  - clean up generated headers of host arch with 'make ARCH=um mrproper'
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl6jDlAVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGcZEP/2ORmb2mfCCrPU9Cjks+97FsQd7P
 aqOCAX0884prFBMykSYuPJ9DSlMEcMwkA7zdd7dlaEbwI7WJhWzjoQRi+/pnsSMn
 7wG46T1Vj4CXIvHgP8PEeO0wFXXdNOQNxeJ/CDAD99ISmMTg4crJWyPOI5eFTKad
 GqWWY04smfE/2RhAbA3UJASsrO6Ev45Os4CLjXwqDM26oDozyyhrWdGFrE5/pqLp
 /feA5Jky8lUMIRoql3QMCKkjFpwl+A6IGnD2BCMdkkgRoTn+V/v28ENqYEjPEoYw
 fB2wVQ6+3E9DEVS1lBaYG+ZMhyZxJWABd2KCN6OuVZPVvuJEjnCWnT9wy71ZBO3W
 YDUOMsL5yqae4OhpCOHAujsBe23hEbjPXRHiAFXiaFayFb3gYgoiQhKV3HNSY00p
 JEFk9by3RggL/ZOmLhfzX9BLdfaK1q/k4kGuTU3eD4JryfI70tlO/t726LbFKqjP
 dBZjCGjGMZv2QAnLVa9mAnO3OH2jq8dyrlsjNPbTsQ4yBqk+YzXm16M7RCqOBKE1
 w4bI8oEBLUfzMrhQGYRVOMxS8DkxyNg6KUjbXkGBgueuiYT38pjRaj955hN4kCzT
 FxYXHwm6v+VwJp5wt/HN+Gd8A5rzQgGUFutqJ9vAgQUBJJz5y5RP+7AUVBKspIe/
 OF8x1J/O5REkOOCP
 =3L2c
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - fix scripts/config to properly handle ':' in string type CONFIG
   options

 - fix unneeded rebuilds of DT schema check rule

 - git rid of ordering dependency between <linux/vermagic.h> and
   <linux/module.h> to fix build errors in some network drivers

 - clean up generated headers of host arch with 'make ARCH=um mrproper'

* tag 'kbuild-fixes-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  h8300: ignore vmlinux.lds
  Documentation: kbuild: fix the section title format
  um: ensure `make ARCH=um mrproper` removes arch/$(SUBARCH)/include/generated/
  arch: split MODULE_ARCH_VERMAGIC definitions out to <asm/vermagic.h>
  kbuild: fix DT binding schema rule again to avoid needless rebuilds
  scripts/config: allow colons in option strings for sed
2020-04-24 10:39:32 -07:00
Thomas Gleixner
9020d39563 x86/alternatives: Move temporary_mm helpers into C
The only user of these inlines is the text poke code and this must not be
exposed to the world.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.139069561@linutronix.de
2020-04-24 19:12:56 +02:00
Thomas Gleixner
cb2a02355b x86/cr4: Sanitize CR4.PCE update
load_mm_cr4_irqsoff() is really a strange name for a function which has
only one purpose: Update the CR4.PCE bit depending on the perf state.

Rename it to update_cr4_pce_mm(), move it into the tlb code and provide a
function which can be invoked by the perf smp function calls.

Another step to remove exposure of cpu_tlbstate.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.049499158@linutronix.de
2020-04-24 19:01:17 +02:00
Thomas Gleixner
d8f0b35331 x86/cpu: Uninline CR4 accessors
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

The various CR4 accessors require cpu_tlbstate as the CR4 shadow cache
is located there.

In preparation for unexporting cpu_tlbstate, create a builtin function
for manipulating CR4 and rework the various helpers to use it.

No functional change.

 [ bp: push the export of native_write_cr4() only when CONFIG_LKTDM=m to
   the last patch in the series. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092558.939985695@linutronix.de
2020-04-24 18:46:42 +02:00
Thomas Gleixner
8c5cc19e94 x86/tlb: Uninline __get_current_cr3_fast()
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

In preparation for unexporting cpu_tlbstate move __get_current_cr3_fast()
into the x86 TLB management code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200421092558.848064318@linutronix.de
2020-04-24 15:55:50 +02:00
Ard Biesheuvel
0a75561489 efi/libstub/x86: Avoid getter function for efi_is64
We no longer need to take special care when using global variables
in the EFI stub, so switch to a simple symbol reference for efi_is64.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-04-24 14:52:16 +02:00
Ard Biesheuvel
ccc27ae774 efi/libstub: Drop __pure getter for efi_system_table
The practice of using __pure getter functions to access global
variables in the EFI stub dates back to the time when we had to
carefully prevent GOT entries from being emitted, because we
could not rely on the toolchain to do this for us.

Today, we use the hidden visibility pragma for all EFI stub source
files, which now all live in the same subdirectory, and we apply a
sanity check on the objects, so we can get rid of these getter
functions and simply refer to global data objects directly.

Start with efi_system_table(), and convert it into a global variable.
While at it, make it a pointer-to-const, because we can.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-04-24 14:52:16 +02:00
Andy Shevchenko
4a65ed6562 Immutable branch between MFD, X86, USB and Watchdog due for the v5.7 merge window
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAl6iviIACgkQUa+KL4f8
 d2EdiA/+OSpr4fiy9RTIxXnyROoFmI9ypKyDbXdkW0c2E2ZlH6d3LCFRCiF39DeQ
 7U6uNXyNetUjzH6fXm1BM/nltNbAaAUmNlL1THGQ3wamMayL4S0hSHgy8ZUTcFj4
 yC4TLwHAsXmNjHC0FbUul9LILooBs6MfEHeN+zK7tCI4ZLaygx5d/+ApFlbuU45c
 J4u7a32PEmM9i6BvSAQK2OGWChIRGvCnxekz9C8ebWULzzsOKPLFJccKEdORc8dK
 5Mg3J8zC1IZ+ct6y+DvvbaYROP3vbNabJPHIjV/hLmP0g3qHghHUW265qLVrHu/z
 uCek8lgLTYetJ+Nn4/k7eoUDKkZgbgfukRLyHw/gu4YQaiwhjI2KVP+ytBWkCzA4
 AOJVnMN756C1Rx3XrF/E63RdwuIfJA+VGGW7YzbPuXa0SvECPE//wVpZv6FSkJmA
 N4s1Z7yhuZEjGvuu4l5/ErZSbSN/2OMM4ahdQ4xoCAxYKkKMacS5Gds83VlnM6LQ
 Mwl77aTc84JqI1y4HROd2qj4J8YfE1F8lTRt8lepgeOL/kntM9U1lvBT3TCV3T0e
 xLC6GDCHo6N9vlyn1KU4PCyCHFNUJLD6Wq+H/jLZkcrx4k59caR0+I1+iyaQyo/i
 o+DgYCp6GnPFd3XuVLppg45n6qZ7wB67WOJ6iOpcO7+kMldg4Ck=
 =4M2c
 -----END PGP SIGNATURE-----

Merge branch 'ib-mfd-x86-usb-watchdog-v5.7'

Merge branch 'ib-mfd-x86-usb-watchdog-v5.7' of
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git
to avoid conflicts in PDx86.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-04-24 13:56:46 +03:00
Mika Westerberg
25f1ca31e2 platform/x86: intel_pmc_ipc: Convert to MFD
This driver only creates a bunch of platform devices sharing resources
belonging to the PMC device. This is pretty much what MFD subsystem is
for so move the driver there, renaming it to intel_pmc_bxt.c which
should be more clear what it is.

MFD subsystem provides nice helper APIs for subdevice creation so
convert the driver to use those. Unfortunately the ACPI device includes
separate resources for most of the subdevices so we cannot simply call
mfd_add_devices() to create all of them but instead we need to call it
separately for each device.

The new MFD driver continues to expose two sysfs attributes that allow
userspace to send IPC commands to the PMC/SCU to avoid breaking any
existing applications that may use these. Generally this is bad idea so
document this in the ABI documentation.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:18:44 +01:00
Mika Westerberg
0759a8730c platform/x86: intel_telemetry: Add telemetry_get_pltdata()
Add new function that allows telemetry modules to get pointer to the
platform specific configuration. This is needed to allow the telemetry
debugfs module to fetch PMC IPC instance in the subsequent patch.

This also allows us to replace telemetry_pltconfig_valid() with
telemetry_get_pltdata() as well.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:18:30 +01:00
Mika Westerberg
781adff21c x86/platform/intel-mid: Add empty stubs for intel_scu_devices_[create|destroy]()
This allows to call the functions even when CONFIG_X86_INTEL_MID is not
enabled.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:18:21 +01:00
Mika Westerberg
7713f9180c platform/x86: intel_pmc_ipc: Drop intel_pmc_ipc_command()
Now that all callers have been converted over to the SCU IPC API we can
drop intel_pmc_ipc_command().

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:18:16 +01:00
Mika Westerberg
68c73fb224 platform/x86: intel_telemetry: Convert to use new SCU IPC API
Convert the Intel Apollo Lake telemetry driver to use the new SCU IPC
API. This allows us to get rid of the duplicate PMC IPC implementation
which is now covered in SCU IPC driver.

Also move telemetry specific IPC message constant to the telemetry
driver where it belongs.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:18:08 +01:00
Mika Westerberg
4181bc8f6f mfd: intel_soc_pmic_bxtwc: Convert to use new SCU IPC API
Convert the Intel Broxton Whiskey Cover PMIC driver to use the new SCU
IPC API. This allows us to get rid of the PMC IPC implementation which
is now covered in SCU IPC driver. We drop the error log if the IPC
command fails because intel_scu_ipc_dev_command() does that already.

Also move PMIC specific IPC message constants to the PMIC driver from
the intel_pmc_ipc.h header.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:17:58 +01:00
Mika Westerberg
7e18c89d6e platform/x86: intel_scu_ipc: Add managed function to register SCU IPC
Drivers such as intel_pmc_ipc.c can be unloaded as well so in order to
support those in this driver add a new function that can be called to
unregister the SCU IPC when it is not needed anymore.

We also add a managed version of the intel_scu_ipc_register() that takes
care of calling intel_scu_ipc_unregister() automatically when the driver
is unbound.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:17:44 +01:00
Mika Westerberg
f57fa18583 platform/x86: intel_scu_ipc: Introduce new SCU IPC API
The current SCU IPC API has been operating on a single instance and
there has been no way to pin the providing module in place when the SCU
IPC is in use.

This implements a new API that takes the SCU IPC instance as first
parameter (NULL means the single instance is being used). The SCU IPC
instance can be retrieved by calling new function intel_scu_ipc_dev_get()
that take care of pinning the providing module in place as long as
intel_scu_ipc_dev_put() is not called.

The old API is updated to call the new API and is is left there in the
legacy API header to support the existing users that cannot be converted
easily.

Subsequent patches will convert most of the users over to the new API.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:17:28 +01:00
Mika Westerberg
dd88564937 platform/x86: intel_scu_ipc: Move legacy SCU IPC API to a separate header
In preparation for introducing a new API for SCU IPC, move the legacy
API and constants to a separate header that is is subject to be removed
eventually.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:17:24 +01:00
Mika Westerberg
54b34aa0a7 platform/x86: intel_scu_ipc: Split out SCU IPC functionality from the SCU driver
The SCU IPC functionality is usable outside of Intel MID devices. For
example modern Intel CPUs include the same thing but now it is called
PMC (Power Management Controller) instead of SCU. To make the IPC
available for those split the driver into core part (intel_scu_ipc.c)
and the SCU PCI driver part (intel_scu_pcidrv.c) which then calls the
former before it goes and creates rest of the SCU devices. The SCU IPC
will also register a new class that gets assigned to the device that is
created under the parent PCI device.

We also split the Kconfig symbols so that INTEL_SCU_IPC enables the SCU
IPC library and INTEL_SCU_PCI the SCU driver and convert the users
accordingly. While there remove default y from the INTEL_SCU_PCI symbol
as it is already selected by X86_INTEL_MID.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:17:05 +01:00
Arvind Sankar
b4b89a0272 efi/gop: Add prototypes for query_mode and set_mode
Add prototypes and argmap for the Graphics Output Protocol's QueryMode
and SetMode functions.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200320020028.1936003-11-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-04-23 20:15:06 +02:00
Paolo Bonzini
33b2217245 KVM: x86: move nested-related kvm_x86_ops to a separate struct
Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to
nested virtualization into a separate struct.

As a result, these ops will always be non-NULL on VMX.  This is not a problem:

* check_nested_events is only called if is_guest_mode(vcpu) returns true

* get_nested_state treats VMXOFF state the same as nested being disabled

* set_nested_state fails if you attempt to set nested state while
  nesting is disabled

* nested_enable_evmcs could already be called on a CPU without VMX enabled
  in CPUID.

* nested_get_evmcs_version was fixed in the previous patch

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-23 09:04:57 -04:00
Christoph Hellwig
325518e9b7 x86/mm: Use pgprotval_t in protval_4k_2_large() and protval_large_2_4k()
Use the proper type for "raw" page table values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200422170116.GA28345@lst.de
2020-04-23 11:38:42 +02:00
Christoph Hellwig
de17a37896 x86/mm: Unexport __cachemode2pte_tbl
Exporting the raw data for a table is generally a bad idea. Move
cachemode2protval() out of line given that it isn't really used in the
fast path, and then mark __cachemode2pte_tbl static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200408152745.1565832-5-hch@lst.de
2020-04-23 11:34:31 +02:00
Christoph Hellwig
d073569363 x86/mm: Cleanup pgprot_4k_2_large() and pgprot_large_2_4k()
Make use of lower level helpers that operate on the raw protection
values to make the code a little easier to understand, and to also
avoid extra conversions in a few callers.

[ Qian: Fix a wrongly placed bracket in the original submission.
  Reported and fixed by Qian Cai <cai@lca.pw>. Details in second
  Link: below. ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200408152745.1565832-4-hch@lst.de
Link: https://lkml.kernel.org/r/1ED37D02-125F-4919-861A-371981581D9E@lca.pw
2020-04-23 11:31:52 +02:00
Masahiro Yamada
62d0fd591d arch: split MODULE_ARCH_VERMAGIC definitions out to <asm/vermagic.h>
As the bug report [1] pointed out, <linux/vermagic.h> must be included
after <linux/module.h>.

I believe we should not impose any include order restriction. We often
sort include directives alphabetically, but it is just coding style
convention. Technically, we can include header files in any order by
making every header self-contained.

Currently, arch-specific MODULE_ARCH_VERMAGIC is defined in
<asm/module.h>, which is not included from <linux/vermagic.h>.

Hence, the straight-forward fix-up would be as follows:

|--- a/include/linux/vermagic.h
|+++ b/include/linux/vermagic.h
|@@ -1,5 +1,6 @@
| /* SPDX-License-Identifier: GPL-2.0 */
| #include <generated/utsrelease.h>
|+#include <linux/module.h>
|
| /* Simply sanity version stamp for modules. */
| #ifdef CONFIG_SMP

This works enough, but for further cleanups, I split MODULE_ARCH_VERMAGIC
definitions into <asm/vermagic.h>.

With this, <linux/module.h> and <linux/vermagic.h> will be orthogonal,
and the location of MODULE_ARCH_VERMAGIC definitions will be consistent.

For arc and ia64, MODULE_PROC_FAMILY is only used for defining
MODULE_ARCH_VERMAGIC. I squashed it.

For hexagon, nds32, and xtensa, I removed <asm/modules.h> entirely
because they contained nothing but MODULE_ARCH_VERMAGIC definition.
Kbuild will automatically generate <asm/modules.h> at build-time,
wrapping <asm-generic/module.h>.

[1] https://lore.kernel.org/lkml/20200411155623.GA22175@zn.tnic

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
2020-04-23 10:50:26 +09:00
Al Viro
2a89b674fd get rid of csum_partial_copy_to_user()
For historical reasons some architectures call their csum_and_copy_to_user()
csum_partial_copy_to_user() instead (and supply a macro defining the
former as the latter).  That's the last remnants of old experiment that
went nowhere; time to bury them.  Rename those to csum_and_copy_to_user()
and get rid of the macros.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-22 14:37:50 -04:00
Peter Zijlstra
c536ed2fff objtool: Remove SAVE/RESTORE hints
The SAVE/RESTORE hints are now unused; remove them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115118.926738768@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:50 +02:00
Peter Zijlstra
e25eea89bb objtool: Introduce HINT_RET_OFFSET
Normally objtool ensures a function keeps the stack layout invariant.
But there is a useful exception, it is possible to stuff the return
stack in order to 'inject' a 'call':

	push $fun
	ret

In this case the invariant mentioned above is violated.

Add an objtool HINT to annotate this and allow a function exit with a
modified stack frame.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115118.690601403@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:50 +02:00
Peter Zijlstra
b746046238 objtool: Better handle IRET
Teach objtool a little more about IRET so that we can avoid using the
SAVE/RESTORE annotation. In particular, make the weird corner case in
insn->restore go away.

The purpose of that corner case is to deal with the fact that
UNWIND_HINT_RESTORE lands on the instruction after IRET, but that
instruction can end up being outside the basic block, consider:

	if (cond)
		sync_core()
	foo();

Then the hint will land on foo(), and we'll encounter the restore
hint without ever having seen the save hint.

By teaching objtool about the arch specific exception frame size, and
assuming that any IRET in an STT_FUNC symbol is an exception frame
sized POP, we can remove the use of save/restore hints for this code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115118.631224674@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:50 +02:00
Paolo Bonzini
3bda03865f KVM: s390: Fix for 5.7 and maintainer update
- Silence false positive lockdep warning
 - add Claudio as reviewer
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJenY6AAAoJEBF7vIC1phx8bykQAK+QZyD+H/zGNuqeUVn0sh8e
 yKUVMR+kuE+l57q77nt2AYVxqpCD9xSKRR+SOSLzhVH/HJf625nm+Ny/WOWMebwJ
 EA/KK+v15T5rga8gFza+4cPg4v/pHwjHhSbjTb1JWg+8cJR1BTj6OxRuTtWr5+25
 GF4RhkJOit/VhNbCo1aIgs7/7F1pPALstdPAUsHYe1PeULdRMVqSVluXT2KTPhpi
 /kzDw8sKKcYgv/eaVdcNoHv+VX1AWIRDAKEttCywyocfbu0ESwadmR7C0qlm1446
 HqowP6F0xCF0Whi/65aN4ZOv7wjO/qrV08DZ7JLA3/oKlXtZ1ieyiE2q/P1frSo1
 gvmuHiH5/UI6t6a/BSCpJwqcilxKYArqAAYBKoGiJhTbsJStqw0wl41klWTKXlTq
 VrCvjoUxQ9JMjFCQ1GXOU+ODNyX2IwZYptJ5vF24HYzBJwUBe3HPG9/BA8YcodzG
 qGQ5IKv0Q1IFTwOqnt557H0MjcBtNIEx54aLJrPy3wldsiNSj39Ft0cuvnbR+Q4F
 QhKk88dHtd7NW1IirfgYmLGe0rB1ANKM7wUGEdM5w2y5Eg8wCs8/P4KeGh0YyFI9
 xPqZDfwof6KkDjOGFXr/CeD/thi+km0/FpePb7cL5Ow4a+JmrCvqQiXrf0TbnFpv
 t5ZlHnGzoSHsEaRgmJ+X
 =d46L
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master

KVM: s390: Fix for 5.7 and maintainer update

- Silence false positive lockdep warning
- add Claudio as reviewer
2020-04-21 09:37:13 -04:00
Wanpeng Li
a9ab13ff6e KVM: X86: Improve latency for single target IPI fastpath
IPI and Timer cause the main MSRs write vmexits in cloud environment
observation, let's optimize virtual IPI latency more aggressively to
inject target IPI as soon as possible.

Running kvm-unit-tests/vmexit.flat IPI testing on SKX server, disable
adaptive advance lapic timer and adaptive halt-polling to avoid the
interference, this patch can give another 7% improvement.

w/o fastpath   -> x86.c fastpath      4238 -> 3543  16.4%
x86.c fastpath -> vmx.c fastpath      3543 -> 3293     7%
w/o fastpath   -> vmx.c fastpath      4238 -> 3293  22.3%

Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410174703.1138-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:13:10 -04:00
Sean Christopherson
8791585837 KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags
Introduce a new "extended register" type, EXIT_INFO_2 (to pair with the
nomenclature in .get_exit_info()), and use it to cache VMX's
vmcs.EXIT_INTR_INFO.  Drop a comment in vmx_recover_nmi_blocking() that
is obsoleted by the generic caching mechanism.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415203454.8296-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:13:07 -04:00
Sean Christopherson
5addc23519 KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags
Introduce a new "extended register" type, EXIT_INFO_1 (to pair with the
nomenclature in .get_exit_info()), and use it to cache VMX's
vmcs.EXIT_QUALIFICATION.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415203454.8296-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:13:07 -04:00
Sean Christopherson
be01e8e2c6 KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related code
Rename functions and variables in kvm_mmu_new_cr3() and related code to
replace "cr3" with "pgd", i.e. continue the work started by commit
727a7e27cf ("KVM: x86: rename set_cr3 callback and related flags to
load_mmu_pgd").  kvm_mmu_new_cr3() and company are not always loading a
new CR3, e.g. when nested EPT is enabled "cr3" is actually an EPTP.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-37-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:59 -04:00
Sean Christopherson
4a632ac6ca KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switch
Add a separate "skip" override for MMU sync, a future change to avoid
TLB flushes on nested VMX transitions may need to sync the MMU even if
the TLB flush is unnecessary.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-32-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:56 -04:00
Sean Christopherson
a4148b7ca2 KVM: VMX: Retrieve APIC access page HPA only when necessary
Move the retrieval of the HPA associated with L1's APIC access page into
VMX code to avoid unnecessarily calling gfn_to_page(), e.g. when the
vCPU is in guest mode (L2).  Alternatively, the optimization logic in
VMX could be mirrored into the common x86 code, but that will get ugly
fast when further optimizations are introduced.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-29-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:55 -04:00
Sean Christopherson
eeeb4f67a6 KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID
Add KVM_REQ_TLB_FLUSH_CURRENT to allow optimized TLB flushing of VMX's
EPTP/VPID contexts[*] from the KVM MMU and/or in a deferred manner, e.g.
to flush L2's context during nested VM-Enter.

Convert KVM_REQ_TLB_FLUSH to KVM_REQ_TLB_FLUSH_CURRENT in flows where
the flush is directly associated with vCPU-scoped instruction emulation,
i.e. MOV CR3 and INVPCID.

Add a comment in vmx_vcpu_load_vmcs() above its KVM_REQ_TLB_FLUSH to
make it clear that it deliberately requests a flush of all contexts.

Service any pending flush request on nested VM-Exit as it's possible a
nested VM-Exit could occur after requesting a flush for L2.  Add the
same logic for nested VM-Enter even though it's _extremely_ unlikely
for flush to be pending on nested VM-Enter, but theoretically possible
(in the future) due to RSM (SMM) emulation.

[*] Intel also has an Address Space Identifier (ASID) concept, e.g.
    EPTP+VPID+PCID == ASID, it's just not documented in the SDM because
    the rules of invalidation are different based on which piece of the
    ASID is being changed, i.e. whether the EPTP, VPID, or PCID context
    must be invalidated.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-25-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:53 -04:00
Sean Christopherson
7780938cc7 KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all()
Rename ->tlb_flush() to ->tlb_flush_all() in preparation for adding a
new hook to flush only the current ASID/context.

Opportunstically replace the comment in vmx_flush_tlb() that explains
why it flushes all EPTP/VPID contexts with a comment explaining why it
unconditionally uses INVEPT when EPT is enabled.  I.e. rely on the "all"
part of the name to clarify why it does global INVEPT/INVVPID.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-23-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:52 -04:00
Sean Christopherson
f55ac304ca KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush()
Drop @invalidate_gpa from ->tlb_flush() and kvm_vcpu_flush_tlb() now
that all callers pass %true for said param, or ignore the param (SVM has
an internal call to svm_flush_tlb() in svm_flush_tlb_guest that somewhat
arbitrarily passes %false).

Remove __vmx_flush_tlb() as it is no longer used.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-17-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:49 -04:00
Vitaly Kuznetsov
0baedd7927 KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()
Hyper-V PV TLB flush mechanism does TLB flush on behalf of the guest
so doing tlb_flush_all() is an overkill, switch to using tlb_flush_guest()
(just like KVM PV TLB flush mechanism) instead. Introduce
KVM_REQ_HV_TLB_FLUSH to support the change.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:48 -04:00
Michael Kelley
2ddddd0b4e Drivers: hv: Move AEOI determination to architecture dependent code
Hyper-V on ARM64 doesn't provide a flag for the AEOI recommendation
in ms_hyperv.hints, so having the test in architecture independent
code doesn't work. Resolve this by moving the check of the flag
to an architecture dependent helper function. No functionality is
changed.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200420164926.24471-1-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-04-21 10:02:38 +01:00
Sean Christopherson
e64419d991 KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
Add a dedicated hook to handle flushing TLB entries on behalf of the
guest, i.e. for a paravirtualized TLB flush, and use it directly instead
of bouncing through kvm_vcpu_flush_tlb().

For VMX, change the effective implementation implementation to never do
INVEPT and flush only the current context, i.e. to always flush via
INVVPID(SINGLE_CONTEXT).  The INVEPT performed by __vmx_flush_tlb() when
@invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only
flush guest-physical mappings; linear and combined mappings are flushed
by VM-Enter when VPID is disabled, and changes in the guest pages tables
do not affect guest-physical mappings.

When EPT and VPID are enabled, doing INVVPID is not required (by Intel's
architecture) to invalidate guest-physical mappings, i.e. TLB entries
that cache guest-physical mappings can live across INVVPID as the
mappings are associated with an EPTP, not a VPID.  The intent of
@invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate
gpa mappings", i.e. do INVEPT and not simply INVVPID.  Other than nested
VPID handling, which now calls vpid_sync_context() directly, the only
scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is
enabled) is if KVM is flushing TLB entries from the guest's perspective,
i.e. is only required to invalidate linear mappings.

For SVM, flushing TLB entries from the guest's perspective can be done
by flushing the current ASID, as changes to the guest's page tables are
associated only with the current ASID.

Adding a dedicated ->tlb_flush_guest() paves the way toward removing
@invalidate_gpa, which is a potentially dangerous control flag as its
meaning is not exactly crystal clear, even for those who are familiar
with the subtleties of what mappings Intel CPUs are/aren't allowed to
keep across various invalidation scenarios.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-15-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-20 17:26:10 -04:00
Paolo Bonzini
5efac0741c KVM: x86: introduce kvm_mmu_invalidate_gva
Wrap the combination of mmu->invlpg and kvm_x86_ops->tlb_flush_gva
into a new function.  This function also lets us specify the host PGD to
invalidate and also the MMU, both of which will be useful in fixing and
simplifying kvm_inject_emulated_page_fault.

A nested guest's MMU however has g_context->invlpg == NULL.  Instead of
setting it to nonpaging_invlpg, make kvm_mmu_invalidate_gva the only
entry point to mmu->invlpg and make a NULL invlpg pointer equivalent
to nonpaging_invlpg, saving a retpoline.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-20 17:25:55 -04:00
Kees Cook
9fccc5c0c9 x86/elf: Disable automatic READ_IMPLIES_EXEC on 64-bit
With modern x86 64-bit environments, there should never be a need for
automatic READ_IMPLIES_EXEC, as the architecture is intended to always
be execute-bit aware (as in, the default memory protection should be NX
unless a region explicitly requests to be executable).

There were very old x86_64 systems that lacked the NX bit, but for those,
the NX bit is, obviously, unenforceable, so these changes should have
no impact on them.

Suggested-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lkml.kernel.org/r/20200327064820.12602-4-keescook@chromium.org
2020-04-20 19:24:33 +02:00
Kees Cook
122306117a x86/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK
The READ_IMPLIES_EXEC workaround was designed for old toolchains that
lacked the ELF PT_GNU_STACK marking under the assumption that toolchains
that couldn't specify executable permission flags for the stack may not
know how to do it correctly for any memory region.

This logic is sensible for having ancient binaries coexist in a system
with possibly NX memory, but was implemented in a way that equated having
a PT_GNU_STACK marked executable as being as "broken" as lacking the
PT_GNU_STACK marking entirely. Things like unmarked assembly and stack
trampolines may cause PT_GNU_STACK to need an executable bit, but they
do not imply all mappings must be executable.

This confusion has led to situations where modern programs with explicitly
marked executable stacks are forced into the READ_IMPLIES_EXEC state when
no such thing is needed. (And leads to unexpected failures when mmap()ing
regions of device driver memory that wish to disallow VM_EXEC[1].)

In looking for other reasons for the READ_IMPLIES_EXEC behavior, Jann
Horn noted that glibc thread stacks have always been marked RWX (until
2003 when they started tracking the PT_GNU_STACK flag instead[2]). And
musl doesn't support executable stacks at all[3]. As such, no breakage
for multithreaded applications is expected from this change.

[1] https://lkml.kernel.org/r/20190418055759.GA3155@mellanox.com
[2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=54ee14b3882
[3] https://lkml.kernel.org/r/20190423192534.GN23599@brightrain.aerifal.cx

Suggested-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lkml.kernel.org/r/20200327064820.12602-3-keescook@chromium.org
2020-04-20 19:09:38 +02:00
Kees Cook
9d9e435f3f x86/elf: Add table to document READ_IMPLIES_EXEC
Add a table to document the current behavior of READ_IMPLIES_EXEC in
preparation for changing the behavior.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lkml.kernel.org/r/20200327064820.12602-2-keescook@chromium.org
2020-04-20 15:28:40 +02:00
Christoph Hellwig
7fa3e10f0f x86/mm: Move pgprot2cachemode out of line
This helper is only used by x86 low-level MM code.  Also remove the
entirely pointless __pte2cachemode_tbl export as that symbol can be
marked static now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200408152745.1565832-3-hch@lst.de
2020-04-20 12:39:17 +02:00
Christoph Hellwig
1f6f655e01 x86/mm: Add a x86_has_pat_wp() helper
Abstract the ioremap code away from the caching mode internals.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200408152745.1565832-2-hch@lst.de
2020-04-20 12:39:11 +02:00
Mark Gross
7e5b3c267d x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation
SRBDS is an MDS-like speculative side channel that can leak bits from the
random number generator (RNG) across cores and threads. New microcode
serializes the processor access during the execution of RDRAND and
RDSEED. This ensures that the shared buffer is overwritten before it is
released for reuse.

While it is present on all affected CPU models, the microcode mitigation
is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the
cases where TSX is not supported or has been disabled with TSX_CTRL.

The mitigation is activated by default on affected processors and it
increases latency for RDRAND and RDSEED instructions. Among other
effects this will reduce throughput from /dev/urandom.

* Enable administrator to configure the mitigation off when desired using
  either mitigations=off or srbds=off.

* Export vulnerability status via sysfs

* Rename file-scoped macros to apply for non-whitelist table initializations.

 [ bp: Massage,
   - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g,
   - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in,
   - flip check in cpu_set_bug_bits() to save an indentation level,
   - reflow comments.
   jpoimboe: s/Mitigated/Mitigation/ in user-visible strings
   tglx: Dropped the fused off magic for now
 ]

Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
2020-04-20 12:19:22 +02:00
Mark Gross
e9d7144597 x86/cpu: Add a steppings field to struct x86_cpu_id
Intel uses the same family/model for several CPUs. Sometimes the
stepping must be checked to tell them apart.

On x86 there can be at most 16 steppings. Add a steppings bitmask to
x86_cpu_id and a X86_MATCH_VENDOR_FAMILY_MODEL_STEPPING_FEATURE macro
and support for matching against family/model/stepping.

 [ bp: Massage. ]

Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-04-20 12:19:21 +02:00
Linus Torvalds
0fe5f9ca22 A set of fixes for x86 and objtool:
objtool:
 
   - Ignore the double UD2 which is emitted in BUG() when CONFIG_UBSAN_TRAP
     is enabled.
 
   - Support clang non-section symbols in objtool ORC dump
 
   - Fix switch table detection in .text.unlikely
 
   - Make the BP scratch register warning more robust.
 
  x86:
 
   - Increase microcode maximum patch size for AMD to cope with new CPUs
     which have a larger patch size.
 
   - Fix a crash in the resource control filesystem when the removal of the
     default resource group is attempted.
 
   - Preserve Code and Data Prioritization enabled state accross CPU
     hotplug.
 
   - Update split lock cpu matching to use the new X86_MATCH macros.
 
   - Change the split lock enumeration as Intel finaly decided that the
     IA32_CORE_CAPABILITIES bits are not architectural contrary to what
     the SDM claims. !@#%$^!
 
   - Add Tremont CPU models to the split lock detection cpu match.
 
   - Add a missing static attribute to make sparse happy.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6cWGsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYod2jD/4kZqz+nEzAvx8RC/7zfLr1S6mDYcLb
 kqWEblLRfPofFNO3W/1Ri7xUs2VCyBcOJeG9JIugI8YV/b/5LY9j2nW30unXi84y
 8DHLWgM7OG+EiNDMvdQwgnjNb9Pdl4F1e9yTTD6IRg0bHOjvtHVyq9bNg7f3iaED
 ZE4X5Hh5u4qFK/jmcsTF5HA/wIjELdmT32F4RxceAlmvpa5SUGlOfVVo1cSZpCbx
 XkrvUvEzyZhbzY+Gy1q3SHTt+fvzx1++LsnJD0Dyfe5Q47PA1Iy6Zo2+Epn3FnCu
 XuQKLaiDhidpkPzTGULZUsubavXbrSEu5/yhFJHyUqMy5WNOmvXBN8eVC4j1I9Ga
 tnt43s3AS8noz4qIb7bpoVgETFtoCfWfqwhtZmALPzrfutwxe2Ujtsi9FUca6HtA
 T5dKuNwc8G+Q5ZiNi+rPjcV/QGGncZFwtwwRwUl/YKgQ2VgrTgfsPc431tfSl3Q8
 hVQIOhQNHCKqe3uGhiCsI29pNMDXVijZcI8w2SSmxnPyrMRXD7bTfLWnPav7SGFO
 aSSi9HWtghkU/MsmRgRcZc9PI5bNs6w5IkfQqfXjd/lJwea2yQg1cn1KdmGi3Q33
 BNj9FudNMe4K8ITaNWiLdt5rYCDIvWEzmbwawAhevstbKrjVtrAYgNAjvgJEnXAt
 mZwTu+Hpd6d+JA==
 =raUm
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 and objtool fixes from Thomas Gleixner:
 "A set of fixes for x86 and objtool:

  objtool:

   - Ignore the double UD2 which is emitted in BUG() when
     CONFIG_UBSAN_TRAP is enabled.

   - Support clang non-section symbols in objtool ORC dump

   - Fix switch table detection in .text.unlikely

   - Make the BP scratch register warning more robust.

  x86:

   - Increase microcode maximum patch size for AMD to cope with new CPUs
     which have a larger patch size.

   - Fix a crash in the resource control filesystem when the removal of
     the default resource group is attempted.

   - Preserve Code and Data Prioritization enabled state accross CPU
     hotplug.

   - Update split lock cpu matching to use the new X86_MATCH macros.

   - Change the split lock enumeration as Intel finaly decided that the
     IA32_CORE_CAPABILITIES bits are not architectural contrary to what
     the SDM claims. !@#%$^!

   - Add Tremont CPU models to the split lock detection cpu match.

   - Add a missing static attribute to make sparse happy"

* tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/split_lock: Add Tremont family CPU models
  x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural
  x86/resctrl: Preserve CDP enable over CPU hotplug
  x86/resctrl: Fix invalid attempt at removing the default resource group
  x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL()
  x86/umip: Make umip_insns static
  x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
  objtool: Make BP scratch register warning more robust
  objtool: Fix switch table detection in .text.unlikely
  objtool: Support Clang non-section symbols in ORC generation
  objtool: Support Clang non-section symbols in ORC dump
  objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
2020-04-19 11:58:32 -07:00
Tony Luck
f82cdff1aa x86/mce: Drop bogus comment about mce.kflags
The bit definitions for kflags are for internal use only. A
late edit moved them from uapi/asm/mce.h to the internal
x86 <asm/mce.h>, but the comment saying "See below" was
accidentally left here.

Delete "See below". Just labelling this field as internal
kernel use is sufficient.

Fixes: 1de08dccd3 ("x86/mce: Add a struct mce.kflags field")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200415195826.GA13681@agluck-desk2.amr.corp.intel.com
2020-04-17 11:12:21 +02:00
Sean Christopherson
53b3d8e9d5 KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault)
Export the page fault propagation helper so that VMX can use it to
correctly emulate TLB invalidation on page faults in an upcoming patch.

In the (hopefully) not-too-distant future, SGX virtualization will also
want access to the helper for injecting page faults to the correct level
(L1 vs. L2) when emulating ENCLS instructions.

Rename the function to kvm_inject_emulated_page_fault() to clarify that
it is (a) injecting a fault and (b) only for page faults.  WARN if it's
invoked with an exception other than PF_VECTOR.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-6-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15 12:08:50 -04:00
John Allen
bdf89df3c5 x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
Future AMD CPUs will have microcode patches that exceed the default 4K
patch size. Raise our limit.

Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v4.14..
Link: https://lkml.kernel.org/r/20200409152931.GA685273@mojo.amd.com
2020-04-14 17:34:46 +02:00
Borislav Petkov
1df73b2131 x86/mce: Fixup exception only for the correct MCEs
The severity grading code returns IN_KERNEL_RECOV error context for
errors which have happened in kernel space but from which the kernel can
recover. Whether the recovery can happen is determined by the exception
table entry having as handler ex_handler_fault() and which has been
declared at build time using _ASM_EXTABLE_FAULT().

IN_KERNEL_RECOV is used in mce_severity_intel() to lookup the
corresponding error severity in the severities table.

However, the mapping back from error severity to whether the error is
IN_KERNEL_RECOV is ambiguous and in the very paranoid case - which
might not be possible right now - but be better safe than sorry later,
an exception fixup could be attempted for another MCE whose address
is in the exception table and has the proper severity. Which would be
unfortunate, to say the least.

Therefore, mark such MCEs explicitly as MCE_IN_KERNEL_RECOV so that the
recovery attempt is done only for them.

Document the whole handling, while at it, as it is not trivial.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200407163414.18058-10-bp@alien8.de
2020-04-14 16:01:49 +02:00
Tony Luck
1de08dccd3 x86/mce: Add a struct mce.kflags field
There can be many different subsystems register on the mce handler
chain. Add a new bitmask field and define values so that handlers can
indicate whether they took any action to log or otherwise handle an
error.

The default handler at the end of the chain can use this information to
decide whether to print to the console log.

Boris suggested a generic name and leaving plenty of spare bits for
possible future use.

 [ bp: Move flag bits to the internal mce.h header and use BIT_ULL(). ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200214222720.13168-4-tony.luck@intel.com
2020-04-14 15:58:43 +02:00
Tony Luck
c9c6d216ed x86/mce: Rename "first" function as "early"
It isn't going to be first on the notifier chain when the CEC is moved
to be a normal user of the notifier chain.

Fix the enum for the MCE_PRIO symbols to list them in reverse order so
that the compiler can give them numbers from low to high priority. Add
an entry for MCE_PRIO_CEC as the highest priority.

 [ bp: Use passive voice, add comments. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200214222720.13168-2-tony.luck@intel.com
2020-04-14 15:55:01 +02:00
Borislav Petkov
3e0fdec858 x86/mce/amd, edac: Remove report_gart_errors
... because no one should be interested in spurious MCEs anyway. Make
the filtering unconditional and move it to amd_filter_mce().

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200407163414.18058-2-bp@alien8.de
2020-04-14 15:53:46 +02:00
Thomas Gleixner
f26d2580a7 x86/mce/amd: Cleanup threshold device remove path
Pass in the bank pointer directly to the cleaning up functions,
obviating the need for per-CPU accesses. Make the clean up path
interrupt-safe by cleaning the bank pointer first so that the rest of
the teardown happens safe from the thresholding interrupt.

No functional changes.

 [ bp: Write commit message and reverse bank->shared test to save an
   indentation level in threshold_remove_bank(). ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200403161943.1458-7-bp@alien8.de
2020-04-14 15:49:51 +02:00
Borislav Petkov
593309423c x86/32: Remove CONFIG_DOUBLEFAULT
Make the doublefault exception handler unconditional on 32-bit. Yes,
it is important to be able to catch #DF exceptions instead of silent
reboots. Yes, the code size increase is worth every byte. And one less
CONFIG symbol is just the cherry on top.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200404083646.8897-1-bp@alien8.de
2020-04-14 14:24:05 +02:00
Paolo Bonzini
f14eec0a32 KVM: SVM: move more vmentry code to assembly
Manipulate IF around vmload/vmsave to remove the confusing usage of
local_irq_enable where interrupts are actually disabled via GIF.
And stuff the RSB immediately without waiting for a RET to avoid
Spectre-v2 attacks.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-14 04:21:21 -04:00
Ard Biesheuvel
a088b858f1 efi/x86: Revert struct layout change to fix kexec boot regression
Commit

  0a67361dcd ("efi/x86: Remove runtime table address from kexec EFI setup data")

removed the code that retrieves the non-remapped UEFI runtime services
pointer from the data structure provided by kexec, as it was never really
needed on the kexec boot path: mapping the runtime services table at its
non-remapped address is only needed when calling SetVirtualAddressMap(),
which never happens during a kexec boot in the first place.

However, dropping the 'runtime' member from struct efi_setup_data was a
mistake. That struct is shared ABI between the kernel and the kexec tooling
for x86, and so we cannot simply change its layout. So let's put back the
removed field, but call it 'unused' to reflect the fact that we never look
at its contents. While at it, add a comment to remind our future selves
that the layout is external ABI.

Fixes: 0a67361dcd ("efi/x86: Remove runtime table address from kexec EFI setup data")
Reported-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-14 08:32:17 +02:00
Ingo Molnar
3b02a051d2 Linux 5.7-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl6TbaUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhgkH/iWpiKvosA20HJjC
 rBqYeJPxQsgZTuBieWJ+MeVxbpcF7RlM4c+glyvg3QJhHwIEG58dl6LBrQbAyBAR
 aFHNojr1iAYOruVCGnU3pA008YZiwUIDv/ZQ4DF8fmIU2vI2mJ6qHBv3XDl4G2uR
 Nwz8Eu9AgIwZM5coomVOSmoWyFy7Vxmb7W+3t5VmKsvOWx4ib9kyQtOIkvQDEl7j
 XCbWfI0xDQr6LFOm4jnCi5R/LhJ2LIqqIvHHrunbpszM8IwK797jCXz4im+dmd5Y
 +km46N7a8pDqri36xXz1gdBAU3eG7Pt1NyvfjwRVTdX4GquQ2MT0GoojxbLxUP3y
 3pEsQuE=
 =whbL
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refresh

Resolve these conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/Makefile

Do a minor "evil merge" to move the KCSAN entry up a bit by a few lines
in the Kconfig to reduce the probability of future conflicts.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-13 09:44:39 +02:00
Linus Torvalds
4f8a3cc118 A set of three patches to fix the fallout of the newly added split lock
detection feature.
 
 It addressed the case where a KVM guest triggers a split lock #AC and KVM
 reinjects it into the guest which is not prepared to handle it.
 
 Adds proper sanity checks which prevent the unconditional injection into
 the guest and handles the #AC on the host side in the same way as user
 space detections are handled. Depending on the detection mode it either
 warns and disables detection for the task or kills the task if the mode is
 set to fatal.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6TFtMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoSHuD/4gUKV4BefhiUulmk++jAyq1Hq91IOg
 wIZEALyP53AcxXDoASAzkbNfyDuBufszzM6lSgd0lFMN4uaDvW/7jTR9fxyG7xMT
 uTS86WZRKpMjb+NWnU5OuO7jmYSKnV/yjXqyg+z9G7nO/JdQ7b4GekdPQobGOZ0f
 NGGttSjCHvMUt5JF6tUggpl8IgomEP0HNes80IHIoJbE1KOm9qiP0NYZtlrlwYWT
 J0Xuz4b/I1IeT2Dr4X7B4LYmzYqyXpG/8khTapFyBdLqdNLBhoEUPotnEjrL3u/S
 I4h+U5N1hCu3AjC0iatjedD2etB8GWrOWhBYPmdh9LTdhHIVUIDKWDoUYl3YeESK
 Kvu5b3tyCbT86YKu2WxDtwi67yN6MM390M2JU3TLzzbbmjxMTG2dzWQaWzKeDFcx
 NwoxQU08c1/dVheODe2lCsI+RaMY3uWMpHoRrJkm105CaOGrBMpTFfHyMJsY8zgZ
 vgpUZeXylx0IrgteWyD6UrkA6LqtBukc/zOb9YL8vQmSh2I3URhUQ8O+TQG9VtbR
 e/KekJ2Ij7gXSOSu65bcpNY3q7BtVi+7ev5KFYhVMT3QuMMdc7l+TkMX5lDhAG+a
 lYdn9mxuNahyTxylGt7Sy0U1bRyn1n7fIG4azwuCiNhXtimV0urTG6PtfQTf+j4L
 bXJfa4C4BykF5A==
 =8FbL
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of three patches to fix the fallout of the newly added split
  lock detection feature.

  It addressed the case where a KVM guest triggers a split lock #AC and
  KVM reinjects it into the guest which is not prepared to handle it.

  Add proper sanity checks which prevent the unconditional injection
  into the guest and handles the #AC on the host side in the same way as
  user space detections are handled. Depending on the detection mode it
  either warns and disables detection for the task or kills the task if
  the mode is set to fatal"

* tag 'x86-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest
  KVM: x86: Emulate split-lock access as a write in emulator
  x86/split_lock: Provide handle_guest_split_lock()
2020-04-12 10:17:16 -07:00
Linus Torvalds
b753101a4a Kbuild updates for v5.7 (2nd)
- raise minimum supported binutils version to 2.23
 
  - remove old CONFIG_AS_* macros that we know binutils >= 2.23 supports
 
  - move remaining CONFIG_AS_* tests to Kconfig from Makefile
 
  - enable -Wtautological-compare warnings to catch more issues
 
  - do not support GCC plugins for GCC <= 4.7
 
  - fix various breakages of 'make xconfig'
 
  - include the linker version used for linking the kernel into
    LINUX_COMPILER, which is used for the banner, and also exposed to
    /proc/version
 
  - link lib-y objects to vmlinux forcibly when CONFIG_MODULES=y,
    which allows us to remove the lib-ksyms.o workaround, and to
    solve the last known issue of the LLVM linker
 
  - add dummy tools in scripts/dummy-tools/ to enable all compiler
    tests in Kconfig, which will be useful for distro maintainers
 
  - support the single switch, LLVM=1 to use Clang and all LLVM utilities
    instead of GCC and Binutils.
 
  - support LLVM_IAS=1 to enable the integrated assembler, which is still
    experimental
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl6RNqEVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGZPEP/3affmzIWJuKGF1RErOHK3KCe/uX
 PmLjoRZ7im7V+J4b3W+p+re6BOXIXhW+rtKoP/Ijuys9g80WeeAb2nB4h0ESOtff
 3NgN97v28mh4tVtbluJambFDXItei+UwDp1sgg2sZ7ehaSBVny9hgNmPRn5YcyoS
 O3Juy85q70l8awBWThjEHgSxEw2Rzh9PLE6YmMt40rHTxVEDjMOPSuBlp/+TWj3X
 ugF/wInp+J5mCAKCwJI4L6PavdwIwf9hg3Cv/DpoOw60TxwH+7Rq6RueDKBgHhe3
 UEPHrXyPCsF/JQwwSFxN7k481RV2PjkXFwA3U5vH+3WIRb4ETX0+fmBIrLPSAX4z
 6rZiEvdrGS4TVvW2i8mrkJUrLPHNyQ90q/FU0V18A1k77Cv7mWJjSebTAVYNvz/v
 f/DxApaepwprdtHcNYJMN/TVnwxNexJK+U+bkuXsmDggvZYCxwLQUjtI3Sab1Rv9
 C6Y8WgqKx8yP6NbqVtUMkwXdEhBiHgybVxkl9hseUEbhUElIViuq5rlrHa0FVt2Q
 w4orgFXOd7k5iuDr7ka+wa3p20KLQQuB+vwLaCpi35+4vepQ7P0i2tFNwSclo7lO
 +iNy4Bq20W0/cmQeUJIzctJGibwro1I3HPN1UJ7gp0fZ2WVGzV0SKpwQ0tLOVuuU
 y9yPsL1ciDpKQKMh
 =jpyF
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - raise minimum supported binutils version to 2.23

 - remove old CONFIG_AS_* macros that we know binutils >= 2.23 supports

 - move remaining CONFIG_AS_* tests to Kconfig from Makefile

 - enable -Wtautological-compare warnings to catch more issues

 - do not support GCC plugins for GCC <= 4.7

 - fix various breakages of 'make xconfig'

 - include the linker version used for linking the kernel into
   LINUX_COMPILER, which is used for the banner, and also exposed to
   /proc/version

 - link lib-y objects to vmlinux forcibly when CONFIG_MODULES=y, which
   allows us to remove the lib-ksyms.o workaround, and to solve the last
   known issue of the LLVM linker

 - add dummy tools in scripts/dummy-tools/ to enable all compiler tests
   in Kconfig, which will be useful for distro maintainers

 - support the single switch, LLVM=1 to use Clang and all LLVM utilities
   instead of GCC and Binutils.

 - support LLVM_IAS=1 to enable the integrated assembler, which is still
   experimental

* tag 'kbuild-v5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (36 commits)
  kbuild: fix comment about missing include guard detection
  kbuild: support LLVM=1 to switch the default tools to Clang/LLVM
  kbuild: replace AS=clang with LLVM_IAS=1
  kbuild: add dummy toolchains to enable all cc-option etc. in Kconfig
  kbuild: link lib-y objects to vmlinux forcibly when CONFIG_MODULES=y
  MIPS: fw: arc: add __weak to prom_meminit and prom_free_prom_memory
  kbuild: remove -I$(srctree)/tools/include from scripts/Makefile
  kbuild: do not pass $(KBUILD_CFLAGS) to scripts/mkcompile_h
  Documentation/llvm: fix the name of llvm-size
  kbuild: mkcompile_h: Include $LD version in /proc/version
  kconfig: qconf: Fix a few alignment issues
  kconfig: qconf: remove some old bogus TODOs
  kconfig: qconf: fix support for the split view mode
  kconfig: qconf: fix the content of the main widget
  kconfig: qconf: Change title for the item window
  kconfig: qconf: clean deprecated warnings
  gcc-plugins: drop support for GCC <= 4.7
  kbuild: Enable -Wtautological-compare
  x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2
  crypto: x86 - clean up poly1305-x86_64-cryptogams.S by 'make clean'
  ...
2020-04-11 09:46:12 -07:00
Thomas Gleixner
d7e94dbdac x86/split_lock: Provide handle_guest_split_lock()
Without at least minimal handling for split lock detection induced #AC,
VMX will just run into the same problem as the VMWare hypervisor, which
was reported by Kenneth.

It will inject the #AC blindly into the guest whether the guest is
prepared or not.

Provide a function for guest mode which acts depending on the host
SLD mode. If mode == sld_warn, treat it like user space, i.e. emit a
warning, disable SLD and mark the task accordingly. Otherwise force
SIGBUS.

 [ bp: Add a !CPU_SUP_INTEL stub for handle_guest_split_lock(). ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200410115516.978037132@linutronix.de
Link: https://lkml.kernel.org/r/20200402123258.895628824@linutronix.de
2020-04-11 16:39:30 +02:00
Logan Gunthorpe
30796e18c2 x86/mm: introduce __set_memory_prot()
For use in the 32bit arch_add_memory() to set the pgprot type of the
memory to add.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-5-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Logan Gunthorpe
c164fbb40c x86/mm: thread pgprot_t through init_memory_mapping()
In preparation to support a pgprot_t argument for arch_add_memory().

It's required to move the prototype of init_memory_mapping() seeing the
original location came before the definition of pgprot_t.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-4-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Anshuman Khandual
c62da0c35d mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS
This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the
existing VM_STACK_DEFAULT_FLAGS.  While here, also define some more
macros with standard VMA access flag combinations that are used
frequently across many platforms.  Apart from simplification, this
reduces code duplication as well.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Chris Zankel <chris@zankel.net>
Link: http://lkml.kernel.org/r/1583391014-8170-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Arjun Roy
c97078bd21 mm: define pte_index as macro for x86
pte_index() is either defined as a macro (e.g.  sparc64) or as an
inlined function (e.g.  x86).  vm_insert_pages() depends on pte_index
but it is not defined on all platforms (e.g.  m68k).

To fix compilation of vm_insert_pages() on architectures not providing
pte_index(), we perform the following fix:

0. For platforms where it is meaningful, and defined as a macro, no
    change is needed.
1. For platforms where it is meaningful and defined as an inlined
    function, and we want to use it with vm_insert_pages(), we define
    a degenerate macro of the form:  #define pte_index pte_index
2. vm_insert_pages() checks for the existence of a pte_index macro
   definition. If found, it implements a batched insert. If not found,
   it devolves to calling vm_insert_page() in a loop.

This patch implements step 1 for x86.

v3 of this patch fixes a compilation warning for an unused method.
v2 of this patch moved a macro definition to a more readable location.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200228054714.204424-1-arjunroy.kdev@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Masahiro Yamada
42251572c4 x86: remove always-defined CONFIG_AS_AVX
CONFIG_AS_AVX was introduced by commit ea4d26ae24 ("raid5: add AVX
optimized RAID5 checksumming").

We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5 ("kbuild: Raise the minimum
required binutils version to 2.21").

I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.

Remove CONFIG_AS_AVX, which is always defined.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:59 +09:00
Masahiro Yamada
48e24723d0 x86: remove always-defined CONFIG_AS_CFI_SECTIONS
CONFIG_AS_CFI_SECTIONS was introduced by commit 9e56529227 ("x86:
Use .cfi_sections for assembly code").

We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5 ("kbuild: Raise the minimum
required binutils version to 2.21").

I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.

Remove CONFIG_AS_CFI_SECTIONS, which is always defined.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:59 +09:00
Masahiro Yamada
46427f658e x86: remove unneeded (CONFIG_AS_)CFI_SIGNAL_FRAME
Commit 131484c8da ("x86/debug: Remove perpetually broken,
unmaintainable dwarf annotations") removes all the users of
CFI_SIGNAL_FRAME.

Remove the CFI_SIGNAL_FRAME and CONFIG_AS_CFI_SIGNAL_FRAME.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:59 +09:00
Masahiro Yamada
0f2661c4b9 x86: remove always-defined CONFIG_AS_CFI
CONFIG_AS_CFI was introduced by commit e2414910f2 ("[PATCH] x86:
Detect CFI support in the assembler at runtime"), and extended by
commit f0f12d85af ("x86_64: Check for .cfi_rel_offset in CFI probe").

We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5 ("kbuild: Raise the minimum
required binutils version to 2.21").

I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.

Remove CONFIG_AS_CFI, which is always defined.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:59 +09:00
Masahiro Yamada
418d6e295e x86: remove unneeded defined(__ASSEMBLY__) check from asm/dwarf2.h
This header file has the following check at the top:

  #ifndef __ASSEMBLY__
  #warning "asm/dwarf2.h should be only included in pure assembly files"
  #endif

So, we expect defined(__ASSEMBLY__) is always true.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:58 +09:00
Peter Xu
2e3d5dc508 userfaultfd: wp: add pmd_swp_*uffd_wp() helpers
Adding these missing helpers for uffd-wp operations with pmd
swap/migration entries.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-10-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Andrea Arcangeli
5a281062af userfaultfd: wp: add WP pagetable tracking to x86
Accurate userfaultfd WP tracking is possible by tracking exactly which
virtual memory ranges were writeprotected by userland.  We can't relay
only on the RW bit of the mapped pagetable because that information is
destroyed by fork() or KSM or swap.  If we were to relay on that, we'd
need to stay on the safe side and generate false positive wp faults for
every swapped out page.

[peterx@redhat.com: append _PAGE_UFD_WP to _PAGE_CHG_MASK]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-4-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Linus Torvalds
86f26a77cb pci-v5.7-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl6GTQMUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vy3PhAAmqpYBRobOsG8QbmKDjoJEFtkqdvD
 z6+4zf/R+hF11RyXjMDwihIe8d+tkQ4eAaYu6Oh5PrTyanz0G0PgeCrivZeytULk
 thqQIWzDQMVA5vN/2/Vy8s5s+3HzP8z/MZOFScJ7+xA1MndXptPRTNmFUbjx+GAv
 x8/pTp0u9AF6m7itX65DxXvwkzjWamt+Ar4Yx2IcuKAU/M5RtfuZO3PpDnqn7/wk
 JFlkRoYeFB6qNnnkPdeyPHl9dALhuhzgdTyklQEnKVW3nf3xThYDhcEwdh6kBQgl
 0dH8lL5LXy7PKGN8RES4wB0Vqndw/HlsCF5O4wkkfItbnbJxGJtS139e5973m0ud
 sgWvF4yJAT2jCKhIeNz34sePQJMyWALhv0XzZCsJ0YeGHsrV1jrHELkwUT1+eIsT
 3UV0iZ6aL06zQJDyKUbbIcQzEQ/wwBC+x9VgsyL54K1quCQZ1N1Nl/dvrb4cRG9m
 m9EhJK/brDf4c0uFlOmMTSxV1t5J+z6ZSQnh1ShD/o5yBsxqN6q5brDT6LEs+jbM
 LsIkA18jJOd4OyiDs98YiFKvIfFQbQ0LEBQpJwhF0snvfBFMMbUYN/T/NYneWON/
 F0TpkFoP7PXDuq55iNaLdnObfzrpC9kdzUyWvePUvjxIl55bkf+/qtUny+H48t4L
 dNggvW052d7BHes=
 =deWu
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Revert sysfs "rescan" renames that broke apps (Kelsey Skunberg)

   - Add more 32 GT/s link speed decoding and improve the implementation
     (Yicong Yang)

  Resource management:

   - Add support for sizing programmable host bridge apertures and fix a
     related alpha Nautilus regression (Ivan Kokshaysky)

  Interrupts:

   - Add boot interrupt quirk mechanism for Xeon chipsets and document
     boot interrupts (Sean V Kelley)

  PCIe native device hotplug:

   - When possible, disable in-band presence detect and use PDS
     (Alexandru Gagniuc)

   - Add DMI table for devices that don't use in-band presence detection
     but don't advertise that correctly (Stuart Hayes)

   - Fix hang when powering slots up/down via sysfs (Lukas Wunner)

   - Fix an MSI interrupt race (Stuart Hayes)

  Virtualization:

   - Add ACS quirks for Zhaoxin devices (Raymond Pang)

  Error handling:

   - Add Error Disconnect Recover (EDR) support so firmware can report
     devices disconnected via DPC and we can try to recover (Kuppuswamy
     Sathyanarayanan)

  Peer-to-peer DMA:

   - Add Intel Sky Lake-E Root Ports B, C, D to the whitelist (Andrew
     Maier)

  ASPM:

   - Reduce severity of common clock config message (Chris Packham)

   - Clear the correct bits when enabling L1 substates, so we don't go
     to the wrong state (Yicong Yang)

  Endpoint framework:

   - Replace EPF linkup ops with notifier call chain and improve locking
     (Kishon Vijay Abraham I)

   - Fix concurrent memory allocation in OB address region (Kishon Vijay
     Abraham I)

   - Move PF function number assignment to EPC core to support multiple
     function creation methods (Kishon Vijay Abraham I)

   - Fix issue with clearing configfs "start" entry (Kunihiko Hayashi)

   - Fix issue with endpoint MSI-X ignoring BAR Indicator and Table
     Offset (Kishon Vijay Abraham I)

   - Add support for testing DMA transfers (Kishon Vijay Abraham I)

   - Add support for testing > 10 endpoint devices (Kishon Vijay Abraham I)

   - Add support for tests to clear IRQ (Kishon Vijay Abraham I)

   - Add common DT schema for endpoint controllers (Kishon Vijay Abraham I)

  Amlogic Meson PCIe controller driver:

   - Add DT bindings for AXG PCIe PHY, shared MIPI/PCIe analog PHY (Remi
     Pommarel)

   - Add Amlogic AXG PCIe PHY, AXG MIPI/PCIe analog PHY drivers (Remi
     Pommarel)

  Cadence PCIe controller driver:

   - Add Root Complex/Endpoint DT schema for Cadence PCIe (Kishon Vijay
     Abraham I)

  Intel VMD host bridge driver:

   - Add two VMD Device IDs that require bus restriction mode (Sushma
     Kalakota)

  Mobiveil PCIe controller driver:

   - Refactor and modularize mobiveil driver (Hou Zhiqiang)

   - Add support for Mobiveil GPEX Gen4 host (Hou Zhiqiang)

  Microsoft Hyper-V host bridge driver:

   - Add support for Hyper-V PCI protocol version 1.3 and
     PCI_BUS_RELATIONS2 (Long Li)

   - Refactor to prepare for virtual PCI on non-x86 architectures (Boqun
     Feng)

   - Fix memory leak in hv_pci_probe()'s error path (Dexuan Cui)

  NVIDIA Tegra PCIe controller driver:

   - Use pci_parse_request_of_pci_ranges() (Rob Herring)

   - Add support for endpoint mode and related DT updates (Vidya Sagar)

   - Reduce -EPROBE_DEFER error message log level (Thierry Reding)

  Qualcomm PCIe controller driver:

   - Restrict class fixup to specific Qualcomm devices (Bjorn Andersson)

  Synopsys DesignWare PCIe controller driver:

   - Refactor core initialization code for endpoint mode (Vidya Sagar)

   - Fix endpoint MSI-X to use correct table address (Kishon Vijay
     Abraham I)

  TI DRA7xx PCIe controller driver:

   - Fix MSI IRQ handling (Vignesh Raghavendra)

  TI Keystone PCIe controller driver:

   - Allow AM654 endpoint to raise MSI-X interrupt (Kishon Vijay Abraham I)

  Miscellaneous:

   - Quirk ASMedia XHCI USB to avoid "PME# from D0" defect (Kai-Heng
     Feng)

   - Use ioremap(), not phys_to_virt(), for platform ROM to fix video
     ROM mapping with CONFIG_HIGHMEM (Mikel Rychliski)"

* tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (96 commits)
  misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
  PCI: tegra: Print -EPROBE_DEFER error message at debug level
  misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
  misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
  tools: PCI: Add 'e' to clear IRQ
  misc: pci_endpoint_test: Add ioctl to clear IRQ
  misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
  PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
  PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
  PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
  misc: pci_endpoint_test: Add support to get DMA option from userspace
  tools: PCI: Add 'd' command line option to support DMA
  misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
  PCI: endpoint: functions/pci-epf-test: Print throughput information
  PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
  PCI: pciehp: Fix MSI interrupt race
  PCI: pciehp: Fix indefinite wait on sysfs requests
  PCI: endpoint: Fix clearing start entry in configfs
  PCI: tegra: Add support for PCIe endpoint mode in Tegra194
  PCI: sysfs: Revert "rescan" file renames
  ...
2020-04-03 14:25:02 -07:00
Linus Torvalds
8c1b724ddb ARM:
* GICv4.1 support
 * 32bit host removal
 
 PPC:
 * secure (encrypted) using under the Protected Execution Framework
 ultravisor
 
 s390:
 * allow disabling GISA (hardware interrupt injection) and protected
 VMs/ultravisor support.
 
 x86:
 * New dirty bitmap flag that sets all bits in the bitmap when dirty
 page logging is enabled; this is faster because it doesn't require bulk
 modification of the page tables.
 * Initial work on making nested SVM event injection more similar to VMX,
 and less buggy.
 * Various cleanups to MMU code (though the big ones and related
 optimizations were delayed to 5.8).  Instead of using cr3 in function
 names which occasionally means eptp, KVM too has standardized on "pgd".
 * A large refactoring of CPUID features, which now use an array that
 parallels the core x86_features.
 * Some removal of pointer chasing from kvm_x86_ops, which will also be
 switched to static calls as soon as they are available.
 * New Tigerlake CPUID features.
 * More bugfixes, optimizations and cleanups.
 
 Generic:
 * selftests: cleanups, new MMU notifier stress test, steal-time test
 * CSV output for kvm_stat.
 
 KVM/MIPS has been broken since 5.5, it does not compile due to a patch committed
 by MIPS maintainers.  I had already prepared a fix, but the MIPS maintainers
 prefer to fix it in generic code rather than KVM so they are taking care of it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6GOnIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMfxwf/ZKLZiRoaovXCOG71M/eHtQb8ZIqU
 3MPy+On3eC5Sk/aBxWUL9EFZsbYG6kYdbZ1VOvG9XPBoLlnkDSm/IR0kaELHtnjj
 oGVda/tvGn46Ne39y8xBptmb91WDcWH0vFthT/CwlMxAw3xjr+gG7Qyo+8F2CW6m
 SSSuLiHSBnyO1cQKruBTHZ8qnR8LlnfXEqtd6Y4LFLic0LbLIoIdRcT3wjQrcZrm
 Djd7wbTEYZjUfoqZ72ekwEDUsONcDLDSKcguDO9pSMSCGhpxCVT5Vy68KRpoIMs2
 nzNWDKjvqQo5zb2+GWxJgkd12Hv+n7PCXZMbVrWBu1pQsewUns9m4mkpGw==
 =6fGt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - GICv4.1 support

   - 32bit host removal

  PPC:
   - secure (encrypted) using under the Protected Execution Framework
     ultravisor

  s390:
   - allow disabling GISA (hardware interrupt injection) and protected
     VMs/ultravisor support.

  x86:
   - New dirty bitmap flag that sets all bits in the bitmap when dirty
     page logging is enabled; this is faster because it doesn't require
     bulk modification of the page tables.

   - Initial work on making nested SVM event injection more similar to
     VMX, and less buggy.

   - Various cleanups to MMU code (though the big ones and related
     optimizations were delayed to 5.8). Instead of using cr3 in
     function names which occasionally means eptp, KVM too has
     standardized on "pgd".

   - A large refactoring of CPUID features, which now use an array that
     parallels the core x86_features.

   - Some removal of pointer chasing from kvm_x86_ops, which will also
     be switched to static calls as soon as they are available.

   - New Tigerlake CPUID features.

   - More bugfixes, optimizations and cleanups.

  Generic:
   - selftests: cleanups, new MMU notifier stress test, steal-time test

   - CSV output for kvm_stat"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
  x86/kvm: fix a missing-prototypes "vmread_error"
  KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
  KVM: VMX: Add a trampoline to fix VMREAD error handling
  KVM: SVM: Annotate svm_x86_ops as __initdata
  KVM: VMX: Annotate vmx_x86_ops as __initdata
  KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
  KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
  KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
  KVM: VMX: Configure runtime hooks using vmx_x86_ops
  KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
  KVM: x86: Move init-only kvm_x86_ops to separate struct
  KVM: Pass kvm_init()'s opaque param to additional arch funcs
  s390/gmap: return proper error code on ksm unsharing
  KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
  KVM: Fix out of range accesses to memslots
  KVM: X86: Micro-optimize IPI fastpath delay
  KVM: X86: Delay read msr data iff writes ICR MSR
  KVM: PPC: Book3S HV: Add a capability for enabling secure guests
  KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
  KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
  ...
2020-04-02 15:13:15 -07:00
Linus Torvalds
f14a9532ee A single fix addressing Sparse warnings. <asm/bitops.h> is changed non-trivially
to avoid the warnings, but generated code is not supposed to be affected.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl6Fs/QRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iOLQ//R3i1QAIvDasdG9EMbSkJAZP2cliK8r3B
 Vc+3quYv8wjJcgj5LGPrlVeoV2X96jopZiN3YeWWJg0rA30ZyZMPiVkQWtYVUEtU
 VjGvX5RNw0ShcWjzbetcPXhyczCpJKFwFVv2fEVPwAvI3OyqGuL044aFQhgksra+
 RE4n8eYWB9pastFeJGn1WPWdJOw40fOcC7YbAF3USo7e8aO/Wv3KJiZxahhGFnPt
 5spBnZHSPbvZp9O8pgdYVJ09mExK2wBxk/GClQw/E4i7d/TLcHEzBIOAekS98H0F
 9lNgCnFLVmEK5DA4TXMPhz+aYfEb5VFoBgz4wA4VOiwcPrTJKa0IukcG+oWXWPrB
 PRb8StNB3IHU0pqKPHRemyPNzl9d4DMm22NMfRBCVUrPrDlYkOb1tCANgcyHOyMf
 G/w2nbcNDgzi9m2L38gWCFIY5AP1AKW+0X8MdsvyESlTXIC6lsBFsjsLE69nbv7c
 dBYYxwEKb41bjXpWIxbdCEyW9kNZTSt5RZP+Md/2DGoeWLHba4iHmmXjhJKGF1F3
 pf1yJZDVoaQkwX+mLgDyC9681UzDA0lRMrSBIhQOpw2OuCkvBRTifwvH0efbLjtN
 cXxeCZvK8O1Zmc/BTtdRRPWybItjtZmkfm2iVviFUxY566i/vAQdmQBST4Uqq4qv
 2V9nnZVEJas=
 =Vx32
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
 "A single fix addressing Sparse warnings. <asm/bitops.h> is changed
  non-trivially to avoid the warnings, but generated code is not
  supposed to be affected"

* tag 'x86-urgent-2020-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix bitops.h warning with a moved cast
2020-04-02 14:52:12 -07:00
Linus Torvalds
6cad420cc6 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
 "A large amount of MM, plenty more to come.

  Subsystems affected by this patch series:
   - tools
   - kthread
   - kbuild
   - scripts
   - ocfs2
   - vfs
   - mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap,
         sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy,
         hugetlbfs, hugetlb"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
  include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
  mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
  selftests/vm: fix map_hugetlb length used for testing read and write
  mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
  mm/hugetlb.c: clean code by removing unnecessary initialization
  hugetlb_cgroup: add hugetlb_cgroup reservation docs
  hugetlb_cgroup: add hugetlb_cgroup reservation tests
  hugetlb: support file_region coalescing again
  hugetlb_cgroup: support noreserve mappings
  hugetlb_cgroup: add accounting for shared mappings
  hugetlb: disable region_add file_region coalescing
  hugetlb_cgroup: add reservation accounting for private mappings
  mm/hugetlb_cgroup: fix hugetlb_cgroup migration
  hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
  hugetlb_cgroup: add hugetlb_cgroup reservation counter
  hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  mm/memblock.c: remove redundant assignment to variable max_addr
  mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
  mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
  ...
2020-04-02 13:55:34 -07:00
Anshuman Khandual
7969f2264f mm/vma: make vma_is_foreign() available for general use
Idea of a foreign VMA with respect to the present context is very generic.
But currently there are two identical definitions for this in powerpc and
x86 platforms.  Lets consolidate those redundant definitions while making
vma_is_foreign() available for general use later.  This should not cause
any functional change.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Link: http://lkml.kernel.org/r/1582782965-3274-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:29 -07:00
Masahiro Yamada
630f289b71 asm-generic: make more kernel-space headers mandatory
Change a header to mandatory-y if both of the following are met:

[1] At least one architecture (except um) specifies it as generic-y in
    arch/*/include/asm/Kbuild

[2] Every architecture (except um) either has its own implementation
    (arch/*/include/asm/*.h) or specifies it as generic-y in
    arch/*/include/asm/Kbuild

This commit was generated by the following shell script.

----------------------------------->8-----------------------------------

arches=$(cd arch; ls -1 | sed -e '/Kconfig/d' -e '/um/d')

tmpfile=$(mktemp)

grep "^mandatory-y +=" include/asm-generic/Kbuild > $tmpfile

find arch -path 'arch/*/include/asm/Kbuild' |
	xargs sed -n 's/^generic-y += \(.*\)/\1/p' | sort -u |
while read header
do
	mandatory=yes

	for arch in $arches
	do
		if ! grep -q "generic-y += $header" arch/$arch/include/asm/Kbuild &&
			! [ -f arch/$arch/include/asm/$header ]; then
			mandatory=no
			break
		fi
	done

	if [ "$mandatory" = yes ]; then
		echo "mandatory-y += $header" >> $tmpfile

		for arch in $arches
		do
			sed -i "/generic-y += $header/d" arch/$arch/include/asm/Kbuild
		done
	fi

done

sed -i '/^mandatory-y +=/d' include/asm-generic/Kbuild

LANG=C sort $tmpfile >> include/asm-generic/Kbuild

----------------------------------->8-----------------------------------

One obvious benefit is the diff stat:

 25 files changed, 52 insertions(+), 557 deletions(-)

It is tedious to list generic-y for each arch that needs it.

So, mandatory-y works like a fallback default (by just wrapping
asm-generic one) when arch does not have a specific header
implementation.

See the following commits:

def3f7cefe
a1b39bae16

It is tedious to convert headers one by one, so I processed by a shell
script.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200210175452.5030-1-masahiroy@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:25 -07:00
Linus Torvalds
890f0b0d27 x86: start using named parameters for low-level uaccess asms
This is partly for readability - using named arguments instead of
numbered ones makes it muchmore obvious just what is going on.  Using
"%[efault]" instead of "%4" for the special -EFAULT constant just means
that you don't have to count the arguments to see what's up.

But the motivation for all this cleanup is that when we'll start to
conditionally use "asm goto" even for the __get_user_asm() case, the
argument numbers will depend on whether we have an error output, or an
error label we can just directly jump to.

So this moves us towards named arguments for the same reason that we
have to use named arguments for the asms that use SET_CC(): numbering
will eventually become similarly unreliable and depends on whether we
can use particular compiler features or not.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-01 13:23:14 -07:00
Linus Torvalds
7da63b3d54 x86: get rid of 'rtype' argument to __get_user_asm() macro
This is the exact same thing as 3680785692 ("x86: get rid of 'rtype'
argument to __put_user_goto() macro") except it's about __get_user_asm()
rather than __put_user_goto().

The reasons are the same: having the low-level asm access the argument
with a different size than the compiler thinks it does is fundamentally
wrong.

But unlike the __put_user_goto() case, we actually did tell the compiler
that we used a bigger variable (either long or long long), and then only
filled in the low bits, and ended up "fixing" this by casting the result
to the proper pointer type.

That's because we needed to use a non-qualified type (the user pointer
might be a const pointer!), and that makes this a bit more painful.  Our
'__inttype()' macro used to be lazy and only differentiate between "fits
in a register" or "needs two registers".

So this fix had to also make that '__inttype()' macro more precise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-01 12:41:50 -07:00
Linus Torvalds
3680785692 x86: get rid of 'rtype' argument to __put_user_goto() macro
The 'rtype' argument goes back to pre-git (and pre-BK) times, and comes
from the fact that we used to not necessarily have the same type sizes
for the arguments of the inline asm as we did for the actual accesses we
did.

So 'rtype' is the 'register type' - the override of the register size in
the inline asm when it doesn't match the actual size of the variable we
use as the output argument (for when you used "put_user()" on an "int"
value that was assigned to a byte-sized user space access etc).

That mismatch doesn't actually exist any more, and should probably never
have existed in the first place.  It's a horrid bug just waiting to
happen (using more - or less - of the variable that the compiler
expected us to use).

I think we had some odd casting going on to hide the effects of that
oddity after-the-fact, but those are long gone, and these days we should
always have the right size value in the first place, using things like

        __typeof__(*(ptr)) __pu_val = (x);

and gcc should thus have the right register size without any manual
'rtype' games.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-01 10:52:01 -07:00
Linus Torvalds
1a323ea535 x86: get rid of 'errret' argument to __get_user_xyz() macross
Every remaining user just has the error case returning -EFAULT.

In fact, the exception was __get_user_asm_nozero(), which was removed in
commit 4b842e4e25 ("x86: get rid of small constant size cases in
raw_copy_{to,from}_user()"), and the other __get_user_xyz() macros just
followed suit for consistency.

Fix up some macro whitespace while at it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-31 18:23:47 -07:00
Linus Torvalds
ab33eb494c x86: remove __put_user_asm() infrastructure
The last user was removed by commit 4b842e4e25 ("x86: get rid of small
constant size cases in raw_copy_{to,from}_user()").  Get rid of the
left-overs before somebody tries to use it again.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-31 18:11:18 -07:00
Linus Torvalds
d9d7677892 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "A handful of changes:

   - two memory encryption related fixes

   - don't display the kernel's virtual memory layout plaintext on
     32-bit kernels either

   - two simplifications"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Remove the now redundant N_MEMORY check
  dma-mapping: Fix dma_pgprot() for unencrypted coherent pages
  x86: Don't let pgprot_modify() change the page encryption bit
  x86/mm/kmmio: Use this_cpu_ptr() instead get_cpu_var() for kmmio_ctx
  x86/mm/init/32: Stop printing the virtual memory layout
2020-03-31 11:51:05 -07:00
Linus Torvalds
d0be2d53c7 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "Misc changes:

   - add a pkey sanity check

   - three commits to improve and future-proof xstate/xfeature handling
     some more"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pkeys: Add check for pkey "overflow"
  x86/fpu/xstate: Warn when checking alignment of disabled xfeatures
  x86/fpu/xstate: Fix XSAVES offsets in setup_xstate_comp()
  x86/fpu/xstate: Fix last_good_offset in setup_xstate_features()
2020-03-31 11:26:22 -07:00
Linus Torvalds
fdf5563a72 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "This topic tree contains more commits than usual:

   - most of it are uaccess cleanups/reorganization by Al

   - there's a bunch of prototype declaration (--Wmissing-prototypes)
     cleanups

   - misc other cleanups all around the map"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/mm/set_memory: Fix -Wmissing-prototypes warnings
  x86/efi: Add a prototype for efi_arch_mem_reserve()
  x86/mm: Mark setup_emu2phys_nid() static
  x86/jump_label: Move 'inline' keyword placement
  x86/platform/uv: Add a missing prototype for uv_bau_message_interrupt()
  kill uaccess_try()
  x86: unsafe_put-style macro for sigmask
  x86: x32_setup_rt_frame(): consolidate uaccess areas
  x86: __setup_rt_frame(): consolidate uaccess areas
  x86: __setup_frame(): consolidate uaccess areas
  x86: setup_sigcontext(): list user_access_{begin,end}() into callers
  x86: get rid of put_user_try in __setup_rt_frame() (both 32bit and 64bit)
  x86: ia32_setup_rt_frame(): consolidate uaccess areas
  x86: ia32_setup_frame(): consolidate uaccess areas
  x86: ia32_setup_sigcontext(): lift user_access_{begin,end}() into the callers
  x86/alternatives: Mark text_poke_loc_init() static
  x86/cpu: Fix a -Wmissing-prototypes warning for init_ia32_feat_ctl()
  x86/mm: Drop pud_mknotpresent()
  x86: Replace setup_irq() by request_irq()
  x86/configs: Slightly reduce defconfigs
  ...
2020-03-31 11:04:05 -07:00
Linus Torvalds
9589351ccf Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "Misc cleanups and small enhancements all around the map"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/compressed: Fix debug_puthex() parameter type
  x86/setup: Fix static memory detection
  x86/vmlinux: Drop unneeded linker script discard of .eh_frame
  x86/*/Makefile: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
  x86/boot/compressed: Remove .eh_frame section from bzImage
  x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
  x86/boot/compressed/64: Use 32-bit (zero-extended) MOV for z_output_len
  x86/boot/compressed/64: Use LEA to initialize boot stack pointer
2020-03-31 10:28:35 -07:00
Sean Christopherson
6e4fd06f3e KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
Remove the __exit annotation from VMX hardware_unsetup(), the hook
can be reached during kvm_init() by way of kvm_arch_hardware_unsetup()
if failure occurs at various points during initialization.

Removing the annotation also lets us annotate vmx_x86_ops and svm_x86_ops
with __initdata; otherwise, objtool complains because it doesn't
understand that the vendor specific __initdata is being copied by value
to a non-__initdata instance.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-8-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-31 10:48:09 -04:00
Sean Christopherson
afaf0b2f9b KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
Replace the kvm_x86_ops pointer in common x86 with an instance of the
struct to save one pointer dereference when invoking functions.  Copy the
struct by value to set the ops during kvm_init().

Arbitrarily use kvm_x86_ops.hardware_enable to track whether or not the
ops have been initialized, i.e. a vendor KVM module has been loaded.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-7-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-31 10:48:08 -04:00
Sean Christopherson
d008dfdb0e KVM: x86: Move init-only kvm_x86_ops to separate struct
Move the kvm_x86_ops functions that are used only within the scope of
kvm_init() into a separate struct, kvm_x86_init_ops.  In addition to
identifying the init-only functions without restorting to code comments,
this also sets the stage for waiting until after ->hardware_setup() to
set kvm_x86_ops.  Setting kvm_x86_ops after ->hardware_setup() is
desirable as many of the hooks are not usable until ->hardware_setup()
completes.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-3-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-31 10:48:04 -04:00
Paolo Bonzini
cf39d37539 KVM/arm updates for Linux 5.7
- GICv4.1 support
 - 32bit host removal
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl6DKKIPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDDe0P/30Oda6HJdcUY+g0dnHkH8N7t+VKjPPnihlX
 WBaT0Y4SzMsfAtG5lQqS48A50dXKWW70QvwkZjxu7abQhYFWGd2SGtTQxwqJXT8J
 I6MBh4r9xrIfiqzVT2BXslA6id5H6wCyyFI6vKm/IFkIu1J6JtwnKakQ0CIddS1d
 Blbgj5jcxGw+2xOppHCQXbWwwDdmYWkMZEBZjmhkezddqLDK+oaAUiUhHHHizTsB
 kLjgqYBVENpR1zDIsGpQAJloKXAiHfBQshQAmnhnBNzXE60LZ0n0/iODU9U5FDEO
 5j0DRWccKvsIMsUh7JpPr5xerGJ0rqk1IwPC2JcyzfRbvRLMpK1IOWfhI5Tg5lbP
 4Ev96QLEMBnKOWMSE0MqnMdq6JPzDLA6WZ28HZe2nc3/oWNgsSDtlXigx4xFFxTX
 zfc2YpAgFu3xJkPf8PtWTFvItm0AvFNFynPg0Rr/NsGf/FGeszYR4cLcHmv5NlWS
 IiV4+lgnlmr2LZr3VjUaumbtWIpuVF4Db5Al2K2E/PCN7ObfEkyCweDic8ophkH8
 sMS9TI38aH1Efy+I2Nfxxqpy8BcElZAMrAWt9R27A4JRLHdr7j5DsGnyRigXHgRe
 pFgbqtk/EjWkHwjaJVg8kPxf2+2P05VZsQeGG721nbKAIKDetM3RA2BflexdsptY
 kXplNsVr
 =eILh
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for Linux 5.7

- GICv4.1 support
- 32bit host removal
2020-03-31 10:44:53 -04:00
Linus Torvalds
2853d5fafb Support for "split lock" detection:
- Atomic operations (lock prefixed instructions) which span two cache
     lines have to acquire the global bus lock. This is at least 1k cycles
     slower than an atomic operation within a cache line and disrupts
     performance on other cores. Aside of performance disruption this is
     a unpriviledged form of DoS.
 
     Some newer CPUs have the capability to raise an #AC trap when such an
     operation is attempted. The detection is by default enabled in warning
     mode which will warn once when a user space application is caught. A
     command line option allows to disable the detection or to select fatal
     mode which will terminate offending applications with SIGBUS.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B/uMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYocsAD/9yqpw+XlPKNPsfbm9sbirBDfTrENcL
 F44iwn4WnrjoW/gnnZCYmPxJFsTtGVPqxHdUf4eyGemg9r9ZEO0DQftmUHC5Z6KX
 aa/b5JoeM61wp9HlpVlD4D1jVt4pWyQODQeZnUXE4DEzmRc3cD/5lSU+/VeaIwwz
 lxwUemqmXK7ucH2KA7smOGsl2nU6ED84q3mdOB1b4Cw+gWYMUnPJnuS/ipriBRx4
 BYbMItcxsFvtdO9Hx8PvGd5LUK0wW8JOWrYQICD2kLpZtHtGeaHpBzFzL0+nMU7d
 1epyDqJQDmX+PAzvj+EYyn3HTfobZlckn+tbxMQkkS+oDk1ywOZd+BancClvn5/5
 jMfPIQJF5bGASVnzGMWhzVdwthTZiMG4d1iKsUWOA/hN0ch0+rm1BqraToabsEFg
 Sv7/rvl9KtSOtMJTeAmMhlZUMBj9m8BtPFjniDwp6nw/upGgJdST5mrKFNYZvqOj
 JnXsEMr/nJVW6bnUvT6LF66xbHlzHdxtodkQWqF+IEsyRaOz1zAGpQamP98KxNLc
 dq/XYoEe1KqIFbg4BkNP+GeDL3FQDxjFNwPQnnjQEzWRbjkHlfmq1uKCsR2r8mBO
 fYNJ1X8lTyGV0kx/ERpWGazzabpzh+8Lr1yMhnoA3EWvlzUjmpN2PFI4oTpTrtzT
 c/q16SCxim3NWA==
 =D9x8
 -----END PGP SIGNATURE-----

Merge tag 'x86-splitlock-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 splitlock updates from Thomas Gleixner:
 "Support for 'split lock' detection:

  Atomic operations (lock prefixed instructions) which span two cache
  lines have to acquire the global bus lock. This is at least 1k cycles
  slower than an atomic operation within a cache line and disrupts
  performance on other cores. Aside of performance disruption this is a
  unpriviledged form of DoS.

  Some newer CPUs have the capability to raise an #AC trap when such an
  operation is attempted. The detection is by default enabled in warning
  mode which will warn once when a user space application is caught. A
  command line option allows to disable the detection or to select fatal
  mode which will terminate offending applications with SIGBUS"

* tag 'x86-splitlock-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR
  x86/split_lock: Rework the initialization flow of split lock detection
  x86/split_lock: Enable split lock detection by kernel
2020-03-30 19:35:52 -07:00
Linus Torvalds
d5f744f9a2 x86 entry code updates:
- Convert the 32bit syscalls to be pt_regs based which removes the
       requirement to push all 6 potential arguments onto the stack and
       consolidates the interface with the 64bit variant
 
     - The first small portion of the exception and syscall related entry
       code consolidation which aims to address the recently discovered
       issues vs. RCU, int3, NMI and some other exceptions which can
       interrupt any context. The bulk of the changes is still work in
       progress and aimed for 5.8.
 
     - A few lockdep namespace cleanups which have been applied into this
       branch to keep the prerequisites for the ongoing work confined.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B/TMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYA6EAC7r/bCMxBelljT3b7LkBbiJcocJ+zK
 OSzWU9miJGTAvYqn4/ciLKg4dA424b/1rBFlF1hBTCQ0HL5Cv4lajxdKEZCO5WCC
 WWTCz+MC60aWFaH3VNoywiLGb39H2IbqWbS9yNPd/wBkLHiMAD6NPQntOvcPaD4j
 1lyrMtLzfrWlrHxvxdI3kt5ZpFLYNXr2xk61xQjTz0ROFQBhf2sDsuhHhiYVLPj7
 JwYktpbBiPeaw2+I18NPymNPY+VfY8LCTgLl5M+rbKyCqebKaedZQJ7QXFhAEqKC
 Y2f+gJsKWtTDzGP2mk/5kF0uP7cd0vJK35ZCXtLZ9BbcNtFZU6w+ADqRo4pJBHRY
 QRzo/AWrdkuTJF0CrP6mcneNC7NwWLSdKrE1z77RQCHUPVvhHhRDZsgdLcZ/KKwx
 y1ji22trwNB+7LmI2fUOU5RRHZBIuNvQT+mPt24febJuHpZKul62dd3cqTGeSTC+
 MYVknYDSg/+jk+83DhuZnTyb9lWTbq/0Q1HRDu6l2LrMIH7YMPpY5Ea64ZFYzWXy
 s0+iHEM4mUzltwNauHIntjbwXi3C0l2k1WQyG0gun2eS6SXfu0lb93V4msFj/N1+
 oHavH2n2A4XrRr+Ob87fsl7nfXJibWP7R9xPblrWP2sNdqfjSyGd49rnsvpWqWMK
 Fj0d7tQ78+/SwA==
 =tWXS
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry code updates from Thomas Gleixner:

 - Convert the 32bit syscalls to be pt_regs based which removes the
   requirement to push all 6 potential arguments onto the stack and
   consolidates the interface with the 64bit variant

 - The first small portion of the exception and syscall related entry
   code consolidation which aims to address the recently discovered
   issues vs. RCU, int3, NMI and some other exceptions which can
   interrupt any context. The bulk of the changes is still work in
   progress and aimed for 5.8.

 - A few lockdep namespace cleanups which have been applied into this
   branch to keep the prerequisites for the ongoing work confined.

* tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  x86/entry: Fix build error x86 with !CONFIG_POSIX_TIMERS
  lockdep: Rename trace_{hard,soft}{irq_context,irqs_enabled}()
  lockdep: Rename trace_softirqs_{on,off}()
  lockdep: Rename trace_hardirq_{enter,exit}()
  x86/entry: Rename ___preempt_schedule
  x86: Remove unneeded includes
  x86/entry: Drop asmlinkage from syscalls
  x86/entry/32: Enable pt_regs based syscalls
  x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments
  x86/entry/32: Rename 32-bit specific syscalls
  x86/entry/32: Clean up syscall_32.tbl
  x86/entry: Remove ABI prefixes from functions in syscall tables
  x86/entry/64: Add __SYSCALL_COMMON()
  x86/entry: Remove syscall qualifier support
  x86/entry/64: Remove ptregs qualifier from syscall table
  x86/entry: Move max syscall number calculation to syscallhdr.sh
  x86/entry/64: Split X32 syscall table into its own file
  x86/entry/64: Move sys_ni_syscall stub to common.c
  x86/entry/64: Use syscall wrappers for x32_rt_sigreturn
  x86/entry: Refactor SYS_NI macros
  ...
2020-03-30 19:14:28 -07:00
Linus Torvalds
dbb381b619 timekeeping and timer updates:
Core:
 
   - Consolidation of the vDSO build infrastructure to address the
     difficulties of cross-builds for ARM64 compat vDSO libraries by
     restricting the exposure of header content to the vDSO build.
 
     This is achieved by splitting out header content into separate
     headers. which contain only the minimaly required information which is
     necessary to build the vDSO. These new headers are included from the
     kernel headers and the vDSO specific files.
 
   - Enhancements to the generic vDSO library allowing more fine grained
     control over the compiled in code, further reducing architecture
     specific storage and preparing for adopting the generic library by PPC.
 
   - Cleanup and consolidation of the exit related code in posix CPU timers.
 
   - Small cleanups and enhancements here and there
 
  Drivers:
 
   - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support
 
   - Correct the clock rate of PIT64b global clock
 
   - setup_irq() cleanup
 
   - Preparation for PWM and suspend support for the TI DM timer
 
   - Expand the fttmr010 driver to support ast2600 systems
 
   - The usual small fixes, enhancements and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B+QETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofJ5D/94s5fpaqiuNcaAsLq2D3DRIrTnqxx7
 yEeAOPcbYV1bM1SgY/M83L5yGc2S8ny787e26abwRTCZhZV3eAmRTphIFFIZR0Xk
 xS+i67odscbdJTRtztKj3uQ9rFxefszRuphyaa89pwSY9nnyMWLcahGSQOGs0LJK
 hvmgwPjyM1drNfPxgPiaFg7vDr2XxNATpQr/FBt+BhelvVan8TlAfrkcNPiLr++Y
 Axz925FP7jMaRRbZ1acji34gLiIAZk0jLCUdbix7YkPrqDB4GfO+v8Vez+fGClbJ
 uDOYeR4r1+Be/BtSJtJ2tHqtsKCcAL6agtaE2+epZq5HbzaZFRvBFaxgFNF8WVcn
 3FFibdEMdsRNfZTUVp5wwgOLN0UIqE/7LifE12oLEL2oFB5H2PiNEUw3E02XHO11
 rL3zgHhB6Ke1sXKPCjSGdmIQLbxZmV5kOlQFy7XuSeo5fmRapVzKNffnKcftIliF
 1HNtZbgdA+3tdxMFCqoo1QX+kotl9kgpslmdZ0qHAbaRb3xqLoSskbqEjFRMuSCC
 8bjJrwboD9T5GPfwodSCgqs/58CaSDuqPFbIjCay+p90Fcg6wWAkZtyG04ZLdPRc
 GgNNdN4gjTD9bnrRi8cH47z1g8OO4vt4K4SEbmjo8IlDW+9jYMxuwgR88CMeDXd7
 hu7aKsr2I2q/WQ==
 =5o9G
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timekeeping and timer updates from Thomas Gleixner:
 "Core:

   - Consolidation of the vDSO build infrastructure to address the
     difficulties of cross-builds for ARM64 compat vDSO libraries by
     restricting the exposure of header content to the vDSO build.

     This is achieved by splitting out header content into separate
     headers. which contain only the minimaly required information which
     is necessary to build the vDSO. These new headers are included from
     the kernel headers and the vDSO specific files.

   - Enhancements to the generic vDSO library allowing more fine grained
     control over the compiled in code, further reducing architecture
     specific storage and preparing for adopting the generic library by
     PPC.

   - Cleanup and consolidation of the exit related code in posix CPU
     timers.

   - Small cleanups and enhancements here and there

  Drivers:

   - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support

   - Correct the clock rate of PIT64b global clock

   - setup_irq() cleanup

   - Preparation for PWM and suspend support for the TI DM timer

   - Expand the fttmr010 driver to support ast2600 systems

   - The usual small fixes, enhancements and cleanups all over the
     place"

* tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
  Revert "clocksource/drivers/timer-probe: Avoid creating dead devices"
  vdso: Fix clocksource.h macro detection
  um: Fix header inclusion
  arm64: vdso32: Enable Clang Compilation
  lib/vdso: Enable common headers
  arm: vdso: Enable arm to use common headers
  x86/vdso: Enable x86 to use common headers
  mips: vdso: Enable mips to use common headers
  arm64: vdso32: Include common headers in the vdso library
  arm64: vdso: Include common headers in the vdso library
  arm64: Introduce asm/vdso/processor.h
  arm64: vdso32: Code clean up
  linux/elfnote.h: Replace elf.h with UAPI equivalent
  scripts: Fix the inclusion order in modpost
  common: Introduce processor.h
  linux/ktime.h: Extract common header for vDSO
  linux/jiffies.h: Extract common header for vDSO
  linux/time64.h: Extract common header for vDSO
  linux/time32.h: Extract common header for vDSO
  linux/time.h: Extract common header for vDSO
  ...
2020-03-30 18:51:47 -07:00
Linus Torvalds
336622e9fc NOHZ full updates:
- Remove TIF_NOHZ from 3 architectures
 
     These architectures use a static key to decide whether context tracking
     needs to be invoked and the TIF_NOHZ flag just causes a pointless
     slowpath execution for nothing.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B+bITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZjpD/9PkXE/zQoVmPLhOOcEBXB4i0rQQV41
 mR8F83aswch+qtT1g7A00G5j49CWkLh/hj5PX7ajS9nSTQCHOQ9jdZuxPrjW8CGZ
 gMHCyd5o9C98sKOORylR2nuCKhVdOq0/HleRjBBDsqcO0T5KlhVPUrtuJ878kX8d
 1SnoZnZMx+Ro0+4+Ehp39CmZJ0pV6o5ypT469esa2MB1xw389AQCmLt4rk99FNMo
 LDbKAB+7XBwNAu/rqD0hIv7YyvaSlcdlWBAXBLeCrwVIKQG3VfT9CpgwTtGoNFhY
 9KBkzr0z+lvHS9eKWyWzpXYrgVU1u28gUVvpaavv+Ma5V8STqNunoMBs7hKanJqV
 mPh+4ABACtFieKlwkj2PwUrGEgH+y/SAfStliOFsimVz/w2udC0S777/EjjzfKaN
 NS13mP19s5/P1q3y/6BSrOxYD0inicROO+UfetHNPOgMePY+Gp/xzluefPnhTagX
 CnJxndA3Fbjh9rXFbSZ5TMlf97kTxVVJE+qtrh5Upw1AWpo/qvkLsIFsamgyW2jR
 7t3MbHzKYnLkUJlwOLPJimvZeN4hZOx05ra/RZOkVaxri7xtVsDCkaEvhgEqLWYj
 Gbt2mGnNccawwN0bVPd2hgkKmUBqO8u5llhQcM2BBG4CJgZMaB8LjIkS+F6FsSME
 xMnY+tS3c7Q8TQ==
 =0lHP
 -----END PGP SIGNATURE-----

Merge tag 'timers-nohz-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull NOHZ update from Thomas Gleixner:
 "Remove TIF_NOHZ from three architectures

  These architectures use a static key to decide whether context
  tracking needs to be invoked and the TIF_NOHZ flag just causes a
  pointless slowpath execution for nothing"

* tag 'timers-nohz-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm64: Remove TIF_NOHZ
  arm: Remove TIF_NOHZ
  x86: Remove TIF_NOHZ
  context-tracking: Introduce CONFIG_HAVE_TIF_NOHZ
  x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY
2020-03-30 18:29:05 -07:00
Linus Torvalds
642e53ead6 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Various NUMA scheduling updates: harmonize the load-balancer and
     NUMA placement logic to not work against each other. The intended
     result is better locality, better utilization and fewer migrations.

   - Introduce Thermal Pressure tracking and optimizations, to improve
     task placement on thermally overloaded systems.

   - Implement frequency invariant scheduler accounting on (some) x86
     CPUs. This is done by observing and sampling the 'recent' CPU
     frequency average at ~tick boundaries. The CPU provides this data
     via the APERF/MPERF MSRs. This hopefully makes our capacity
     estimates more precise and keeps tasks on the same CPU better even
     if it might seem overloaded at a lower momentary frequency. (As
     usual, turbo mode is a complication that we resolve by observing
     the maximum frequency and renormalizing to it.)

   - Add asymmetric CPU capacity wakeup scan to improve capacity
     utilization on asymmetric topologies. (big.LITTLE systems)

   - PSI fixes and optimizations.

   - RT scheduling capacity awareness fixes & improvements.

   - Optimize the CONFIG_RT_GROUP_SCHED constraints code.

   - Misc fixes, cleanups and optimizations - see the changelog for
     details"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  threads: Update PID limit comment according to futex UAPI change
  sched/fair: Fix condition of avg_load calculation
  sched/rt: cpupri_find: Trigger a full search as fallback
  kthread: Do not preempt current task if it is going to call schedule()
  sched/fair: Improve spreading of utilization
  sched: Avoid scale real weight down to zero
  psi: Move PF_MEMSTALL out of task->flags
  MAINTAINERS: Add maintenance information for psi
  psi: Optimize switching tasks inside shared cgroups
  psi: Fix cpu.pressure for cpu.max and competing cgroups
  sched/core: Distribute tasks within affinity masks
  sched/fair: Fix enqueue_task_fair warning
  thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code
  sched/rt: Remove unnecessary push for unfit tasks
  sched/rt: Allow pulling unfitting task
  sched/rt: Optimize cpupri_find() on non-heterogenous systems
  sched/rt: Re-instate old behavior in select_task_rq_rt()
  sched/rt: cpupri_find: Implement fallback mechanism for !fit case
  sched/fair: Fix reordering of enqueue/dequeue_task_fair()
  sched/fair: Fix runnable_avg for throttled cfs
  ...
2020-03-30 17:01:51 -07:00
Linus Torvalds
9b82f05f86 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main changes in this cycle were:

  Kernel side changes:

   - A couple of x86/cpu cleanups and changes were grandfathered in due
     to patch dependencies. These clean up the set of CPU model/family
     matching macros with a consistent namespace and C99 initializer
     style.

   - A bunch of updates to various low level PMU drivers:
       * AMD Family 19h L3 uncore PMU
       * Intel Tiger Lake uncore support
       * misc fixes to LBR TOS sampling

   - optprobe fixes

   - perf/cgroup: optimize cgroup event sched-in processing

   - misc cleanups and fixes

  Tooling side changes are to:

   - perf {annotate,expr,record,report,stat,test}

   - perl scripting

   - libapi, libperf and libtraceevent

   - vendor events on Intel and S390, ARM cs-etm

   - Intel PT updates

   - Documentation changes and updates to core facilities

   - misc cleanups, fixes and other enhancements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
  cpufreq/intel_pstate: Fix wrong macro conversion
  x86/cpu: Cleanup the now unused CPU match macros
  hwrng: via_rng: Convert to new X86 CPU match macros
  crypto: Convert to new CPU match macros
  ASoC: Intel: Convert to new X86 CPU match macros
  powercap/intel_rapl: Convert to new X86 CPU match macros
  PCI: intel-mid: Convert to new X86 CPU match macros
  mmc: sdhci-acpi: Convert to new X86 CPU match macros
  intel_idle: Convert to new X86 CPU match macros
  extcon: axp288: Convert to new X86 CPU match macros
  thermal: Convert to new X86 CPU match macros
  hwmon: Convert to new X86 CPU match macros
  platform/x86: Convert to new CPU match macros
  EDAC: Convert to new X86 CPU match macros
  cpufreq: Convert to new X86 CPU match macros
  ACPI: Convert to new X86 CPU match macros
  x86/platform: Convert to new CPU match macros
  x86/kernel: Convert to new CPU match macros
  x86/kvm: Convert to new CPU match macros
  x86/perf/events: Convert to new CPU match macros
  ...
2020-03-30 16:40:08 -07:00
Linus Torvalds
4b9fd8a829 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Continued user-access cleanups in the futex code.

   - percpu-rwsem rewrite that uses its own waitqueue and atomic_t
     instead of an embedded rwsem. This addresses a couple of
     weaknesses, but the primary motivation was complications on the -rt
     kernel.

   - Introduce raw lock nesting detection on lockdep
     (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
     lock differences. This too originates from -rt.

   - Reuse lockdep zapped chain_hlocks entries, to conserve RAM
     footprint on distro-ish kernels running into the "BUG:
     MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
     chain-entries pool.

   - Misc cleanups, smaller fixes and enhancements - see the changelog
     for details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
  thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
  Documentation/locking/locktypes: Minor copy editor fixes
  Documentation/locking/locktypes: Further clarifications and wordsmithing
  m68knommu: Remove mm.h include from uaccess_no.h
  x86: get rid of user_atomic_cmpxchg_inatomic()
  generic arch_futex_atomic_op_inuser() doesn't need access_ok()
  x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
  x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
  objtool: whitelist __sanitizer_cov_trace_switch()
  [parisc, s390, sparc64] no need for access_ok() in futex handling
  sh: no need of access_ok() in arch_futex_atomic_op_inuser()
  futex: arch_futex_atomic_op_inuser() calling conventions change
  completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
  lockdep: Add posixtimer context tracing bits
  lockdep: Annotate irq_work
  lockdep: Add hrtimer context tracing bits
  lockdep: Introduce wait-type checks
  completion: Use simple wait queues
  sched/swait: Prepare usage in completions
  ...
2020-03-30 16:17:15 -07:00
Linus Torvalds
a776c270a0 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The EFI changes in this cycle are much larger than usual, for two
  (positive) reasons:

   - The GRUB project is showing signs of life again, resulting in the
     introduction of the generic Linux/UEFI boot protocol, instead of
     x86 specific hacks which are increasingly difficult to maintain.
     There's hope that all future extensions will now go through that
     boot protocol.

   - Preparatory work for RISC-V EFI support.

  The main changes are:

   - Boot time GDT handling changes

   - Simplify handling of EFI properties table on arm64

   - Generic EFI stub cleanups, to improve command line handling, file
     I/O, memory allocation, etc.

   - Introduce a generic initrd loading method based on calling back
     into the firmware, instead of relying on the x86 EFI handover
     protocol or device tree.

   - Introduce a mixed mode boot method that does not rely on the x86
     EFI handover protocol either, and could potentially be adopted by
     other architectures (if another one ever surfaces where one
     execution mode is a superset of another)

   - Clean up the contents of 'struct efi', and move out everything that
     doesn't need to be stored there.

   - Incorporate support for UEFI spec v2.8A changes that permit
     firmware implementations to return EFI_UNSUPPORTED from UEFI
     runtime services at OS runtime, and expose a mask of which ones are
     supported or unsupported via a configuration table.

   - Partial fix for the lack of by-VA cache maintenance in the
     decompressor on 32-bit ARM.

   - Changes to load device firmware from EFI boot service memory
     regions

   - Various documentation updates and minor code cleanups and fixes"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  efi/libstub/arm: Fix spurious message that an initrd was loaded
  efi/libstub/arm64: Avoid image_base value from efi_loaded_image
  partitions/efi: Fix partition name parsing in GUID partition entry
  efi/x86: Fix cast of image argument
  efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations
  efi: Fix a mistype in comments mentioning efivar_entry_iter_begin()
  efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux
  efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
  efi/x86: Ignore the memory attributes table on i386
  efi/x86: Don't relocate the kernel unless necessary
  efi/x86: Remove extra headroom for setup block
  efi/x86: Add kernel preferred address to PE header
  efi/x86: Decompress at start of PE image load address
  x86/boot/compressed/32: Save the output address instead of recalculating it
  efi/libstub/x86: Deal with exit() boot service returning
  x86/boot: Use unsigned comparison for addresses
  efi/x86: Avoid using code32_start
  efi/x86: Make efi32_pe_entry() more readable
  efi/x86: Respect 32-bit ABI in efi32_pe_entry()
  efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
  ...
2020-03-30 16:13:08 -07:00
Linus Torvalds
ff7b862a4c * Do not report spurious MCEs on some Intel platforms caused by errata;
by Prarit Bhargava.
 
 * Change dev-mcelog's hardcoded limit of 32 error records to a dynamic
 one, controlled by the number of logical CPUs, by Tony Luck.
 
 * Add support for the processor identification number (PPIN) on AMD, by
 Wei Huang.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl6BseQACgkQEsHwGGHe
 VUqIMg/+KtZsFOHRKZD1dc0Jyo8O0BTzqMIif5J7AzRWv6DPLzfEFBjGFmVY10gN
 aovhRIF1TrUI8Em5as4FlczH8l328n1ZQhhy6YoCHcrT03LsKHXE46bcvm5msj9n
 0s0uZyDei6ly4k6hnNn5NPMjlkpNKS4/A1dkT3Ir25zlS+3Agds4nj5iNzFfOE19
 67bFSVw+KuEt4iihfX/uT0HtmcW5T5byDlwrxgMUC3s0EzMLIx4y+hqROzrJfIau
 NI3edpD0olhfkT9vz5NyZI7hNVAUOoWfYhoxZEJlAxjC+0MRKwR2A539YGsqzgJ9
 kFN5h6400xDmG5C5FUVULAEHG8O/AV+0AzMoH0c4xamalB64CJe6BehYJggFbyXB
 bH9bSZKasesZUSTP+v92dOrMK2ZtJnvhU5hhEDYbtRL4ERyIb/q9/AsJfpb299HJ
 JD1t4lMhURYr5qu/nck48yVnsHw0yqPju1qRDxqkbmRCkKNDi2t1ph7XUb7okSba
 AekWUomTliTm83rsX/lH6OJQ1uCtM7QOp6YULr8Zjb4TJcSAfuEsbAcnulUSrxan
 hreIKqC2A2RMpRVnX9IflKDHAGNWmT5Ag6tLpQ0/TfeaazxT2gdEw8YS4EU18cq6
 mMiJyIKmH2nGT7Mf65A0Lg0uJXFPFrtnKfFoSlb0kDsGlx3PEic=
 =3/4h
 -----END PGP SIGNATURE-----

Merge tag 'ras_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Do not report spurious MCEs on some Intel platforms caused by errata;
   by Prarit Bhargava.

 - Change dev-mcelog's hardcoded limit of 32 error records to a dynamic
   one, controlled by the number of logical CPUs, by Tony Luck.

 - Add support for the processor identification number (PPIN) on AMD, by
   Wei Huang.

* tag 'ras_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/amd: Add PPIN support for AMD MCE
  x86/mce/dev-mcelog: Dynamically allocate space for machine check records
  x86/mce: Do not log spurious corrected mce errors
2020-03-30 13:17:50 -07:00
Thomas Gleixner
cf226c42b2 Merge branch 'uaccess.futex' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into locking/core
Pull uaccess futex cleanups for Al Viro:

     Consolidate access_ok() usage and the futex uaccess function zoo.
2020-03-28 11:59:24 +01:00
Thomas Gleixner
a215032725 Merge branch 'next.uaccess-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into x86/cleanups
Pull uaccess cleanups from Al Viro:

  Consolidate the user access areas and get rid of uaccess_try(), user_ex()
  and other warts.
2020-03-28 11:57:02 +01:00
Al Viro
f5544ba712 x86: get rid of user_atomic_cmpxchg_inatomic()
Only one user left; the thing had been made polymorphic back in 2013
for the sake of MPX.  No point keeping it now that MPX is gone.
Convert futex_atomic_cmpxchg_inatomic() to user_access_{begin,end}()
while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27 23:58:55 -04:00
Al Viro
8aef36dacb x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
lock cmpxchg leaves the current value in eax; no need to reload it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27 23:58:54 -04:00
Al Viro
0ec33c0171 x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
Lift stac/clac pairs from __futex_atomic_op{1,2} into arch_futex_atomic_op_inuser(),
fold them with access_ok() in there.  The switch in arch_futex_atomic_op_inuser()
is what has required the previous (objtool) commit...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27 23:58:53 -04:00
Al Viro
a08971e948 futex: arch_futex_atomic_op_inuser() calling conventions change
Move access_ok() in and pagefault_enable()/pagefault_disable() out.
Mechanical conversion only - some instances don't really need
a separate access_ok() at all (e.g. the ones only using
get_user()/put_user(), or architectures where access_ok()
is always true); we'll deal with that in followups.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27 23:58:51 -04:00
Benjamin Thiel
5bacdc0982 x86/mm/set_memory: Fix -Wmissing-prototypes warnings
Add missing includes and move prototypes into the header set_memory.h in
order to fix -Wmissing-prototypes warnings.

 [ bp: Add ifdeffery around arch_invalidate_pmem() ]

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200320145028.6013-1-b.thiel@posteo.de
2020-03-27 11:26:06 +01:00
Benjamin Thiel
01bd18624d x86/platform/uv: Add a missing prototype for uv_bau_message_interrupt()
... in order to fix a -Wmissing-prototypes warning:

  arch/x86/platform/uv/tlb_uv.c:1275:6: warning:
  no previous prototype for ‘uv_bau_message_interrupt’ [-Wmissing-prototypes] \
	  void uv_bau_message_interrupt(struct pt_regs *regs)

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200327072621.2255-1-b.thiel@posteo.de
2020-03-27 10:54:52 +01:00
Al Viro
cf122cfba5 kill uaccess_try()
finally

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-26 15:02:14 -04:00
Ingo Molnar
629b3df7ec Merge branch 'x86/cpu' into perf/core, to resolve conflict
Conflicts:
	arch/x86/events/intel/uncore.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-25 15:20:44 +01:00
Brian Gerst
290a4474d0 x86/entry: Fix build error x86 with !CONFIG_POSIX_TIMERS
Add missing semicolon.

Fixes: a74d187c2d ("x86/entry: Refactor SYS_NI macros")
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200324143520.898733-1-brgerst@gmail.com
2020-03-25 10:06:20 +01:00
Thomas Gleixner
1826d56bce x86/cpu: Cleanup the now unused CPU match macros
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131510.900226233@linutronix.de
2020-03-24 21:37:23 +01:00
Thomas Gleixner
20d437447c x86/cpu: Add consistent CPU match macros
Finding all places which build x86_cpu_id match tables is tedious and the
logic is hidden in lots of differently named macro wrappers.

Most of these initializer macros use plain C89 initializers which rely on
the ordering of the struct members. So new members could only be added at
the end of the struct, but that's ugly as hell and C99 initializers are
really the right thing to use.

Provide a set of macros which:

  - Have a proper naming scheme, starting with X86_MATCH_

  - Use C99 initializers

The set of provided macros are all subsets of the base macro

    X86_MATCH_VENDOR_FAM_MODEL_FEATURE()

which allows to supply all possible selection criteria:

      vendor, family, model, feature

The other macros shorten this to avoid typing all arguments when they are
not needed and would require one of the _ANY constants. They have been
created due to the requirements of the existing usage sites.

Also add a few model constants for Centaur CPUs and QUARK.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131508.826011988@linutronix.de
2020-03-24 21:17:50 +01:00
Thomas Gleixner
ba5bade4cc x86/devicetable: Move x86 specific macro out of generic code
There is no reason that this gunk is in a generic header file. The wildcard
defines need to stay as they are required by file2alias.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131508.736205164@linutronix.de
2020-03-24 21:02:47 +01:00
Vincenzo Frascino
1c1a18b00d um: Fix header inclusion
User Mode Linux is a flavor of x86 that from the vDSO prospective always
falls back on system calls. This implies that it does not require any
of the unified vDSO definitions and their inclusion causes side effects
like this:

  In file included from include/vdso/processor.h:10:0,
                      from include/vdso/datapage.h:17,
                      from arch/x86/include/asm/vgtod.h:7,
                      from arch/x86/um/../kernel/sys_ia32.c:49:
  >> arch/x86/include/asm/vdso/processor.h:11:29: error: redefinition of 'rep_nop'
      static __always_inline void rep_nop(void)
                                  ^~~~~~~
     In file included from include/linux/rcupdate.h:30:0,
                      from include/linux/rculist.h:11,
                      from include/linux/pid.h:5,
                      from include/linux/sched.h:14,
                      from arch/x86/um/../kernel/sys_ia32.c:25:
     arch/x86/um/asm/processor.h:24:20: note: previous definition of 'rep_nop' was here
      static inline void rep_nop(void)

Make sure that the unnecessary headers are not included when um is built
to address the problem.

Fixes: abc22418db ("x86/vdso: Enable x86 to use common headers")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200323124109.7104-1-vincenzo.frascino@arm.com
2020-03-23 18:45:14 +01:00
Anshuman Khandual
31a9122058 x86/mm: Drop pud_mknotpresent()
There is an inconsistency between PMD and PUD-based THP page table helpers
like the following, as pud_present() does not test for _PAGE_PSE.

pmd_present(pmd_mknotpresent(pmd)) : True
pud_present(pud_mknotpresent(pud)) : False

Drop pud_mknotpresent() as there are no current users. If/when needed
back later, pud_present() will also have to be fixed to accommodate
_PAGE_PSE.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/1584925542-13034-1-git-send-email-anshuman.khandual@arm.com
2020-03-23 09:11:48 +01:00
Wei Huang
077168e241 x86/mce/amd: Add PPIN support for AMD MCE
Newer AMD CPUs support a feature called protected processor
identification number (PPIN). This feature can be detected via
CPUID_Fn80000008_EBX[23].

However, CPUID alone is not enough to read the processor identification
number - MSR_AMD_PPIN_CTL also needs to be configured properly. If, for
any reason, MSR_AMD_PPIN_CTL[PPIN_EN] can not be turned on, such as
disabled in BIOS, the CPU capability bit X86_FEATURE_AMD_PPIN needs to
be cleared.

When the X86_FEATURE_AMD_PPIN capability is available, the
identification number is issued together with the MCE error info in
order to keep track of the source of MCE errors.

 [ bp: Massage. ]

Co-developed-by: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Signed-off-by: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200321193800.3666964-1-wei.huang2@amd.com
2020-03-22 11:03:47 +01:00
Peter Zijlstra
46db36abc3 x86/entry: Rename ___preempt_schedule
Because moar '_' isn't always moar readable.

git grep -l "___preempt_schedule\(_notrace\)*" | while read file;
do
	sed -ie 's/___preempt_schedule\(_notrace\)*/preempt_schedule\1_thunk/g' $file;
done

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200320115858.995685950@infradead.org
2020-03-21 16:03:53 +01:00
Brian Gerst
ffd75b373f x86: Remove unneeded includes
Clean up includes of and in <asm/syscalls.h>

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-19-brgerst@gmail.com
2020-03-21 16:03:25 +01:00
Brian Gerst
0f78ff1711 x86/entry: Drop asmlinkage from syscalls
asmlinkage is no longer required since the syscall ABI is now fully under
x86 architecture control.  This makes the 32-bit native syscalls a bit more
effecient by passing in regs via EAX instead of on the stack.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-18-brgerst@gmail.com
2020-03-21 16:03:25 +01:00
Brian Gerst
25c619e59b x86/entry/32: Enable pt_regs based syscalls
Enable pt_regs based syscalls for 32-bit.  This makes the 32-bit native
kernel consistent with the 64-bit kernel, and improves the syscall
interface by not needing to push all 6 potential arguments onto the stack.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-17-brgerst@gmail.com
2020-03-21 16:03:24 +01:00
Brian Gerst
0872098804 x86/entry: Move max syscall number calculation to syscallhdr.sh
Instead of using an array in asm-offsets to calculate the max syscall
number, calculate it when writing out the syscall headers.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-9-brgerst@gmail.com
2020-03-21 16:03:21 +01:00
Brian Gerst
cc42c045af x86/entry/64: Move sys_ni_syscall stub to common.c
so it can be available to multiple syscall tables.  Also directly return
-ENOSYS instead of bouncing to the generic sys_ni_syscall().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-7-brgerst@gmail.com
2020-03-21 16:03:20 +01:00
Brian Gerst
27dd84fafc x86/entry/64: Use syscall wrappers for x32_rt_sigreturn
Add missing syscall wrapper for x32_rt_sigreturn().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-6-brgerst@gmail.com
2020-03-21 16:03:20 +01:00
Brian Gerst
a74d187c2d x86/entry: Refactor SYS_NI macros
Pull the common code out from the SYS_NI macros into a new __SYS_NI macro.
Also conditionalize the X64 version in preparation for enabling syscall
wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-5-brgerst@gmail.com
2020-03-21 16:03:20 +01:00
Brian Gerst
6cc8d2b286 x86/entry: Refactor COND_SYSCALL macros
Pull the common code out from the COND_SYSCALL macros into a new
__COND_SYSCALL macro.  Also conditionalize the X64 version in preparation
for enabling syscall wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-4-brgerst@gmail.com
2020-03-21 16:03:19 +01:00
Brian Gerst
d2b5de495e x86/entry: Refactor SYSCALL_DEFINE0 macros
Pull the common code out from the SYSCALL_DEFINE0 macros into a new
__SYS_STUB0 macro.  Also conditionalize the X64 version in preparation for
enabling syscall wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-3-brgerst@gmail.com
2020-03-21 16:03:19 +01:00
Brian Gerst
4399e0cf49 x86/entry: Refactor SYSCALL_DEFINEx macros
Pull the common code out from the SYSCALL_DEFINEx macros into a new
__SYS_STUBx macro.  Also conditionalize the X64 version in preparation for
enabling syscall wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-2-brgerst@gmail.com
2020-03-21 16:03:18 +01:00
Vincenzo Frascino
abc22418db x86/vdso: Enable x86 to use common headers
Enable x86 to use only the common headers in the implementation
of the vDSO library.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200320145351.32292-24-vincenzo.frascino@arm.com
2020-03-21 15:24:02 +01:00
Vincenzo Frascino
659a9faa3f x86: Introduce asm/vdso/clocksource.h
The vDSO library should only include the necessary headers required for
a userspace library (UAPI and a minimal set of kernel headers). To make
this possible it is necessary to isolate from the kernel headers the
common parts that are strictly necessary to build the library.

Introduce asm/vdso/clocksource.h to contain all the arm64 specific
functions that are suitable for vDSO inclusion.

This header will be required by a future patch that will generalize
vdso/clocksource.h.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200320145351.32292-5-vincenzo.frascino@arm.com
2020-03-21 15:23:54 +01:00
Ingo Molnar
df10846ff2 Merge branch 'linus' into locking/kcsan, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-21 09:35:44 +01:00
Ingo Molnar
a4654e9bde Merge branch 'x86/kdump' into locking/kcsan, to resolve conflicts
Conflicts:
	arch/x86/purgatory/Makefile

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-21 09:24:41 +01:00
Peter Zijlstra
d8a7386897 x86/optprobe: Fix OPTPROBE vs UACCESS
While looking at an objtool UACCESS warning, it suddenly occurred to me
that it is entirely possible to have an OPTPROBE right in the middle of
an UACCESS region.

In this case we must of course clear FLAGS.AC while running the KPROBE.
Luckily the trampoline already saves/restores [ER]FLAGS, so all we need
to do is inject a CLAC. Unfortunately we cannot use ALTERNATIVE() in the
trampoline text, so we have to frob that manually.

Fixes: ca0bbc70f147 ("sched/x86_64: Don't save flags on context switch")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20200305092130.GU2596@hirez.programming.kicks-ass.net
2020-03-20 13:06:22 +01:00
Ingo Molnar
409e1a3140 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-19 15:01:45 +01:00
Guenter Roeck
bac59d18c7 x86/setup: Fix static memory detection
When booting x86 images in qemu, the following warning is seen randomly
if DEBUG_LOCKDEP is enabled.

  WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:1119
	  lockdep_register_key+0xc0/0x100

static_obj() returns true if an address is between _stext and _end.

On x86, this includes the brk memory space. Problem is that this memory
block is not static on x86; its unused portions are released after init
and can be allocated. This results in the observed warning if a lockdep
object is allocated from this memory.

Solve the problem by implementing arch_is_kernel_initmem_freed() for
x86 and have it return true if an address is within the released memory
range.

The same problem was solved for s390 with commit

  7a5da02de8 ("locking/lockdep: check for freed initmem in static_obj()"),

which introduced arch_is_kernel_initmem_freed().

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200131021159.9178-1-linux@roeck-us.net
2020-03-19 11:58:13 +01:00
Al Viro
9f855c085f x86: switch setup_sigcontext() to unsafe_put_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:39:02 -04:00
Al Viro
77f3c6166d x86: kill get_user_{try,catch,ex}
no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:35:35 -04:00
Al Viro
4b842e4e25 x86: get rid of small constant size cases in raw_copy_{to,from}_user()
Very few call sites where that would be triggered remain, and none
of those is anywhere near hot enough to bother.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 15:53:25 -04:00
Al Viro
71c3313a38 x86: switch sigframe sigset handling to explict __get_user()/__put_user()
... and consolidate the definition of sigframe_ia32->extramask - it's
always a 1-element array of 32bit unsigned.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 15:29:54 -04:00
Jesse Brandeburg
1651e70066 x86: Fix bitops.h warning with a moved cast
Fix many sparse warnings when building with C=1. These are useless noise
from the bitops.h file and getting rid of them helps developers make
more use of the tools and possibly find real bugs.

When the kernel is compiled with C=1, there are lots of messages like:

  arch/x86/include/asm/bitops.h:77:37: warning: cast truncates bits from constant value (ffffff7f becomes 7f)

CONST_MASK() is using a signed integer "1" to create the mask which is
later cast to (u8), in order to yield an 8-bit value for the assembly
instructions to use. Simplify the expressions used to clearly indicate
they are working on 8-bit values only, which still keeps sparse happy
without an accidental promotion to a 32 bit integer.

The warning was occurring because certain bitmasks that end with a bit
set next to a natural boundary like 7, 15, 23, 31, end up with a mask
like 0x7f, which then results in sign extension due to the integer type
promotion rules[1]. It was really only clear_bit() that was having
problems, and it was only on some bit checks that resulted in a mask
like 0xffffff7f being generated after the inversion.

Verify with a test module (see next patch) and assembly inspection that
the fix doesn't introduce any change in generated code.

 [ bp: Massage. ]

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Acked-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules [1]
Link: https://lkml.kernel.org/r/20200310221747.2848474-1-jesse.brandeburg@intel.com
2020-03-18 12:30:19 +01:00
Kim Phillips
e48667b865 perf/amd/uncore: Add support for Family 19h L3 PMU
Family 19h introduces change in slice, core and thread specification in
its L3 Performance Event Select (ChL3PmcCfg) h/w register. The change is
incompatible with Family 17h's version of the register.

Introduce a new path in l3_thread_slice_mask() to do things differently
for Family 19h vs. Family 17h, otherwise the new hardware doesn't get
programmed correctly.

Instead of a linear core--thread bitmask, Family 19h takes an encoded
core number, and a separate thread mask. There are new bits that are set
for all cores and all slices, of which only the latter is used, since
the driver counts events for all slices on behalf of the specified CPU.

Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric mode
decision based on Family 17h or above, not just 17h and 18h: the Family
19h Data Fabric PMC is compatible with the Family 17h DF PMC.

 [ bp: Touchups. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-3-kim.phillips@amd.com
2020-03-17 13:01:03 +01:00
Thomas Hellstrom
6db73f17c5 x86: Don't let pgprot_modify() change the page encryption bit
When SEV or SME is enabled and active, vm_get_page_prot() typically
returns with the encryption bit set. This means that users of
pgprot_modify(, vm_get_page_prot()) (mprotect_fixup(), do_mmap()) end up
with a value of vma->vm_pg_prot that is not consistent with the intended
protection of the PTEs.

This is also important for fault handlers that rely on the VMA
vm_page_prot to set the page protection. Fix this by not allowing
pgprot_modify() to change the encryption bit, similar to how it's done
for PAT bits.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20200304114527.3636-2-thomas_os@shipmail.org
2020-03-17 11:48:31 +01:00
Borislav Petkov
19d33357ec x86/amd_nb, char/amd64-agp: Use amd_nb_num() accessor
... to find whether there are northbridges present on the
system. Convert the last forgotten user and therefore, unexport
amd_nb_misc_ids[] too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Michal Kubecek <mkubecek@suse.cz>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lkml.kernel.org/r/20200316150725.925-1-bp@alien8.de
2020-03-17 10:25:58 +01:00
Paolo Bonzini
727a7e27cf KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgd
The set_cr3 callback is not setting the guest CR3, it is setting the
root of the guest page tables, either shadow or two-dimensional.
To make this clearer as well as to indicate that the MMU calls it
via kvm_mmu_load_cr3, rename it to load_mmu_pgd.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:52 +01:00
Paolo Bonzini
689f3bf216 KVM: x86: unify callbacks to load paging root
Similar to what kvm-intel.ko is doing, provide a single callback that
merges svm_set_cr3, set_tdp_cr3 and nested_svm_set_tdp_cr3.

This lets us unify the set_cr3 and set_tdp_cr3 entries in kvm_x86_ops.
I'm doing that in this same patch because splitting it adds quite a bit
of churn due to the need for forward declarations.  For the same reason
the assignment to vcpu->arch.mmu->set_cr3 is moved to kvm_init_shadow_mmu
from init_kvm_softmmu and nested_svm_init_mmu_context.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:51 +01:00
Sean Christopherson
257038745c KVM: x86: Move nSVM CPUID 0x8000000A handling into common x86 code
Handle CPUID 0x8000000A in the main switch in __do_cpuid_func() and drop
->set_supported_cpuid() now that both VMX and SVM implementations are
empty.  Like leaf 0x14 (Intel PT) and leaf 0x8000001F (SEV), leaf
0x8000000A is is (obviously) vendor specific but can be queried in
common code while respecting SVM's wishes by querying kvm_cpu_cap_has().

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:45 +01:00
Sean Christopherson
91661989d1 KVM: x86: Move VMX's host_efer to common x86 code
Move host_efer to common x86 code and use it for CPUID's is_efer_nx() to
avoid constantly re-reading the MSR.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:42 +01:00
Sean Christopherson
703c335d06 KVM: x86/mmu: Configure max page level during hardware setup
Configure the max page level during hardware setup to avoid a retpoline
in the page fault handler.  Drop ->get_lpage_level() as the page fault
handler was the last user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:40 +01:00
Sean Christopherson
bde7723559 KVM: x86/mmu: Merge kvm_{enable,disable}_tdp() into a common function
Combine kvm_enable_tdp() and kvm_disable_tdp() into a single function,
kvm_configure_mmu(), in preparation for doing additional configuration
during hardware setup.  And because having separate helpers is silly.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:39 +01:00
Sean Christopherson
a1bead2aba KVM: VMX: Directly query Intel PT mode when refreshing PMUs
Use vmx_pt_mode_is_host_guest() in intel_pmu_refresh() instead of
bouncing through kvm_x86_ops->pt_supported, and remove ->pt_supported()
as the PMU code was the last remaining user.

Opportunistically clean up the wording of a comment that referenced
kvm_x86_ops->pt_supported().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:38 +01:00
Sean Christopherson
139085101f KVM: x86: Use KVM cpu caps to detect MSR_TSC_AUX virt support
Check for MSR_TSC_AUX virtualization via kvm_cpu_cap_has() and drop
->rdtscp_supported().

Note, vmx_rdtscp_supported() needs to hang around a tiny bit longer due
other usage in VMX code.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:35 +01:00
Sean Christopherson
90d2f60f41 KVM: x86: Use KVM cpu caps to track UMIP emulation
Set UMIP in kvm_cpu_caps when it is emulated by VMX, even though the
bit will effectively be dropped by do_host_cpuid().  This allows
checking for UMIP emulation via kvm_cpu_caps instead of a dedicated
kvm_x86_ops callback.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:28 +01:00
Sean Christopherson
b3d895d5c4 KVM: x86: Move XSAVES CPUID adjust to VMX's KVM cpu cap update
Move the clearing of the XSAVES CPUID bit into VMX, which has a separate
VMCS control to enable XSAVES in non-root, to eliminate the last ugly
renmant of the undesirable "unsigned f_* = *_supported ? F(*) : 0"
pattern in the common CPUID handling code.

Drop ->xsaves_supported(), CPUID adjustment was the only user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:25 +01:00
Sean Christopherson
d64d83d1e0 KVM: x86: Handle PKU CPUID adjustment in VMX code
Move the setting of the PKU CPUID bit into VMX to eliminate an instance
of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in
the common CPUID handling code.  Drop ->pku_supported(), CPUID
adjustment was the only user.

Note, some AMD CPUs now support PKU, but SVM doesn't yet support
exposing it to a guest.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:19 +01:00
Sean Christopherson
5ffec6f910 KVM: x86: Handle INVPCID CPUID adjustment in VMX code
Move the INVPCID CPUID adjustments into VMX to eliminate an instance of
the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the
common CPUID handling code.  Drop ->invpcid_supported(), CPUID
adjustment was the only user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:17 +01:00
Sean Christopherson
160b486f65 KVM: x86: Drop explicit @func param from ->set_supported_cpuid()
Drop the explicit @func param from ->set_supported_cpuid() and instead
pull the CPUID function from the relevant entry.  This sets the stage
for hardening guest CPUID updates in future patches, e.g. allows adding
run-time assertions that the CPUID feature being changed is actually
a bit in the referenced CPUID entry.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:12 +01:00
Sean Christopherson
7f5581f592 KVM: x86: Use supported_xcr0 to detect MPX support
Query supported_xcr0 when checking for MPX support instead of invoking
->mpx_supported() and drop ->mpx_supported() as kvm_mpx_supported() was
its last user.  Rename vmx_mpx_supported() to cpu_has_vmx_mpx() to
better align with VMX/VMCS nomenclature.

Modify VMX's adjustment of xcr0 to call cpus_has_vmx_mpx() (renamed from
vmx_mpx_supported()) directly to avoid reading supported_xcr0 before
it's fully configured.

No functional change intended.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Test that *all* bits are set. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:10 +01:00
Sean Christopherson
2f728d66e8 KVM: x86: Move kvm_emulate.h into KVM's private directory
Now that the emulation context is dynamically allocated and not embedded
in struct kvm_vcpu, move its header, kvm_emulate.h, out of the public
asm directory and into KVM's private x86 directory.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:52 +01:00
Sean Christopherson
c9b8b07cde KVM: x86: Dynamically allocate per-vCPU emulation context
Allocate the emulation context instead of embedding it in struct
kvm_vcpu_arch.

Dynamic allocation provides several benefits:

  - Shrinks the size x86 vcpus by ~2.5k bytes, dropping them back below
    the PAGE_ALLOC_COSTLY_ORDER threshold.
  - Allows for dropping the include of kvm_emulate.h from asm/kvm_host.h
    and moving kvm_emulate.h into KVM's private directory.
  - Allows a reducing KVM's attack surface by shrinking the amount of
    vCPU data that is exposed to usercopy.
  - Allows a future patch to disable the emulator entirely, which may or
    may not be a realistic endeavor.

Mark the entire struct as valid for usercopy to maintain existing
behavior with respect to hardened usercopy.  Future patches can shrink
the usercopy range to cover only what is necessary.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:52 +01:00
Sean Christopherson
21f1b8f29e KVM: x86: Explicitly pass an exception struct to check_intercept
Explicitly pass an exception struct when checking for intercept from
the emulator, which eliminates the last reference to arch.emulate_ctxt
in vendor specific code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:50 +01:00
Sean Christopherson
d8dd54e063 KVM: x86/mmu: Rename kvm_mmu->get_cr3() to ->get_guest_pgd()
Rename kvm_mmu->get_cr3() to call out that it is retrieving a guest
value, as opposed to kvm_mmu->set_cr3(), which sets a host value, and to
note that it will return something other than CR3 when nested EPT is in
use.  Hopefully the new name will also make it more obvious that L1's
nested_cr3 is returned in SVM's nested NPT case.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:46 +01:00
Sean Christopherson
bb1fcc70d9 KVM: nVMX: Allow L1 to use 5-level page walks for nested EPT
Add support for 5-level nested EPT, and advertise said support in the
EPT capabilities MSR.  KVM's MMU can already handle 5-level legacy page
tables, there's no reason to force an L1 VMM to use shadow paging if it
wants to employ 5-level page tables.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:44 +01:00
Sean Christopherson
8053f924ca KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack
Drop kvm_mmu_extended_role.cr4_la57 now that mmu_role doesn't mask off
level, which already incorporates the guest's CR4.LA57 for a shadow MMU
by querying is_la57_mode().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:43 +01:00
Sean Christopherson
a1c77abb8d KVM: nVMX: Properly handle userspace interrupt window request
Return true for vmx_interrupt_allowed() if the vCPU is in L2 and L1 has
external interrupt exiting enabled.  IRQs are never blocked in hardware
if the CPU is in the guest (L2 from L1's perspective) when IRQs trigger
VM-Exit.

The new check percolates up to kvm_vcpu_ready_for_interrupt_injection()
and thus vcpu_run(), and so KVM will exit to userspace if userspace has
requested an interrupt window (to inject an IRQ into L1).

Remove the @external_intr param from vmx_check_nested_events(), which is
actually an indicator that userspace wants an interrupt window, e.g.
it's named @req_int_win further up the stack.  Injecting a VM-Exit into
L1 to try and bounce out to L0 userspace is all kinds of broken and is
no longer necessary.

Remove the hack in nested_vmx_vmexit() that attempted to workaround the
breakage in vmx_check_nested_events() by only filling interrupt info if
there's an actual interrupt pending.  The hack actually made things
worse because it caused KVM to _never_ fill interrupt info when the
LAPIC resides in userspace (kvm_cpu_has_interrupt() queries
interrupt.injected, which is always cleared by prepare_vmcs12() before
reaching the hack in nested_vmx_vmexit()).

Fixes: 6550c4df7e ("KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"")
Cc: stable@vger.kernel.org
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:40 +01:00
Wanpeng Li
4abaffce4d KVM: LAPIC: Recalculate apic map in batch
In the vCPU reset and set APIC_BASE MSR path, the apic map will be recalculated
several times, each time it will consume 10+ us observed by ftrace in my
non-overcommit environment since the expensive memory allocate/mutex/rcu etc
operations. This patch optimizes it by recaluating apic map in batch, I hope
this can benefit the serverless scenario which can frequently create/destroy
VMs.

Before patch:

kvm_lapic_reset  ~27us

After patch:

kvm_lapic_reset  ~14us

Observed by ftrace, improve ~48%.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:39 +01:00
Jay Zhou
3c9bd4006b KVM: x86: enable dirty log gradually in small chunks
It could take kvm->mmu_lock for an extended period of time when
enabling dirty log for the first time. The main cost is to clear
all the D-bits of last level SPTEs. This situation can benefit from
manual dirty log protect as well, which can reduce the mmu_lock
time taken. The sequence is like this:

1. Initialize all the bits of the dirty bitmap to 1 when enabling
   dirty log for the first time
2. Only write protect the huge pages
3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
   SPTEs gradually in small chunks

Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
I did some tests with a 128G windows VM and counted the time taken
of memory_global_dirty_log_start, here is the numbers:

VM Size        Before    After optimization
128G           460ms     10ms

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:37 +01:00
Oliver Upton
cc7f5577ad KVM: SVM: Inhibit APIC virtualization for X2APIC guest
The AVIC does not support guest use of the x2APIC interface. Currently,
KVM simply chooses to squash the x2APIC feature in the guest's CPUID
If the AVIC is enabled. Doing so prevents KVM from running a guest
with greater than 255 vCPUs, as such a guest necessitates the use
of the x2APIC interface.

Instead, inhibit AVIC enablement on a per-VM basis whenever the x2APIC
feature is set in the guest's CPUID.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:35 +01:00
Sean Christopherson
562b6b089d KVM: x86: Consolidate VM allocation and free for VMX and SVM
Move the VM allocation and free code to common x86 as the logic is
more or less identical across SVM and VMX.

Note, although hyperv.hv_pa_pg is part of the common kvm->arch, it's
(currently) only allocated by VMX VMs.  But, since kfree() plays nice
when passed a NULL pointer, the superfluous call for SVM is harmless
and avoids future churn if SVM gains support for HyperV's direct TLB
flush.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[Make vm_size a field instead of a function. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:33 +01:00
Sean Christopherson
e96c81ee89 KVM: Simplify kvm_free_memslot() and all its descendents
Now that all callers of kvm_free_memslot() pass NULL for @dont, remove
the param from the top-level routine and all arch's implementations.

No functional change intended.

Tested-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:22 +01:00
Sean Christopherson
744e699c7e KVM: x86: Move gpa_val and gpa_available into the emulator context
Move the GPA tracking into the emulator context now that the context is
guaranteed to be initialized via __init_emulate_ctxt() prior to
dereferencing gpa_{available,val}, i.e. now that seeing a stale
gpa_available will also trigger a WARN due to an invalid context.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:12 +01:00
Sean Christopherson
92daa48b34 KVM: x86: Add EMULTYPE_PF when emulation is triggered by a page fault
Add a new emulation type flag to explicitly mark emulation related to a
page fault.  Move the propation of the GPA into the emulator from the
page fault handler into x86_emulate_instruction, using EMULTYPE_PF as an
indicator that cr2 is valid.  Similarly, don't propagate cr2 into the
exception.address when it's *not* valid.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:12 +01:00
Linus Torvalds
6693075e0f Bugfixes, x86+s390.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJebMXnAAoJEL/70l94x66D3fYIAJ1r+o2qgzadwEqoXTvlihjB
 ujX1jOs20EJJ56VhTtXF/wZQc+7VeKCjpIqNv4WaeSYPUhzFGyL9t5tw1YdRDCwY
 u6gklxruIzZodgp+vCoTkPyyUylVmY50sY/yBIJ4F8qOaMxhTEE1aXzGuaOrYqVO
 MmIlAltEKQzdXPO1SVPD7triGPgUTj+DRxrlyRrGt2ItiMUincCz9K6TDyXFib0r
 SSCVFNYtYmzu/bV/E4/Sphi2BxCQEem5DIFWLcngzN8Wy5oCoRVzPGugT4Q9eXWt
 ZtWIDh473JGiXBLYmDq4REJsRSca+7s/YiiLSiQwYfByhIPJpVEoy54fcdaZflo=
 =T4AD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes for x86 and s390"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs
  KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect
  KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1
  KVM: s390: Also reset registers in sync regs for initial cpu reset
  KVM: fix Kconfig menu text for -Werror
  KVM: x86: remove stale comment from struct x86_emulate_ctxt
  KVM: x86: clear stale x86_emulate_ctxt->intercept value
  KVM: SVM: Fix the svm vmexit code for WRMSR
  KVM: X86: Fix dereference null cpufreq policy
2020-03-14 15:45:26 -07:00
Kim Phillips
753039ef8b x86/cpu/amd: Call init_amd_zn() om Family 19h processors too
Family 19h CPUs are Zen-based and still share most architectural
features with Family 17h CPUs, and therefore still need to call
init_amd_zn() e.g., to set the RECLAIM_DISTANCE override.

init_amd_zn() also sets X86_FEATURE_ZEN, which today is only used
in amd_set_core_ssb_state(), which isn't called on some late
model Family 17h CPUs, nor on any Family 19h CPUs:
X86_FEATURE_AMD_SSBD replaces X86_FEATURE_LS_CFG_SSBD on those
later model CPUs, where the SSBD mitigation is done via the
SPEC_CTRL MSR instead of the LS_CFG MSR.

Family 19h CPUs also don't have the erratum where the CPB feature
bit isn't set, but that code can stay unchanged and run safely
on Family 19h.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200311191451.13221-1-kim.phillips@amd.com
2020-03-12 12:13:44 +01:00
Tony Luck
d8ecca4043 x86/mce/dev-mcelog: Dynamically allocate space for machine check records
We have had a hard coded limit of 32 machine check records since the
dawn of time.  But as numbers of cores increase, it is possible for
more than 32 errors to be reported before a user process reads from
/dev/mcelog. In this case the additional errors are lost.

Keep 32 as the minimum. But tune the maximum value up based on the
number of processors.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200218184408.GA23048@agluck-desk2.amr.corp.intel.com
2020-03-10 10:25:14 +01:00
Boqun Feng
1cf106d932 PCI: hv: Introduce hv_msi_entry
Add a new structure (hv_msi_entry), which is also defined in the TLFS,
to describe the msi entry for HVCALL_RETARGET_INTERRUPT. The structure
is needed because its layout may be different from architecture to
architecture.

Also add a new generic interface hv_set_msi_entry_from_desc() to allow
different archs to set the msi entry from msi_desc.

No functional change, only preparation for the future support of virtual
PCI on non-x86 architectures.

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-09 14:51:31 +00:00
Boqun Feng
61bfd920ab PCI: hv: Move retarget related structures into tlfs header
Currently, retarget_msi_interrupt and other structures it relys on are
defined in pci-hyperv.c. However, those structures are actually defined
in Hypervisor Top-Level Functional Specification [1] and may be
different in sizes of fields or layout from architecture to
architecture. Let's move those definitions into x86's tlfs header file
to support virtual PCI on non-x86 architectures in the future. Note that
"__packed" attribute is added to these structures during the movement
for the same reason as we use the attribute for other TLFS structures in
the header file: make sure the structures meet the specification and
avoid anything unexpected from the compilers.

Additionally, rename struct retarget_msi_interrupt to
hv_retarget_msi_interrupt for the consistent naming convention, also
mirroring the name in TLFS.

[1]: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-09 14:50:53 +00:00
Boqun Feng
b00f80fcfa PCI: hv: Move hypercall related definitions into tlfs header
Currently HVCALL_RETARGET_INTERRUPT and HV_PARTITION_ID_SELF are defined
in pci-hyperv.c. However, similar to other hypercall related
definitions, it makes more sense to put them in the tlfs header file.

Besides, these definitions are arch-dependent, so for the support of
virtual PCI on non-x86 archs in the future, move them into arch-specific
tlfs header file.

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-09 14:50:39 +00:00
Ingo Molnar
6120681bdf Merge branch 'efi/urgent' into efi/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-08 09:57:58 +01:00
Ingo Molnar
1b10d388d0 Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-06 12:49:56 +01:00
Vitaly Kuznetsov
d718fdc3e7 KVM: x86: remove stale comment from struct x86_emulate_ctxt
Commit c44b4c6ab8 ("KVM: emulate: clean up initializations in
init_decode_cache") did some field shuffling and instead of
[opcode_len, _regs) started clearing [has_seg_override, modrm).
The comment about clearing fields altogether is not true anymore.

Fixes: c44b4c6ab8 ("KVM: emulate: clean up initializations in init_decode_cache")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-03 17:38:22 +01:00