Commit Graph

10378 Commits

Author SHA1 Message Date
Sean Christopherson
e662ec3e07 KVM: x86/mmu: Move max hugepage level to a separate #define
Rename PT_MAX_HUGEPAGE_LEVEL to KVM_MAX_HUGEPAGE_LEVEL and make it a
separate define in anticipation of dropping KVM's PT_*_LEVEL enums in
favor of the kernel's PG_LEVEL_* enums.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:11 -04:00
Xiaoyao Li
a71936ab46 kvm: x86: Cleanup vcpu->arch.guest_xstate_size
vcpu->arch.guest_xstate_size lost its only user since commit df1daba7d1
("KVM: x86: support XSAVES usage in the host"), so clean it up.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200429154312.1411-1-xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:26:10 -04:00
Sean Christopherson
68cda40d9f KVM: nVMX: Tweak handling of failure code for nested VM-Enter failure
Use an enum for passing around the failure code for a failed VM-Enter
that results in VM-Exit to provide a level of indirection from the final
resting place of the failure code, vmcs.EXIT_QUALIFICATION.  The exit
qualification field is an unsigned long, e.g. passing around
'u32 exit_qual' throws up red flags as it suggests KVM may be dropping
bits when reporting errors to L1.  This is a red herring because the
only defined failure codes are 0, 2, 3, and 4, i.e. don't come remotely
close to overflowing a u32.

Setting vmcs.EXIT_QUALIFICATION on entry failure is further complicated
by the MSR load list, which returns the (1-based) entry that failed, and
the number of MSRs to load is a 32-bit VMCS field.  At first blush, it
would appear that overflowing a u32 is possible, but the number of MSRs
that can be loaded is hardcapped at 4096 (limited by MSR_IA32_VMX_MISC).

In other words, there are two completely disparate types of data that
eventually get stuffed into vmcs.EXIT_QUALIFICATION, neither of which is
an 'unsigned long' in nature.  This was presumably the reasoning for
switching to 'u32' when the related code was refactored in commit
ca0bde28f2 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()").

Using an enum for the failure code addresses the technically-possible-
but-will-never-happen scenario where Intel defines a failure code that
doesn't fit in a 32-bit integer.  The enum variables and values will
either be automatically sized (gcc 5.4 behavior) or be subjected to some
combination of truncation.  The former case will simply work, while the
latter will trigger a compile-time warning unless the compiler is being
particularly unhelpful.

Separating the failure code from the failed MSR entry allows for
disassociating both from vmcs.EXIT_QUALIFICATION, which avoids the
conundrum where KVM has to choose between 'u32 exit_qual' and tracking
values as 'unsigned long' that have no business being tracked as such.
To cement the split, set vmcs12->exit_qualification directly from the
entry error code or failed MSR index instead of bouncing through a local
variable.

Opportunistically rename the variables in load_vmcs12_host_state() and
vmx_set_nested_state() to call out that they're ignored, set exit_reason
on demand on nested VM-Enter failure, and add a comment in
nested_vmx_load_msr() to call out that returning 'i + 1' can't wrap.

No functional change intended.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200511220529.11402-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 12:07:31 -04:00
Borislav Petkov
a9a3ed1eff x86: Fix early boot crash on gcc-10, third try
... or the odyssey of trying to disable the stack protector for the
function which generates the stack canary value.

The whole story started with Sergei reporting a boot crash with a kernel
built with gcc-10:

  Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139
  Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013
  Call Trace:
    dump_stack
    panic
    ? start_secondary
    __stack_chk_fail
    start_secondary
    secondary_startup_64
  -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary

This happens because gcc-10 tail-call optimizes the last function call
in start_secondary() - cpu_startup_entry() - and thus emits a stack
canary check which fails because the canary value changes after the
boot_init_stack_canary() call.

To fix that, the initial attempt was to mark the one function which
generates the stack canary with:

  __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused)

however, using the optimize attribute doesn't work cumulatively
as the attribute does not add to but rather replaces previously
supplied optimization options - roughly all -fxxx options.

The key one among them being -fno-omit-frame-pointer and thus leading to
not present frame pointer - frame pointer which the kernel needs.

The next attempt to prevent compilers from tail-call optimizing
the last function call cpu_startup_entry(), shy of carving out
start_secondary() into a separate compilation unit and building it with
-fno-stack-protector, was to add an empty asm("").

This current solution was short and sweet, and reportedly, is supported
by both compilers but we didn't get very far this time: future (LTO?)
optimization passes could potentially eliminate this, which leads us
to the third attempt: having an actual memory barrier there which the
compiler cannot ignore or move around etc.

That should hold for a long time, but hey we said that about the other
two solutions too so...

Reported-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kalle Valo <kvalo@codeaurora.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
2020-05-15 11:48:01 +02:00
Linus Torvalds
f44d5c4890 Various tracing fixes:
- Fix a crash when having function tracing and function stack tracing on
    the command line. The ftrace trampolines are created as executable and
    read only. But the stack tracer tries to modify them with text_poke()
    which expects all kernel text to still be writable at boot.
    Keep the trampolines writable at boot, and convert them to read-only
    with the rest of the kernel.
 
  - A selftest was triggering in the ring buffer iterator code, that
    is no longer valid with the update of keeping the ring buffer
    writable while a iterator is reading. Just bail after three failed
    attempts to get an event and remove the warning and disabling of the
    ring buffer.
 
  - While modifying the ring buffer code, decided to remove all the
    unnecessary BUG() calls.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXr1CDhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qsXcAQCoL229SBrtHsn4DUO7eAQRppUT3hNw
 RuKzvQ56+1GccQEAh8VGCeg89uMSK6imrTujEl6VmOUdbgrD5R96yiKoGQw=
 =vi+k
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing fixes from Steven Rostedt:
 "Various tracing fixes:

   - Fix a crash when having function tracing and function stack tracing
     on the command line.

     The ftrace trampolines are created as executable and read only. But
     the stack tracer tries to modify them with text_poke() which
     expects all kernel text to still be writable at boot. Keep the
     trampolines writable at boot, and convert them to read-only with
     the rest of the kernel.

   - A selftest was triggering in the ring buffer iterator code, that is
     no longer valid with the update of keeping the ring buffer writable
     while a iterator is reading.

     Just bail after three failed attempts to get an event and remove
     the warning and disabling of the ring buffer.

   - While modifying the ring buffer code, decided to remove all the
     unnecessary BUG() calls"

* tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Remove all BUG() calls
  ring-buffer: Don't deactivate the ring buffer on failed iterator reads
  x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
2020-05-14 11:46:52 -07:00
Yu-cheng Yu
c95473e175 x86/fpu/xstate: Update copy_kernel_to_xregs_err() for supervisor states
The function copy_kernel_to_xregs_err() uses XRSTOR which can work with
standard or compacted format without supervisor xstates. However, when
supervisor xstates are present, XRSTORS must be used. Fix it by using
XRSTORS when supervisor state handling is enabled.

I also considered if there were additional cases where XRSTOR might be
mistakenly called instead of XRSTORS.  There are only three XRSTOR sites
in the kernel:

1. copy_kernel_to_xregs_booting(), already switches between XRSTOR and
   XRSTORS based on X86_FEATURE_XSAVES.

2. copy_user_to_xregs(), which *needs* XRSTOR because it is copying from
   userspace and must never copy supervisor state with XRSTORS.

3. copy_kernel_to_xregs_err() mistakenly used XRSTOR only.  Fix it.

 [ bp: Massage commit message. ]

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-8-yu-cheng.yu@intel.com
2020-05-14 16:46:43 +02:00
Sean Christopherson
e93fd3b3e8 KVM: x86/mmu: Capture TDP level when updating CPUID
Snapshot the TDP level now that it's invariant (SVM) or dependent only
on host capabilities and guest CPUID (VMX).  This avoids having to call
kvm_x86_ops.get_tdp_level() when initializing a TDP MMU and/or
calculating the page role, and thus avoids the associated retpoline.

Drop the WARN in vmx_get_tdp_level() as updating CPUID while L2 is
active is legal, if dodgy.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-11-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:15:14 -04:00
Sean Christopherson
bd31fe495d KVM: VMX: Add proper cache tracking for CR0
Move CR0 caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.
This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0()
is called multiple times in a single VM-Exit, and more importantly
eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading
CR0, and squashes the confusing naming discrepancy of "cache_reg" vs.
"decache_cr0_guest_bits".

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:15:12 -04:00
Sean Christopherson
f98c1e7712 KVM: VMX: Add proper cache tracking for CR4
Move CR4 caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.
This avoids multiple VMREADs and retpolines (when configured) during
nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times
on each transition, e.g. when stuffing CR0 and CR3.

As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline
on SVM when reading CR4, and squashes the confusing naming discrepancy
of "cache_reg" vs. "decache_cr4_guest_bits".

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:15:10 -04:00
Sean Christopherson
56ba77a459 KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch'
Save L1's TSC offset in 'struct kvm_vcpu_arch' and drop the kvm_x86_ops
hook read_l1_tsc_offset().  This avoids a retpoline (when configured)
when reading L1's effective TSC, which is done at least once on every
VM-Exit.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:15:04 -04:00
Paolo Bonzini
c300ab9f08 KVM: x86: Replace late check_nested_events() hack with more precise fix
Add an argument to interrupt_allowed and nmi_allowed, to checking if
interrupt injection is blocked.  Use the hook to handle the case where
an interrupt arrives between check_nested_events() and the injection
logic.  Drop the retry of check_nested_events() that hack-a-fixed the
same condition.

Blocking injection is also a bit of a hack, e.g. KVM should do exiting
and non-exiting interrupt processing in a single pass, but it's a more
precise hack.  The old comment is also misleading, e.g. KVM_REQ_EVENT is
purely an optimization, setting it on every run loop (which KVM doesn't
do) should not affect functionality, only performance.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-13-sean.j.christopherson@intel.com>
[Extend to SVM, add SMI and NMI.  Even though NMI and SMI cannot come
 asynchronously right now, making the fix generic is easy and removes a
 special case. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:14:49 -04:00
Sean Christopherson
88c604b66e KVM: x86: Make return for {interrupt_nmi,smi}_allowed() a bool instead of int
Return an actual bool for kvm_x86_ops' {interrupt_nmi}_allowed() hook to
better reflect the return semantics, and to avoid creating an even
bigger mess when the related VMX code is refactored in upcoming patches.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:14:29 -04:00
Sean Christopherson
d2060bd42e KVM: nVMX: Open a window for pending nested VMX preemption timer
Add a kvm_x86_ops hook to detect a nested pending "hypervisor timer" and
use it to effectively open a window for servicing the expired timer.
Like pending SMIs on VMX, opening a window simply means requesting an
immediate exit.

This fixes a bug where an expired VMX preemption timer (for L2) will be
delayed and/or lost if a pending exception is injected into L2.  The
pending exception is rightly prioritized by vmx_check_nested_events()
and injected into L2, with the preemption timer left pending.  Because
no window opened, L2 is free to run uninterrupted.

Fixes: f4124500c2 ("KVM: nVMX: Fully emulate preemption timer")
Reported-by: Jim Mattson <jmattson@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-3-sean.j.christopherson@intel.com>
[Check it in kvm_vcpu_has_events too, to ensure that the preemption
 timer is serviced promptly even if the vCPU is halted and L1 is not
 intercepting HLT. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 12:14:27 -04:00
Paolo Bonzini
4aef2ec902 Merge branch 'kvm-amd-fixes' into HEAD 2020-05-13 12:14:05 -04:00
Babu Moger
37486135d3 KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c
Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
resource isn't. It can be read with XSAVE and written with XRSTOR.
So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
the guest can read the host value.

In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
potentially use XRSTOR to change the host PKRU value.

While at it, move pkru state save/restore to common code and the
host_pkru field to kvm_vcpu_arch.  This will let SVM support protection keys.

Cc: stable@vger.kernel.org
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 11:27:41 -04:00
Fenghua Yu
b860eb8dce x86/fpu/xstate: Define new functions for clearing fpregs and xstates
Currently, fpu__clear() clears all fpregs and xstates.  Once XSAVES
supervisor states are introduced, supervisor settings (e.g. CET xstates)
must remain active for signals; It is necessary to have separate functions:

- Create fpu__clear_user_states(): clear only user settings for signals;
- Create fpu__clear_all(): clear both user and supervisor settings in
   flush_thread().

Also modify copy_init_fpstate_to_fpregs() to take a mask from above two
functions.

Remove obvious side-comment in fpu__clear(), while at it.

 [ bp: Make the second argument of fpu__clear() bool after requesting it
   a bunch of times during review.
  - Add a comment about copy_init_fpstate_to_fpregs() locking needs. ]

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-6-yu-cheng.yu@intel.com
2020-05-13 13:41:50 +02:00
Yu-cheng Yu
524bb73bc1 x86/fpu/xstate: Separate user and supervisor xfeatures mask
Before the introduction of XSAVES supervisor states, 'xfeatures_mask' is
used at various places to determine XSAVE buffer components and XCR0 bits.
It contains only user xstates.  To support supervisor xstates, it is
necessary to separate user and supervisor xstates:

- First, change 'xfeatures_mask' to 'xfeatures_mask_all', which represents
  the full set of bits that should ever be set in a kernel XSAVE buffer.
- Introduce xfeatures_mask_supervisor() and xfeatures_mask_user() to
  extract relevant xfeatures from xfeatures_mask_all.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-4-yu-cheng.yu@intel.com
2020-05-13 10:31:07 +02:00
Steven Rostedt (VMware)
59566b0b62 x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
Booting one of my machines, it triggered the following crash:

 Kernel/User page tables isolation: enabled
 ftrace: allocating 36577 entries in 143 pages
 Starting tracer 'function'
 BUG: unable to handle page fault for address: ffffffffa000005c
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0003) - permissions violation
 PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061
 Oops: 0003 [#1] PREEMPT SMP PTI
 CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ #24
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
 RIP: 0010:text_poke_early+0x4a/0x58
 Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff
0 41 57 49
 RSP: 0000:ffffffff82003d38 EFLAGS: 00010046
 RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005
 RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c
 RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0
 R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0
 R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0
 FS:  0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0
 Call Trace:
  text_poke_bp+0x27/0x64
  ? mutex_lock+0x36/0x5d
  arch_ftrace_update_trampoline+0x287/0x2d5
  ? ftrace_replace_code+0x14b/0x160
  ? ftrace_update_ftrace_func+0x65/0x6c
  __register_ftrace_function+0x6d/0x81
  ftrace_startup+0x23/0xc1
  register_ftrace_function+0x20/0x37
  func_set_flag+0x59/0x77
  __set_tracer_option.isra.19+0x20/0x3e
  trace_set_options+0xd6/0x13e
  apply_trace_boot_options+0x44/0x6d
  register_tracer+0x19e/0x1ac
  early_trace_init+0x21b/0x2c9
  start_kernel+0x241/0x518
  ? load_ucode_intel_bsp+0x21/0x52
  secondary_startup_64+0xa4/0xb0

I was able to trigger it on other machines, when I added to the kernel
command line of both "ftrace=function" and "trace_options=func_stack_trace".

The cause is the "ftrace=function" would register the function tracer
and create a trampoline, and it will set it as executable and
read-only. Then the "trace_options=func_stack_trace" would then update
the same trampoline to include the stack tracer version of the function
tracer. But since the trampoline already exists, it updates it with
text_poke_bp(). The problem is that text_poke_bp() called while
system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
the page mapping, as it would think that the text is still read-write.
But in this case it is not, and we take a fault and crash.

Instead, lets keep the ftrace trampolines read-write during boot up,
and then when the kernel executable text is set to read-only, the
ftrace trampolines get set to read-only as well.

Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Fixes: 768ae4406a ("x86/ftrace: Use text_poke()")
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-12 18:24:34 -04:00
Fenghua Yu
8ab22804ef x86/fpu/xstate: Define new macros for supervisor and user xstates
XCNTXT_MASK is 'all supported xfeatures' before introducing supervisor
xstates.  Rename it to XFEATURE_MASK_USER_SUPPORTED to make clear that
these are user xstates.

Replace XFEATURE_MASK_SUPERVISOR with the following:
- XFEATURE_MASK_SUPERVISOR_SUPPORTED: Currently nothing.  ENQCMD and
  Control-flow Enforcement Technology (CET) will be introduced in separate
  series.
- XFEATURE_MASK_SUPERVISOR_UNSUPPORTED: Currently only Processor Trace.
- XFEATURE_MASK_SUPERVISOR_ALL: the combination of above.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-3-yu-cheng.yu@intel.com
2020-05-12 20:34:38 +02:00
Fenghua Yu
5274e6c172 x86/fpu/xstate: Rename validate_xstate_header() to validate_user_xstate_header()
The function validate_xstate_header() validates an xstate header coming
from userspace (PTRACE or sigreturn). To make it clear, rename it to
validate_user_xstate_header().

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200512145444.15483-2-yu-cheng.yu@intel.com
2020-05-12 20:20:32 +02:00
Willy Tarreau
38ede90831 floppy: use symbolic register names in the x86 port
Now we can use FD_STATUS and FD_DATA instead of 4 or 5, let's do
this, and also use STATUS_DMA and STATUS_READY for the status bits.

Link: https://lore.kernel.org/r/20200331094054.24441-9-w@1wt.eu
Cc: x86@kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
2020-05-12 19:34:53 +03:00
Willy Tarreau
e72e8bf1c9 floppy: split the base port from the register in I/O accesses
Currently we have architecture-specific fd_inb() and fd_outb() functions
or macros, taking just a port which is in fact made of a base address and
a register. The base address is FDC-specific and derived from the local or
global "fdc" variable through the FD_IOPORT macro used in the base address
calculation.

This change splits this by explicitly passing the FDC's base address and
the register separately to fd_outb() and fd_inb(). It affects the
following archs:
  - x86, alpha, mips, powerpc, parisc, arm, m68k:
    simple remap of port -> base+reg

  - sparc32: use of reg only, since the base address was already masked
    out and the FDC controller is known from a static struct.

  - sparc64: like x86 for PCI, like sparc32 for 82077

Some archs use inline functions and others macros. This was not
unified in order to minimize the number of changes to review. For the
same reason checkpatch still spews a few warnings about things that
were already there before.

The parisc still uses hard-coded register values and could be cleaned up
by taking the register definitions.

The sparc per-controller inb/outb functions could further be refined
to explicitly take an FDC register instead of a port in argument but it
was not needed yet and may be cleaned later.

Link: https://lore.kernel.org/r/20200331094054.24441-2-w@1wt.eu
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ian Molton <spyro@f2s.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: x86@kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
2020-05-12 19:34:52 +03:00
Uros Bizjak
7e32a9dac9 x86/cpu: Use INVPCID mnemonic in invpcid.h
The current minimum required version of binutils is 2.23, which supports
the INVPCID instruction mnemonic. Replace the byte-wise specification of
INVPCID with the proper mnemonic.

 [ bp: Add symbolic operand names for increased readability and flip
   their order like the insn expects them for the AT&T syntax. ]

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200508092247.132147-1-ubizjak@gmail.com

Signed-off-by: Borislav Petkov <bp@suse.de>
2020-05-12 16:05:30 +02:00
Linus Torvalds
c14cab2688 A set of fixes for x86:
- Ensure that direct mapping alias is always flushed when changing page
    attributes. The optimization for small ranges failed to do so when
    the virtual address was in the vmalloc or module space.
 
  - Unbreak the trace event registration for syscalls without arguments
    caused by the refactoring of the SYSCALL_DEFINE0() macro.
 
  - Move the printk in the TSC deadline timer code to a place where it is
    guaranteed to only be called once during boot and cannot be rearmed by
    clearing warn_once after boot. If it's invoked post boot then lockdep
    rightfully complains about a potential deadlock as the calling context
    is different.
 
  - A series of fixes for objtool and the ORC unwinder addressing variety
    of small issues:
 
      Stack offset tracking for indirect CFAs in objtool ignored subsequent
      pushs and pops
 
      Repair the unwind hints in the register clearing entry ASM code
 
      Make the unwinding in the low level exit to usermode code stop after
      switching to the trampoline stack. The unwind hint is not longer valid
      and the ORC unwinder emits a warning as it can't find the registers
      anymore.
 
      Fix the unwind hints in switch_to_asm() and rewind_stack_do_exit()
      which caused objtool to generate bogus ORC data.
 
      Prevent unwinder warnings when dumping the stack of a non-current
      task as there is no way to be sure about the validity because the
      dumped stack can be a moving target.
 
      Make the ORC unwinder behave the same way as the frame pointer
      unwinder when dumping an inactive tasks stack and do not skip the
      first frame.
 
      Prevent ORC unwinding before ORC data has been initialized
 
      Immediately terminate unwinding when a unknown ORC entry type is
      found.
 
      Prevent premature stop of the unwinder caused by IRET frames.
 
      Fix another infinite loop in objtool caused by a negative offset which
      was not catched.
 
      Address a few build warnings in the ORC unwinder and add missing
      static/ro_after_init annotations
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6363QTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRJHD/4hWjzJLsUZ9xq2NrzhevoeJtxj+wVM
 66x9NM3mlFQ30BN4Aye4EnNEhR0iIvNPWWdfEmaJYfPHPwnUjjcOa426HYxP/WXA
 DWd5F20wGaaPOJ65LJpy/+pfcxAeQynt4I2cDEWHAplswfOWV/Hv8mSeKAKuq400
 lCWaTMkWcO/toexSNn8PVyWi9rHlm+76E1bHkVwuoekGBGt1VloKGlK6OPyElzL2
 w9VtrjSLlYQ0MdfCJKQeg44XQPMbf4hZRfc88x9SwDWB01q7aSvb0pWNl9AJKNXA
 7fFu5T4F4PABPgRM7eJ5yNk0De9jM1y+6eCp66f9UXoNOeSr7Boz9Xc4xWqAraIi
 9Dtx3WliO9CAxwUiD+Cj2iJO5o83AdRK/xhCth2VRnYMS6imfSidEqTC+LhEtkzw
 Yplu7sbrWQDa5JTh8vk60clDvbkU+pfdxJisY+KClRguWfQfR6MJNuQnE0NHr7cH
 H4VXFFHEE6tDdJneQ9RxA4iF20RTgSlJGK0YlsH6QsxPsRgoHVkGUao8fQhrNvRc
 MIdpm9YasWStjJ7ZXbDeStmnLFN3DCj1RC8wmvJ4i/R1sPnBvPvRUt4Lm988a951
 Vyr23VIcVrE7zykiqQZVH7bvIv6ULORqTJbIOF1rO/aIut4W8z0ojoVXC0Z7CiwF
 S5SGj+hlWciIew==
 =0rCi
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Ensure that direct mapping alias is always flushed when changing
     page attributes. The optimization for small ranges failed to do so
     when the virtual address was in the vmalloc or module space.

   - Unbreak the trace event registration for syscalls without arguments
     caused by the refactoring of the SYSCALL_DEFINE0() macro.

   - Move the printk in the TSC deadline timer code to a place where it
     is guaranteed to only be called once during boot and cannot be
     rearmed by clearing warn_once after boot. If it's invoked post boot
     then lockdep rightfully complains about a potential deadlock as the
     calling context is different.

   - A series of fixes for objtool and the ORC unwinder addressing
     variety of small issues:

       - Stack offset tracking for indirect CFAs in objtool ignored
         subsequent pushs and pops

       - Repair the unwind hints in the register clearing entry ASM code

       - Make the unwinding in the low level exit to usermode code stop
         after switching to the trampoline stack. The unwind hint is no
         longer valid and the ORC unwinder emits a warning as it can't
         find the registers anymore.

       - Fix unwind hints in switch_to_asm() and rewind_stack_do_exit()
         which caused objtool to generate bogus ORC data.

       - Prevent unwinder warnings when dumping the stack of a
         non-current task as there is no way to be sure about the
         validity because the dumped stack can be a moving target.

       - Make the ORC unwinder behave the same way as the frame pointer
         unwinder when dumping an inactive tasks stack and do not skip
         the first frame.

       - Prevent ORC unwinding before ORC data has been initialized

       - Immediately terminate unwinding when a unknown ORC entry type
         is found.

       - Prevent premature stop of the unwinder caused by IRET frames.

       - Fix another infinite loop in objtool caused by a negative
         offset which was not catched.

       - Address a few build warnings in the ORC unwinder and add
         missing static/ro_after_init annotations"

* tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
  x86/apic: Move TSC deadline timer debug printk
  ftrace/x86: Fix trace event registration for syscalls without arguments
  x86/mm/cpa: Flush direct map alias during cpa
  objtool: Fix infinite loop in for_offset_range()
  x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
  x86/unwind/orc: Fix error path for bad ORC entry type
  x86/unwind/orc: Prevent unwinding before ORC initialization
  x86/unwind/orc: Don't skip the first frame for inactive tasks
  x86/unwind: Prevent false warnings for non-current tasks
  x86/unwind/orc: Convert global variables to static
  x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
  x86/entry/64: Fix unwind hints in __switch_to_asm()
  x86/entry/64: Fix unwind hints in kernel exit path
  x86/entry/64: Fix unwind hints in register clearing code
  objtool: Fix stack offset tracking for indirect CFAs
2020-05-10 11:59:53 -07:00
Paolo Bonzini
d67668e9dd KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6
There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the
different handling of DR6 on intercepted #DB exceptions on Intel and AMD.

On Intel, #DB exceptions transmit the DR6 value via the exit qualification
field of the VMCS, and the exit qualification only contains the description
of the precise event that caused a vmexit.

On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception
was to be injected into the guest.  This has two effects when guest debugging
is in use:

* the guest DR6 is clobbered

* the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather
than just the last one that happened (the testcase in the next patch covers
this issue).

This patch fixes both issues by emulating, so to speak, the Intel behavior
on AMD processors.  The important observation is that (after the previous
patches) the VMCB value of DR6 is only ever observable from the guest is
KVM_DEBUGREG_WONT_EXIT is set.  Therefore we can actually set vmcb->save.dr6
to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it
will be if guest debugging is enabled.

Therefore it is possible to enter the guest with an all-zero DR6,
reconstruct the #DB payload from the DR6 we get at exit time, and let
kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6.
Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT
is set, but this is harmless.

This may not be the most optimized way to deal with this, but it is
simple and, being confined within SVM code, it gets rid of the set_dr6
callback and kvm_update_dr6.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08 07:44:31 -04:00
Paolo Bonzini
5679b803e4 KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6
kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the
second argument.  Ensure that the VMCB value is synchronized to
vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so
that the current value of DR6 is always available in vcpu->arch.dr6.
The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08 07:43:47 -04:00
Linus Torvalds
8c16ec94dc Bugfixes, mostly for ARM and AMD, and more documentation.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6yqbIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroObBQf+NH9DCs6X92YggAoNpJl6uSIOX35X
 ErdWqYj80Xx95QU73aMukjs3Zqxe6WfYI9jPEOD8SDUZzZlVfIA35D8BYlqt1c5R
 A2K2ebTQbZ+j487QTUPbEvEivyxyVSozwvOdKBfL5kv0D9Cn2STyjVjmguUoCp9n
 VztmwbwpSZdOnexRSolwAWuyOriYbvpV12cIZpcMGrjL67yZPv8UyCxxJplDCLlB
 1c8tvGI2Md8apE/YZDqlCFh3H4YBQsact8uOoyY8cXKO/xIAsZOI+Dhm/cQAhGDk
 QIQqv/hkM4HPvOXQluwIau4Cx+Fl05xY/ggtQt4z/8yml2pOw8PKmwziZA==
 =60QX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes, mostly for ARM and AMD, and more documentation.

  Slightly bigger than usual because I couldn't send out what was
  pending for rc4, but there is nothing worrisome going on. I have more
  fixes pending for guest debugging support (gdbstub) but I will send
  them next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
  KVM: selftests: Fix build for evmcs.h
  kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
  KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
  docs/virt/kvm: Document configuring and running nested guests
  KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
  kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
  KVM: x86: Fixes posted interrupt check for IRQs delivery modes
  KVM: SVM: fill in kvm_run->debug.arch.dr[67]
  KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
  KVM: arm64: Fix 32bit PC wrap-around
  KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
  KVM: arm64: Save/restore sp_el0 as part of __guest_enter
  KVM: arm64: Delete duplicated label in invalid_vector
  KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
  KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
  KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
  KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
  KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
  KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
  ...
2020-05-07 09:50:59 -07:00
Kyung Min Park
cec5f268cd x86/delay: Introduce TPAUSE delay
TPAUSE instructs the processor to enter an implementation-dependent
optimized state. The instruction execution wakes up when the time-stamp
counter reaches or exceeds the implicit EDX:EAX 64-bit input value.
The instruction execution also wakes up due to the expiration of
the operating system time-limit or by an external interrupt
or exceptions such as a debug exception or a machine check exception.

TPAUSE offers a choice of two lower power states:
 1. Light-weight power/performance optimized state C0.1
 2. Improved power/performance optimized state C0.2

This way, it can save power with low wake-up latency in comparison to
spinloop based delay. The selection between the two is governed by the
input register.

TPAUSE is available on processors with X86_FEATURE_WAITPKG.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1587757076-30337-4-git-send-email-kyung.min.park@intel.com
2020-05-07 16:06:20 +02:00
Thomas Gleixner
e882489024 x86/delay: Preparatory code cleanup
The naming conventions in the delay code are confusing at best.

All delay variants use a loops argument and or variable which originates
from the original delay_loop() implementation. But all variants except
delay_loop() are based on TSC cycles.

Rename the argument to cycles and make it type u64 to avoid these weird
expansions to u64 in the functions.

Rename MWAITX_MAX_LOOPS to MWAITX_MAX_WAIT_CYCLES for the same reason
and fixup the comment of delay_mwaitx() as well.

Mark the delay_fn function pointer __ro_after_init and fixup the comment
for it.

No functional change and preparation for the upcoming TPAUSE based delay
variant.

[ Kyung Min Park: Added __init to use_tsc_delay() ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1587757076-30337-2-git-send-email-kyung.min.park@intel.com
2020-05-07 16:06:19 +02:00
Christoph Hellwig
2981cf8361 x86/platform/uv: Remove the unused _uv_cpu_blade_processor_id() macro
No users anywhere in the kernel tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-12-hch@lst.de
2020-05-07 15:32:23 +02:00
Christoph Hellwig
fbe1d37866 x86/platform/uv: Remove _uv_hub_info_check()
Neither this functions nor the helpers used to implement it are used
anywhere in the kernel tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-10-hch@lst.de
2020-05-07 15:32:23 +02:00
Christoph Hellwig
8e77554580 x86/platform/uv: Simplify uv_send_IPI_one()
Merge two helpers only used by uv_send_IPI_one() into the main function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-9-hch@lst.de
2020-05-07 15:32:22 +02:00
Christoph Hellwig
e4dd8b8351 x86/platform/uv: Mark is_uv_hubless() static
is_uv_hubless() is only used in x2apic_uv_x.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-7-hch@lst.de
2020-05-07 15:32:21 +02:00
Christoph Hellwig
cc19910587 x86/platform/uv: Remove the UV*_HUB_IS_SUPPORTED macros
All of the macros are always defined to one.  Remove them and the dead
code keyed off them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-6-hch@lst.de
2020-05-07 15:32:21 +02:00
Christoph Hellwig
32988cfd57 x86/platform/uv: Remove the uv_partition_coherence_id() macro
uv_partition_coherence_id() is only used once.  Just open code it in the
only user.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-3-hch@lst.de
2020-05-07 15:32:19 +02:00
Christoph Hellwig
30ad8db3a2 x86/platform/uv: Mark uv_bios_call() and uv_bios_call_irqsave() static
Both functions are only used inside of bios_uv.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-2-hch@lst.de
2020-05-07 15:32:19 +02:00
Borislav Petkov
d8422f6bb0 x86/cpu: Add a X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS() macro
... to match Intel family 6 CPUs with steppings.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Link: https://lkml.kernel.org/r/20200506071516.25445-3-bp@alien8.de
2020-05-07 13:48:05 +02:00
Borislav Petkov
51485635eb Merge 'x86/urgent' into x86/cpu
... to resolve conflicting changes to arch/x86/kernel/apic/apic.c

Signed-off-by: Borislav Petkov <bp@suse.de>
2020-05-07 12:27:43 +02:00
Paolo Bonzini
4d5523cfd5 KVM: x86: fix DR6 delivery for various cases of #DB injection
Go through kvm_queue_exception_p so that the payload is correctly delivered
through the exit qualification, and add a kvm_update_dr6 call to
kvm_deliver_exception_payload that is needed on AMD.

Reported-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07 06:13:41 -04:00
Reinette Chatre
f3d44f18b0 x86/resctrl: Support CPUID enumeration of MBM counter width
The original Memory Bandwidth Monitoring (MBM) architectural
definition defines counters of up to 62 bits in the
IA32_QM_CTR MSR while the first-generation MBM implementation
uses statically defined 24 bit counters.

Expand the MBM CPUID enumeration properties to include the MBM
counter width. The previously undefined EAX output register contains,
in bits [7:0], the MBM counter width encoded as an offset from
24 bits. Enumerating this property is only specified for Intel
CPUs.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.com
2020-05-06 18:02:41 +02:00
Reinette Chatre
0118ad82c2 x86/cpu: Move resctrl CPUID code to resctrl/
The function determining a platform's support and properties of cache
occupancy and memory bandwidth monitoring (properties of
X86_FEATURE_CQM_LLC) can be found among the common CPU code. After
the feature's properties is populated in the per-CPU data the resctrl
subsystem is the only consumer (via boot_cpu_data).

Move the function that obtains the CPU information used by resctrl to
the resctrl subsystem and rename it from init_cqm() to
resctrl_cpu_detect(). The function continues to be called from the
common CPU code. This move is done in preparation of the addition of some
vendor specific code.

No functional change.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/38433b99f9d16c8f4ee796f8cc42b871531fa203.1588715690.git.reinette.chatre@intel.com
2020-05-06 17:51:21 +02:00
Reinette Chatre
8dd97c6518 x86/resctrl: Rename asm/resctrl_sched.h to asm/resctrl.h
asm/resctrl_sched.h is dedicated to the code used for configuration
of the CPU resource control state when a task is scheduled.

Rename resctrl_sched.h to resctrl.h in preparation of additions that
will no longer make this file dedicated to work done during scheduling.

No functional change.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/6914e0ef880b539a82a6d889f9423496d471ad1d.1588715690.git.reinette.chatre@intel.com
2020-05-06 17:45:22 +02:00
Christoph Hellwig
c3b3f52476 signal: refactor copy_siginfo_to_user32
Factor out a copy_siginfo_to_external32 helper from
copy_siginfo_to_user32 that fills out the compat_siginfo, but does so
on a kernel space data structure.  With that we can let architectures
override copy_siginfo_to_user32 with their own implementations using
copy_siginfo_to_external32.  That allows moving the x32 SIGCHLD purely
to x86 architecture code.

As a nice side effect copy_siginfo_to_external32 also comes in handy
for avoiding a set_fs() call in the coredump code later on.

Contains improvements from Eric W. Biederman <ebiederm@xmission.com>
and Arnd Bergmann <arnd@arndb.de>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-05 16:46:09 -04:00
Arvind Sankar
de8c55208c efi/libstub: Fix mixed mode boot issue after macro refactor
Commit

  22090f84bc ("efi/libstub: unify EFI call wrappers for non-x86")

refactored the macros that are used to provide wrappers for mixed-mode
calls on x86, allowing us to boot a 64-bit kernel on 32-bit firmware.

Unfortunately, this broke mixed mode boot due to the fact that
efi_is_native() is not a macro on x86.

All of these macros should go together, so rather than testing each one
to see if it is defined, condition the generic macro definitions on a
new ARCH_HAS_EFISTUB_WRAPPERS, and remove the wrapper definitions on x86
as well if CONFIG_EFI_MIXED is not enabled.

Fixes: 22090f84bc ("efi/libstub: unify EFI call wrappers for non-x86")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200504150248.62482-1-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-05-05 09:25:39 +02:00
Suravee Suthikulpanit
637543a8d6 KVM: x86: Fixes posted interrupt check for IRQs delivery modes
Current logic incorrectly uses the enum ioapic_irq_destination_types
to check the posted interrupt destination types. However, the value was
set using APIC_DM_XXX macros, which are left-shifted by 8 bits.

Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead.

Fixes: (fdcf756213 'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes')
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <1586239989-58305-1-git-send-email-suravee.suthikulpanit@amd.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04 12:16:51 -04:00
Borislav Petkov
bd1de2a7aa x86/tlb/uv: Add a forward declaration for struct flush_tlb_info
... to fix these build warnings:

  In file included from ./arch/x86/include/asm/uv/uv_hub.h:22,
                   from drivers/misc/sgi-gru/grukdump.c:16:
  ./arch/x86/include/asm/uv/uv.h:39:21: warning: ‘struct flush_tlb_info’ declared \
     inside parameter list will not be visible outside of this definition or declaration
     39 |        const struct flush_tlb_info *info);
        |                     ^~~~~~~~~~~~~~
  In file included from ./arch/x86/include/asm/uv/uv_hub.h:22,
                   from drivers/misc/sgi-gru/grutlbpurge.c:28:
  ./arch/x86/include/asm/uv/uv.h:39:21: warning: ‘struct flush_tlb_info’ declared \
    inside parameter list will not be visible outside of this definition or declaration
     39 |        const struct flush_tlb_info *info);
        |                     ^~~~~~~~~~~~~~

  ...

after

  bfe3d8f631 ("x86/tlb: Restrict access to tlbstate")

restricted access to tlbstate.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200503103107.3419-1-bp@alien8.de
2020-05-03 21:43:47 +02:00
Konstantin Khlebnikov
fdc63ff0e4 ftrace/x86: Fix trace event registration for syscalls without arguments
The refactoring of SYSCALL_DEFINE0() macros removed the ABI stubs and
simply defines __abi_sys_$NAME as alias of __do_sys_$NAME.

As a result kallsyms_lookup() returns "__do_sys_$NAME" which does not match
with the declared trace event name.

See also commit 1c758a2202 ("tracing/x86: Update syscall trace events to
handle new prefixed syscall func names").

Add __do_sys_ to the valid prefixes which are checked in
arch_syscall_match_sym_name().

Fixes: d2b5de495e ("x86/entry: Refactor SYSCALL_DEFINE0 macros")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/158636958997.7900.16485049455470033557.stgit@buzz
2020-05-01 19:15:40 +02:00
Peter Zijlstra
cc1ac9c792 x86/retpoline: Fix retpoline unwind
Currently objtool cannot understand retpolines, and thus cannot
generate ORC unwind information for them. This means that we cannot
unwind from the middle of a retpoline.

The recent ANNOTATE_INTRA_FUNCTION_CALL and UNWIND_HINT_RET_OFFSET
support in objtool enables it to understand the basic retpoline
construct. A further problem is that the ORC unwind information is
alternative invariant; IOW. every alternative should have the same
ORC, retpolines obviously violate this. This means we need to
out-of-line them.

Since all GCC generated code already uses out-of-line retpolines, this
should not affect performance much, if anything.

This will enable objtool to generate valid ORC data for the
out-of-line copies, which means we can correctly and reliably unwind
through a retpoline.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191700.210835357@infradead.org
2020-04-30 20:14:34 +02:00
Peter Zijlstra
34fdce6981 x86: Change {JMP,CALL}_NOSPEC argument
In order to change the {JMP,CALL}_NOSPEC macros to call out-of-line
versions of the retpoline magic, we need to remove the '%' from the
argument, such that we can paste it onto symbol names.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191700.151623523@infradead.org
2020-04-30 20:14:34 +02:00
Peter Zijlstra
ca3f0d80dd x86: Simplify retpoline declaration
Because of how KSYM works, we need one declaration per line. Seeing
how we're going to be doubling the amount of retpoline symbols,
simplify the machinery in order to avoid having to copy/paste even
more.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191700.091696925@infradead.org
2020-04-30 20:14:34 +02:00
Peter Zijlstra
089dd8e531 x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
Change FILL_RETURN_BUFFER so that objtool groks it and can generate
correct ORC unwind information.

 - Since ORC is alternative invariant; that is, all alternatives
   should have the same ORC entries, the __FILL_RETURN_BUFFER body
   can not be part of an alternative.

   Therefore, move it out of the alternative and keep the alternative
   as a sort of jump_label around it.

 - Use the ANNOTATE_INTRA_FUNCTION_CALL annotation to white-list
   these 'funny' call instructions to nowhere.

 - Use UNWIND_HINT_EMPTY to 'fill' the speculation traps, otherwise
   objtool will consider them unreachable.

 - Move the RSP adjustment into the loop, such that the loop has a
   deterministic stack layout.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191700.032079304@infradead.org
2020-04-30 20:14:34 +02:00
Peter Zijlstra
1ff865e343 x86,smap: Fix smap_{save,restore}() alternatives
As reported by objtool:

  lib/ubsan.o: warning: objtool: .altinstr_replacement+0x0: alternative modifies stack
  lib/ubsan.o: warning: objtool: .altinstr_replacement+0x7: alternative modifies stack

the smap_{save,restore}() alternatives violate (the newly enforced)
rule on stack invariance. That is, due to there only being a single
ORC table it must be valid to any alternative. These alternatives
violate this with the direct result that unwinds will not be correct
when it hits between the PUSH and POP instructions.

Rewrite the functions to only have a conditional jump.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200429101802.GI13592@hirez.programming.kicks-ass.net
2020-04-30 20:14:31 +02:00
Will Deacon
bf60333977 Merge branch 'x86/asm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-next/asm
As agreed with Boris, merge in the 'x86/asm' branch from -tip so that we
can select the new 'ARCH_USE_SYM_ANNOTATIONS' Kconfig symbol, which is
required by the BTI kernel patches.

* 'x86/asm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Provide a Kconfig symbol for disabling old assembly annotations
  x86/32: Remove CONFIG_DOUBLEFAULT
2020-04-30 17:39:42 +01:00
Linus Torvalds
869997be0e hyperv-fixes for 5.7-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl6mwOETHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXrFLB/4yKsrl41WwYRbTKgiir576/LA0vGxQ
 cZjUQwkVv3S5/AfhvpwiGFV4dBV6j81KtNhRE6luaa3FBHObnjrx5tNqMw/P8a0j
 HZGZ68n4qE+OPVtTxj54s81iWIi9vgT/La92GPYhuXoiVPTd5zJ2lwY3so04BSFJ
 p30+RZFKNkTjNYZNZSHcoodr+js4Uws8JSn8OmpCJr8Gt+FJqkujQROG3HMKhJlk
 KlJlCJhV48tj/nlgcbGHBF0Yy5l8DVCaKIz+MiF5F/i+P8r0cErfyihc9Ene0/un
 LNFhIVGn8/MTi0CVrltcnur2qFH1qPCuLolKSpd/FKd6H2UDgK16XgAd
 =NJP/
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V fixes from Wei Liu:

 - Two patches from Dexuan fixing suspension bugs

 - Three cleanup patches from Andy and Michael

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hyper-v: Remove internal types from UAPI header
  hyper-v: Use UUID API for exporting the GUID
  x86/hyperv: Suspend/resume the VP assist page for hibernation
  Drivers: hv: Move AEOI determination to architecture dependent code
  Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM
2020-04-27 13:28:27 -07:00
Thomas Gleixner
bfe3d8f631 x86/tlb: Restrict access to tlbstate
Hide tlbstate, flush_tlb_info and related helpers when tlbflush.h is
included from a module. Modules have absolutely no business with these
internals.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092600.328438734@linutronix.de
2020-04-26 18:52:33 +02:00
Thomas Gleixner
6c9b7d79a8 x86/tlb: Move PCID helpers where they are used
Aside of the fact that they are used only in the TLB code, especially
having the comment close to the actual implementation makes a lot of
sense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092600.145772183@linutronix.de
2020-04-26 18:49:44 +02:00
Thomas Gleixner
af5c40c6ee x86/tlb: Uninline nmi_uaccess_okay()
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

nmi_access_ok() is the last inline function which requires access to
cpu_tlbstate. Move it into the TLB code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092600.052543007@linutronix.de
2020-04-26 18:47:05 +02:00
Thomas Gleixner
96f59fe291 x86/tlb: Move cr4_set_bits_and_update_boot() to the usage site
No point in having this exposed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.940978251@linutronix.de
2020-04-26 18:39:48 +02:00
Thomas Gleixner
69de6c1a7f x86/tlb: Move paravirt_tlb_remove_table() to the usage site
Move it where the only user is.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.849801011@linutronix.de
2020-04-26 18:19:35 +02:00
Thomas Gleixner
4b04e6c236 x86/tlb: Move __flush_tlb_all() out of line
Reduce the number of required exports to one and make flush_tlb_global()
static to the TLB code.

flush_tlb_local() cannot be confined to the TLB code as the MTRR
handling requires a PGE-less flush.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200421092559.740388137@linutronix.de
2020-04-26 18:17:31 +02:00
Thomas Gleixner
29def599b3 x86/tlb: Move flush_tlb_others() out of line
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

As a last step, move __flush_tlb_others() out of line and hide the
native function. The latter can be static when CONFIG_PARAVIRT is
disabled.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.641957686@linutronix.de
2020-04-26 11:10:25 +02:00
Thomas Gleixner
58430c5dba x86/tlb: Move __flush_tlb_one_kernel() out of line
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

As a fourth step, move __flush_tlb_one_kernel() out of line and hide
the native function. The latter can be static when CONFIG_PARAVIRT is
disabled.

Consolidate the name space while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.535159540@linutronix.de
2020-04-26 11:01:22 +02:00
Thomas Gleixner
127ac915c8 x86/tlb: Move __flush_tlb_one_user() out of line
cpu_tlbstate is exported because various TLB-related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well-contained kernel functions and not be directly exposed to
modules.

As a third step, move _flush_tlb_one_user() out of line and hide the
native function. The latter can be static when CONFIG_PARAVIRT is
disabled.

Consolidate the name space while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.428213098@linutronix.de
2020-04-26 11:00:29 +02:00
Thomas Gleixner
cd30d26cf3 x86/tlb: Move __flush_tlb_global() out of line
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

As a second step, move __flush_tlb_global() out of line and hide the
native function. The latter can be static when CONFIG_PARAVIRT is
disabled.

Consolidate the namespace while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.336916818@linutronix.de
2020-04-26 11:00:27 +02:00
Thomas Gleixner
2faf153bb7 x86/tlb: Move __flush_tlb() out of line
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

As a first step, move __flush_tlb() out of line and hide the native
function. The latter can be static when CONFIG_PARAVIRT is disabled.

Consolidate the namespace while at it and remove the pointless extra
wrapper in the paravirt code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.246130908@linutronix.de
2020-04-26 11:00:05 +02:00
Josh Poimboeuf
81b67439d1 x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
The following execution path is possible:

  fsnotify()
    [ realign the stack and store previous SP in R10 ]
    <IRQ>
      [ only IRET regs saved ]
      common_interrupt()
        interrupt_entry()
	  <NMI>
	    [ full pt_regs saved ]
	    ...
	    [ unwind stack ]

When the unwinder goes through the NMI and the IRQ on the stack, and
then sees fsnotify(), it doesn't have access to the value of R10,
because it only has the five IRET registers.  So the unwind stops
prematurely.

However, because the interrupt_entry() code is careful not to clobber
R10 before saving the full regs, the unwinder should be able to read R10
from the previously saved full pt_regs associated with the NMI.

Handle this case properly.  When encountering an IRET regs frame
immediately after a full pt_regs frame, use the pt_regs as a backup
which can be used to get the C register values.

Also, note that a call frame resets the 'prev_regs' value, because a
function is free to clobber the registers.  For this fix to work, the
IRET and full regs frames must be adjacent, with no FUNC frames in
between.  So replace the FUNC hint in interrupt_entry() with an
IRET_REGS hint.

Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:29 +02:00
Linus Torvalds
b9916af776 Kbuild fixes for v5.7
- fix scripts/config to properly handle ':' in string type CONFIG options
 
  - fix unneeded rebuilds of DT schema check rule
 
  - git rid of ordering dependency between <linux/vermagic.h> and
    <linux/module.h> to fix build errors in some network drivers
 
  - clean up generated headers of host arch with 'make ARCH=um mrproper'
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl6jDlAVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGcZEP/2ORmb2mfCCrPU9Cjks+97FsQd7P
 aqOCAX0884prFBMykSYuPJ9DSlMEcMwkA7zdd7dlaEbwI7WJhWzjoQRi+/pnsSMn
 7wG46T1Vj4CXIvHgP8PEeO0wFXXdNOQNxeJ/CDAD99ISmMTg4crJWyPOI5eFTKad
 GqWWY04smfE/2RhAbA3UJASsrO6Ev45Os4CLjXwqDM26oDozyyhrWdGFrE5/pqLp
 /feA5Jky8lUMIRoql3QMCKkjFpwl+A6IGnD2BCMdkkgRoTn+V/v28ENqYEjPEoYw
 fB2wVQ6+3E9DEVS1lBaYG+ZMhyZxJWABd2KCN6OuVZPVvuJEjnCWnT9wy71ZBO3W
 YDUOMsL5yqae4OhpCOHAujsBe23hEbjPXRHiAFXiaFayFb3gYgoiQhKV3HNSY00p
 JEFk9by3RggL/ZOmLhfzX9BLdfaK1q/k4kGuTU3eD4JryfI70tlO/t726LbFKqjP
 dBZjCGjGMZv2QAnLVa9mAnO3OH2jq8dyrlsjNPbTsQ4yBqk+YzXm16M7RCqOBKE1
 w4bI8oEBLUfzMrhQGYRVOMxS8DkxyNg6KUjbXkGBgueuiYT38pjRaj955hN4kCzT
 FxYXHwm6v+VwJp5wt/HN+Gd8A5rzQgGUFutqJ9vAgQUBJJz5y5RP+7AUVBKspIe/
 OF8x1J/O5REkOOCP
 =3L2c
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - fix scripts/config to properly handle ':' in string type CONFIG
   options

 - fix unneeded rebuilds of DT schema check rule

 - git rid of ordering dependency between <linux/vermagic.h> and
   <linux/module.h> to fix build errors in some network drivers

 - clean up generated headers of host arch with 'make ARCH=um mrproper'

* tag 'kbuild-fixes-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  h8300: ignore vmlinux.lds
  Documentation: kbuild: fix the section title format
  um: ensure `make ARCH=um mrproper` removes arch/$(SUBARCH)/include/generated/
  arch: split MODULE_ARCH_VERMAGIC definitions out to <asm/vermagic.h>
  kbuild: fix DT binding schema rule again to avoid needless rebuilds
  scripts/config: allow colons in option strings for sed
2020-04-24 10:39:32 -07:00
Thomas Gleixner
9020d39563 x86/alternatives: Move temporary_mm helpers into C
The only user of these inlines is the text poke code and this must not be
exposed to the world.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.139069561@linutronix.de
2020-04-24 19:12:56 +02:00
Thomas Gleixner
cb2a02355b x86/cr4: Sanitize CR4.PCE update
load_mm_cr4_irqsoff() is really a strange name for a function which has
only one purpose: Update the CR4.PCE bit depending on the perf state.

Rename it to update_cr4_pce_mm(), move it into the tlb code and provide a
function which can be invoked by the perf smp function calls.

Another step to remove exposure of cpu_tlbstate.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.049499158@linutronix.de
2020-04-24 19:01:17 +02:00
Thomas Gleixner
d8f0b35331 x86/cpu: Uninline CR4 accessors
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

The various CR4 accessors require cpu_tlbstate as the CR4 shadow cache
is located there.

In preparation for unexporting cpu_tlbstate, create a builtin function
for manipulating CR4 and rework the various helpers to use it.

No functional change.

 [ bp: push the export of native_write_cr4() only when CONFIG_LKTDM=m to
   the last patch in the series. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092558.939985695@linutronix.de
2020-04-24 18:46:42 +02:00
Thomas Gleixner
8c5cc19e94 x86/tlb: Uninline __get_current_cr3_fast()
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.

In preparation for unexporting cpu_tlbstate move __get_current_cr3_fast()
into the x86 TLB management code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200421092558.848064318@linutronix.de
2020-04-24 15:55:50 +02:00
Ard Biesheuvel
0a75561489 efi/libstub/x86: Avoid getter function for efi_is64
We no longer need to take special care when using global variables
in the EFI stub, so switch to a simple symbol reference for efi_is64.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-04-24 14:52:16 +02:00
Ard Biesheuvel
ccc27ae774 efi/libstub: Drop __pure getter for efi_system_table
The practice of using __pure getter functions to access global
variables in the EFI stub dates back to the time when we had to
carefully prevent GOT entries from being emitted, because we
could not rely on the toolchain to do this for us.

Today, we use the hidden visibility pragma for all EFI stub source
files, which now all live in the same subdirectory, and we apply a
sanity check on the objects, so we can get rid of these getter
functions and simply refer to global data objects directly.

Start with efi_system_table(), and convert it into a global variable.
While at it, make it a pointer-to-const, because we can.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-04-24 14:52:16 +02:00
Andy Shevchenko
4a65ed6562 Immutable branch between MFD, X86, USB and Watchdog due for the v5.7 merge window
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAl6iviIACgkQUa+KL4f8
 d2EdiA/+OSpr4fiy9RTIxXnyROoFmI9ypKyDbXdkW0c2E2ZlH6d3LCFRCiF39DeQ
 7U6uNXyNetUjzH6fXm1BM/nltNbAaAUmNlL1THGQ3wamMayL4S0hSHgy8ZUTcFj4
 yC4TLwHAsXmNjHC0FbUul9LILooBs6MfEHeN+zK7tCI4ZLaygx5d/+ApFlbuU45c
 J4u7a32PEmM9i6BvSAQK2OGWChIRGvCnxekz9C8ebWULzzsOKPLFJccKEdORc8dK
 5Mg3J8zC1IZ+ct6y+DvvbaYROP3vbNabJPHIjV/hLmP0g3qHghHUW265qLVrHu/z
 uCek8lgLTYetJ+Nn4/k7eoUDKkZgbgfukRLyHw/gu4YQaiwhjI2KVP+ytBWkCzA4
 AOJVnMN756C1Rx3XrF/E63RdwuIfJA+VGGW7YzbPuXa0SvECPE//wVpZv6FSkJmA
 N4s1Z7yhuZEjGvuu4l5/ErZSbSN/2OMM4ahdQ4xoCAxYKkKMacS5Gds83VlnM6LQ
 Mwl77aTc84JqI1y4HROd2qj4J8YfE1F8lTRt8lepgeOL/kntM9U1lvBT3TCV3T0e
 xLC6GDCHo6N9vlyn1KU4PCyCHFNUJLD6Wq+H/jLZkcrx4k59caR0+I1+iyaQyo/i
 o+DgYCp6GnPFd3XuVLppg45n6qZ7wB67WOJ6iOpcO7+kMldg4Ck=
 =4M2c
 -----END PGP SIGNATURE-----

Merge branch 'ib-mfd-x86-usb-watchdog-v5.7'

Merge branch 'ib-mfd-x86-usb-watchdog-v5.7' of
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git
to avoid conflicts in PDx86.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-04-24 13:56:46 +03:00
Mika Westerberg
25f1ca31e2 platform/x86: intel_pmc_ipc: Convert to MFD
This driver only creates a bunch of platform devices sharing resources
belonging to the PMC device. This is pretty much what MFD subsystem is
for so move the driver there, renaming it to intel_pmc_bxt.c which
should be more clear what it is.

MFD subsystem provides nice helper APIs for subdevice creation so
convert the driver to use those. Unfortunately the ACPI device includes
separate resources for most of the subdevices so we cannot simply call
mfd_add_devices() to create all of them but instead we need to call it
separately for each device.

The new MFD driver continues to expose two sysfs attributes that allow
userspace to send IPC commands to the PMC/SCU to avoid breaking any
existing applications that may use these. Generally this is bad idea so
document this in the ABI documentation.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:18:44 +01:00
Mika Westerberg
0759a8730c platform/x86: intel_telemetry: Add telemetry_get_pltdata()
Add new function that allows telemetry modules to get pointer to the
platform specific configuration. This is needed to allow the telemetry
debugfs module to fetch PMC IPC instance in the subsequent patch.

This also allows us to replace telemetry_pltconfig_valid() with
telemetry_get_pltdata() as well.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:18:30 +01:00
Mika Westerberg
781adff21c x86/platform/intel-mid: Add empty stubs for intel_scu_devices_[create|destroy]()
This allows to call the functions even when CONFIG_X86_INTEL_MID is not
enabled.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:18:21 +01:00
Mika Westerberg
7713f9180c platform/x86: intel_pmc_ipc: Drop intel_pmc_ipc_command()
Now that all callers have been converted over to the SCU IPC API we can
drop intel_pmc_ipc_command().

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:18:16 +01:00
Mika Westerberg
68c73fb224 platform/x86: intel_telemetry: Convert to use new SCU IPC API
Convert the Intel Apollo Lake telemetry driver to use the new SCU IPC
API. This allows us to get rid of the duplicate PMC IPC implementation
which is now covered in SCU IPC driver.

Also move telemetry specific IPC message constant to the telemetry
driver where it belongs.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:18:08 +01:00
Mika Westerberg
4181bc8f6f mfd: intel_soc_pmic_bxtwc: Convert to use new SCU IPC API
Convert the Intel Broxton Whiskey Cover PMIC driver to use the new SCU
IPC API. This allows us to get rid of the PMC IPC implementation which
is now covered in SCU IPC driver. We drop the error log if the IPC
command fails because intel_scu_ipc_dev_command() does that already.

Also move PMIC specific IPC message constants to the PMIC driver from
the intel_pmc_ipc.h header.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:17:58 +01:00
Mika Westerberg
7e18c89d6e platform/x86: intel_scu_ipc: Add managed function to register SCU IPC
Drivers such as intel_pmc_ipc.c can be unloaded as well so in order to
support those in this driver add a new function that can be called to
unregister the SCU IPC when it is not needed anymore.

We also add a managed version of the intel_scu_ipc_register() that takes
care of calling intel_scu_ipc_unregister() automatically when the driver
is unbound.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:17:44 +01:00
Mika Westerberg
f57fa18583 platform/x86: intel_scu_ipc: Introduce new SCU IPC API
The current SCU IPC API has been operating on a single instance and
there has been no way to pin the providing module in place when the SCU
IPC is in use.

This implements a new API that takes the SCU IPC instance as first
parameter (NULL means the single instance is being used). The SCU IPC
instance can be retrieved by calling new function intel_scu_ipc_dev_get()
that take care of pinning the providing module in place as long as
intel_scu_ipc_dev_put() is not called.

The old API is updated to call the new API and is is left there in the
legacy API header to support the existing users that cannot be converted
easily.

Subsequent patches will convert most of the users over to the new API.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:17:28 +01:00
Mika Westerberg
dd88564937 platform/x86: intel_scu_ipc: Move legacy SCU IPC API to a separate header
In preparation for introducing a new API for SCU IPC, move the legacy
API and constants to a separate header that is is subject to be removed
eventually.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:17:24 +01:00
Mika Westerberg
54b34aa0a7 platform/x86: intel_scu_ipc: Split out SCU IPC functionality from the SCU driver
The SCU IPC functionality is usable outside of Intel MID devices. For
example modern Intel CPUs include the same thing but now it is called
PMC (Power Management Controller) instead of SCU. To make the IPC
available for those split the driver into core part (intel_scu_ipc.c)
and the SCU PCI driver part (intel_scu_pcidrv.c) which then calls the
former before it goes and creates rest of the SCU devices. The SCU IPC
will also register a new class that gets assigned to the device that is
created under the parent PCI device.

We also split the Kconfig symbols so that INTEL_SCU_IPC enables the SCU
IPC library and INTEL_SCU_PCI the SCU driver and convert the users
accordingly. While there remove default y from the INTEL_SCU_PCI symbol
as it is already selected by X86_INTEL_MID.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-04-24 11:17:05 +01:00
Arvind Sankar
b4b89a0272 efi/gop: Add prototypes for query_mode and set_mode
Add prototypes and argmap for the Graphics Output Protocol's QueryMode
and SetMode functions.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200320020028.1936003-11-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-04-23 20:15:06 +02:00
Paolo Bonzini
33b2217245 KVM: x86: move nested-related kvm_x86_ops to a separate struct
Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to
nested virtualization into a separate struct.

As a result, these ops will always be non-NULL on VMX.  This is not a problem:

* check_nested_events is only called if is_guest_mode(vcpu) returns true

* get_nested_state treats VMXOFF state the same as nested being disabled

* set_nested_state fails if you attempt to set nested state while
  nesting is disabled

* nested_enable_evmcs could already be called on a CPU without VMX enabled
  in CPUID.

* nested_get_evmcs_version was fixed in the previous patch

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-23 09:04:57 -04:00
Christoph Hellwig
325518e9b7 x86/mm: Use pgprotval_t in protval_4k_2_large() and protval_large_2_4k()
Use the proper type for "raw" page table values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200422170116.GA28345@lst.de
2020-04-23 11:38:42 +02:00
Christoph Hellwig
de17a37896 x86/mm: Unexport __cachemode2pte_tbl
Exporting the raw data for a table is generally a bad idea. Move
cachemode2protval() out of line given that it isn't really used in the
fast path, and then mark __cachemode2pte_tbl static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200408152745.1565832-5-hch@lst.de
2020-04-23 11:34:31 +02:00
Christoph Hellwig
d073569363 x86/mm: Cleanup pgprot_4k_2_large() and pgprot_large_2_4k()
Make use of lower level helpers that operate on the raw protection
values to make the code a little easier to understand, and to also
avoid extra conversions in a few callers.

[ Qian: Fix a wrongly placed bracket in the original submission.
  Reported and fixed by Qian Cai <cai@lca.pw>. Details in second
  Link: below. ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200408152745.1565832-4-hch@lst.de
Link: https://lkml.kernel.org/r/1ED37D02-125F-4919-861A-371981581D9E@lca.pw
2020-04-23 11:31:52 +02:00
Masahiro Yamada
62d0fd591d arch: split MODULE_ARCH_VERMAGIC definitions out to <asm/vermagic.h>
As the bug report [1] pointed out, <linux/vermagic.h> must be included
after <linux/module.h>.

I believe we should not impose any include order restriction. We often
sort include directives alphabetically, but it is just coding style
convention. Technically, we can include header files in any order by
making every header self-contained.

Currently, arch-specific MODULE_ARCH_VERMAGIC is defined in
<asm/module.h>, which is not included from <linux/vermagic.h>.

Hence, the straight-forward fix-up would be as follows:

|--- a/include/linux/vermagic.h
|+++ b/include/linux/vermagic.h
|@@ -1,5 +1,6 @@
| /* SPDX-License-Identifier: GPL-2.0 */
| #include <generated/utsrelease.h>
|+#include <linux/module.h>
|
| /* Simply sanity version stamp for modules. */
| #ifdef CONFIG_SMP

This works enough, but for further cleanups, I split MODULE_ARCH_VERMAGIC
definitions into <asm/vermagic.h>.

With this, <linux/module.h> and <linux/vermagic.h> will be orthogonal,
and the location of MODULE_ARCH_VERMAGIC definitions will be consistent.

For arc and ia64, MODULE_PROC_FAMILY is only used for defining
MODULE_ARCH_VERMAGIC. I squashed it.

For hexagon, nds32, and xtensa, I removed <asm/modules.h> entirely
because they contained nothing but MODULE_ARCH_VERMAGIC definition.
Kbuild will automatically generate <asm/modules.h> at build-time,
wrapping <asm-generic/module.h>.

[1] https://lore.kernel.org/lkml/20200411155623.GA22175@zn.tnic

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
2020-04-23 10:50:26 +09:00
Al Viro
2a89b674fd get rid of csum_partial_copy_to_user()
For historical reasons some architectures call their csum_and_copy_to_user()
csum_partial_copy_to_user() instead (and supply a macro defining the
former as the latter).  That's the last remnants of old experiment that
went nowhere; time to bury them.  Rename those to csum_and_copy_to_user()
and get rid of the macros.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-22 14:37:50 -04:00
Peter Zijlstra
c536ed2fff objtool: Remove SAVE/RESTORE hints
The SAVE/RESTORE hints are now unused; remove them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115118.926738768@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:50 +02:00
Peter Zijlstra
e25eea89bb objtool: Introduce HINT_RET_OFFSET
Normally objtool ensures a function keeps the stack layout invariant.
But there is a useful exception, it is possible to stuff the return
stack in order to 'inject' a 'call':

	push $fun
	ret

In this case the invariant mentioned above is violated.

Add an objtool HINT to annotate this and allow a function exit with a
modified stack frame.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115118.690601403@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:50 +02:00
Peter Zijlstra
b746046238 objtool: Better handle IRET
Teach objtool a little more about IRET so that we can avoid using the
SAVE/RESTORE annotation. In particular, make the weird corner case in
insn->restore go away.

The purpose of that corner case is to deal with the fact that
UNWIND_HINT_RESTORE lands on the instruction after IRET, but that
instruction can end up being outside the basic block, consider:

	if (cond)
		sync_core()
	foo();

Then the hint will land on foo(), and we'll encounter the restore
hint without ever having seen the save hint.

By teaching objtool about the arch specific exception frame size, and
assuming that any IRET in an STT_FUNC symbol is an exception frame
sized POP, we can remove the use of save/restore hints for this code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115118.631224674@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:50 +02:00
Paolo Bonzini
3bda03865f KVM: s390: Fix for 5.7 and maintainer update
- Silence false positive lockdep warning
 - add Claudio as reviewer
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJenY6AAAoJEBF7vIC1phx8bykQAK+QZyD+H/zGNuqeUVn0sh8e
 yKUVMR+kuE+l57q77nt2AYVxqpCD9xSKRR+SOSLzhVH/HJf625nm+Ny/WOWMebwJ
 EA/KK+v15T5rga8gFza+4cPg4v/pHwjHhSbjTb1JWg+8cJR1BTj6OxRuTtWr5+25
 GF4RhkJOit/VhNbCo1aIgs7/7F1pPALstdPAUsHYe1PeULdRMVqSVluXT2KTPhpi
 /kzDw8sKKcYgv/eaVdcNoHv+VX1AWIRDAKEttCywyocfbu0ESwadmR7C0qlm1446
 HqowP6F0xCF0Whi/65aN4ZOv7wjO/qrV08DZ7JLA3/oKlXtZ1ieyiE2q/P1frSo1
 gvmuHiH5/UI6t6a/BSCpJwqcilxKYArqAAYBKoGiJhTbsJStqw0wl41klWTKXlTq
 VrCvjoUxQ9JMjFCQ1GXOU+ODNyX2IwZYptJ5vF24HYzBJwUBe3HPG9/BA8YcodzG
 qGQ5IKv0Q1IFTwOqnt557H0MjcBtNIEx54aLJrPy3wldsiNSj39Ft0cuvnbR+Q4F
 QhKk88dHtd7NW1IirfgYmLGe0rB1ANKM7wUGEdM5w2y5Eg8wCs8/P4KeGh0YyFI9
 xPqZDfwof6KkDjOGFXr/CeD/thi+km0/FpePb7cL5Ow4a+JmrCvqQiXrf0TbnFpv
 t5ZlHnGzoSHsEaRgmJ+X
 =d46L
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master

KVM: s390: Fix for 5.7 and maintainer update

- Silence false positive lockdep warning
- add Claudio as reviewer
2020-04-21 09:37:13 -04:00
Wanpeng Li
a9ab13ff6e KVM: X86: Improve latency for single target IPI fastpath
IPI and Timer cause the main MSRs write vmexits in cloud environment
observation, let's optimize virtual IPI latency more aggressively to
inject target IPI as soon as possible.

Running kvm-unit-tests/vmexit.flat IPI testing on SKX server, disable
adaptive advance lapic timer and adaptive halt-polling to avoid the
interference, this patch can give another 7% improvement.

w/o fastpath   -> x86.c fastpath      4238 -> 3543  16.4%
x86.c fastpath -> vmx.c fastpath      3543 -> 3293     7%
w/o fastpath   -> vmx.c fastpath      4238 -> 3293  22.3%

Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410174703.1138-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:13:10 -04:00
Sean Christopherson
8791585837 KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags
Introduce a new "extended register" type, EXIT_INFO_2 (to pair with the
nomenclature in .get_exit_info()), and use it to cache VMX's
vmcs.EXIT_INTR_INFO.  Drop a comment in vmx_recover_nmi_blocking() that
is obsoleted by the generic caching mechanism.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415203454.8296-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:13:07 -04:00
Sean Christopherson
5addc23519 KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags
Introduce a new "extended register" type, EXIT_INFO_1 (to pair with the
nomenclature in .get_exit_info()), and use it to cache VMX's
vmcs.EXIT_QUALIFICATION.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415203454.8296-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:13:07 -04:00
Sean Christopherson
be01e8e2c6 KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related code
Rename functions and variables in kvm_mmu_new_cr3() and related code to
replace "cr3" with "pgd", i.e. continue the work started by commit
727a7e27cf ("KVM: x86: rename set_cr3 callback and related flags to
load_mmu_pgd").  kvm_mmu_new_cr3() and company are not always loading a
new CR3, e.g. when nested EPT is enabled "cr3" is actually an EPTP.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-37-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:59 -04:00
Sean Christopherson
4a632ac6ca KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switch
Add a separate "skip" override for MMU sync, a future change to avoid
TLB flushes on nested VMX transitions may need to sync the MMU even if
the TLB flush is unnecessary.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-32-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:56 -04:00
Sean Christopherson
a4148b7ca2 KVM: VMX: Retrieve APIC access page HPA only when necessary
Move the retrieval of the HPA associated with L1's APIC access page into
VMX code to avoid unnecessarily calling gfn_to_page(), e.g. when the
vCPU is in guest mode (L2).  Alternatively, the optimization logic in
VMX could be mirrored into the common x86 code, but that will get ugly
fast when further optimizations are introduced.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-29-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:55 -04:00
Sean Christopherson
eeeb4f67a6 KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID
Add KVM_REQ_TLB_FLUSH_CURRENT to allow optimized TLB flushing of VMX's
EPTP/VPID contexts[*] from the KVM MMU and/or in a deferred manner, e.g.
to flush L2's context during nested VM-Enter.

Convert KVM_REQ_TLB_FLUSH to KVM_REQ_TLB_FLUSH_CURRENT in flows where
the flush is directly associated with vCPU-scoped instruction emulation,
i.e. MOV CR3 and INVPCID.

Add a comment in vmx_vcpu_load_vmcs() above its KVM_REQ_TLB_FLUSH to
make it clear that it deliberately requests a flush of all contexts.

Service any pending flush request on nested VM-Exit as it's possible a
nested VM-Exit could occur after requesting a flush for L2.  Add the
same logic for nested VM-Enter even though it's _extremely_ unlikely
for flush to be pending on nested VM-Enter, but theoretically possible
(in the future) due to RSM (SMM) emulation.

[*] Intel also has an Address Space Identifier (ASID) concept, e.g.
    EPTP+VPID+PCID == ASID, it's just not documented in the SDM because
    the rules of invalidation are different based on which piece of the
    ASID is being changed, i.e. whether the EPTP, VPID, or PCID context
    must be invalidated.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-25-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:53 -04:00
Sean Christopherson
7780938cc7 KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all()
Rename ->tlb_flush() to ->tlb_flush_all() in preparation for adding a
new hook to flush only the current ASID/context.

Opportunstically replace the comment in vmx_flush_tlb() that explains
why it flushes all EPTP/VPID contexts with a comment explaining why it
unconditionally uses INVEPT when EPT is enabled.  I.e. rely on the "all"
part of the name to clarify why it does global INVEPT/INVVPID.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-23-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:52 -04:00
Sean Christopherson
f55ac304ca KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush()
Drop @invalidate_gpa from ->tlb_flush() and kvm_vcpu_flush_tlb() now
that all callers pass %true for said param, or ignore the param (SVM has
an internal call to svm_flush_tlb() in svm_flush_tlb_guest that somewhat
arbitrarily passes %false).

Remove __vmx_flush_tlb() as it is no longer used.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-17-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:49 -04:00
Vitaly Kuznetsov
0baedd7927 KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()
Hyper-V PV TLB flush mechanism does TLB flush on behalf of the guest
so doing tlb_flush_all() is an overkill, switch to using tlb_flush_guest()
(just like KVM PV TLB flush mechanism) instead. Introduce
KVM_REQ_HV_TLB_FLUSH to support the change.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21 09:12:48 -04:00
Michael Kelley
2ddddd0b4e Drivers: hv: Move AEOI determination to architecture dependent code
Hyper-V on ARM64 doesn't provide a flag for the AEOI recommendation
in ms_hyperv.hints, so having the test in architecture independent
code doesn't work. Resolve this by moving the check of the flag
to an architecture dependent helper function. No functionality is
changed.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200420164926.24471-1-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-04-21 10:02:38 +01:00
Sean Christopherson
e64419d991 KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
Add a dedicated hook to handle flushing TLB entries on behalf of the
guest, i.e. for a paravirtualized TLB flush, and use it directly instead
of bouncing through kvm_vcpu_flush_tlb().

For VMX, change the effective implementation implementation to never do
INVEPT and flush only the current context, i.e. to always flush via
INVVPID(SINGLE_CONTEXT).  The INVEPT performed by __vmx_flush_tlb() when
@invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only
flush guest-physical mappings; linear and combined mappings are flushed
by VM-Enter when VPID is disabled, and changes in the guest pages tables
do not affect guest-physical mappings.

When EPT and VPID are enabled, doing INVVPID is not required (by Intel's
architecture) to invalidate guest-physical mappings, i.e. TLB entries
that cache guest-physical mappings can live across INVVPID as the
mappings are associated with an EPTP, not a VPID.  The intent of
@invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate
gpa mappings", i.e. do INVEPT and not simply INVVPID.  Other than nested
VPID handling, which now calls vpid_sync_context() directly, the only
scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is
enabled) is if KVM is flushing TLB entries from the guest's perspective,
i.e. is only required to invalidate linear mappings.

For SVM, flushing TLB entries from the guest's perspective can be done
by flushing the current ASID, as changes to the guest's page tables are
associated only with the current ASID.

Adding a dedicated ->tlb_flush_guest() paves the way toward removing
@invalidate_gpa, which is a potentially dangerous control flag as its
meaning is not exactly crystal clear, even for those who are familiar
with the subtleties of what mappings Intel CPUs are/aren't allowed to
keep across various invalidation scenarios.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-15-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-20 17:26:10 -04:00
Paolo Bonzini
5efac0741c KVM: x86: introduce kvm_mmu_invalidate_gva
Wrap the combination of mmu->invlpg and kvm_x86_ops->tlb_flush_gva
into a new function.  This function also lets us specify the host PGD to
invalidate and also the MMU, both of which will be useful in fixing and
simplifying kvm_inject_emulated_page_fault.

A nested guest's MMU however has g_context->invlpg == NULL.  Instead of
setting it to nonpaging_invlpg, make kvm_mmu_invalidate_gva the only
entry point to mmu->invlpg and make a NULL invlpg pointer equivalent
to nonpaging_invlpg, saving a retpoline.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-20 17:25:55 -04:00
Kees Cook
9fccc5c0c9 x86/elf: Disable automatic READ_IMPLIES_EXEC on 64-bit
With modern x86 64-bit environments, there should never be a need for
automatic READ_IMPLIES_EXEC, as the architecture is intended to always
be execute-bit aware (as in, the default memory protection should be NX
unless a region explicitly requests to be executable).

There were very old x86_64 systems that lacked the NX bit, but for those,
the NX bit is, obviously, unenforceable, so these changes should have
no impact on them.

Suggested-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lkml.kernel.org/r/20200327064820.12602-4-keescook@chromium.org
2020-04-20 19:24:33 +02:00
Kees Cook
122306117a x86/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK
The READ_IMPLIES_EXEC workaround was designed for old toolchains that
lacked the ELF PT_GNU_STACK marking under the assumption that toolchains
that couldn't specify executable permission flags for the stack may not
know how to do it correctly for any memory region.

This logic is sensible for having ancient binaries coexist in a system
with possibly NX memory, but was implemented in a way that equated having
a PT_GNU_STACK marked executable as being as "broken" as lacking the
PT_GNU_STACK marking entirely. Things like unmarked assembly and stack
trampolines may cause PT_GNU_STACK to need an executable bit, but they
do not imply all mappings must be executable.

This confusion has led to situations where modern programs with explicitly
marked executable stacks are forced into the READ_IMPLIES_EXEC state when
no such thing is needed. (And leads to unexpected failures when mmap()ing
regions of device driver memory that wish to disallow VM_EXEC[1].)

In looking for other reasons for the READ_IMPLIES_EXEC behavior, Jann
Horn noted that glibc thread stacks have always been marked RWX (until
2003 when they started tracking the PT_GNU_STACK flag instead[2]). And
musl doesn't support executable stacks at all[3]. As such, no breakage
for multithreaded applications is expected from this change.

[1] https://lkml.kernel.org/r/20190418055759.GA3155@mellanox.com
[2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=54ee14b3882
[3] https://lkml.kernel.org/r/20190423192534.GN23599@brightrain.aerifal.cx

Suggested-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lkml.kernel.org/r/20200327064820.12602-3-keescook@chromium.org
2020-04-20 19:09:38 +02:00
Kees Cook
9d9e435f3f x86/elf: Add table to document READ_IMPLIES_EXEC
Add a table to document the current behavior of READ_IMPLIES_EXEC in
preparation for changing the behavior.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lkml.kernel.org/r/20200327064820.12602-2-keescook@chromium.org
2020-04-20 15:28:40 +02:00
Christoph Hellwig
7fa3e10f0f x86/mm: Move pgprot2cachemode out of line
This helper is only used by x86 low-level MM code.  Also remove the
entirely pointless __pte2cachemode_tbl export as that symbol can be
marked static now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200408152745.1565832-3-hch@lst.de
2020-04-20 12:39:17 +02:00
Christoph Hellwig
1f6f655e01 x86/mm: Add a x86_has_pat_wp() helper
Abstract the ioremap code away from the caching mode internals.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200408152745.1565832-2-hch@lst.de
2020-04-20 12:39:11 +02:00
Mark Gross
7e5b3c267d x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation
SRBDS is an MDS-like speculative side channel that can leak bits from the
random number generator (RNG) across cores and threads. New microcode
serializes the processor access during the execution of RDRAND and
RDSEED. This ensures that the shared buffer is overwritten before it is
released for reuse.

While it is present on all affected CPU models, the microcode mitigation
is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the
cases where TSX is not supported or has been disabled with TSX_CTRL.

The mitigation is activated by default on affected processors and it
increases latency for RDRAND and RDSEED instructions. Among other
effects this will reduce throughput from /dev/urandom.

* Enable administrator to configure the mitigation off when desired using
  either mitigations=off or srbds=off.

* Export vulnerability status via sysfs

* Rename file-scoped macros to apply for non-whitelist table initializations.

 [ bp: Massage,
   - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g,
   - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in,
   - flip check in cpu_set_bug_bits() to save an indentation level,
   - reflow comments.
   jpoimboe: s/Mitigated/Mitigation/ in user-visible strings
   tglx: Dropped the fused off magic for now
 ]

Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
2020-04-20 12:19:22 +02:00
Mark Gross
e9d7144597 x86/cpu: Add a steppings field to struct x86_cpu_id
Intel uses the same family/model for several CPUs. Sometimes the
stepping must be checked to tell them apart.

On x86 there can be at most 16 steppings. Add a steppings bitmask to
x86_cpu_id and a X86_MATCH_VENDOR_FAMILY_MODEL_STEPPING_FEATURE macro
and support for matching against family/model/stepping.

 [ bp: Massage. ]

Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-04-20 12:19:21 +02:00
Linus Torvalds
0fe5f9ca22 A set of fixes for x86 and objtool:
objtool:
 
   - Ignore the double UD2 which is emitted in BUG() when CONFIG_UBSAN_TRAP
     is enabled.
 
   - Support clang non-section symbols in objtool ORC dump
 
   - Fix switch table detection in .text.unlikely
 
   - Make the BP scratch register warning more robust.
 
  x86:
 
   - Increase microcode maximum patch size for AMD to cope with new CPUs
     which have a larger patch size.
 
   - Fix a crash in the resource control filesystem when the removal of the
     default resource group is attempted.
 
   - Preserve Code and Data Prioritization enabled state accross CPU
     hotplug.
 
   - Update split lock cpu matching to use the new X86_MATCH macros.
 
   - Change the split lock enumeration as Intel finaly decided that the
     IA32_CORE_CAPABILITIES bits are not architectural contrary to what
     the SDM claims. !@#%$^!
 
   - Add Tremont CPU models to the split lock detection cpu match.
 
   - Add a missing static attribute to make sparse happy.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6cWGsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYod2jD/4kZqz+nEzAvx8RC/7zfLr1S6mDYcLb
 kqWEblLRfPofFNO3W/1Ri7xUs2VCyBcOJeG9JIugI8YV/b/5LY9j2nW30unXi84y
 8DHLWgM7OG+EiNDMvdQwgnjNb9Pdl4F1e9yTTD6IRg0bHOjvtHVyq9bNg7f3iaED
 ZE4X5Hh5u4qFK/jmcsTF5HA/wIjELdmT32F4RxceAlmvpa5SUGlOfVVo1cSZpCbx
 XkrvUvEzyZhbzY+Gy1q3SHTt+fvzx1++LsnJD0Dyfe5Q47PA1Iy6Zo2+Epn3FnCu
 XuQKLaiDhidpkPzTGULZUsubavXbrSEu5/yhFJHyUqMy5WNOmvXBN8eVC4j1I9Ga
 tnt43s3AS8noz4qIb7bpoVgETFtoCfWfqwhtZmALPzrfutwxe2Ujtsi9FUca6HtA
 T5dKuNwc8G+Q5ZiNi+rPjcV/QGGncZFwtwwRwUl/YKgQ2VgrTgfsPc431tfSl3Q8
 hVQIOhQNHCKqe3uGhiCsI29pNMDXVijZcI8w2SSmxnPyrMRXD7bTfLWnPav7SGFO
 aSSi9HWtghkU/MsmRgRcZc9PI5bNs6w5IkfQqfXjd/lJwea2yQg1cn1KdmGi3Q33
 BNj9FudNMe4K8ITaNWiLdt5rYCDIvWEzmbwawAhevstbKrjVtrAYgNAjvgJEnXAt
 mZwTu+Hpd6d+JA==
 =raUm
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 and objtool fixes from Thomas Gleixner:
 "A set of fixes for x86 and objtool:

  objtool:

   - Ignore the double UD2 which is emitted in BUG() when
     CONFIG_UBSAN_TRAP is enabled.

   - Support clang non-section symbols in objtool ORC dump

   - Fix switch table detection in .text.unlikely

   - Make the BP scratch register warning more robust.

  x86:

   - Increase microcode maximum patch size for AMD to cope with new CPUs
     which have a larger patch size.

   - Fix a crash in the resource control filesystem when the removal of
     the default resource group is attempted.

   - Preserve Code and Data Prioritization enabled state accross CPU
     hotplug.

   - Update split lock cpu matching to use the new X86_MATCH macros.

   - Change the split lock enumeration as Intel finaly decided that the
     IA32_CORE_CAPABILITIES bits are not architectural contrary to what
     the SDM claims. !@#%$^!

   - Add Tremont CPU models to the split lock detection cpu match.

   - Add a missing static attribute to make sparse happy"

* tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/split_lock: Add Tremont family CPU models
  x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural
  x86/resctrl: Preserve CDP enable over CPU hotplug
  x86/resctrl: Fix invalid attempt at removing the default resource group
  x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL()
  x86/umip: Make umip_insns static
  x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
  objtool: Make BP scratch register warning more robust
  objtool: Fix switch table detection in .text.unlikely
  objtool: Support Clang non-section symbols in ORC generation
  objtool: Support Clang non-section symbols in ORC dump
  objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
2020-04-19 11:58:32 -07:00
Tony Luck
f82cdff1aa x86/mce: Drop bogus comment about mce.kflags
The bit definitions for kflags are for internal use only. A
late edit moved them from uapi/asm/mce.h to the internal
x86 <asm/mce.h>, but the comment saying "See below" was
accidentally left here.

Delete "See below". Just labelling this field as internal
kernel use is sufficient.

Fixes: 1de08dccd3 ("x86/mce: Add a struct mce.kflags field")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200415195826.GA13681@agluck-desk2.amr.corp.intel.com
2020-04-17 11:12:21 +02:00
Sean Christopherson
53b3d8e9d5 KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault)
Export the page fault propagation helper so that VMX can use it to
correctly emulate TLB invalidation on page faults in an upcoming patch.

In the (hopefully) not-too-distant future, SGX virtualization will also
want access to the helper for injecting page faults to the correct level
(L1 vs. L2) when emulating ENCLS instructions.

Rename the function to kvm_inject_emulated_page_fault() to clarify that
it is (a) injecting a fault and (b) only for page faults.  WARN if it's
invoked with an exception other than PF_VECTOR.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-6-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15 12:08:50 -04:00
John Allen
bdf89df3c5 x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
Future AMD CPUs will have microcode patches that exceed the default 4K
patch size. Raise our limit.

Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v4.14..
Link: https://lkml.kernel.org/r/20200409152931.GA685273@mojo.amd.com
2020-04-14 17:34:46 +02:00
Borislav Petkov
1df73b2131 x86/mce: Fixup exception only for the correct MCEs
The severity grading code returns IN_KERNEL_RECOV error context for
errors which have happened in kernel space but from which the kernel can
recover. Whether the recovery can happen is determined by the exception
table entry having as handler ex_handler_fault() and which has been
declared at build time using _ASM_EXTABLE_FAULT().

IN_KERNEL_RECOV is used in mce_severity_intel() to lookup the
corresponding error severity in the severities table.

However, the mapping back from error severity to whether the error is
IN_KERNEL_RECOV is ambiguous and in the very paranoid case - which
might not be possible right now - but be better safe than sorry later,
an exception fixup could be attempted for another MCE whose address
is in the exception table and has the proper severity. Which would be
unfortunate, to say the least.

Therefore, mark such MCEs explicitly as MCE_IN_KERNEL_RECOV so that the
recovery attempt is done only for them.

Document the whole handling, while at it, as it is not trivial.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200407163414.18058-10-bp@alien8.de
2020-04-14 16:01:49 +02:00
Tony Luck
1de08dccd3 x86/mce: Add a struct mce.kflags field
There can be many different subsystems register on the mce handler
chain. Add a new bitmask field and define values so that handlers can
indicate whether they took any action to log or otherwise handle an
error.

The default handler at the end of the chain can use this information to
decide whether to print to the console log.

Boris suggested a generic name and leaving plenty of spare bits for
possible future use.

 [ bp: Move flag bits to the internal mce.h header and use BIT_ULL(). ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200214222720.13168-4-tony.luck@intel.com
2020-04-14 15:58:43 +02:00
Tony Luck
c9c6d216ed x86/mce: Rename "first" function as "early"
It isn't going to be first on the notifier chain when the CEC is moved
to be a normal user of the notifier chain.

Fix the enum for the MCE_PRIO symbols to list them in reverse order so
that the compiler can give them numbers from low to high priority. Add
an entry for MCE_PRIO_CEC as the highest priority.

 [ bp: Use passive voice, add comments. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200214222720.13168-2-tony.luck@intel.com
2020-04-14 15:55:01 +02:00
Borislav Petkov
3e0fdec858 x86/mce/amd, edac: Remove report_gart_errors
... because no one should be interested in spurious MCEs anyway. Make
the filtering unconditional and move it to amd_filter_mce().

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200407163414.18058-2-bp@alien8.de
2020-04-14 15:53:46 +02:00
Thomas Gleixner
f26d2580a7 x86/mce/amd: Cleanup threshold device remove path
Pass in the bank pointer directly to the cleaning up functions,
obviating the need for per-CPU accesses. Make the clean up path
interrupt-safe by cleaning the bank pointer first so that the rest of
the teardown happens safe from the thresholding interrupt.

No functional changes.

 [ bp: Write commit message and reverse bank->shared test to save an
   indentation level in threshold_remove_bank(). ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200403161943.1458-7-bp@alien8.de
2020-04-14 15:49:51 +02:00
Borislav Petkov
593309423c x86/32: Remove CONFIG_DOUBLEFAULT
Make the doublefault exception handler unconditional on 32-bit. Yes,
it is important to be able to catch #DF exceptions instead of silent
reboots. Yes, the code size increase is worth every byte. And one less
CONFIG symbol is just the cherry on top.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200404083646.8897-1-bp@alien8.de
2020-04-14 14:24:05 +02:00
Paolo Bonzini
f14eec0a32 KVM: SVM: move more vmentry code to assembly
Manipulate IF around vmload/vmsave to remove the confusing usage of
local_irq_enable where interrupts are actually disabled via GIF.
And stuff the RSB immediately without waiting for a RET to avoid
Spectre-v2 attacks.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-14 04:21:21 -04:00
Ard Biesheuvel
a088b858f1 efi/x86: Revert struct layout change to fix kexec boot regression
Commit

  0a67361dcd ("efi/x86: Remove runtime table address from kexec EFI setup data")

removed the code that retrieves the non-remapped UEFI runtime services
pointer from the data structure provided by kexec, as it was never really
needed on the kexec boot path: mapping the runtime services table at its
non-remapped address is only needed when calling SetVirtualAddressMap(),
which never happens during a kexec boot in the first place.

However, dropping the 'runtime' member from struct efi_setup_data was a
mistake. That struct is shared ABI between the kernel and the kexec tooling
for x86, and so we cannot simply change its layout. So let's put back the
removed field, but call it 'unused' to reflect the fact that we never look
at its contents. While at it, add a comment to remind our future selves
that the layout is external ABI.

Fixes: 0a67361dcd ("efi/x86: Remove runtime table address from kexec EFI setup data")
Reported-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-14 08:32:17 +02:00
Ingo Molnar
3b02a051d2 Linux 5.7-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl6TbaUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhgkH/iWpiKvosA20HJjC
 rBqYeJPxQsgZTuBieWJ+MeVxbpcF7RlM4c+glyvg3QJhHwIEG58dl6LBrQbAyBAR
 aFHNojr1iAYOruVCGnU3pA008YZiwUIDv/ZQ4DF8fmIU2vI2mJ6qHBv3XDl4G2uR
 Nwz8Eu9AgIwZM5coomVOSmoWyFy7Vxmb7W+3t5VmKsvOWx4ib9kyQtOIkvQDEl7j
 XCbWfI0xDQr6LFOm4jnCi5R/LhJ2LIqqIvHHrunbpszM8IwK797jCXz4im+dmd5Y
 +km46N7a8pDqri36xXz1gdBAU3eG7Pt1NyvfjwRVTdX4GquQ2MT0GoojxbLxUP3y
 3pEsQuE=
 =whbL
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refresh

Resolve these conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/Makefile

Do a minor "evil merge" to move the KCSAN entry up a bit by a few lines
in the Kconfig to reduce the probability of future conflicts.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-13 09:44:39 +02:00
Linus Torvalds
4f8a3cc118 A set of three patches to fix the fallout of the newly added split lock
detection feature.
 
 It addressed the case where a KVM guest triggers a split lock #AC and KVM
 reinjects it into the guest which is not prepared to handle it.
 
 Adds proper sanity checks which prevent the unconditional injection into
 the guest and handles the #AC on the host side in the same way as user
 space detections are handled. Depending on the detection mode it either
 warns and disables detection for the task or kills the task if the mode is
 set to fatal.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6TFtMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoSHuD/4gUKV4BefhiUulmk++jAyq1Hq91IOg
 wIZEALyP53AcxXDoASAzkbNfyDuBufszzM6lSgd0lFMN4uaDvW/7jTR9fxyG7xMT
 uTS86WZRKpMjb+NWnU5OuO7jmYSKnV/yjXqyg+z9G7nO/JdQ7b4GekdPQobGOZ0f
 NGGttSjCHvMUt5JF6tUggpl8IgomEP0HNes80IHIoJbE1KOm9qiP0NYZtlrlwYWT
 J0Xuz4b/I1IeT2Dr4X7B4LYmzYqyXpG/8khTapFyBdLqdNLBhoEUPotnEjrL3u/S
 I4h+U5N1hCu3AjC0iatjedD2etB8GWrOWhBYPmdh9LTdhHIVUIDKWDoUYl3YeESK
 Kvu5b3tyCbT86YKu2WxDtwi67yN6MM390M2JU3TLzzbbmjxMTG2dzWQaWzKeDFcx
 NwoxQU08c1/dVheODe2lCsI+RaMY3uWMpHoRrJkm105CaOGrBMpTFfHyMJsY8zgZ
 vgpUZeXylx0IrgteWyD6UrkA6LqtBukc/zOb9YL8vQmSh2I3URhUQ8O+TQG9VtbR
 e/KekJ2Ij7gXSOSu65bcpNY3q7BtVi+7ev5KFYhVMT3QuMMdc7l+TkMX5lDhAG+a
 lYdn9mxuNahyTxylGt7Sy0U1bRyn1n7fIG4azwuCiNhXtimV0urTG6PtfQTf+j4L
 bXJfa4C4BykF5A==
 =8FbL
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of three patches to fix the fallout of the newly added split
  lock detection feature.

  It addressed the case where a KVM guest triggers a split lock #AC and
  KVM reinjects it into the guest which is not prepared to handle it.

  Add proper sanity checks which prevent the unconditional injection
  into the guest and handles the #AC on the host side in the same way as
  user space detections are handled. Depending on the detection mode it
  either warns and disables detection for the task or kills the task if
  the mode is set to fatal"

* tag 'x86-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest
  KVM: x86: Emulate split-lock access as a write in emulator
  x86/split_lock: Provide handle_guest_split_lock()
2020-04-12 10:17:16 -07:00
Linus Torvalds
b753101a4a Kbuild updates for v5.7 (2nd)
- raise minimum supported binutils version to 2.23
 
  - remove old CONFIG_AS_* macros that we know binutils >= 2.23 supports
 
  - move remaining CONFIG_AS_* tests to Kconfig from Makefile
 
  - enable -Wtautological-compare warnings to catch more issues
 
  - do not support GCC plugins for GCC <= 4.7
 
  - fix various breakages of 'make xconfig'
 
  - include the linker version used for linking the kernel into
    LINUX_COMPILER, which is used for the banner, and also exposed to
    /proc/version
 
  - link lib-y objects to vmlinux forcibly when CONFIG_MODULES=y,
    which allows us to remove the lib-ksyms.o workaround, and to
    solve the last known issue of the LLVM linker
 
  - add dummy tools in scripts/dummy-tools/ to enable all compiler
    tests in Kconfig, which will be useful for distro maintainers
 
  - support the single switch, LLVM=1 to use Clang and all LLVM utilities
    instead of GCC and Binutils.
 
  - support LLVM_IAS=1 to enable the integrated assembler, which is still
    experimental
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl6RNqEVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGZPEP/3affmzIWJuKGF1RErOHK3KCe/uX
 PmLjoRZ7im7V+J4b3W+p+re6BOXIXhW+rtKoP/Ijuys9g80WeeAb2nB4h0ESOtff
 3NgN97v28mh4tVtbluJambFDXItei+UwDp1sgg2sZ7ehaSBVny9hgNmPRn5YcyoS
 O3Juy85q70l8awBWThjEHgSxEw2Rzh9PLE6YmMt40rHTxVEDjMOPSuBlp/+TWj3X
 ugF/wInp+J5mCAKCwJI4L6PavdwIwf9hg3Cv/DpoOw60TxwH+7Rq6RueDKBgHhe3
 UEPHrXyPCsF/JQwwSFxN7k481RV2PjkXFwA3U5vH+3WIRb4ETX0+fmBIrLPSAX4z
 6rZiEvdrGS4TVvW2i8mrkJUrLPHNyQ90q/FU0V18A1k77Cv7mWJjSebTAVYNvz/v
 f/DxApaepwprdtHcNYJMN/TVnwxNexJK+U+bkuXsmDggvZYCxwLQUjtI3Sab1Rv9
 C6Y8WgqKx8yP6NbqVtUMkwXdEhBiHgybVxkl9hseUEbhUElIViuq5rlrHa0FVt2Q
 w4orgFXOd7k5iuDr7ka+wa3p20KLQQuB+vwLaCpi35+4vepQ7P0i2tFNwSclo7lO
 +iNy4Bq20W0/cmQeUJIzctJGibwro1I3HPN1UJ7gp0fZ2WVGzV0SKpwQ0tLOVuuU
 y9yPsL1ciDpKQKMh
 =jpyF
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - raise minimum supported binutils version to 2.23

 - remove old CONFIG_AS_* macros that we know binutils >= 2.23 supports

 - move remaining CONFIG_AS_* tests to Kconfig from Makefile

 - enable -Wtautological-compare warnings to catch more issues

 - do not support GCC plugins for GCC <= 4.7

 - fix various breakages of 'make xconfig'

 - include the linker version used for linking the kernel into
   LINUX_COMPILER, which is used for the banner, and also exposed to
   /proc/version

 - link lib-y objects to vmlinux forcibly when CONFIG_MODULES=y, which
   allows us to remove the lib-ksyms.o workaround, and to solve the last
   known issue of the LLVM linker

 - add dummy tools in scripts/dummy-tools/ to enable all compiler tests
   in Kconfig, which will be useful for distro maintainers

 - support the single switch, LLVM=1 to use Clang and all LLVM utilities
   instead of GCC and Binutils.

 - support LLVM_IAS=1 to enable the integrated assembler, which is still
   experimental

* tag 'kbuild-v5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (36 commits)
  kbuild: fix comment about missing include guard detection
  kbuild: support LLVM=1 to switch the default tools to Clang/LLVM
  kbuild: replace AS=clang with LLVM_IAS=1
  kbuild: add dummy toolchains to enable all cc-option etc. in Kconfig
  kbuild: link lib-y objects to vmlinux forcibly when CONFIG_MODULES=y
  MIPS: fw: arc: add __weak to prom_meminit and prom_free_prom_memory
  kbuild: remove -I$(srctree)/tools/include from scripts/Makefile
  kbuild: do not pass $(KBUILD_CFLAGS) to scripts/mkcompile_h
  Documentation/llvm: fix the name of llvm-size
  kbuild: mkcompile_h: Include $LD version in /proc/version
  kconfig: qconf: Fix a few alignment issues
  kconfig: qconf: remove some old bogus TODOs
  kconfig: qconf: fix support for the split view mode
  kconfig: qconf: fix the content of the main widget
  kconfig: qconf: Change title for the item window
  kconfig: qconf: clean deprecated warnings
  gcc-plugins: drop support for GCC <= 4.7
  kbuild: Enable -Wtautological-compare
  x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2
  crypto: x86 - clean up poly1305-x86_64-cryptogams.S by 'make clean'
  ...
2020-04-11 09:46:12 -07:00
Thomas Gleixner
d7e94dbdac x86/split_lock: Provide handle_guest_split_lock()
Without at least minimal handling for split lock detection induced #AC,
VMX will just run into the same problem as the VMWare hypervisor, which
was reported by Kenneth.

It will inject the #AC blindly into the guest whether the guest is
prepared or not.

Provide a function for guest mode which acts depending on the host
SLD mode. If mode == sld_warn, treat it like user space, i.e. emit a
warning, disable SLD and mark the task accordingly. Otherwise force
SIGBUS.

 [ bp: Add a !CPU_SUP_INTEL stub for handle_guest_split_lock(). ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200410115516.978037132@linutronix.de
Link: https://lkml.kernel.org/r/20200402123258.895628824@linutronix.de
2020-04-11 16:39:30 +02:00
Logan Gunthorpe
30796e18c2 x86/mm: introduce __set_memory_prot()
For use in the 32bit arch_add_memory() to set the pgprot type of the
memory to add.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-5-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Logan Gunthorpe
c164fbb40c x86/mm: thread pgprot_t through init_memory_mapping()
In preparation to support a pgprot_t argument for arch_add_memory().

It's required to move the prototype of init_memory_mapping() seeing the
original location came before the definition of pgprot_t.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-4-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Anshuman Khandual
c62da0c35d mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS
This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the
existing VM_STACK_DEFAULT_FLAGS.  While here, also define some more
macros with standard VMA access flag combinations that are used
frequently across many platforms.  Apart from simplification, this
reduces code duplication as well.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Chris Zankel <chris@zankel.net>
Link: http://lkml.kernel.org/r/1583391014-8170-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Arjun Roy
c97078bd21 mm: define pte_index as macro for x86
pte_index() is either defined as a macro (e.g.  sparc64) or as an
inlined function (e.g.  x86).  vm_insert_pages() depends on pte_index
but it is not defined on all platforms (e.g.  m68k).

To fix compilation of vm_insert_pages() on architectures not providing
pte_index(), we perform the following fix:

0. For platforms where it is meaningful, and defined as a macro, no
    change is needed.
1. For platforms where it is meaningful and defined as an inlined
    function, and we want to use it with vm_insert_pages(), we define
    a degenerate macro of the form:  #define pte_index pte_index
2. vm_insert_pages() checks for the existence of a pte_index macro
   definition. If found, it implements a batched insert. If not found,
   it devolves to calling vm_insert_page() in a loop.

This patch implements step 1 for x86.

v3 of this patch fixes a compilation warning for an unused method.
v2 of this patch moved a macro definition to a more readable location.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200228054714.204424-1-arjunroy.kdev@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Masahiro Yamada
42251572c4 x86: remove always-defined CONFIG_AS_AVX
CONFIG_AS_AVX was introduced by commit ea4d26ae24 ("raid5: add AVX
optimized RAID5 checksumming").

We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5 ("kbuild: Raise the minimum
required binutils version to 2.21").

I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.

Remove CONFIG_AS_AVX, which is always defined.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:59 +09:00
Masahiro Yamada
48e24723d0 x86: remove always-defined CONFIG_AS_CFI_SECTIONS
CONFIG_AS_CFI_SECTIONS was introduced by commit 9e56529227 ("x86:
Use .cfi_sections for assembly code").

We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5 ("kbuild: Raise the minimum
required binutils version to 2.21").

I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.

Remove CONFIG_AS_CFI_SECTIONS, which is always defined.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:59 +09:00
Masahiro Yamada
46427f658e x86: remove unneeded (CONFIG_AS_)CFI_SIGNAL_FRAME
Commit 131484c8da ("x86/debug: Remove perpetually broken,
unmaintainable dwarf annotations") removes all the users of
CFI_SIGNAL_FRAME.

Remove the CFI_SIGNAL_FRAME and CONFIG_AS_CFI_SIGNAL_FRAME.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:59 +09:00
Masahiro Yamada
0f2661c4b9 x86: remove always-defined CONFIG_AS_CFI
CONFIG_AS_CFI was introduced by commit e2414910f2 ("[PATCH] x86:
Detect CFI support in the assembler at runtime"), and extended by
commit f0f12d85af ("x86_64: Check for .cfi_rel_offset in CFI probe").

We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5 ("kbuild: Raise the minimum
required binutils version to 2.21").

I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.

Remove CONFIG_AS_CFI, which is always defined.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:59 +09:00
Masahiro Yamada
418d6e295e x86: remove unneeded defined(__ASSEMBLY__) check from asm/dwarf2.h
This header file has the following check at the top:

  #ifndef __ASSEMBLY__
  #warning "asm/dwarf2.h should be only included in pure assembly files"
  #endif

So, we expect defined(__ASSEMBLY__) is always true.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:58 +09:00
Peter Xu
2e3d5dc508 userfaultfd: wp: add pmd_swp_*uffd_wp() helpers
Adding these missing helpers for uffd-wp operations with pmd
swap/migration entries.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-10-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Andrea Arcangeli
5a281062af userfaultfd: wp: add WP pagetable tracking to x86
Accurate userfaultfd WP tracking is possible by tracking exactly which
virtual memory ranges were writeprotected by userland.  We can't relay
only on the RW bit of the mapped pagetable because that information is
destroyed by fork() or KSM or swap.  If we were to relay on that, we'd
need to stay on the safe side and generate false positive wp faults for
every swapped out page.

[peterx@redhat.com: append _PAGE_UFD_WP to _PAGE_CHG_MASK]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-4-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Linus Torvalds
86f26a77cb pci-v5.7-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl6GTQMUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vy3PhAAmqpYBRobOsG8QbmKDjoJEFtkqdvD
 z6+4zf/R+hF11RyXjMDwihIe8d+tkQ4eAaYu6Oh5PrTyanz0G0PgeCrivZeytULk
 thqQIWzDQMVA5vN/2/Vy8s5s+3HzP8z/MZOFScJ7+xA1MndXptPRTNmFUbjx+GAv
 x8/pTp0u9AF6m7itX65DxXvwkzjWamt+Ar4Yx2IcuKAU/M5RtfuZO3PpDnqn7/wk
 JFlkRoYeFB6qNnnkPdeyPHl9dALhuhzgdTyklQEnKVW3nf3xThYDhcEwdh6kBQgl
 0dH8lL5LXy7PKGN8RES4wB0Vqndw/HlsCF5O4wkkfItbnbJxGJtS139e5973m0ud
 sgWvF4yJAT2jCKhIeNz34sePQJMyWALhv0XzZCsJ0YeGHsrV1jrHELkwUT1+eIsT
 3UV0iZ6aL06zQJDyKUbbIcQzEQ/wwBC+x9VgsyL54K1quCQZ1N1Nl/dvrb4cRG9m
 m9EhJK/brDf4c0uFlOmMTSxV1t5J+z6ZSQnh1ShD/o5yBsxqN6q5brDT6LEs+jbM
 LsIkA18jJOd4OyiDs98YiFKvIfFQbQ0LEBQpJwhF0snvfBFMMbUYN/T/NYneWON/
 F0TpkFoP7PXDuq55iNaLdnObfzrpC9kdzUyWvePUvjxIl55bkf+/qtUny+H48t4L
 dNggvW052d7BHes=
 =deWu
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Revert sysfs "rescan" renames that broke apps (Kelsey Skunberg)

   - Add more 32 GT/s link speed decoding and improve the implementation
     (Yicong Yang)

  Resource management:

   - Add support for sizing programmable host bridge apertures and fix a
     related alpha Nautilus regression (Ivan Kokshaysky)

  Interrupts:

   - Add boot interrupt quirk mechanism for Xeon chipsets and document
     boot interrupts (Sean V Kelley)

  PCIe native device hotplug:

   - When possible, disable in-band presence detect and use PDS
     (Alexandru Gagniuc)

   - Add DMI table for devices that don't use in-band presence detection
     but don't advertise that correctly (Stuart Hayes)

   - Fix hang when powering slots up/down via sysfs (Lukas Wunner)

   - Fix an MSI interrupt race (Stuart Hayes)

  Virtualization:

   - Add ACS quirks for Zhaoxin devices (Raymond Pang)

  Error handling:

   - Add Error Disconnect Recover (EDR) support so firmware can report
     devices disconnected via DPC and we can try to recover (Kuppuswamy
     Sathyanarayanan)

  Peer-to-peer DMA:

   - Add Intel Sky Lake-E Root Ports B, C, D to the whitelist (Andrew
     Maier)

  ASPM:

   - Reduce severity of common clock config message (Chris Packham)

   - Clear the correct bits when enabling L1 substates, so we don't go
     to the wrong state (Yicong Yang)

  Endpoint framework:

   - Replace EPF linkup ops with notifier call chain and improve locking
     (Kishon Vijay Abraham I)

   - Fix concurrent memory allocation in OB address region (Kishon Vijay
     Abraham I)

   - Move PF function number assignment to EPC core to support multiple
     function creation methods (Kishon Vijay Abraham I)

   - Fix issue with clearing configfs "start" entry (Kunihiko Hayashi)

   - Fix issue with endpoint MSI-X ignoring BAR Indicator and Table
     Offset (Kishon Vijay Abraham I)

   - Add support for testing DMA transfers (Kishon Vijay Abraham I)

   - Add support for testing > 10 endpoint devices (Kishon Vijay Abraham I)

   - Add support for tests to clear IRQ (Kishon Vijay Abraham I)

   - Add common DT schema for endpoint controllers (Kishon Vijay Abraham I)

  Amlogic Meson PCIe controller driver:

   - Add DT bindings for AXG PCIe PHY, shared MIPI/PCIe analog PHY (Remi
     Pommarel)

   - Add Amlogic AXG PCIe PHY, AXG MIPI/PCIe analog PHY drivers (Remi
     Pommarel)

  Cadence PCIe controller driver:

   - Add Root Complex/Endpoint DT schema for Cadence PCIe (Kishon Vijay
     Abraham I)

  Intel VMD host bridge driver:

   - Add two VMD Device IDs that require bus restriction mode (Sushma
     Kalakota)

  Mobiveil PCIe controller driver:

   - Refactor and modularize mobiveil driver (Hou Zhiqiang)

   - Add support for Mobiveil GPEX Gen4 host (Hou Zhiqiang)

  Microsoft Hyper-V host bridge driver:

   - Add support for Hyper-V PCI protocol version 1.3 and
     PCI_BUS_RELATIONS2 (Long Li)

   - Refactor to prepare for virtual PCI on non-x86 architectures (Boqun
     Feng)

   - Fix memory leak in hv_pci_probe()'s error path (Dexuan Cui)

  NVIDIA Tegra PCIe controller driver:

   - Use pci_parse_request_of_pci_ranges() (Rob Herring)

   - Add support for endpoint mode and related DT updates (Vidya Sagar)

   - Reduce -EPROBE_DEFER error message log level (Thierry Reding)

  Qualcomm PCIe controller driver:

   - Restrict class fixup to specific Qualcomm devices (Bjorn Andersson)

  Synopsys DesignWare PCIe controller driver:

   - Refactor core initialization code for endpoint mode (Vidya Sagar)

   - Fix endpoint MSI-X to use correct table address (Kishon Vijay
     Abraham I)

  TI DRA7xx PCIe controller driver:

   - Fix MSI IRQ handling (Vignesh Raghavendra)

  TI Keystone PCIe controller driver:

   - Allow AM654 endpoint to raise MSI-X interrupt (Kishon Vijay Abraham I)

  Miscellaneous:

   - Quirk ASMedia XHCI USB to avoid "PME# from D0" defect (Kai-Heng
     Feng)

   - Use ioremap(), not phys_to_virt(), for platform ROM to fix video
     ROM mapping with CONFIG_HIGHMEM (Mikel Rychliski)"

* tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (96 commits)
  misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
  PCI: tegra: Print -EPROBE_DEFER error message at debug level
  misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
  misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
  tools: PCI: Add 'e' to clear IRQ
  misc: pci_endpoint_test: Add ioctl to clear IRQ
  misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
  PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
  PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
  PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
  misc: pci_endpoint_test: Add support to get DMA option from userspace
  tools: PCI: Add 'd' command line option to support DMA
  misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
  PCI: endpoint: functions/pci-epf-test: Print throughput information
  PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
  PCI: pciehp: Fix MSI interrupt race
  PCI: pciehp: Fix indefinite wait on sysfs requests
  PCI: endpoint: Fix clearing start entry in configfs
  PCI: tegra: Add support for PCIe endpoint mode in Tegra194
  PCI: sysfs: Revert "rescan" file renames
  ...
2020-04-03 14:25:02 -07:00
Linus Torvalds
8c1b724ddb ARM:
* GICv4.1 support
 * 32bit host removal
 
 PPC:
 * secure (encrypted) using under the Protected Execution Framework
 ultravisor
 
 s390:
 * allow disabling GISA (hardware interrupt injection) and protected
 VMs/ultravisor support.
 
 x86:
 * New dirty bitmap flag that sets all bits in the bitmap when dirty
 page logging is enabled; this is faster because it doesn't require bulk
 modification of the page tables.
 * Initial work on making nested SVM event injection more similar to VMX,
 and less buggy.
 * Various cleanups to MMU code (though the big ones and related
 optimizations were delayed to 5.8).  Instead of using cr3 in function
 names which occasionally means eptp, KVM too has standardized on "pgd".
 * A large refactoring of CPUID features, which now use an array that
 parallels the core x86_features.
 * Some removal of pointer chasing from kvm_x86_ops, which will also be
 switched to static calls as soon as they are available.
 * New Tigerlake CPUID features.
 * More bugfixes, optimizations and cleanups.
 
 Generic:
 * selftests: cleanups, new MMU notifier stress test, steal-time test
 * CSV output for kvm_stat.
 
 KVM/MIPS has been broken since 5.5, it does not compile due to a patch committed
 by MIPS maintainers.  I had already prepared a fix, but the MIPS maintainers
 prefer to fix it in generic code rather than KVM so they are taking care of it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6GOnIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMfxwf/ZKLZiRoaovXCOG71M/eHtQb8ZIqU
 3MPy+On3eC5Sk/aBxWUL9EFZsbYG6kYdbZ1VOvG9XPBoLlnkDSm/IR0kaELHtnjj
 oGVda/tvGn46Ne39y8xBptmb91WDcWH0vFthT/CwlMxAw3xjr+gG7Qyo+8F2CW6m
 SSSuLiHSBnyO1cQKruBTHZ8qnR8LlnfXEqtd6Y4LFLic0LbLIoIdRcT3wjQrcZrm
 Djd7wbTEYZjUfoqZ72ekwEDUsONcDLDSKcguDO9pSMSCGhpxCVT5Vy68KRpoIMs2
 nzNWDKjvqQo5zb2+GWxJgkd12Hv+n7PCXZMbVrWBu1pQsewUns9m4mkpGw==
 =6fGt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - GICv4.1 support

   - 32bit host removal

  PPC:
   - secure (encrypted) using under the Protected Execution Framework
     ultravisor

  s390:
   - allow disabling GISA (hardware interrupt injection) and protected
     VMs/ultravisor support.

  x86:
   - New dirty bitmap flag that sets all bits in the bitmap when dirty
     page logging is enabled; this is faster because it doesn't require
     bulk modification of the page tables.

   - Initial work on making nested SVM event injection more similar to
     VMX, and less buggy.

   - Various cleanups to MMU code (though the big ones and related
     optimizations were delayed to 5.8). Instead of using cr3 in
     function names which occasionally means eptp, KVM too has
     standardized on "pgd".

   - A large refactoring of CPUID features, which now use an array that
     parallels the core x86_features.

   - Some removal of pointer chasing from kvm_x86_ops, which will also
     be switched to static calls as soon as they are available.

   - New Tigerlake CPUID features.

   - More bugfixes, optimizations and cleanups.

  Generic:
   - selftests: cleanups, new MMU notifier stress test, steal-time test

   - CSV output for kvm_stat"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
  x86/kvm: fix a missing-prototypes "vmread_error"
  KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
  KVM: VMX: Add a trampoline to fix VMREAD error handling
  KVM: SVM: Annotate svm_x86_ops as __initdata
  KVM: VMX: Annotate vmx_x86_ops as __initdata
  KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
  KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
  KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
  KVM: VMX: Configure runtime hooks using vmx_x86_ops
  KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
  KVM: x86: Move init-only kvm_x86_ops to separate struct
  KVM: Pass kvm_init()'s opaque param to additional arch funcs
  s390/gmap: return proper error code on ksm unsharing
  KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
  KVM: Fix out of range accesses to memslots
  KVM: X86: Micro-optimize IPI fastpath delay
  KVM: X86: Delay read msr data iff writes ICR MSR
  KVM: PPC: Book3S HV: Add a capability for enabling secure guests
  KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
  KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
  ...
2020-04-02 15:13:15 -07:00
Linus Torvalds
f14a9532ee A single fix addressing Sparse warnings. <asm/bitops.h> is changed non-trivially
to avoid the warnings, but generated code is not supposed to be affected.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl6Fs/QRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iOLQ//R3i1QAIvDasdG9EMbSkJAZP2cliK8r3B
 Vc+3quYv8wjJcgj5LGPrlVeoV2X96jopZiN3YeWWJg0rA30ZyZMPiVkQWtYVUEtU
 VjGvX5RNw0ShcWjzbetcPXhyczCpJKFwFVv2fEVPwAvI3OyqGuL044aFQhgksra+
 RE4n8eYWB9pastFeJGn1WPWdJOw40fOcC7YbAF3USo7e8aO/Wv3KJiZxahhGFnPt
 5spBnZHSPbvZp9O8pgdYVJ09mExK2wBxk/GClQw/E4i7d/TLcHEzBIOAekS98H0F
 9lNgCnFLVmEK5DA4TXMPhz+aYfEb5VFoBgz4wA4VOiwcPrTJKa0IukcG+oWXWPrB
 PRb8StNB3IHU0pqKPHRemyPNzl9d4DMm22NMfRBCVUrPrDlYkOb1tCANgcyHOyMf
 G/w2nbcNDgzi9m2L38gWCFIY5AP1AKW+0X8MdsvyESlTXIC6lsBFsjsLE69nbv7c
 dBYYxwEKb41bjXpWIxbdCEyW9kNZTSt5RZP+Md/2DGoeWLHba4iHmmXjhJKGF1F3
 pf1yJZDVoaQkwX+mLgDyC9681UzDA0lRMrSBIhQOpw2OuCkvBRTifwvH0efbLjtN
 cXxeCZvK8O1Zmc/BTtdRRPWybItjtZmkfm2iVviFUxY566i/vAQdmQBST4Uqq4qv
 2V9nnZVEJas=
 =Vx32
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
 "A single fix addressing Sparse warnings. <asm/bitops.h> is changed
  non-trivially to avoid the warnings, but generated code is not
  supposed to be affected"

* tag 'x86-urgent-2020-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix bitops.h warning with a moved cast
2020-04-02 14:52:12 -07:00
Linus Torvalds
6cad420cc6 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
 "A large amount of MM, plenty more to come.

  Subsystems affected by this patch series:
   - tools
   - kthread
   - kbuild
   - scripts
   - ocfs2
   - vfs
   - mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap,
         sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy,
         hugetlbfs, hugetlb"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
  include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
  mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
  selftests/vm: fix map_hugetlb length used for testing read and write
  mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
  mm/hugetlb.c: clean code by removing unnecessary initialization
  hugetlb_cgroup: add hugetlb_cgroup reservation docs
  hugetlb_cgroup: add hugetlb_cgroup reservation tests
  hugetlb: support file_region coalescing again
  hugetlb_cgroup: support noreserve mappings
  hugetlb_cgroup: add accounting for shared mappings
  hugetlb: disable region_add file_region coalescing
  hugetlb_cgroup: add reservation accounting for private mappings
  mm/hugetlb_cgroup: fix hugetlb_cgroup migration
  hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
  hugetlb_cgroup: add hugetlb_cgroup reservation counter
  hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  mm/memblock.c: remove redundant assignment to variable max_addr
  mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
  mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
  ...
2020-04-02 13:55:34 -07:00
Anshuman Khandual
7969f2264f mm/vma: make vma_is_foreign() available for general use
Idea of a foreign VMA with respect to the present context is very generic.
But currently there are two identical definitions for this in powerpc and
x86 platforms.  Lets consolidate those redundant definitions while making
vma_is_foreign() available for general use later.  This should not cause
any functional change.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Link: http://lkml.kernel.org/r/1582782965-3274-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:29 -07:00
Masahiro Yamada
630f289b71 asm-generic: make more kernel-space headers mandatory
Change a header to mandatory-y if both of the following are met:

[1] At least one architecture (except um) specifies it as generic-y in
    arch/*/include/asm/Kbuild

[2] Every architecture (except um) either has its own implementation
    (arch/*/include/asm/*.h) or specifies it as generic-y in
    arch/*/include/asm/Kbuild

This commit was generated by the following shell script.

----------------------------------->8-----------------------------------

arches=$(cd arch; ls -1 | sed -e '/Kconfig/d' -e '/um/d')

tmpfile=$(mktemp)

grep "^mandatory-y +=" include/asm-generic/Kbuild > $tmpfile

find arch -path 'arch/*/include/asm/Kbuild' |
	xargs sed -n 's/^generic-y += \(.*\)/\1/p' | sort -u |
while read header
do
	mandatory=yes

	for arch in $arches
	do
		if ! grep -q "generic-y += $header" arch/$arch/include/asm/Kbuild &&
			! [ -f arch/$arch/include/asm/$header ]; then
			mandatory=no
			break
		fi
	done

	if [ "$mandatory" = yes ]; then
		echo "mandatory-y += $header" >> $tmpfile

		for arch in $arches
		do
			sed -i "/generic-y += $header/d" arch/$arch/include/asm/Kbuild
		done
	fi

done

sed -i '/^mandatory-y +=/d' include/asm-generic/Kbuild

LANG=C sort $tmpfile >> include/asm-generic/Kbuild

----------------------------------->8-----------------------------------

One obvious benefit is the diff stat:

 25 files changed, 52 insertions(+), 557 deletions(-)

It is tedious to list generic-y for each arch that needs it.

So, mandatory-y works like a fallback default (by just wrapping
asm-generic one) when arch does not have a specific header
implementation.

See the following commits:

def3f7cefe
a1b39bae16

It is tedious to convert headers one by one, so I processed by a shell
script.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200210175452.5030-1-masahiroy@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:25 -07:00
Linus Torvalds
890f0b0d27 x86: start using named parameters for low-level uaccess asms
This is partly for readability - using named arguments instead of
numbered ones makes it muchmore obvious just what is going on.  Using
"%[efault]" instead of "%4" for the special -EFAULT constant just means
that you don't have to count the arguments to see what's up.

But the motivation for all this cleanup is that when we'll start to
conditionally use "asm goto" even for the __get_user_asm() case, the
argument numbers will depend on whether we have an error output, or an
error label we can just directly jump to.

So this moves us towards named arguments for the same reason that we
have to use named arguments for the asms that use SET_CC(): numbering
will eventually become similarly unreliable and depends on whether we
can use particular compiler features or not.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-01 13:23:14 -07:00
Linus Torvalds
7da63b3d54 x86: get rid of 'rtype' argument to __get_user_asm() macro
This is the exact same thing as 3680785692 ("x86: get rid of 'rtype'
argument to __put_user_goto() macro") except it's about __get_user_asm()
rather than __put_user_goto().

The reasons are the same: having the low-level asm access the argument
with a different size than the compiler thinks it does is fundamentally
wrong.

But unlike the __put_user_goto() case, we actually did tell the compiler
that we used a bigger variable (either long or long long), and then only
filled in the low bits, and ended up "fixing" this by casting the result
to the proper pointer type.

That's because we needed to use a non-qualified type (the user pointer
might be a const pointer!), and that makes this a bit more painful.  Our
'__inttype()' macro used to be lazy and only differentiate between "fits
in a register" or "needs two registers".

So this fix had to also make that '__inttype()' macro more precise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-01 12:41:50 -07:00
Linus Torvalds
3680785692 x86: get rid of 'rtype' argument to __put_user_goto() macro
The 'rtype' argument goes back to pre-git (and pre-BK) times, and comes
from the fact that we used to not necessarily have the same type sizes
for the arguments of the inline asm as we did for the actual accesses we
did.

So 'rtype' is the 'register type' - the override of the register size in
the inline asm when it doesn't match the actual size of the variable we
use as the output argument (for when you used "put_user()" on an "int"
value that was assigned to a byte-sized user space access etc).

That mismatch doesn't actually exist any more, and should probably never
have existed in the first place.  It's a horrid bug just waiting to
happen (using more - or less - of the variable that the compiler
expected us to use).

I think we had some odd casting going on to hide the effects of that
oddity after-the-fact, but those are long gone, and these days we should
always have the right size value in the first place, using things like

        __typeof__(*(ptr)) __pu_val = (x);

and gcc should thus have the right register size without any manual
'rtype' games.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-01 10:52:01 -07:00
Linus Torvalds
1a323ea535 x86: get rid of 'errret' argument to __get_user_xyz() macross
Every remaining user just has the error case returning -EFAULT.

In fact, the exception was __get_user_asm_nozero(), which was removed in
commit 4b842e4e25 ("x86: get rid of small constant size cases in
raw_copy_{to,from}_user()"), and the other __get_user_xyz() macros just
followed suit for consistency.

Fix up some macro whitespace while at it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-31 18:23:47 -07:00
Linus Torvalds
ab33eb494c x86: remove __put_user_asm() infrastructure
The last user was removed by commit 4b842e4e25 ("x86: get rid of small
constant size cases in raw_copy_{to,from}_user()").  Get rid of the
left-overs before somebody tries to use it again.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-31 18:11:18 -07:00
Linus Torvalds
d9d7677892 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "A handful of changes:

   - two memory encryption related fixes

   - don't display the kernel's virtual memory layout plaintext on
     32-bit kernels either

   - two simplifications"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Remove the now redundant N_MEMORY check
  dma-mapping: Fix dma_pgprot() for unencrypted coherent pages
  x86: Don't let pgprot_modify() change the page encryption bit
  x86/mm/kmmio: Use this_cpu_ptr() instead get_cpu_var() for kmmio_ctx
  x86/mm/init/32: Stop printing the virtual memory layout
2020-03-31 11:51:05 -07:00
Linus Torvalds
d0be2d53c7 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "Misc changes:

   - add a pkey sanity check

   - three commits to improve and future-proof xstate/xfeature handling
     some more"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pkeys: Add check for pkey "overflow"
  x86/fpu/xstate: Warn when checking alignment of disabled xfeatures
  x86/fpu/xstate: Fix XSAVES offsets in setup_xstate_comp()
  x86/fpu/xstate: Fix last_good_offset in setup_xstate_features()
2020-03-31 11:26:22 -07:00
Linus Torvalds
fdf5563a72 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "This topic tree contains more commits than usual:

   - most of it are uaccess cleanups/reorganization by Al

   - there's a bunch of prototype declaration (--Wmissing-prototypes)
     cleanups

   - misc other cleanups all around the map"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/mm/set_memory: Fix -Wmissing-prototypes warnings
  x86/efi: Add a prototype for efi_arch_mem_reserve()
  x86/mm: Mark setup_emu2phys_nid() static
  x86/jump_label: Move 'inline' keyword placement
  x86/platform/uv: Add a missing prototype for uv_bau_message_interrupt()
  kill uaccess_try()
  x86: unsafe_put-style macro for sigmask
  x86: x32_setup_rt_frame(): consolidate uaccess areas
  x86: __setup_rt_frame(): consolidate uaccess areas
  x86: __setup_frame(): consolidate uaccess areas
  x86: setup_sigcontext(): list user_access_{begin,end}() into callers
  x86: get rid of put_user_try in __setup_rt_frame() (both 32bit and 64bit)
  x86: ia32_setup_rt_frame(): consolidate uaccess areas
  x86: ia32_setup_frame(): consolidate uaccess areas
  x86: ia32_setup_sigcontext(): lift user_access_{begin,end}() into the callers
  x86/alternatives: Mark text_poke_loc_init() static
  x86/cpu: Fix a -Wmissing-prototypes warning for init_ia32_feat_ctl()
  x86/mm: Drop pud_mknotpresent()
  x86: Replace setup_irq() by request_irq()
  x86/configs: Slightly reduce defconfigs
  ...
2020-03-31 11:04:05 -07:00
Linus Torvalds
9589351ccf Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "Misc cleanups and small enhancements all around the map"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/compressed: Fix debug_puthex() parameter type
  x86/setup: Fix static memory detection
  x86/vmlinux: Drop unneeded linker script discard of .eh_frame
  x86/*/Makefile: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
  x86/boot/compressed: Remove .eh_frame section from bzImage
  x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
  x86/boot/compressed/64: Use 32-bit (zero-extended) MOV for z_output_len
  x86/boot/compressed/64: Use LEA to initialize boot stack pointer
2020-03-31 10:28:35 -07:00
Sean Christopherson
6e4fd06f3e KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
Remove the __exit annotation from VMX hardware_unsetup(), the hook
can be reached during kvm_init() by way of kvm_arch_hardware_unsetup()
if failure occurs at various points during initialization.

Removing the annotation also lets us annotate vmx_x86_ops and svm_x86_ops
with __initdata; otherwise, objtool complains because it doesn't
understand that the vendor specific __initdata is being copied by value
to a non-__initdata instance.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-8-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-31 10:48:09 -04:00
Sean Christopherson
afaf0b2f9b KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
Replace the kvm_x86_ops pointer in common x86 with an instance of the
struct to save one pointer dereference when invoking functions.  Copy the
struct by value to set the ops during kvm_init().

Arbitrarily use kvm_x86_ops.hardware_enable to track whether or not the
ops have been initialized, i.e. a vendor KVM module has been loaded.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-7-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-31 10:48:08 -04:00
Sean Christopherson
d008dfdb0e KVM: x86: Move init-only kvm_x86_ops to separate struct
Move the kvm_x86_ops functions that are used only within the scope of
kvm_init() into a separate struct, kvm_x86_init_ops.  In addition to
identifying the init-only functions without restorting to code comments,
this also sets the stage for waiting until after ->hardware_setup() to
set kvm_x86_ops.  Setting kvm_x86_ops after ->hardware_setup() is
desirable as many of the hooks are not usable until ->hardware_setup()
completes.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-3-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-31 10:48:04 -04:00
Paolo Bonzini
cf39d37539 KVM/arm updates for Linux 5.7
- GICv4.1 support
 - 32bit host removal
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl6DKKIPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDDe0P/30Oda6HJdcUY+g0dnHkH8N7t+VKjPPnihlX
 WBaT0Y4SzMsfAtG5lQqS48A50dXKWW70QvwkZjxu7abQhYFWGd2SGtTQxwqJXT8J
 I6MBh4r9xrIfiqzVT2BXslA6id5H6wCyyFI6vKm/IFkIu1J6JtwnKakQ0CIddS1d
 Blbgj5jcxGw+2xOppHCQXbWwwDdmYWkMZEBZjmhkezddqLDK+oaAUiUhHHHizTsB
 kLjgqYBVENpR1zDIsGpQAJloKXAiHfBQshQAmnhnBNzXE60LZ0n0/iODU9U5FDEO
 5j0DRWccKvsIMsUh7JpPr5xerGJ0rqk1IwPC2JcyzfRbvRLMpK1IOWfhI5Tg5lbP
 4Ev96QLEMBnKOWMSE0MqnMdq6JPzDLA6WZ28HZe2nc3/oWNgsSDtlXigx4xFFxTX
 zfc2YpAgFu3xJkPf8PtWTFvItm0AvFNFynPg0Rr/NsGf/FGeszYR4cLcHmv5NlWS
 IiV4+lgnlmr2LZr3VjUaumbtWIpuVF4Db5Al2K2E/PCN7ObfEkyCweDic8ophkH8
 sMS9TI38aH1Efy+I2Nfxxqpy8BcElZAMrAWt9R27A4JRLHdr7j5DsGnyRigXHgRe
 pFgbqtk/EjWkHwjaJVg8kPxf2+2P05VZsQeGG721nbKAIKDetM3RA2BflexdsptY
 kXplNsVr
 =eILh
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for Linux 5.7

- GICv4.1 support
- 32bit host removal
2020-03-31 10:44:53 -04:00
Linus Torvalds
2853d5fafb Support for "split lock" detection:
- Atomic operations (lock prefixed instructions) which span two cache
     lines have to acquire the global bus lock. This is at least 1k cycles
     slower than an atomic operation within a cache line and disrupts
     performance on other cores. Aside of performance disruption this is
     a unpriviledged form of DoS.
 
     Some newer CPUs have the capability to raise an #AC trap when such an
     operation is attempted. The detection is by default enabled in warning
     mode which will warn once when a user space application is caught. A
     command line option allows to disable the detection or to select fatal
     mode which will terminate offending applications with SIGBUS.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B/uMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYocsAD/9yqpw+XlPKNPsfbm9sbirBDfTrENcL
 F44iwn4WnrjoW/gnnZCYmPxJFsTtGVPqxHdUf4eyGemg9r9ZEO0DQftmUHC5Z6KX
 aa/b5JoeM61wp9HlpVlD4D1jVt4pWyQODQeZnUXE4DEzmRc3cD/5lSU+/VeaIwwz
 lxwUemqmXK7ucH2KA7smOGsl2nU6ED84q3mdOB1b4Cw+gWYMUnPJnuS/ipriBRx4
 BYbMItcxsFvtdO9Hx8PvGd5LUK0wW8JOWrYQICD2kLpZtHtGeaHpBzFzL0+nMU7d
 1epyDqJQDmX+PAzvj+EYyn3HTfobZlckn+tbxMQkkS+oDk1ywOZd+BancClvn5/5
 jMfPIQJF5bGASVnzGMWhzVdwthTZiMG4d1iKsUWOA/hN0ch0+rm1BqraToabsEFg
 Sv7/rvl9KtSOtMJTeAmMhlZUMBj9m8BtPFjniDwp6nw/upGgJdST5mrKFNYZvqOj
 JnXsEMr/nJVW6bnUvT6LF66xbHlzHdxtodkQWqF+IEsyRaOz1zAGpQamP98KxNLc
 dq/XYoEe1KqIFbg4BkNP+GeDL3FQDxjFNwPQnnjQEzWRbjkHlfmq1uKCsR2r8mBO
 fYNJ1X8lTyGV0kx/ERpWGazzabpzh+8Lr1yMhnoA3EWvlzUjmpN2PFI4oTpTrtzT
 c/q16SCxim3NWA==
 =D9x8
 -----END PGP SIGNATURE-----

Merge tag 'x86-splitlock-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 splitlock updates from Thomas Gleixner:
 "Support for 'split lock' detection:

  Atomic operations (lock prefixed instructions) which span two cache
  lines have to acquire the global bus lock. This is at least 1k cycles
  slower than an atomic operation within a cache line and disrupts
  performance on other cores. Aside of performance disruption this is a
  unpriviledged form of DoS.

  Some newer CPUs have the capability to raise an #AC trap when such an
  operation is attempted. The detection is by default enabled in warning
  mode which will warn once when a user space application is caught. A
  command line option allows to disable the detection or to select fatal
  mode which will terminate offending applications with SIGBUS"

* tag 'x86-splitlock-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR
  x86/split_lock: Rework the initialization flow of split lock detection
  x86/split_lock: Enable split lock detection by kernel
2020-03-30 19:35:52 -07:00
Linus Torvalds
d5f744f9a2 x86 entry code updates:
- Convert the 32bit syscalls to be pt_regs based which removes the
       requirement to push all 6 potential arguments onto the stack and
       consolidates the interface with the 64bit variant
 
     - The first small portion of the exception and syscall related entry
       code consolidation which aims to address the recently discovered
       issues vs. RCU, int3, NMI and some other exceptions which can
       interrupt any context. The bulk of the changes is still work in
       progress and aimed for 5.8.
 
     - A few lockdep namespace cleanups which have been applied into this
       branch to keep the prerequisites for the ongoing work confined.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B/TMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYA6EAC7r/bCMxBelljT3b7LkBbiJcocJ+zK
 OSzWU9miJGTAvYqn4/ciLKg4dA424b/1rBFlF1hBTCQ0HL5Cv4lajxdKEZCO5WCC
 WWTCz+MC60aWFaH3VNoywiLGb39H2IbqWbS9yNPd/wBkLHiMAD6NPQntOvcPaD4j
 1lyrMtLzfrWlrHxvxdI3kt5ZpFLYNXr2xk61xQjTz0ROFQBhf2sDsuhHhiYVLPj7
 JwYktpbBiPeaw2+I18NPymNPY+VfY8LCTgLl5M+rbKyCqebKaedZQJ7QXFhAEqKC
 Y2f+gJsKWtTDzGP2mk/5kF0uP7cd0vJK35ZCXtLZ9BbcNtFZU6w+ADqRo4pJBHRY
 QRzo/AWrdkuTJF0CrP6mcneNC7NwWLSdKrE1z77RQCHUPVvhHhRDZsgdLcZ/KKwx
 y1ji22trwNB+7LmI2fUOU5RRHZBIuNvQT+mPt24febJuHpZKul62dd3cqTGeSTC+
 MYVknYDSg/+jk+83DhuZnTyb9lWTbq/0Q1HRDu6l2LrMIH7YMPpY5Ea64ZFYzWXy
 s0+iHEM4mUzltwNauHIntjbwXi3C0l2k1WQyG0gun2eS6SXfu0lb93V4msFj/N1+
 oHavH2n2A4XrRr+Ob87fsl7nfXJibWP7R9xPblrWP2sNdqfjSyGd49rnsvpWqWMK
 Fj0d7tQ78+/SwA==
 =tWXS
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry code updates from Thomas Gleixner:

 - Convert the 32bit syscalls to be pt_regs based which removes the
   requirement to push all 6 potential arguments onto the stack and
   consolidates the interface with the 64bit variant

 - The first small portion of the exception and syscall related entry
   code consolidation which aims to address the recently discovered
   issues vs. RCU, int3, NMI and some other exceptions which can
   interrupt any context. The bulk of the changes is still work in
   progress and aimed for 5.8.

 - A few lockdep namespace cleanups which have been applied into this
   branch to keep the prerequisites for the ongoing work confined.

* tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  x86/entry: Fix build error x86 with !CONFIG_POSIX_TIMERS
  lockdep: Rename trace_{hard,soft}{irq_context,irqs_enabled}()
  lockdep: Rename trace_softirqs_{on,off}()
  lockdep: Rename trace_hardirq_{enter,exit}()
  x86/entry: Rename ___preempt_schedule
  x86: Remove unneeded includes
  x86/entry: Drop asmlinkage from syscalls
  x86/entry/32: Enable pt_regs based syscalls
  x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments
  x86/entry/32: Rename 32-bit specific syscalls
  x86/entry/32: Clean up syscall_32.tbl
  x86/entry: Remove ABI prefixes from functions in syscall tables
  x86/entry/64: Add __SYSCALL_COMMON()
  x86/entry: Remove syscall qualifier support
  x86/entry/64: Remove ptregs qualifier from syscall table
  x86/entry: Move max syscall number calculation to syscallhdr.sh
  x86/entry/64: Split X32 syscall table into its own file
  x86/entry/64: Move sys_ni_syscall stub to common.c
  x86/entry/64: Use syscall wrappers for x32_rt_sigreturn
  x86/entry: Refactor SYS_NI macros
  ...
2020-03-30 19:14:28 -07:00
Linus Torvalds
dbb381b619 timekeeping and timer updates:
Core:
 
   - Consolidation of the vDSO build infrastructure to address the
     difficulties of cross-builds for ARM64 compat vDSO libraries by
     restricting the exposure of header content to the vDSO build.
 
     This is achieved by splitting out header content into separate
     headers. which contain only the minimaly required information which is
     necessary to build the vDSO. These new headers are included from the
     kernel headers and the vDSO specific files.
 
   - Enhancements to the generic vDSO library allowing more fine grained
     control over the compiled in code, further reducing architecture
     specific storage and preparing for adopting the generic library by PPC.
 
   - Cleanup and consolidation of the exit related code in posix CPU timers.
 
   - Small cleanups and enhancements here and there
 
  Drivers:
 
   - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support
 
   - Correct the clock rate of PIT64b global clock
 
   - setup_irq() cleanup
 
   - Preparation for PWM and suspend support for the TI DM timer
 
   - Expand the fttmr010 driver to support ast2600 systems
 
   - The usual small fixes, enhancements and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B+QETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofJ5D/94s5fpaqiuNcaAsLq2D3DRIrTnqxx7
 yEeAOPcbYV1bM1SgY/M83L5yGc2S8ny787e26abwRTCZhZV3eAmRTphIFFIZR0Xk
 xS+i67odscbdJTRtztKj3uQ9rFxefszRuphyaa89pwSY9nnyMWLcahGSQOGs0LJK
 hvmgwPjyM1drNfPxgPiaFg7vDr2XxNATpQr/FBt+BhelvVan8TlAfrkcNPiLr++Y
 Axz925FP7jMaRRbZ1acji34gLiIAZk0jLCUdbix7YkPrqDB4GfO+v8Vez+fGClbJ
 uDOYeR4r1+Be/BtSJtJ2tHqtsKCcAL6agtaE2+epZq5HbzaZFRvBFaxgFNF8WVcn
 3FFibdEMdsRNfZTUVp5wwgOLN0UIqE/7LifE12oLEL2oFB5H2PiNEUw3E02XHO11
 rL3zgHhB6Ke1sXKPCjSGdmIQLbxZmV5kOlQFy7XuSeo5fmRapVzKNffnKcftIliF
 1HNtZbgdA+3tdxMFCqoo1QX+kotl9kgpslmdZ0qHAbaRb3xqLoSskbqEjFRMuSCC
 8bjJrwboD9T5GPfwodSCgqs/58CaSDuqPFbIjCay+p90Fcg6wWAkZtyG04ZLdPRc
 GgNNdN4gjTD9bnrRi8cH47z1g8OO4vt4K4SEbmjo8IlDW+9jYMxuwgR88CMeDXd7
 hu7aKsr2I2q/WQ==
 =5o9G
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timekeeping and timer updates from Thomas Gleixner:
 "Core:

   - Consolidation of the vDSO build infrastructure to address the
     difficulties of cross-builds for ARM64 compat vDSO libraries by
     restricting the exposure of header content to the vDSO build.

     This is achieved by splitting out header content into separate
     headers. which contain only the minimaly required information which
     is necessary to build the vDSO. These new headers are included from
     the kernel headers and the vDSO specific files.

   - Enhancements to the generic vDSO library allowing more fine grained
     control over the compiled in code, further reducing architecture
     specific storage and preparing for adopting the generic library by
     PPC.

   - Cleanup and consolidation of the exit related code in posix CPU
     timers.

   - Small cleanups and enhancements here and there

  Drivers:

   - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support

   - Correct the clock rate of PIT64b global clock

   - setup_irq() cleanup

   - Preparation for PWM and suspend support for the TI DM timer

   - Expand the fttmr010 driver to support ast2600 systems

   - The usual small fixes, enhancements and cleanups all over the
     place"

* tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
  Revert "clocksource/drivers/timer-probe: Avoid creating dead devices"
  vdso: Fix clocksource.h macro detection
  um: Fix header inclusion
  arm64: vdso32: Enable Clang Compilation
  lib/vdso: Enable common headers
  arm: vdso: Enable arm to use common headers
  x86/vdso: Enable x86 to use common headers
  mips: vdso: Enable mips to use common headers
  arm64: vdso32: Include common headers in the vdso library
  arm64: vdso: Include common headers in the vdso library
  arm64: Introduce asm/vdso/processor.h
  arm64: vdso32: Code clean up
  linux/elfnote.h: Replace elf.h with UAPI equivalent
  scripts: Fix the inclusion order in modpost
  common: Introduce processor.h
  linux/ktime.h: Extract common header for vDSO
  linux/jiffies.h: Extract common header for vDSO
  linux/time64.h: Extract common header for vDSO
  linux/time32.h: Extract common header for vDSO
  linux/time.h: Extract common header for vDSO
  ...
2020-03-30 18:51:47 -07:00
Linus Torvalds
336622e9fc NOHZ full updates:
- Remove TIF_NOHZ from 3 architectures
 
     These architectures use a static key to decide whether context tracking
     needs to be invoked and the TIF_NOHZ flag just causes a pointless
     slowpath execution for nothing.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B+bITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZjpD/9PkXE/zQoVmPLhOOcEBXB4i0rQQV41
 mR8F83aswch+qtT1g7A00G5j49CWkLh/hj5PX7ajS9nSTQCHOQ9jdZuxPrjW8CGZ
 gMHCyd5o9C98sKOORylR2nuCKhVdOq0/HleRjBBDsqcO0T5KlhVPUrtuJ878kX8d
 1SnoZnZMx+Ro0+4+Ehp39CmZJ0pV6o5ypT469esa2MB1xw389AQCmLt4rk99FNMo
 LDbKAB+7XBwNAu/rqD0hIv7YyvaSlcdlWBAXBLeCrwVIKQG3VfT9CpgwTtGoNFhY
 9KBkzr0z+lvHS9eKWyWzpXYrgVU1u28gUVvpaavv+Ma5V8STqNunoMBs7hKanJqV
 mPh+4ABACtFieKlwkj2PwUrGEgH+y/SAfStliOFsimVz/w2udC0S777/EjjzfKaN
 NS13mP19s5/P1q3y/6BSrOxYD0inicROO+UfetHNPOgMePY+Gp/xzluefPnhTagX
 CnJxndA3Fbjh9rXFbSZ5TMlf97kTxVVJE+qtrh5Upw1AWpo/qvkLsIFsamgyW2jR
 7t3MbHzKYnLkUJlwOLPJimvZeN4hZOx05ra/RZOkVaxri7xtVsDCkaEvhgEqLWYj
 Gbt2mGnNccawwN0bVPd2hgkKmUBqO8u5llhQcM2BBG4CJgZMaB8LjIkS+F6FsSME
 xMnY+tS3c7Q8TQ==
 =0lHP
 -----END PGP SIGNATURE-----

Merge tag 'timers-nohz-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull NOHZ update from Thomas Gleixner:
 "Remove TIF_NOHZ from three architectures

  These architectures use a static key to decide whether context
  tracking needs to be invoked and the TIF_NOHZ flag just causes a
  pointless slowpath execution for nothing"

* tag 'timers-nohz-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm64: Remove TIF_NOHZ
  arm: Remove TIF_NOHZ
  x86: Remove TIF_NOHZ
  context-tracking: Introduce CONFIG_HAVE_TIF_NOHZ
  x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY
2020-03-30 18:29:05 -07:00
Linus Torvalds
642e53ead6 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Various NUMA scheduling updates: harmonize the load-balancer and
     NUMA placement logic to not work against each other. The intended
     result is better locality, better utilization and fewer migrations.

   - Introduce Thermal Pressure tracking and optimizations, to improve
     task placement on thermally overloaded systems.

   - Implement frequency invariant scheduler accounting on (some) x86
     CPUs. This is done by observing and sampling the 'recent' CPU
     frequency average at ~tick boundaries. The CPU provides this data
     via the APERF/MPERF MSRs. This hopefully makes our capacity
     estimates more precise and keeps tasks on the same CPU better even
     if it might seem overloaded at a lower momentary frequency. (As
     usual, turbo mode is a complication that we resolve by observing
     the maximum frequency and renormalizing to it.)

   - Add asymmetric CPU capacity wakeup scan to improve capacity
     utilization on asymmetric topologies. (big.LITTLE systems)

   - PSI fixes and optimizations.

   - RT scheduling capacity awareness fixes & improvements.

   - Optimize the CONFIG_RT_GROUP_SCHED constraints code.

   - Misc fixes, cleanups and optimizations - see the changelog for
     details"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  threads: Update PID limit comment according to futex UAPI change
  sched/fair: Fix condition of avg_load calculation
  sched/rt: cpupri_find: Trigger a full search as fallback
  kthread: Do not preempt current task if it is going to call schedule()
  sched/fair: Improve spreading of utilization
  sched: Avoid scale real weight down to zero
  psi: Move PF_MEMSTALL out of task->flags
  MAINTAINERS: Add maintenance information for psi
  psi: Optimize switching tasks inside shared cgroups
  psi: Fix cpu.pressure for cpu.max and competing cgroups
  sched/core: Distribute tasks within affinity masks
  sched/fair: Fix enqueue_task_fair warning
  thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code
  sched/rt: Remove unnecessary push for unfit tasks
  sched/rt: Allow pulling unfitting task
  sched/rt: Optimize cpupri_find() on non-heterogenous systems
  sched/rt: Re-instate old behavior in select_task_rq_rt()
  sched/rt: cpupri_find: Implement fallback mechanism for !fit case
  sched/fair: Fix reordering of enqueue/dequeue_task_fair()
  sched/fair: Fix runnable_avg for throttled cfs
  ...
2020-03-30 17:01:51 -07:00
Linus Torvalds
9b82f05f86 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main changes in this cycle were:

  Kernel side changes:

   - A couple of x86/cpu cleanups and changes were grandfathered in due
     to patch dependencies. These clean up the set of CPU model/family
     matching macros with a consistent namespace and C99 initializer
     style.

   - A bunch of updates to various low level PMU drivers:
       * AMD Family 19h L3 uncore PMU
       * Intel Tiger Lake uncore support
       * misc fixes to LBR TOS sampling

   - optprobe fixes

   - perf/cgroup: optimize cgroup event sched-in processing

   - misc cleanups and fixes

  Tooling side changes are to:

   - perf {annotate,expr,record,report,stat,test}

   - perl scripting

   - libapi, libperf and libtraceevent

   - vendor events on Intel and S390, ARM cs-etm

   - Intel PT updates

   - Documentation changes and updates to core facilities

   - misc cleanups, fixes and other enhancements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
  cpufreq/intel_pstate: Fix wrong macro conversion
  x86/cpu: Cleanup the now unused CPU match macros
  hwrng: via_rng: Convert to new X86 CPU match macros
  crypto: Convert to new CPU match macros
  ASoC: Intel: Convert to new X86 CPU match macros
  powercap/intel_rapl: Convert to new X86 CPU match macros
  PCI: intel-mid: Convert to new X86 CPU match macros
  mmc: sdhci-acpi: Convert to new X86 CPU match macros
  intel_idle: Convert to new X86 CPU match macros
  extcon: axp288: Convert to new X86 CPU match macros
  thermal: Convert to new X86 CPU match macros
  hwmon: Convert to new X86 CPU match macros
  platform/x86: Convert to new CPU match macros
  EDAC: Convert to new X86 CPU match macros
  cpufreq: Convert to new X86 CPU match macros
  ACPI: Convert to new X86 CPU match macros
  x86/platform: Convert to new CPU match macros
  x86/kernel: Convert to new CPU match macros
  x86/kvm: Convert to new CPU match macros
  x86/perf/events: Convert to new CPU match macros
  ...
2020-03-30 16:40:08 -07:00
Linus Torvalds
4b9fd8a829 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Continued user-access cleanups in the futex code.

   - percpu-rwsem rewrite that uses its own waitqueue and atomic_t
     instead of an embedded rwsem. This addresses a couple of
     weaknesses, but the primary motivation was complications on the -rt
     kernel.

   - Introduce raw lock nesting detection on lockdep
     (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
     lock differences. This too originates from -rt.

   - Reuse lockdep zapped chain_hlocks entries, to conserve RAM
     footprint on distro-ish kernels running into the "BUG:
     MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
     chain-entries pool.

   - Misc cleanups, smaller fixes and enhancements - see the changelog
     for details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
  thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
  Documentation/locking/locktypes: Minor copy editor fixes
  Documentation/locking/locktypes: Further clarifications and wordsmithing
  m68knommu: Remove mm.h include from uaccess_no.h
  x86: get rid of user_atomic_cmpxchg_inatomic()
  generic arch_futex_atomic_op_inuser() doesn't need access_ok()
  x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
  x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
  objtool: whitelist __sanitizer_cov_trace_switch()
  [parisc, s390, sparc64] no need for access_ok() in futex handling
  sh: no need of access_ok() in arch_futex_atomic_op_inuser()
  futex: arch_futex_atomic_op_inuser() calling conventions change
  completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
  lockdep: Add posixtimer context tracing bits
  lockdep: Annotate irq_work
  lockdep: Add hrtimer context tracing bits
  lockdep: Introduce wait-type checks
  completion: Use simple wait queues
  sched/swait: Prepare usage in completions
  ...
2020-03-30 16:17:15 -07:00
Linus Torvalds
a776c270a0 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The EFI changes in this cycle are much larger than usual, for two
  (positive) reasons:

   - The GRUB project is showing signs of life again, resulting in the
     introduction of the generic Linux/UEFI boot protocol, instead of
     x86 specific hacks which are increasingly difficult to maintain.
     There's hope that all future extensions will now go through that
     boot protocol.

   - Preparatory work for RISC-V EFI support.

  The main changes are:

   - Boot time GDT handling changes

   - Simplify handling of EFI properties table on arm64

   - Generic EFI stub cleanups, to improve command line handling, file
     I/O, memory allocation, etc.

   - Introduce a generic initrd loading method based on calling back
     into the firmware, instead of relying on the x86 EFI handover
     protocol or device tree.

   - Introduce a mixed mode boot method that does not rely on the x86
     EFI handover protocol either, and could potentially be adopted by
     other architectures (if another one ever surfaces where one
     execution mode is a superset of another)

   - Clean up the contents of 'struct efi', and move out everything that
     doesn't need to be stored there.

   - Incorporate support for UEFI spec v2.8A changes that permit
     firmware implementations to return EFI_UNSUPPORTED from UEFI
     runtime services at OS runtime, and expose a mask of which ones are
     supported or unsupported via a configuration table.

   - Partial fix for the lack of by-VA cache maintenance in the
     decompressor on 32-bit ARM.

   - Changes to load device firmware from EFI boot service memory
     regions

   - Various documentation updates and minor code cleanups and fixes"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  efi/libstub/arm: Fix spurious message that an initrd was loaded
  efi/libstub/arm64: Avoid image_base value from efi_loaded_image
  partitions/efi: Fix partition name parsing in GUID partition entry
  efi/x86: Fix cast of image argument
  efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations
  efi: Fix a mistype in comments mentioning efivar_entry_iter_begin()
  efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux
  efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
  efi/x86: Ignore the memory attributes table on i386
  efi/x86: Don't relocate the kernel unless necessary
  efi/x86: Remove extra headroom for setup block
  efi/x86: Add kernel preferred address to PE header
  efi/x86: Decompress at start of PE image load address
  x86/boot/compressed/32: Save the output address instead of recalculating it
  efi/libstub/x86: Deal with exit() boot service returning
  x86/boot: Use unsigned comparison for addresses
  efi/x86: Avoid using code32_start
  efi/x86: Make efi32_pe_entry() more readable
  efi/x86: Respect 32-bit ABI in efi32_pe_entry()
  efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
  ...
2020-03-30 16:13:08 -07:00
Linus Torvalds
ff7b862a4c * Do not report spurious MCEs on some Intel platforms caused by errata;
by Prarit Bhargava.
 
 * Change dev-mcelog's hardcoded limit of 32 error records to a dynamic
 one, controlled by the number of logical CPUs, by Tony Luck.
 
 * Add support for the processor identification number (PPIN) on AMD, by
 Wei Huang.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl6BseQACgkQEsHwGGHe
 VUqIMg/+KtZsFOHRKZD1dc0Jyo8O0BTzqMIif5J7AzRWv6DPLzfEFBjGFmVY10gN
 aovhRIF1TrUI8Em5as4FlczH8l328n1ZQhhy6YoCHcrT03LsKHXE46bcvm5msj9n
 0s0uZyDei6ly4k6hnNn5NPMjlkpNKS4/A1dkT3Ir25zlS+3Agds4nj5iNzFfOE19
 67bFSVw+KuEt4iihfX/uT0HtmcW5T5byDlwrxgMUC3s0EzMLIx4y+hqROzrJfIau
 NI3edpD0olhfkT9vz5NyZI7hNVAUOoWfYhoxZEJlAxjC+0MRKwR2A539YGsqzgJ9
 kFN5h6400xDmG5C5FUVULAEHG8O/AV+0AzMoH0c4xamalB64CJe6BehYJggFbyXB
 bH9bSZKasesZUSTP+v92dOrMK2ZtJnvhU5hhEDYbtRL4ERyIb/q9/AsJfpb299HJ
 JD1t4lMhURYr5qu/nck48yVnsHw0yqPju1qRDxqkbmRCkKNDi2t1ph7XUb7okSba
 AekWUomTliTm83rsX/lH6OJQ1uCtM7QOp6YULr8Zjb4TJcSAfuEsbAcnulUSrxan
 hreIKqC2A2RMpRVnX9IflKDHAGNWmT5Ag6tLpQ0/TfeaazxT2gdEw8YS4EU18cq6
 mMiJyIKmH2nGT7Mf65A0Lg0uJXFPFrtnKfFoSlb0kDsGlx3PEic=
 =3/4h
 -----END PGP SIGNATURE-----

Merge tag 'ras_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Do not report spurious MCEs on some Intel platforms caused by errata;
   by Prarit Bhargava.

 - Change dev-mcelog's hardcoded limit of 32 error records to a dynamic
   one, controlled by the number of logical CPUs, by Tony Luck.

 - Add support for the processor identification number (PPIN) on AMD, by
   Wei Huang.

* tag 'ras_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/amd: Add PPIN support for AMD MCE
  x86/mce/dev-mcelog: Dynamically allocate space for machine check records
  x86/mce: Do not log spurious corrected mce errors
2020-03-30 13:17:50 -07:00
Thomas Gleixner
cf226c42b2 Merge branch 'uaccess.futex' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into locking/core
Pull uaccess futex cleanups for Al Viro:

     Consolidate access_ok() usage and the futex uaccess function zoo.
2020-03-28 11:59:24 +01:00
Thomas Gleixner
a215032725 Merge branch 'next.uaccess-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into x86/cleanups
Pull uaccess cleanups from Al Viro:

  Consolidate the user access areas and get rid of uaccess_try(), user_ex()
  and other warts.
2020-03-28 11:57:02 +01:00
Al Viro
f5544ba712 x86: get rid of user_atomic_cmpxchg_inatomic()
Only one user left; the thing had been made polymorphic back in 2013
for the sake of MPX.  No point keeping it now that MPX is gone.
Convert futex_atomic_cmpxchg_inatomic() to user_access_{begin,end}()
while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27 23:58:55 -04:00
Al Viro
8aef36dacb x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
lock cmpxchg leaves the current value in eax; no need to reload it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27 23:58:54 -04:00
Al Viro
0ec33c0171 x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
Lift stac/clac pairs from __futex_atomic_op{1,2} into arch_futex_atomic_op_inuser(),
fold them with access_ok() in there.  The switch in arch_futex_atomic_op_inuser()
is what has required the previous (objtool) commit...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27 23:58:53 -04:00
Al Viro
a08971e948 futex: arch_futex_atomic_op_inuser() calling conventions change
Move access_ok() in and pagefault_enable()/pagefault_disable() out.
Mechanical conversion only - some instances don't really need
a separate access_ok() at all (e.g. the ones only using
get_user()/put_user(), or architectures where access_ok()
is always true); we'll deal with that in followups.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27 23:58:51 -04:00
Benjamin Thiel
5bacdc0982 x86/mm/set_memory: Fix -Wmissing-prototypes warnings
Add missing includes and move prototypes into the header set_memory.h in
order to fix -Wmissing-prototypes warnings.

 [ bp: Add ifdeffery around arch_invalidate_pmem() ]

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200320145028.6013-1-b.thiel@posteo.de
2020-03-27 11:26:06 +01:00
Benjamin Thiel
01bd18624d x86/platform/uv: Add a missing prototype for uv_bau_message_interrupt()
... in order to fix a -Wmissing-prototypes warning:

  arch/x86/platform/uv/tlb_uv.c:1275:6: warning:
  no previous prototype for ‘uv_bau_message_interrupt’ [-Wmissing-prototypes] \
	  void uv_bau_message_interrupt(struct pt_regs *regs)

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200327072621.2255-1-b.thiel@posteo.de
2020-03-27 10:54:52 +01:00
Al Viro
cf122cfba5 kill uaccess_try()
finally

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-26 15:02:14 -04:00
Ingo Molnar
629b3df7ec Merge branch 'x86/cpu' into perf/core, to resolve conflict
Conflicts:
	arch/x86/events/intel/uncore.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-25 15:20:44 +01:00
Brian Gerst
290a4474d0 x86/entry: Fix build error x86 with !CONFIG_POSIX_TIMERS
Add missing semicolon.

Fixes: a74d187c2d ("x86/entry: Refactor SYS_NI macros")
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200324143520.898733-1-brgerst@gmail.com
2020-03-25 10:06:20 +01:00
Thomas Gleixner
1826d56bce x86/cpu: Cleanup the now unused CPU match macros
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131510.900226233@linutronix.de
2020-03-24 21:37:23 +01:00
Thomas Gleixner
20d437447c x86/cpu: Add consistent CPU match macros
Finding all places which build x86_cpu_id match tables is tedious and the
logic is hidden in lots of differently named macro wrappers.

Most of these initializer macros use plain C89 initializers which rely on
the ordering of the struct members. So new members could only be added at
the end of the struct, but that's ugly as hell and C99 initializers are
really the right thing to use.

Provide a set of macros which:

  - Have a proper naming scheme, starting with X86_MATCH_

  - Use C99 initializers

The set of provided macros are all subsets of the base macro

    X86_MATCH_VENDOR_FAM_MODEL_FEATURE()

which allows to supply all possible selection criteria:

      vendor, family, model, feature

The other macros shorten this to avoid typing all arguments when they are
not needed and would require one of the _ANY constants. They have been
created due to the requirements of the existing usage sites.

Also add a few model constants for Centaur CPUs and QUARK.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131508.826011988@linutronix.de
2020-03-24 21:17:50 +01:00
Thomas Gleixner
ba5bade4cc x86/devicetable: Move x86 specific macro out of generic code
There is no reason that this gunk is in a generic header file. The wildcard
defines need to stay as they are required by file2alias.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131508.736205164@linutronix.de
2020-03-24 21:02:47 +01:00
Vincenzo Frascino
1c1a18b00d um: Fix header inclusion
User Mode Linux is a flavor of x86 that from the vDSO prospective always
falls back on system calls. This implies that it does not require any
of the unified vDSO definitions and their inclusion causes side effects
like this:

  In file included from include/vdso/processor.h:10:0,
                      from include/vdso/datapage.h:17,
                      from arch/x86/include/asm/vgtod.h:7,
                      from arch/x86/um/../kernel/sys_ia32.c:49:
  >> arch/x86/include/asm/vdso/processor.h:11:29: error: redefinition of 'rep_nop'
      static __always_inline void rep_nop(void)
                                  ^~~~~~~
     In file included from include/linux/rcupdate.h:30:0,
                      from include/linux/rculist.h:11,
                      from include/linux/pid.h:5,
                      from include/linux/sched.h:14,
                      from arch/x86/um/../kernel/sys_ia32.c:25:
     arch/x86/um/asm/processor.h:24:20: note: previous definition of 'rep_nop' was here
      static inline void rep_nop(void)

Make sure that the unnecessary headers are not included when um is built
to address the problem.

Fixes: abc22418db ("x86/vdso: Enable x86 to use common headers")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200323124109.7104-1-vincenzo.frascino@arm.com
2020-03-23 18:45:14 +01:00
Anshuman Khandual
31a9122058 x86/mm: Drop pud_mknotpresent()
There is an inconsistency between PMD and PUD-based THP page table helpers
like the following, as pud_present() does not test for _PAGE_PSE.

pmd_present(pmd_mknotpresent(pmd)) : True
pud_present(pud_mknotpresent(pud)) : False

Drop pud_mknotpresent() as there are no current users. If/when needed
back later, pud_present() will also have to be fixed to accommodate
_PAGE_PSE.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/1584925542-13034-1-git-send-email-anshuman.khandual@arm.com
2020-03-23 09:11:48 +01:00
Wei Huang
077168e241 x86/mce/amd: Add PPIN support for AMD MCE
Newer AMD CPUs support a feature called protected processor
identification number (PPIN). This feature can be detected via
CPUID_Fn80000008_EBX[23].

However, CPUID alone is not enough to read the processor identification
number - MSR_AMD_PPIN_CTL also needs to be configured properly. If, for
any reason, MSR_AMD_PPIN_CTL[PPIN_EN] can not be turned on, such as
disabled in BIOS, the CPU capability bit X86_FEATURE_AMD_PPIN needs to
be cleared.

When the X86_FEATURE_AMD_PPIN capability is available, the
identification number is issued together with the MCE error info in
order to keep track of the source of MCE errors.

 [ bp: Massage. ]

Co-developed-by: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Signed-off-by: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200321193800.3666964-1-wei.huang2@amd.com
2020-03-22 11:03:47 +01:00
Peter Zijlstra
46db36abc3 x86/entry: Rename ___preempt_schedule
Because moar '_' isn't always moar readable.

git grep -l "___preempt_schedule\(_notrace\)*" | while read file;
do
	sed -ie 's/___preempt_schedule\(_notrace\)*/preempt_schedule\1_thunk/g' $file;
done

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200320115858.995685950@infradead.org
2020-03-21 16:03:53 +01:00
Brian Gerst
ffd75b373f x86: Remove unneeded includes
Clean up includes of and in <asm/syscalls.h>

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-19-brgerst@gmail.com
2020-03-21 16:03:25 +01:00
Brian Gerst
0f78ff1711 x86/entry: Drop asmlinkage from syscalls
asmlinkage is no longer required since the syscall ABI is now fully under
x86 architecture control.  This makes the 32-bit native syscalls a bit more
effecient by passing in regs via EAX instead of on the stack.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-18-brgerst@gmail.com
2020-03-21 16:03:25 +01:00
Brian Gerst
25c619e59b x86/entry/32: Enable pt_regs based syscalls
Enable pt_regs based syscalls for 32-bit.  This makes the 32-bit native
kernel consistent with the 64-bit kernel, and improves the syscall
interface by not needing to push all 6 potential arguments onto the stack.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-17-brgerst@gmail.com
2020-03-21 16:03:24 +01:00
Brian Gerst
0872098804 x86/entry: Move max syscall number calculation to syscallhdr.sh
Instead of using an array in asm-offsets to calculate the max syscall
number, calculate it when writing out the syscall headers.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-9-brgerst@gmail.com
2020-03-21 16:03:21 +01:00
Brian Gerst
cc42c045af x86/entry/64: Move sys_ni_syscall stub to common.c
so it can be available to multiple syscall tables.  Also directly return
-ENOSYS instead of bouncing to the generic sys_ni_syscall().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-7-brgerst@gmail.com
2020-03-21 16:03:20 +01:00
Brian Gerst
27dd84fafc x86/entry/64: Use syscall wrappers for x32_rt_sigreturn
Add missing syscall wrapper for x32_rt_sigreturn().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-6-brgerst@gmail.com
2020-03-21 16:03:20 +01:00
Brian Gerst
a74d187c2d x86/entry: Refactor SYS_NI macros
Pull the common code out from the SYS_NI macros into a new __SYS_NI macro.
Also conditionalize the X64 version in preparation for enabling syscall
wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-5-brgerst@gmail.com
2020-03-21 16:03:20 +01:00
Brian Gerst
6cc8d2b286 x86/entry: Refactor COND_SYSCALL macros
Pull the common code out from the COND_SYSCALL macros into a new
__COND_SYSCALL macro.  Also conditionalize the X64 version in preparation
for enabling syscall wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-4-brgerst@gmail.com
2020-03-21 16:03:19 +01:00
Brian Gerst
d2b5de495e x86/entry: Refactor SYSCALL_DEFINE0 macros
Pull the common code out from the SYSCALL_DEFINE0 macros into a new
__SYS_STUB0 macro.  Also conditionalize the X64 version in preparation for
enabling syscall wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-3-brgerst@gmail.com
2020-03-21 16:03:19 +01:00
Brian Gerst
4399e0cf49 x86/entry: Refactor SYSCALL_DEFINEx macros
Pull the common code out from the SYSCALL_DEFINEx macros into a new
__SYS_STUBx macro.  Also conditionalize the X64 version in preparation for
enabling syscall wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-2-brgerst@gmail.com
2020-03-21 16:03:18 +01:00
Vincenzo Frascino
abc22418db x86/vdso: Enable x86 to use common headers
Enable x86 to use only the common headers in the implementation
of the vDSO library.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200320145351.32292-24-vincenzo.frascino@arm.com
2020-03-21 15:24:02 +01:00
Vincenzo Frascino
659a9faa3f x86: Introduce asm/vdso/clocksource.h
The vDSO library should only include the necessary headers required for
a userspace library (UAPI and a minimal set of kernel headers). To make
this possible it is necessary to isolate from the kernel headers the
common parts that are strictly necessary to build the library.

Introduce asm/vdso/clocksource.h to contain all the arm64 specific
functions that are suitable for vDSO inclusion.

This header will be required by a future patch that will generalize
vdso/clocksource.h.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200320145351.32292-5-vincenzo.frascino@arm.com
2020-03-21 15:23:54 +01:00
Ingo Molnar
df10846ff2 Merge branch 'linus' into locking/kcsan, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-21 09:35:44 +01:00
Ingo Molnar
a4654e9bde Merge branch 'x86/kdump' into locking/kcsan, to resolve conflicts
Conflicts:
	arch/x86/purgatory/Makefile

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-21 09:24:41 +01:00
Peter Zijlstra
d8a7386897 x86/optprobe: Fix OPTPROBE vs UACCESS
While looking at an objtool UACCESS warning, it suddenly occurred to me
that it is entirely possible to have an OPTPROBE right in the middle of
an UACCESS region.

In this case we must of course clear FLAGS.AC while running the KPROBE.
Luckily the trampoline already saves/restores [ER]FLAGS, so all we need
to do is inject a CLAC. Unfortunately we cannot use ALTERNATIVE() in the
trampoline text, so we have to frob that manually.

Fixes: ca0bbc70f147 ("sched/x86_64: Don't save flags on context switch")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20200305092130.GU2596@hirez.programming.kicks-ass.net
2020-03-20 13:06:22 +01:00
Ingo Molnar
409e1a3140 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-19 15:01:45 +01:00
Guenter Roeck
bac59d18c7 x86/setup: Fix static memory detection
When booting x86 images in qemu, the following warning is seen randomly
if DEBUG_LOCKDEP is enabled.

  WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:1119
	  lockdep_register_key+0xc0/0x100

static_obj() returns true if an address is between _stext and _end.

On x86, this includes the brk memory space. Problem is that this memory
block is not static on x86; its unused portions are released after init
and can be allocated. This results in the observed warning if a lockdep
object is allocated from this memory.

Solve the problem by implementing arch_is_kernel_initmem_freed() for
x86 and have it return true if an address is within the released memory
range.

The same problem was solved for s390 with commit

  7a5da02de8 ("locking/lockdep: check for freed initmem in static_obj()"),

which introduced arch_is_kernel_initmem_freed().

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200131021159.9178-1-linux@roeck-us.net
2020-03-19 11:58:13 +01:00
Al Viro
9f855c085f x86: switch setup_sigcontext() to unsafe_put_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:39:02 -04:00
Al Viro
77f3c6166d x86: kill get_user_{try,catch,ex}
no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 20:35:35 -04:00
Al Viro
4b842e4e25 x86: get rid of small constant size cases in raw_copy_{to,from}_user()
Very few call sites where that would be triggered remain, and none
of those is anywhere near hot enough to bother.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 15:53:25 -04:00
Al Viro
71c3313a38 x86: switch sigframe sigset handling to explict __get_user()/__put_user()
... and consolidate the definition of sigframe_ia32->extramask - it's
always a 1-element array of 32bit unsigned.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18 15:29:54 -04:00
Jesse Brandeburg
1651e70066 x86: Fix bitops.h warning with a moved cast
Fix many sparse warnings when building with C=1. These are useless noise
from the bitops.h file and getting rid of them helps developers make
more use of the tools and possibly find real bugs.

When the kernel is compiled with C=1, there are lots of messages like:

  arch/x86/include/asm/bitops.h:77:37: warning: cast truncates bits from constant value (ffffff7f becomes 7f)

CONST_MASK() is using a signed integer "1" to create the mask which is
later cast to (u8), in order to yield an 8-bit value for the assembly
instructions to use. Simplify the expressions used to clearly indicate
they are working on 8-bit values only, which still keeps sparse happy
without an accidental promotion to a 32 bit integer.

The warning was occurring because certain bitmasks that end with a bit
set next to a natural boundary like 7, 15, 23, 31, end up with a mask
like 0x7f, which then results in sign extension due to the integer type
promotion rules[1]. It was really only clear_bit() that was having
problems, and it was only on some bit checks that resulted in a mask
like 0xffffff7f being generated after the inversion.

Verify with a test module (see next patch) and assembly inspection that
the fix doesn't introduce any change in generated code.

 [ bp: Massage. ]

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Acked-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules [1]
Link: https://lkml.kernel.org/r/20200310221747.2848474-1-jesse.brandeburg@intel.com
2020-03-18 12:30:19 +01:00
Kim Phillips
e48667b865 perf/amd/uncore: Add support for Family 19h L3 PMU
Family 19h introduces change in slice, core and thread specification in
its L3 Performance Event Select (ChL3PmcCfg) h/w register. The change is
incompatible with Family 17h's version of the register.

Introduce a new path in l3_thread_slice_mask() to do things differently
for Family 19h vs. Family 17h, otherwise the new hardware doesn't get
programmed correctly.

Instead of a linear core--thread bitmask, Family 19h takes an encoded
core number, and a separate thread mask. There are new bits that are set
for all cores and all slices, of which only the latter is used, since
the driver counts events for all slices on behalf of the specified CPU.

Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric mode
decision based on Family 17h or above, not just 17h and 18h: the Family
19h Data Fabric PMC is compatible with the Family 17h DF PMC.

 [ bp: Touchups. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-3-kim.phillips@amd.com
2020-03-17 13:01:03 +01:00
Thomas Hellstrom
6db73f17c5 x86: Don't let pgprot_modify() change the page encryption bit
When SEV or SME is enabled and active, vm_get_page_prot() typically
returns with the encryption bit set. This means that users of
pgprot_modify(, vm_get_page_prot()) (mprotect_fixup(), do_mmap()) end up
with a value of vma->vm_pg_prot that is not consistent with the intended
protection of the PTEs.

This is also important for fault handlers that rely on the VMA
vm_page_prot to set the page protection. Fix this by not allowing
pgprot_modify() to change the encryption bit, similar to how it's done
for PAT bits.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20200304114527.3636-2-thomas_os@shipmail.org
2020-03-17 11:48:31 +01:00
Borislav Petkov
19d33357ec x86/amd_nb, char/amd64-agp: Use amd_nb_num() accessor
... to find whether there are northbridges present on the
system. Convert the last forgotten user and therefore, unexport
amd_nb_misc_ids[] too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Michal Kubecek <mkubecek@suse.cz>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lkml.kernel.org/r/20200316150725.925-1-bp@alien8.de
2020-03-17 10:25:58 +01:00
Paolo Bonzini
727a7e27cf KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgd
The set_cr3 callback is not setting the guest CR3, it is setting the
root of the guest page tables, either shadow or two-dimensional.
To make this clearer as well as to indicate that the MMU calls it
via kvm_mmu_load_cr3, rename it to load_mmu_pgd.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:52 +01:00
Paolo Bonzini
689f3bf216 KVM: x86: unify callbacks to load paging root
Similar to what kvm-intel.ko is doing, provide a single callback that
merges svm_set_cr3, set_tdp_cr3 and nested_svm_set_tdp_cr3.

This lets us unify the set_cr3 and set_tdp_cr3 entries in kvm_x86_ops.
I'm doing that in this same patch because splitting it adds quite a bit
of churn due to the need for forward declarations.  For the same reason
the assignment to vcpu->arch.mmu->set_cr3 is moved to kvm_init_shadow_mmu
from init_kvm_softmmu and nested_svm_init_mmu_context.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:51 +01:00
Sean Christopherson
257038745c KVM: x86: Move nSVM CPUID 0x8000000A handling into common x86 code
Handle CPUID 0x8000000A in the main switch in __do_cpuid_func() and drop
->set_supported_cpuid() now that both VMX and SVM implementations are
empty.  Like leaf 0x14 (Intel PT) and leaf 0x8000001F (SEV), leaf
0x8000000A is is (obviously) vendor specific but can be queried in
common code while respecting SVM's wishes by querying kvm_cpu_cap_has().

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:45 +01:00
Sean Christopherson
91661989d1 KVM: x86: Move VMX's host_efer to common x86 code
Move host_efer to common x86 code and use it for CPUID's is_efer_nx() to
avoid constantly re-reading the MSR.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:42 +01:00
Sean Christopherson
703c335d06 KVM: x86/mmu: Configure max page level during hardware setup
Configure the max page level during hardware setup to avoid a retpoline
in the page fault handler.  Drop ->get_lpage_level() as the page fault
handler was the last user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:40 +01:00
Sean Christopherson
bde7723559 KVM: x86/mmu: Merge kvm_{enable,disable}_tdp() into a common function
Combine kvm_enable_tdp() and kvm_disable_tdp() into a single function,
kvm_configure_mmu(), in preparation for doing additional configuration
during hardware setup.  And because having separate helpers is silly.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:39 +01:00
Sean Christopherson
a1bead2aba KVM: VMX: Directly query Intel PT mode when refreshing PMUs
Use vmx_pt_mode_is_host_guest() in intel_pmu_refresh() instead of
bouncing through kvm_x86_ops->pt_supported, and remove ->pt_supported()
as the PMU code was the last remaining user.

Opportunistically clean up the wording of a comment that referenced
kvm_x86_ops->pt_supported().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:38 +01:00
Sean Christopherson
139085101f KVM: x86: Use KVM cpu caps to detect MSR_TSC_AUX virt support
Check for MSR_TSC_AUX virtualization via kvm_cpu_cap_has() and drop
->rdtscp_supported().

Note, vmx_rdtscp_supported() needs to hang around a tiny bit longer due
other usage in VMX code.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:35 +01:00
Sean Christopherson
90d2f60f41 KVM: x86: Use KVM cpu caps to track UMIP emulation
Set UMIP in kvm_cpu_caps when it is emulated by VMX, even though the
bit will effectively be dropped by do_host_cpuid().  This allows
checking for UMIP emulation via kvm_cpu_caps instead of a dedicated
kvm_x86_ops callback.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:28 +01:00
Sean Christopherson
b3d895d5c4 KVM: x86: Move XSAVES CPUID adjust to VMX's KVM cpu cap update
Move the clearing of the XSAVES CPUID bit into VMX, which has a separate
VMCS control to enable XSAVES in non-root, to eliminate the last ugly
renmant of the undesirable "unsigned f_* = *_supported ? F(*) : 0"
pattern in the common CPUID handling code.

Drop ->xsaves_supported(), CPUID adjustment was the only user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:25 +01:00
Sean Christopherson
d64d83d1e0 KVM: x86: Handle PKU CPUID adjustment in VMX code
Move the setting of the PKU CPUID bit into VMX to eliminate an instance
of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in
the common CPUID handling code.  Drop ->pku_supported(), CPUID
adjustment was the only user.

Note, some AMD CPUs now support PKU, but SVM doesn't yet support
exposing it to a guest.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:19 +01:00
Sean Christopherson
5ffec6f910 KVM: x86: Handle INVPCID CPUID adjustment in VMX code
Move the INVPCID CPUID adjustments into VMX to eliminate an instance of
the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the
common CPUID handling code.  Drop ->invpcid_supported(), CPUID
adjustment was the only user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:17 +01:00
Sean Christopherson
160b486f65 KVM: x86: Drop explicit @func param from ->set_supported_cpuid()
Drop the explicit @func param from ->set_supported_cpuid() and instead
pull the CPUID function from the relevant entry.  This sets the stage
for hardening guest CPUID updates in future patches, e.g. allows adding
run-time assertions that the CPUID feature being changed is actually
a bit in the referenced CPUID entry.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:12 +01:00
Sean Christopherson
7f5581f592 KVM: x86: Use supported_xcr0 to detect MPX support
Query supported_xcr0 when checking for MPX support instead of invoking
->mpx_supported() and drop ->mpx_supported() as kvm_mpx_supported() was
its last user.  Rename vmx_mpx_supported() to cpu_has_vmx_mpx() to
better align with VMX/VMCS nomenclature.

Modify VMX's adjustment of xcr0 to call cpus_has_vmx_mpx() (renamed from
vmx_mpx_supported()) directly to avoid reading supported_xcr0 before
it's fully configured.

No functional change intended.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Test that *all* bits are set. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:10 +01:00
Sean Christopherson
2f728d66e8 KVM: x86: Move kvm_emulate.h into KVM's private directory
Now that the emulation context is dynamically allocated and not embedded
in struct kvm_vcpu, move its header, kvm_emulate.h, out of the public
asm directory and into KVM's private x86 directory.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:52 +01:00
Sean Christopherson
c9b8b07cde KVM: x86: Dynamically allocate per-vCPU emulation context
Allocate the emulation context instead of embedding it in struct
kvm_vcpu_arch.

Dynamic allocation provides several benefits:

  - Shrinks the size x86 vcpus by ~2.5k bytes, dropping them back below
    the PAGE_ALLOC_COSTLY_ORDER threshold.
  - Allows for dropping the include of kvm_emulate.h from asm/kvm_host.h
    and moving kvm_emulate.h into KVM's private directory.
  - Allows a reducing KVM's attack surface by shrinking the amount of
    vCPU data that is exposed to usercopy.
  - Allows a future patch to disable the emulator entirely, which may or
    may not be a realistic endeavor.

Mark the entire struct as valid for usercopy to maintain existing
behavior with respect to hardened usercopy.  Future patches can shrink
the usercopy range to cover only what is necessary.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:52 +01:00
Sean Christopherson
21f1b8f29e KVM: x86: Explicitly pass an exception struct to check_intercept
Explicitly pass an exception struct when checking for intercept from
the emulator, which eliminates the last reference to arch.emulate_ctxt
in vendor specific code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:50 +01:00
Sean Christopherson
d8dd54e063 KVM: x86/mmu: Rename kvm_mmu->get_cr3() to ->get_guest_pgd()
Rename kvm_mmu->get_cr3() to call out that it is retrieving a guest
value, as opposed to kvm_mmu->set_cr3(), which sets a host value, and to
note that it will return something other than CR3 when nested EPT is in
use.  Hopefully the new name will also make it more obvious that L1's
nested_cr3 is returned in SVM's nested NPT case.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:46 +01:00
Sean Christopherson
bb1fcc70d9 KVM: nVMX: Allow L1 to use 5-level page walks for nested EPT
Add support for 5-level nested EPT, and advertise said support in the
EPT capabilities MSR.  KVM's MMU can already handle 5-level legacy page
tables, there's no reason to force an L1 VMM to use shadow paging if it
wants to employ 5-level page tables.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:44 +01:00
Sean Christopherson
8053f924ca KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack
Drop kvm_mmu_extended_role.cr4_la57 now that mmu_role doesn't mask off
level, which already incorporates the guest's CR4.LA57 for a shadow MMU
by querying is_la57_mode().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:43 +01:00
Sean Christopherson
a1c77abb8d KVM: nVMX: Properly handle userspace interrupt window request
Return true for vmx_interrupt_allowed() if the vCPU is in L2 and L1 has
external interrupt exiting enabled.  IRQs are never blocked in hardware
if the CPU is in the guest (L2 from L1's perspective) when IRQs trigger
VM-Exit.

The new check percolates up to kvm_vcpu_ready_for_interrupt_injection()
and thus vcpu_run(), and so KVM will exit to userspace if userspace has
requested an interrupt window (to inject an IRQ into L1).

Remove the @external_intr param from vmx_check_nested_events(), which is
actually an indicator that userspace wants an interrupt window, e.g.
it's named @req_int_win further up the stack.  Injecting a VM-Exit into
L1 to try and bounce out to L0 userspace is all kinds of broken and is
no longer necessary.

Remove the hack in nested_vmx_vmexit() that attempted to workaround the
breakage in vmx_check_nested_events() by only filling interrupt info if
there's an actual interrupt pending.  The hack actually made things
worse because it caused KVM to _never_ fill interrupt info when the
LAPIC resides in userspace (kvm_cpu_has_interrupt() queries
interrupt.injected, which is always cleared by prepare_vmcs12() before
reaching the hack in nested_vmx_vmexit()).

Fixes: 6550c4df7e ("KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"")
Cc: stable@vger.kernel.org
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:40 +01:00
Wanpeng Li
4abaffce4d KVM: LAPIC: Recalculate apic map in batch
In the vCPU reset and set APIC_BASE MSR path, the apic map will be recalculated
several times, each time it will consume 10+ us observed by ftrace in my
non-overcommit environment since the expensive memory allocate/mutex/rcu etc
operations. This patch optimizes it by recaluating apic map in batch, I hope
this can benefit the serverless scenario which can frequently create/destroy
VMs.

Before patch:

kvm_lapic_reset  ~27us

After patch:

kvm_lapic_reset  ~14us

Observed by ftrace, improve ~48%.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:39 +01:00
Jay Zhou
3c9bd4006b KVM: x86: enable dirty log gradually in small chunks
It could take kvm->mmu_lock for an extended period of time when
enabling dirty log for the first time. The main cost is to clear
all the D-bits of last level SPTEs. This situation can benefit from
manual dirty log protect as well, which can reduce the mmu_lock
time taken. The sequence is like this:

1. Initialize all the bits of the dirty bitmap to 1 when enabling
   dirty log for the first time
2. Only write protect the huge pages
3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
   SPTEs gradually in small chunks

Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
I did some tests with a 128G windows VM and counted the time taken
of memory_global_dirty_log_start, here is the numbers:

VM Size        Before    After optimization
128G           460ms     10ms

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:37 +01:00
Oliver Upton
cc7f5577ad KVM: SVM: Inhibit APIC virtualization for X2APIC guest
The AVIC does not support guest use of the x2APIC interface. Currently,
KVM simply chooses to squash the x2APIC feature in the guest's CPUID
If the AVIC is enabled. Doing so prevents KVM from running a guest
with greater than 255 vCPUs, as such a guest necessitates the use
of the x2APIC interface.

Instead, inhibit AVIC enablement on a per-VM basis whenever the x2APIC
feature is set in the guest's CPUID.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:35 +01:00
Sean Christopherson
562b6b089d KVM: x86: Consolidate VM allocation and free for VMX and SVM
Move the VM allocation and free code to common x86 as the logic is
more or less identical across SVM and VMX.

Note, although hyperv.hv_pa_pg is part of the common kvm->arch, it's
(currently) only allocated by VMX VMs.  But, since kfree() plays nice
when passed a NULL pointer, the superfluous call for SVM is harmless
and avoids future churn if SVM gains support for HyperV's direct TLB
flush.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[Make vm_size a field instead of a function. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:33 +01:00
Sean Christopherson
e96c81ee89 KVM: Simplify kvm_free_memslot() and all its descendents
Now that all callers of kvm_free_memslot() pass NULL for @dont, remove
the param from the top-level routine and all arch's implementations.

No functional change intended.

Tested-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:22 +01:00
Sean Christopherson
744e699c7e KVM: x86: Move gpa_val and gpa_available into the emulator context
Move the GPA tracking into the emulator context now that the context is
guaranteed to be initialized via __init_emulate_ctxt() prior to
dereferencing gpa_{available,val}, i.e. now that seeing a stale
gpa_available will also trigger a WARN due to an invalid context.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:12 +01:00
Sean Christopherson
92daa48b34 KVM: x86: Add EMULTYPE_PF when emulation is triggered by a page fault
Add a new emulation type flag to explicitly mark emulation related to a
page fault.  Move the propation of the GPA into the emulator from the
page fault handler into x86_emulate_instruction, using EMULTYPE_PF as an
indicator that cr2 is valid.  Similarly, don't propagate cr2 into the
exception.address when it's *not* valid.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:12 +01:00
Linus Torvalds
6693075e0f Bugfixes, x86+s390.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJebMXnAAoJEL/70l94x66D3fYIAJ1r+o2qgzadwEqoXTvlihjB
 ujX1jOs20EJJ56VhTtXF/wZQc+7VeKCjpIqNv4WaeSYPUhzFGyL9t5tw1YdRDCwY
 u6gklxruIzZodgp+vCoTkPyyUylVmY50sY/yBIJ4F8qOaMxhTEE1aXzGuaOrYqVO
 MmIlAltEKQzdXPO1SVPD7triGPgUTj+DRxrlyRrGt2ItiMUincCz9K6TDyXFib0r
 SSCVFNYtYmzu/bV/E4/Sphi2BxCQEem5DIFWLcngzN8Wy5oCoRVzPGugT4Q9eXWt
 ZtWIDh473JGiXBLYmDq4REJsRSca+7s/YiiLSiQwYfByhIPJpVEoy54fcdaZflo=
 =T4AD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes for x86 and s390"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs
  KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect
  KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1
  KVM: s390: Also reset registers in sync regs for initial cpu reset
  KVM: fix Kconfig menu text for -Werror
  KVM: x86: remove stale comment from struct x86_emulate_ctxt
  KVM: x86: clear stale x86_emulate_ctxt->intercept value
  KVM: SVM: Fix the svm vmexit code for WRMSR
  KVM: X86: Fix dereference null cpufreq policy
2020-03-14 15:45:26 -07:00
Kim Phillips
753039ef8b x86/cpu/amd: Call init_amd_zn() om Family 19h processors too
Family 19h CPUs are Zen-based and still share most architectural
features with Family 17h CPUs, and therefore still need to call
init_amd_zn() e.g., to set the RECLAIM_DISTANCE override.

init_amd_zn() also sets X86_FEATURE_ZEN, which today is only used
in amd_set_core_ssb_state(), which isn't called on some late
model Family 17h CPUs, nor on any Family 19h CPUs:
X86_FEATURE_AMD_SSBD replaces X86_FEATURE_LS_CFG_SSBD on those
later model CPUs, where the SSBD mitigation is done via the
SPEC_CTRL MSR instead of the LS_CFG MSR.

Family 19h CPUs also don't have the erratum where the CPB feature
bit isn't set, but that code can stay unchanged and run safely
on Family 19h.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200311191451.13221-1-kim.phillips@amd.com
2020-03-12 12:13:44 +01:00
Tony Luck
d8ecca4043 x86/mce/dev-mcelog: Dynamically allocate space for machine check records
We have had a hard coded limit of 32 machine check records since the
dawn of time.  But as numbers of cores increase, it is possible for
more than 32 errors to be reported before a user process reads from
/dev/mcelog. In this case the additional errors are lost.

Keep 32 as the minimum. But tune the maximum value up based on the
number of processors.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200218184408.GA23048@agluck-desk2.amr.corp.intel.com
2020-03-10 10:25:14 +01:00
Boqun Feng
1cf106d932 PCI: hv: Introduce hv_msi_entry
Add a new structure (hv_msi_entry), which is also defined in the TLFS,
to describe the msi entry for HVCALL_RETARGET_INTERRUPT. The structure
is needed because its layout may be different from architecture to
architecture.

Also add a new generic interface hv_set_msi_entry_from_desc() to allow
different archs to set the msi entry from msi_desc.

No functional change, only preparation for the future support of virtual
PCI on non-x86 architectures.

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-09 14:51:31 +00:00
Boqun Feng
61bfd920ab PCI: hv: Move retarget related structures into tlfs header
Currently, retarget_msi_interrupt and other structures it relys on are
defined in pci-hyperv.c. However, those structures are actually defined
in Hypervisor Top-Level Functional Specification [1] and may be
different in sizes of fields or layout from architecture to
architecture. Let's move those definitions into x86's tlfs header file
to support virtual PCI on non-x86 architectures in the future. Note that
"__packed" attribute is added to these structures during the movement
for the same reason as we use the attribute for other TLFS structures in
the header file: make sure the structures meet the specification and
avoid anything unexpected from the compilers.

Additionally, rename struct retarget_msi_interrupt to
hv_retarget_msi_interrupt for the consistent naming convention, also
mirroring the name in TLFS.

[1]: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-09 14:50:53 +00:00
Boqun Feng
b00f80fcfa PCI: hv: Move hypercall related definitions into tlfs header
Currently HVCALL_RETARGET_INTERRUPT and HV_PARTITION_ID_SELF are defined
in pci-hyperv.c. However, similar to other hypercall related
definitions, it makes more sense to put them in the tlfs header file.

Besides, these definitions are arch-dependent, so for the support of
virtual PCI on non-x86 archs in the future, move them into arch-specific
tlfs header file.

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-09 14:50:39 +00:00
Ingo Molnar
6120681bdf Merge branch 'efi/urgent' into efi/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-08 09:57:58 +01:00
Ingo Molnar
1b10d388d0 Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-06 12:49:56 +01:00
Vitaly Kuznetsov
d718fdc3e7 KVM: x86: remove stale comment from struct x86_emulate_ctxt
Commit c44b4c6ab8 ("KVM: emulate: clean up initializations in
init_decode_cache") did some field shuffling and instead of
[opcode_len, _regs) started clearing [has_seg_override, modrm).
The comment about clearing fields altogether is not true anymore.

Fixes: c44b4c6ab8 ("KVM: emulate: clean up initializations in init_decode_cache")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-03 17:38:22 +01:00
Linus Torvalds
2873dc2547 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: a pkeys fix for a bug that triggers with weird BIOS
  settings, and two Xen PV fixes: a paravirt interface fix, and
  pagetable dumping fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix dump_pagetables with Xen PV
  x86/ioperm: Add new paravirt function update_io_bitmap()
  x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
2020-03-02 06:54:54 -06:00
Juergen Gross
99bcd4a6e5 x86/ioperm: Add new paravirt function update_io_bitmap()
Commit 111e7b15cf ("x86/ioperm: Extend IOPL config to control ioperm()
as well") reworked the iopl syscall to use I/O bitmaps.

Unfortunately this broke Xen PV domains using that syscall as there is
currently no I/O bitmap support in PV domains.

Add I/O bitmap support via a new paravirt function update_io_bitmap which
Xen PV domains can use to update their I/O bitmaps via a hypercall.

Fixes: 111e7b15cf ("x86/ioperm: Extend IOPL config to control ioperm() as well")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Cc: <stable@vger.kernel.org> # 5.5
Link: https://lkml.kernel.org/r/20200218154712.25490-1-jgross@suse.com
2020-02-29 12:43:09 +01:00
Thomas Gleixner
17dbedb5da x86/irq: Remove useless return value from do_IRQ()
Nothing is using it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220216.826870369@linutronix.de
2020-02-27 14:48:40 +01:00
Thomas Gleixner
3ba4f0a633 x86/traps: Remove redundant declaration of do_double_fault()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220216.720335354@linutronix.de
2020-02-27 14:48:40 +01:00
Thomas Gleixner
840371bea1 x86/entry/32: Force MCE through do_mce()
Remove the pointless difference between 32 and 64 bit to make further
unifications simpler.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220216.428188397@linutronix.de
2020-02-27 14:48:39 +01:00
Andy Lutomirski
55ba18d6ed x86/mce: Disable tracing and kprobes on do_machine_check()
do_machine_check() can be raised in almost any context including the most
fragile ones. Prevent kprobes and tracing.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220216.315548935@linutronix.de
2020-02-27 14:48:39 +01:00
Ingo Molnar
e9765680a3 EFI updates for v5.7:
This time, the set of changes for the EFI subsystem is much larger than
 usual. The main reasons are:
 - Get things cleaned up before EFI support for RISC-V arrives, which will
   increase the size of the validation matrix, and therefore the threshold to
   making drastic changes,
 - After years of defunct maintainership, the GRUB project has finally started
   to consider changes from the distros regarding UEFI boot, some of which are
   highly specific to the way x86 does UEFI secure boot and measured boot,
   based on knowledge of both shim internals and the layout of bootparams and
   the x86 setup header. Having this maintenance burden on other architectures
   (which don't need shim in the first place) is hard to justify, so instead,
   we are introducing a generic Linux/UEFI boot protocol.
 
 Summary of changes:
 - Boot time GDT handling changes (Arvind)
 - Simplify handling of EFI properties table on arm64
 - Generic EFI stub cleanups, to improve command line handling, file I/O,
   memory allocation, etc.
 - Introduce a generic initrd loading method based on calling back into
   the firmware, instead of relying on the x86 EFI handover protocol or
   device tree.
 - Introduce a mixed mode boot method that does not rely on the x86 EFI
   handover protocol either, and could potentially be adopted by other
   architectures (if another one ever surfaces where one execution mode
   is a superset of another)
 - Clean up the contents of struct efi, and move out everything that
   doesn't need to be stored there.
 - Incorporate support for UEFI spec v2.8A changes that permit firmware
   implementations to return EFI_UNSUPPORTED from UEFI runtime services at
   OS runtime, and expose a mask of which ones are supported or unsupported
   via a configuration table.
 - Various documentation updates and minor code cleanups (Heinrich)
 - Partial fix for the lack of by-VA cache maintenance in the decompressor
   on 32-bit ARM. Note that these patches were deliberately put at the
   beginning so they can be used as a stable branch that will be shared with
   a PR containing the complete fix, which I will send to the ARM tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl5S7WYACgkQwjcgfpV0
 +n1jmQgAmwV3V8pbgB4mi4P2Mv8w5Zj5feUe6uXnTR2AFv5nygLcTzdxN+TU/6lc
 OmZv2zzdsAscYlhuUdI/4t4cXIjHAZI39+UBoNRuMqKbkbvXCFscZANLxvJjHjZv
 FFbgUk0DXkF0BCEDuSLNavidAv4b3gZsOmHbPfwuB8xdP05LbvbS2mf+2tWVAi2z
 ULfua/0o9yiwl19jSS6iQEPCvvLBeBzTLW7x5Rcm/S0JnotzB59yMaeqD7jO8JYP
 5PvI4WM/l5UfVHnZp2k1R76AOjReALw8dQgqAsT79Q7+EH3sNNuIjU6omdy+DFf4
 tnpwYfeLOaZ1SztNNfU87Hsgnn2CGw==
 =pDE3
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core

Pull EFI updates for v5.7 from Ard Biesheuvel:

This time, the set of changes for the EFI subsystem is much larger than
usual. The main reasons are:

 - Get things cleaned up before EFI support for RISC-V arrives, which will
   increase the size of the validation matrix, and therefore the threshold to
   making drastic changes,

 - After years of defunct maintainership, the GRUB project has finally started
   to consider changes from the distros regarding UEFI boot, some of which are
   highly specific to the way x86 does UEFI secure boot and measured boot,
   based on knowledge of both shim internals and the layout of bootparams and
   the x86 setup header. Having this maintenance burden on other architectures
   (which don't need shim in the first place) is hard to justify, so instead,
   we are introducing a generic Linux/UEFI boot protocol.

Summary of changes:

 - Boot time GDT handling changes (Arvind)

 - Simplify handling of EFI properties table on arm64

 - Generic EFI stub cleanups, to improve command line handling, file I/O,
   memory allocation, etc.

 - Introduce a generic initrd loading method based on calling back into
   the firmware, instead of relying on the x86 EFI handover protocol or
   device tree.

 - Introduce a mixed mode boot method that does not rely on the x86 EFI
   handover protocol either, and could potentially be adopted by other
   architectures (if another one ever surfaces where one execution mode
   is a superset of another)

 - Clean up the contents of struct efi, and move out everything that
   doesn't need to be stored there.

 - Incorporate support for UEFI spec v2.8A changes that permit firmware
   implementations to return EFI_UNSUPPORTED from UEFI runtime services at
   OS runtime, and expose a mask of which ones are supported or unsupported
   via a configuration table.

 - Various documentation updates and minor code cleanups (Heinrich)

 - Partial fix for the lack of by-VA cache maintenance in the decompressor
   on 32-bit ARM. Note that these patches were deliberately put at the
   beginning so they can be used as a stable branch that will be shared with
   a PR containing the complete fix, which I will send to the ARM tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-26 15:21:22 +01:00
Arvind Sankar
6f8f0dc980 x86/vmlinux: Drop unneeded linker script discard of .eh_frame
Now that .eh_frame sections for the files in setup.elf and realmode.elf
are not generated anymore, the linker scripts don't need the special
output section name /DISCARD/ any more.

Remove the one in the main kernel linker script as well, since there are
no .eh_frame sections already, and fix up a comment referencing .eh_frame.

Update the comment in asm/dwarf2.h referring to .eh_frame so it continues
to make sense, as well as being more specific.

 [ bp: Touch up commit message. ]

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lkml.kernel.org/r/20200224232129.597160-3-nivedita@alum.mit.edu
2020-02-25 14:51:29 +01:00
Linus Torvalds
63623fd449 Bugfixes, including the fix for CVE-2020-2732 and a few
issues found by "make W=1".
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeVBwcAAoJEL/70l94x66DB9AH/AxWhmtf6YVXMNyZjXydxa1f
 hYVm9wg9GCsZS+7cktMhq0/uDEu5IjaCv7d+bzIcYZdFAOcs5nBUUjn1LtVl9w1y
 48vobyOa8pXpORerBtZtaO1kt4sfFR63zm7uau32DzXrz3qpHlMUjPdL08A1e35V
 cSSPAHHsl9S1TbDryc/VUNCOgauJes6LHbd3CdeAXU6lzMBW8JWbF2b/MAkvHG6n
 Hw5LpicWSeTxoPjR4Oi0Yx3VKvWfS9608netSJmuCNsv36wrhzKR1iuyb3kNCkAy
 AIlALn4PZq1Y5i1INi/XIkpC8d9yTqt5heRxYwp+yHadWO6E7ZMlITfxLZii+mM=
 =7EpO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes, including the fix for CVE-2020-2732 and a few issues found
  by 'make W=1'"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: s390: rstify new ioctls in api.rst
  KVM: nVMX: Check IO instruction VM-exit conditions
  KVM: nVMX: Refactor IO bitmap checks into helper function
  KVM: nVMX: Don't emulate instructions in guest mode
  KVM: nVMX: Emulate MTF when performing instruction emulation
  KVM: fix error handling in svm_hardware_setup
  KVM: SVM: Fix potential memory leak in svm_cpu_init()
  KVM: apic: avoid calculating pending eoi from an uninitialized val
  KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled
  KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1
  kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled
  KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE
  KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow
  KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI
  kvm/emulate: fix a -Werror=cast-function-type
  KVM: x86: fix incorrect comparison in trace event
  KVM: nVMX: Fix some obsolete comments and grammar error
  KVM: x86: fix missing prototypes
  KVM: x86: enable -Werror
2020-02-24 11:48:17 -08:00
Dave Hansen
16171bffc8 x86/pkeys: Add check for pkey "overflow"
Alex Shi reported the pkey macros above arch_set_user_pkey_access()
to be unused.  They are unused, and even refer to a nonexistent
CONFIG option.

But, they might have served a good use, which was to ensure that
the code does not try to set values that would not fit in the
PKRU register.  As it stands, a too-large 'pkey' value would
be likely to silently overflow the u32 new_pkru_bits.

Add a check to look for overflows.  Also add a comment to remind
any future developer to closely examine the types used to store
pkey values if arch_max_pkey() ever changes.

This boots and passes the x86 pkey selftests.

Reported-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200122165346.AD4DA150@viggo.jf.intel.com
2020-02-24 20:25:21 +01:00
Ingo Molnar
546121b65f Linux 5.6-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5TFjYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGikYIAIhI4C8R87wyj/0m
 b2NWk6TZ5AFmiZLYSbsPYxdSC9OLdUmlGFKgL2SyLTwZCiHChm+cNBrngp3hJ6gz
 x1YH99HdjzkiaLa0hCc2+a/aOt8azGU2RiWEP8rbo0gFSk28wE6FjtzSxR95jyPz
 FRKo/sM+dHBMFXrthJbr+xHZ1De28MITzS2ddstr/10ojoRgm43I3qo1JKhjoDN5
 9GGb6v0Md5eo+XZjjB50CvgF5GhpiqW7+HBB7npMsgTk37GdsR5RlosJ/TScLVC9
 dNeanuqk8bqMGM0u2DFYdDqjcqAlYbt8aobuWWCB5xgPBXr5G2nox+IgF/f9G6UH
 EShA/xs=
 =OFPc
 -----END PGP SIGNATURE-----

Merge tag 'v5.6-rc3' into sched/core, to pick up fixes and dependent patches

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-24 11:36:09 +01:00
Ard Biesheuvel
3b8f44fc08 efi/libstub/x86: Use Exit() boot service to exit the stub on errors
Currently, we either return with an error [from efi_pe_entry()] or
enter a deadloop [in efi_main()] if any fatal errors occur during
execution of the EFI stub. Let's switch to calling the Exit() EFI boot
service instead in both cases, so that we
a) can get rid of the deadloop, and simply return to the boot manager
   if any errors occur during execution of the stub, including during
   the call to ExitBootServices(),
b) can also return cleanly from efi_pe_entry() or efi_main() in mixed
   mode, once we introduce support for LoadImage/StartImage based mixed
   mode in the next patch.

Note that on systems running downstream GRUBs [which do not use LoadImage
or StartImage to boot the kernel, and instead, pass their own image
handle as the loaded image handle], calling Exit() will exit from GRUB
rather than from the kernel, but this is a tolerable side effect.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23 21:59:42 +01:00
Ard Biesheuvel
59f2a619a2 efi: Add 'runtime' pointer to struct efi
Instead of going through the EFI system table each time, just copy the
runtime services table pointer into struct efi directly. This is the
last use of the system table pointer in struct efi, allowing us to
drop it in a future patch, along with a fair amount of quirky handling
of the translated address.

Note that usually, the runtime services pointer changes value during
the call to SetVirtualAddressMap(), so grab the updated value as soon
as that call returns. (Mixed mode uses a 1:1 mapping, and kexec boot
enters with the updated address in the system table, so in those cases,
we don't need to do anything here)

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23 21:59:42 +01:00
Ard Biesheuvel
9cd437ac0e efi/x86: Make fw_vendor, config_table and runtime sysfs nodes x86 specific
There is some code that exposes physical addresses of certain parts of
the EFI firmware implementation via sysfs nodes. These nodes are only
used on x86, and are of dubious value to begin with, so let's move
their handling into the x86 arch code.

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23 21:59:42 +01:00
Ard Biesheuvel
0a67361dcd efi/x86: Remove runtime table address from kexec EFI setup data
Since commit 33b85447fa ("efi/x86: Drop two near identical versions
of efi_runtime_init()"), we no longer map the EFI runtime services table
before calling SetVirtualAddressMap(), which means we don't need the 1:1
mapped physical address of this table, and so there is no point in passing
the address via EFI setup data on kexec boot.

Note that the kexec tools will still look for this address in sysfs, so
we still need to provide it.

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23 21:59:42 +01:00
Ard Biesheuvel
2931d526d5 efi/libstub: Make the LoadFile EFI protocol accessible
Add the protocol definitions, GUIDs and mixed mode glue so that
the EFI loadfile protocol can be used from the stub. This will
be used in a future patch to load the initrd.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23 21:57:15 +01:00
Ard Biesheuvel
abd268685a efi/libstub: Expose LocateDevicePath boot service
We will be adding support for loading the initrd from a GUIDed
device path in a subsequent patch, so update the prototype of
the LocateDevicePath() boot service to make it callable from
our code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23 21:57:15 +01:00
Ard Biesheuvel
1e45bf7372 efi/libstub/x86: Permit cmdline data to be allocated above 4 GB
We now support cmdline data that is located in memory that is not
32-bit addressable, so relax the allocation limit on systems where
this feature is enabled.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23 21:57:15 +01:00
Oliver Upton
5ef8acbdd6 KVM: nVMX: Emulate MTF when performing instruction emulation
Since commit 5f3d45e7f2 ("kvm/x86: add support for
MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap
flag processor-based execution control for its L2 guest. KVM simply
forwards any MTF VM-exits to the L1 guest, which works for normal
instruction execution.

However, when KVM needs to emulate an instruction on the behalf of an L2
guest, the monitor trap flag is not emulated. Add the necessary logic to
kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon
instruction emulation for L2.

Fixes: 5f3d45e7f2 ("kvm/x86: add support for MONITOR_TRAP_FLAG")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23 09:36:23 +01:00
Vitaly Kuznetsov
91a5f413af KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1
Even when APICv is disabled for L1 it can (and, actually, is) still
available for L2, this means we need to always call
vmx_deliver_nested_posted_interrupt() when attempting an interrupt
delivery.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21 18:05:21 +01:00
Xiaoyao Li
624e18f92f KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE
Commit 159348784f ("x86/vmx: Introduce VMX_FEATURES_*") missed
bit 26 (enable user wait and pause) of Secondary Processor-based
VM-Execution Controls.

Add VMX_FEATURE_USR_WAIT_PAUSE flag so that it shows up in /proc/cpuinfo,
and use it to define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE to make them
uniform.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21 18:05:19 +01:00
Peter Zijlstra (Intel)
6650cdd9a8 x86/split_lock: Enable split lock detection by kernel
A split-lock occurs when an atomic instruction operates on data that spans
two cache lines. In order to maintain atomicity the core takes a global bus
lock.

This is typically >1000 cycles slower than an atomic operation within a
cache line. It also disrupts performance on other cores (which must wait
for the bus lock to be released before their memory operations can
complete). For real-time systems this may mean missing deadlines. For other
systems it may just be very annoying.

Some CPUs have the capability to raise an #AC trap when a split lock is
attempted.

Provide a command line option to give the user choices on how to handle
this:

split_lock_detect=
	off	- not enabled (no traps for split locks)
	warn	- warn once when an application does a
		  split lock, but allow it to continue
		  running.
	fatal	- Send SIGBUS to applications that cause split lock

On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it will
OOPs.

One implementation wrinkle is that the MSR to control the split lock
detection is per-core, not per thread. This might result in some short
lived races on HT systems in "warn" mode if Linux tries to enable on one
thread while disabling on the other. Race analysis by Sean Christopherson:

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.
  - Transitioning between off/warn/fatal modes at runtime isn't supported
    and disabling is tracked per task, so hardware will always reach a steady
    state that matches the configured mode.  I.e. split-lock is guaranteed to
    be enabled in hardware once all _TIF_SLD threads have been scheduled out.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20200126200535.GB30377@agluck-desk2.amr.corp.intel.com
2020-02-20 21:17:53 +01:00
Qian Cai
b78a8552d7 kvm/emulate: fix a -Werror=cast-function-type
arch/x86/kvm/emulate.c: In function 'x86_emulate_insn':
arch/x86/kvm/emulate.c:5686:22: error: cast between incompatible
function types from 'int (*)(struct x86_emulate_ctxt *)' to 'void
(*)(struct fastop *)' [-Werror=cast-function-type]
    rc = fastop(ctxt, (fastop_t)ctxt->execute);

Fix it by using an unnamed union of a (*execute) function pointer and a
(*fastop) function pointer.

Fixes: 3009afc6e3 ("KVM: x86: Use a typedef for fastop functions")
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-20 18:13:45 +01:00
Kim Phillips
21b5ee59ef x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF
Commit

  aaf248848d ("perf/x86/msr: Add AMD IRPERF (Instructions Retired)
		  performance counter")

added support for access to the free-running counter via 'perf -e
msr/irperf/', but when exercised, it always returns a 0 count:

BEFORE:

  $ perf stat -e instructions,msr/irperf/ true

   Performance counter stats for 'true':

             624,833      instructions
                   0      msr/irperf/

Simply set its enable bit - HWCR bit 30 - to make it start counting.

Enablement is restricted to all machines advertising IRPERF capability,
except those susceptible to an erratum that makes the IRPERF return
bad values.

That erratum occurs in Family 17h models 00-1fh [1], but not in F17h
models 20h and above [2].

AFTER (on a family 17h model 31h machine):

  $ perf stat -e instructions,msr/irperf/ true

   Performance counter stats for 'true':

             621,690      instructions
             622,490      msr/irperf/

[1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors
[2] Revision Guide for AMD Family 17h Models 30h-3Fh Processors

The revision guides are available from the bugzilla Link below.

 [ bp: Massage commit message. ]

Fixes: aaf248848d ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: http://lkml.kernel.org/r/20200214201805.13830-1-kim.phillips@amd.com
2020-02-19 20:01:54 +01:00
Benjamin Thiel
b10c307f6f x86/cpu: Move prototype for get_umwait_control_msr() to a global location
.. in order to fix a -Wmissing-prototypes warning.

No functional change.

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20200123172945.7235-1-b.thiel@posteo.de
2020-02-17 19:32:45 +01:00
Benjamin Thiel
99ce3255fd x86/syscalls: Add prototypes for C syscall callbacks
.. in order to fix a couple of -Wmissing-prototypes warnings.

No functional change.

 [ bp: Massage commit message and drop newlines. ]

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200123152754.20149-1-b.thiel@posteo.de
2020-02-17 18:22:25 +01:00
Thomas Gleixner
b95a8a27c3 x86/vdso: Use generic VDSO clock mode storage
Switch to the generic VDSO clock mode storage.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> (VDSO parts)
Acked-by: Juergen Gross <jgross@suse.com> (Xen parts)
Acked-by: Paolo Bonzini <pbonzini@redhat.com> (KVM parts)
Link: https://lkml.kernel.org/r/20200207124403.152039903@linutronix.de
2020-02-17 14:40:23 +01:00
Thomas Gleixner
eec399dd86 x86/vdso: Move VDSO clocksource state tracking to callback
All architectures which use the generic VDSO code have their own storage
for the VDSO clock mode. That's pointless and just requires duplicate code.

X86 abuses the function which retrieves the architecture specific clock
mode storage to mark the clocksource as used in the VDSO. That's silly
because this is invoked on every tick when the VDSO data is updated.

Move this functionality to the clocksource::enable() callback so it gets
invoked once when the clocksource is installed. This allows to make the
clock mode storage generic.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>  (Hyper-V parts)
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> (VDSO parts)
Acked-by: Juergen Gross <jgross@suse.com> (Xen parts)
Link: https://lkml.kernel.org/r/20200207124402.934519777@linutronix.de
2020-02-17 14:40:22 +01:00
Thomas Gleixner
50e8187158 x86/vdso: Mark the TSC clocksource path likely
Jumping out of line for the TSC clcoksource read is creating awful
code. TSC is likely to be the clocksource at least on bare metal and the PV
interfaces are sufficiently more work that the jump over the TSC read is
just in the noise.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20200207124402.328922847@linutronix.de
2020-02-17 14:40:19 +01:00
Al Viro
c8e3dd8660 x86 user stack frame reads: switch to explicit __get_user()
rather than relying upon the magic in raw_copy_from_user()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-15 17:26:26 -05:00
Frederic Weisbecker
68d875131e x86: Remove TIF_NOHZ
Static keys have replaced TIF_NOHZ to optimize the calls to context
tracking. We can now safely remove that thread flag.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
2020-02-14 16:05:19 +01:00
Thomas Gleixner
7c80579530 x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY
Evaluating _TIF_NOHZ to decide whether to use the slow syscall entry path
is not only pointless, it's actually counterproductive:

 1) Context tracking code is invoked unconditionally before that flag is
    evaluated.

 2) If the flag is set the slow path is invoked for nothing due to #1

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2020-02-14 16:04:35 +01:00
Miaohe Lin
ffdbd50dca KVM: nVMX: Fix some comment typos and coding style
Fix some typos in the comments. Also fix coding style.
[Sean Christopherson rewrites the comment of write_fault_to_shadow_pgtable
field in struct kvm_vcpu_arch.]

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:43 +01:00
Linus Torvalds
1a2a76c268 A set of fixes for X86:
- Ensure that the PIT is set up when the local APIC is disable or
    configured in legacy mode. This is caused by an ordering issue
    introduced in the recent changes which skip PIT initialization when the
    TSC and APIC frequencies are already known.
 
  - Handle malformed SRAT tables during early ACPI parsing which caused an
    infinite loop anda boot hang.
 
  - Fix a long standing race in the affinity setting code which affects PCI
    devices with non-maskable MSI interrupts. The problem is caused by the
    non-atomic writes of the MSI address (destination APIC id) and data
    (vector) fields which the device uses to construct the MSI message. The
    non-atomic writes are mandated by PCI.
 
    If both fields change and the device raises an interrupt after writing
    address and before writing data, then the MSI block constructs a
    inconsistent message which causes interrupts to be lost and subsequent
    malfunction of the device.
 
    The fix is to redirect the interrupt to the new vector on the current
    CPU first and then switch it over to the new target CPU. This allows to
    observe an eventually raised interrupt in the transitional stage (old
    CPU, new vector) to be observed in the APIC IRR and retriggered on the
    new target CPU and the new vector. The potential spurious interrupts
    caused by this are harmless and can in the worst case expose a buggy
    driver (all handlers have to be able to deal with spurious interrupts as
    they can and do happen for various reasons).
 
  - Add the missing suspend/resume mechanism for the HYPERV hypercall page
    which prevents resume hibernation on HYPERV guests. This change got
    lost before the merge window.
 
  - Mask the IOAPIC before disabling the local APIC to prevent potentially
    stale IOAPIC remote IRR bits which cause stale interrupt lines after
    resume.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5AEJwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWY2D/47ur9gsVQGryKzneVAr0SCsq4Un11e
 uifX4ldu4gCEBRTYhpgcpiFKeLvY/QJ6uOD+gQUHyy/s+lCf6yzE6UhXEqSCtcT7
 LkSxD8jAFf6KhMA6iqYBfyxUsPMXBetLjjHWsyc/kf15O/vbYm7qf05timmNZkDS
 S7C+yr3KRqRjLR7G7t4twlgC9aLcNUQihUdsH2qyTvjnlkYHJLDa0/Js7bFYYKVx
 9GdUDLvPFB1mZ76g012De4R3kJsWitiyLlQ38DP5VysKulnszUCdiXlgCEFrgxvQ
 OQhLafQzOAzvxQmP+1alODR0dmJZA8k0zsDeeTB/vTpRvv6+Pe2qUswLSpauBzuq
 TpDsrv8/5pwZh28+91f/Unk+tH8NaVNtGe/Uf+ePxIkn1nbqL84o4NHGplM6R97d
 HAWdZQZ1cGRLf6YRRJ+57oM/5xE3vBbF1Wn0+QDTFwdsk2vcxuQ4eB3M/8E1V7Zk
 upp8ty50bZ5+rxQ8XTq/eb8epSRnfLoBYpi4ux6MIOWRdmKDl40cDeZCzA2kNP7m
 qY1haaRN3ksqvhzc0Yf6cL+CgvC4ur8gRHezfOqmBzVoaLyVEFIVjgjR/ojf0bq8
 /v+L9D5+IdIv4jEZruRRs0gOXNDzoBbvf0qKGaO0tUTWiDsv7c5AGixp8aozniHS
 HXsv1lIpRuC7WQ==
 =WxKD
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for X86:

   - Ensure that the PIT is set up when the local APIC is disable or
     configured in legacy mode. This is caused by an ordering issue
     introduced in the recent changes which skip PIT initialization when
     the TSC and APIC frequencies are already known.

   - Handle malformed SRAT tables during early ACPI parsing which caused
     an infinite loop anda boot hang.

   - Fix a long standing race in the affinity setting code which affects
     PCI devices with non-maskable MSI interrupts. The problem is caused
     by the non-atomic writes of the MSI address (destination APIC id)
     and data (vector) fields which the device uses to construct the MSI
     message. The non-atomic writes are mandated by PCI.

     If both fields change and the device raises an interrupt after
     writing address and before writing data, then the MSI block
     constructs a inconsistent message which causes interrupts to be
     lost and subsequent malfunction of the device.

     The fix is to redirect the interrupt to the new vector on the
     current CPU first and then switch it over to the new target CPU.
     This allows to observe an eventually raised interrupt in the
     transitional stage (old CPU, new vector) to be observed in the APIC
     IRR and retriggered on the new target CPU and the new vector.

     The potential spurious interrupts caused by this are harmless and
     can in the worst case expose a buggy driver (all handlers have to
     be able to deal with spurious interrupts as they can and do happen
     for various reasons).

   - Add the missing suspend/resume mechanism for the HYPERV hypercall
     page which prevents resume hibernation on HYPERV guests. This
     change got lost before the merge window.

   - Mask the IOAPIC before disabling the local APIC to prevent
     potentially stale IOAPIC remote IRR bits which cause stale
     interrupt lines after resume"

* tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Mask IOAPIC entries when disabling the local APIC
  x86/hyperv: Suspend/resume the hypercall page for hibernation
  x86/apic/msi: Plug non-maskable MSI affinity race
  x86/boot: Handle malformed SRAT tables during early ACPI parsing
  x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
2020-02-09 12:11:12 -08:00
Linus Torvalds
90568ecf56 s390:
* fix register corruption
 * ENOTSUPP/EOPNOTSUPP mixed
 * reset cleanups/fixes
 * selftests
 
 x86:
 * Bug fixes and cleanups
 * AMD support for APIC virtualization even in combination with
   in-kernel PIT or IOAPIC.
 
 MIPS:
 * Compilation fix.
 
 Generic:
 * Fix refcount overflow for zero page.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeOuf7AAoJEL/70l94x66DOBQH/j1W9lUpbDgr9aWbrZT+O/yP
 FWzUDrRlCZCjV1FQKbGPa4YLeDRTG5n+RIQTjmCGRqiMqeoELSJ1+iK99e97nG/u
 L28zf/90Nf0R+wwHL4AOFeploTYfG4WP8EVnlr3CG2UCJrNjxN1KU7yRZoWmWa2d
 ckLJ8ouwNvx6VZd233LVmT38EP4352d1LyqIL8/+oXDIyAcRJLFQu1gRCwagsh3w
 1v1czowFpWnRQ/z9zU7YD+PA4v85/Ge8sVVHlpi1X5NgV/khk4U6B0crAw6M+la+
 mTmpz9g56oAh9m9NUdtv4zDCz1EWGH0v8+ZkAajUKtrM0ftJMn57P6p8PH4VVlE=
 =5+Wl
 -----END PGP SIGNATURE-----

Merge tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "s390:
   - fix register corruption
   - ENOTSUPP/EOPNOTSUPP mixed
   - reset cleanups/fixes
   - selftests

  x86:
   - Bug fixes and cleanups
   - AMD support for APIC virtualization even in combination with
     in-kernel PIT or IOAPIC.

  MIPS:
   - Compilation fix.

  Generic:
   - Fix refcount overflow for zero page"

* tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration
  KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit
  x86: vmxfeatures: rename features for consistency with KVM and manual
  KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses
  KVM: x86: Fix perfctr WRMSR for running counters
  x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
  x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
  kvm: mmu: Separate generating and setting mmio ptes
  kvm: mmu: Replace unsigned with unsigned int for PTE access
  KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()
  KVM: MIPS: Fold comparecount_func() into comparecount_wakeup()
  KVM: MIPS: Fix a build error due to referencing not-yet-defined function
  x86/kvm: do not setup pv tlb flush when not paravirtualized
  KVM: fix overflow of zero page refcount with ksm running
  KVM: x86: Take a u64 when checking for a valid dr7 value
  KVM: x86: use raw clock values consistently
  KVM: x86: reorganize pvclock_gtod_data members
  KVM: nVMX: delete meaningless nested_vmx_run() declaration
  KVM: SVM: allow AVIC without split irqchip
  kvm: ioapic: Lazy update IOAPIC EOI
  ...
2020-02-06 09:07:45 -08:00
Linus Torvalds
9e6c535c64 pci-v5.6-fixes-1
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl48HHIUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyJkhAAu3GpkS0sOWNmQhmvr2v9nn/QD8qN
 geXF7WmEbl+3CbWEEg/UEag4KPaNnQEqdPodX84ZF2wJ+vWQvQmqhYDd4s+a1IiX
 QEIFLx0LqcjaWSvrrhJnTTp2aFLKZQdP6mkhX1ddzsseDburKcq15mVzkfgibNcf
 76fbzWyvlNw9uv4CQBeXHJ2l/Ga8pLQN+ZreG0xKHtVsXZ0J2WT6Qp+KDLU+twW+
 BRoN8rzah9I2vbsPNKo0cikjGiTPpJWFzuHiEWZJQplJZ3kpa4/QTSlXewpLruiM
 /k1gKGeHyYlQVWvD1FcwxmSWle3pvihjwKfzGHCsw6cWTtIeEFlD4ehg4LIqjYuW
 aIktqDLj9dHcEh4t5fIVkGgGzc1dRZyaqXnxxC0f5sGezZSFFrCpHTLXDQpznnbC
 xj6kejxwfyypvMtIWJLK0SwdCWikcL2x0I/x1JgV4iXEdr4LjEhtg6rxCewENQ9Q
 xZgEbVTi+hHSCZYqfu68/qrWWrapt4dcmZhxPXkk2bj2A5xsgyou4gaTG6jhMQab
 WmgIwyzd3Bk+o3+JJbRt1/qhDUx3413ja6f2oX8XNV6b0Fa9XIDwk4Qw47tLA8Kj
 yWWt4w3iccghhg2d/eWz2+fxfbNB0qGGR4v+Y+lpy/jvFORQpS44uTKgcr0q1OvR
 kye5UT+mW+b1z1U=
 =tcy+
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Define to_pci_sysdata() always to fix build breakage when !CONFIG_PCI
   (Jason A. Donenfeld)

 - Use PF PASID for VFs to fix VF IOMMU bind failures (Kuppuswamy
   Sathyanarayanan)

* tag 'pci-v5.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/ATS: Use PF PASID for VFs
  x86/PCI: Define to_pci_sysdata() even when !CONFIG_PCI
2020-02-06 14:17:38 +00:00
Paolo Bonzini
bcfcff640c x86: vmxfeatures: rename features for consistency with KVM and manual
Three of the feature bits in vmxfeatures.h have names that are different
from the Intel SDM.  The names have been adjusted recently in KVM but they
were using the old name in the tip tree's x86/cpu branch.  Adjust for
consistency.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:22:59 +01:00
Suravee Suthikulpanit
e2ed4078a6 kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.
AMD SVM AVIC accelerates EOI write and does not trap. This causes
in-kernel PIT re-injection mode to fail since it relies on irq-ack
notifier mechanism. So, APICv is activated only when in-kernel PIT
is in discard mode e.g. w/ qemu option:

  -global kvm-pit.lost_tick_policy=discard

Also, introduce APICV_INHIBIT_REASON_PIT_REINJ bit to be used for this
reason.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00
Suravee Suthikulpanit
f3515dc3be svm: Temporarily deactivate AVIC during ExtINT handling
AMD AVIC does not support ExtINT. Therefore, AVIC must be temporary
deactivated and fall back to using legacy interrupt injection via vINTR
and interrupt window.

Also, introduce APICV_INHIBIT_REASON_IRQWIN to be used for this reason.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[Rename svm_request_update_avic to svm_toggle_avic_for_extint. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:43 +01:00
Suravee Suthikulpanit
9a0bf05430 svm: Deactivate AVIC when launching guest with nested SVM support
Since AVIC does not currently work w/ nested virtualization,
deactivate AVIC for the guest if setting CPUID Fn80000001_ECX[SVM]
(i.e. indicate support for SVM, which is needed for nested virtualization).
Also, introduce a new APICV_INHIBIT_REASON_NESTED bit to be used for
this reason.

Suggested-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:43 +01:00
Suravee Suthikulpanit
f4fdc0a2ed kvm: x86: hyperv: Use APICv update request interface
Since disabling APICv has to be done for all vcpus on AMD-based
system, adopt the newly introduced kvm_request_apicv_update()
interface, and introduce a new APICV_INHIBIT_REASON_HYPERV.

Also, remove the kvm_vcpu_deactivate_apicv() since no longer used.

Cc: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:43 +01:00
Suravee Suthikulpanit
2de9d0ccd0 kvm: x86: Introduce x86 ops hook for pre-update APICv
AMD SVM AVIC needs to update APIC backing page mapping before changing
APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl
function hook to be called prior KVM APICv update request to each vcpu.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:42 +01:00
Suravee Suthikulpanit
ef8efd7a15 kvm: x86: Introduce APICv x86 ops for checking APIC inhibit reasons
Inibit reason bits are used to determine if APICv deactivation is
applicable for a particular hardware virtualization architecture.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:42 +01:00
Suravee Suthikulpanit
8df14af42f kvm: x86: Add support for dynamic APICv activation
Certain runtime conditions require APICv to be temporary deactivated
during runtime.  The current implementation only support run-time
deactivation of APICv when Hyper-V SynIC is enabled, which is not
temporary.

In addition, for AMD, when APICv is (de)activated at runtime,
all vcpus in the VM have to operate in the same mode.  Thus the
requesting vcpu must notify the others.

So, introduce the following:
 * A new KVM_REQ_APICV_UPDATE request bit
 * Interfaces to request all vcpus to update APICv status
 * A new interface to update APICV-related parameters for each vcpu

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:41 +01:00
Paolo Bonzini
7e3e67a987 KVM: x86: remove get_enable_apicv from kvm_x86_ops
It is unused now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:40 +01:00
Suravee Suthikulpanit
4e19c36f2d kvm: x86: Introduce APICv inhibit reason bits
There are several reasons in which a VM needs to deactivate APICv
e.g. disable APICv via parameter during module loading, or when
enable Hyper-V SynIC support. Additional inhibit reasons will be
introduced later on when dynamic APICv is supported,

Introduce KVM APICv inhibit reason bits along with a new variable,
apicv_inhibit_reasons, to help keep track of APICv state for each VM,

Initially, the APICV_INHIBIT_REASON_DISABLE bit is used to indicate
the case where APICv is disabled during KVM module load.
(e.g. insmod kvm_amd avic=0 or insmod kvm_intel enable_apicv=0).

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[Do not use get_enable_apicv; consider irqchip_split in svm.c. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:40 +01:00
Jason A. Donenfeld
b9303bb199 x86/PCI: Define to_pci_sysdata() even when !CONFIG_PCI
Recently, the to_pci_sysdata() helper was added inside the CONFIG_PCI
guard, but it is used inside a CONFIG_NUMA guard, which does not require
CONFIG_PCI.  This breaks builds on !CONFIG_PCI machines.  Make
to_pci_sysdata() available in all configurations.

Fixes: aad6aa0cd6 ("x86/PCI: Add to_pci_sysdata() helper")
Link: https://lore.kernel.org/r/20200203215306.172000-1-Jason@zx2c4.com
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>	# build-tested
2020-02-04 08:44:46 -06:00
Linus Torvalds
153b5c566d Microblaze patches for 5.6-rc1
- Enable CMA
 - Add support for MB v11
 - Defconfig updates
 - Minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iF0EABECAB0WIQQbPNTMvXmYlBPRwx7KSWXLKUoMIQUCXjlJ1gAKCRDKSWXLKUoM
 IWy9AJ4tauV9sUb+zNadrYxI+2zemRstUwCfQ49LG4kHpFCv8ldSTmhBPJY/3MI=
 =QpT4
 -----END PGP SIGNATURE-----

Merge tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze

Pull Microblaze update from Michal Simek:

 - enable CMA

 - add support for MB v11

 - defconfig updates

 - minor fixes

* tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Add ID for Microblaze v11
  microblaze: Prevent the overflow of the start
  microblaze: Wire CMA allocator
  asm-generic: Make dma-contiguous.h a mandatory include/asm header
  microblaze: Sync defconfig with latest Kconfig layout
  microblaze: defconfig: Disable EXT2 driver and Enable EXT3 & EXT4 drivers
  microblaze: Align comments with register usage
2020-02-04 11:58:07 +00:00
Michal Simek
def3f7cefe asm-generic: Make dma-contiguous.h a mandatory include/asm header
dma-continuguous.h is generic for all architectures except arm32 which has
its own version.

Similar change was done for msi.h by commit a1b39bae16
("asm-generic: Make msi.h a mandatory include/asm header")

Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/linux-arm-kernel/20200117080446.GA8980@lst.de/T/#m92bb56b04161057635d4142e1b3b9b6b0a70122e
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # for arch/riscv
2020-02-04 11:38:59 +01:00
Peter Zijlstra
ff2e6d7259 asm-generic/tlb: rename HAVE_RCU_TABLE_FREE
Towards a more consistent naming scheme.

[akpm@linux-foundation.org: fix sparc64 Kconfig]
Link: http://lkml.kernel.org/r/20200116064531.483522-7-aneesh.kumar@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:26 +00:00
Steven Price
c5cfae12fd x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct
To enable x86 to use the generic walk_page_range() function, the callers
of ptdump_walk_pgd_level_debugfs() need to pass in the mm_struct.

This means that ptdump_walk_pgd_level_core() is now always passed a valid
pgd, so drop the support for pgd==NULL.

Link: http://lkml.kernel.org/r/20191218162402.45610-19-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:25 +00:00
Steven Price
e455248d5e x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct
To enable x86 to use the generic walk_page_range() function, the callers
of ptdump_walk_pgd_level() need to pass an mm_struct rather than the raw
pgd_t pointer.  Luckily since commit 7e904a91bf ("efi: Use efi_mm in x86
as well as ARM") we now have an mm_struct for EFI on x86.

Link: http://lkml.kernel.org/r/20191218162402.45610-18-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:25 +00:00
Steven Price
757b2a4ab5 x86: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space.  For this it needs to know when it has reached a
'leaf' entry in the page tables.  This information is provided by the
p?d_leaf() functions/macros.

For x86 we already have p?d_large() functions, so simply add macros to
provide the generic p?d_leaf() names for the generic code.

Link: http://lkml.kernel.org/r/20191218162402.45610-11-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:25 +00:00
Linus Torvalds
acd77500aa Change /dev/random so that it uses the CRNG and only blocking if the
CRNG hasn't initialized, instead of the old blocking pool.  Also clean
 up archrandom.h, and some other miscellaneous cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl40j1kACgkQ8vlZVpUN
 gaPCywf8CWS9HFd2Iipj60gkTVugjlL5ib0lbfhQcAAwwzw1GLTXJSMBzzoMRHY/
 ZI2sJZS1m0V1oWNnXXVKi+A1VXmlValWXAc+7fvbeaIe5pRT1EHP14s4Kz7/4d8Q
 dk0b8cxNpR8u5CcbN8y9D+71IKpdksUbX7uGuGfw3bncQdRNwJVf+oS1fMGS0Rsb
 F8ddQaED7iFpX2BMl56afQ4t2t0LA5+eLYMGoYoJx5fgd9BseP0TEcjj9Y4Z30M7
 +GO4NZjUbAY0syx9r8hx3P/5miWZm2J9QJmJoXHhr5+IcAKM+6+Uo6X6gkOEqV4i
 U//V1cqNuowV5ckE4Na+MfBillinsQ==
 =HeFM
 -----END PGP SIGNATURE-----

Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

Pull random changes from Ted Ts'o:
 "Change /dev/random so that it uses the CRNG and only blocking if the
  CRNG hasn't initialized, instead of the old blocking pool. Also clean
  up archrandom.h, and some other miscellaneous cleanups"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (24 commits)
  s390x: Mark archrandom.h functions __must_check
  powerpc: Mark archrandom.h functions __must_check
  powerpc: Use bool in archrandom.h
  x86: Mark archrandom.h functions __must_check
  linux/random.h: Mark CONFIG_ARCH_RANDOM functions __must_check
  linux/random.h: Use false with bool
  linux/random.h: Remove arch_has_random, arch_has_random_seed
  s390: Remove arch_has_random, arch_has_random_seed
  powerpc: Remove arch_has_random, arch_has_random_seed
  x86: Remove arch_has_random, arch_has_random_seed
  random: remove some dead code of poolinfo
  random: fix typo in add_timer_randomness()
  random: Add and use pr_fmt()
  random: convert to ENTROPY_BITS for better code readability
  random: remove unnecessary unlikely()
  random: remove kernel.random.read_wakeup_threshold
  random: delete code to pull data into pools
  random: remove the blocking pool
  random: make /dev/random be almost like /dev/urandom
  random: ignore GRND_RANDOM in getentropy(2)
  ...
2020-02-01 09:48:37 -08:00
Thomas Gleixner
6f1a4891a5 x86/apic/msi: Plug non-maskable MSI affinity race
Evan tracked down a subtle race between the update of the MSI message and
the device raising an interrupt internally on PCI devices which do not
support MSI masking. The update of the MSI message is non-atomic and
consists of either 2 or 3 sequential 32bit wide writes to the PCI config
space.

   - Write address low 32bits
   - Write address high 32bits (If supported by device)
   - Write data

When an interrupt is migrated then both address and data might change, so
the kernel attempts to mask the MSI interrupt first. But for MSI masking is
optional, so there exist devices which do not provide it. That means that
if the device raises an interrupt internally between the writes then a MSI
message is sent built from half updated state.

On x86 this can lead to spurious interrupts on the wrong interrupt
vector when the affinity setting changes both address and data. As a
consequence the device interrupt can be lost causing the device to
become stuck or malfunctioning.

Evan tried to handle that by disabling MSI accross an MSI message
update. That's not feasible because disabling MSI has issues on its own:

 If MSI is disabled the PCI device is routing an interrupt to the legacy
 INTx mechanism. The INTx delivery can be disabled, but the disablement is
 not working on all devices.

 Some devices lose interrupts when both MSI and INTx delivery are disabled.

Another way to solve this would be to enforce the allocation of the same
vector on all CPUs in the system for this kind of screwed devices. That
could be done, but it would bring back the vector space exhaustion problems
which got solved a few years ago.

Fortunately the high address (if supported by the device) is only relevant
when X2APIC is enabled which implies interrupt remapping. In the interrupt
remapping case the affinity setting is happening at the interrupt remapping
unit and the PCI MSI message is programmed only once when the PCI device is
initialized.

That makes it possible to solve it with a two step update:

  1) Target the MSI msg to the new vector on the current target CPU

  2) Target the MSI msg to the new vector on the new target CPU

In both cases writing the MSI message is only changing a single 32bit word
which prevents the issue of inconsistency.

After writing the final destination it is necessary to check whether the
device issued an interrupt while the intermediate state #1 (new vector,
current CPU) was in effect.

This is possible because the affinity change is always happening on the
current target CPU. The code runs with interrupts disabled, so the
interrupt can be detected by checking the IRR of the local APIC. If the
vector is pending in the IRR then the interrupt is retriggered on the new
target CPU by sending an IPI for the associated vector on the target CPU.

This can cause spurious interrupts on both the local and the new target
CPU.

 1) If the new vector is not in use on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then interrupt entry code will
    ignore that spurious interrupt. The vector is marked so that the
    'No irq handler for vector' warning is supressed once.

 2) If the new vector is in use already on the local CPU then the IRR check
    might see an pending interrupt from the device which is using this
    vector. The IPI to the new target CPU will then invoke the handler of
    the device, which got the affinity change, even if that device did not
    issue an interrupt

 3) If the new vector is in use already on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then the handler of the device which
    uses that vector on the local CPU will be invoked.

expose issues in device driver interrupt handlers which are not prepared to
handle a spurious interrupt correctly. This not a regression, it's just
exposing something which was already broken as spurious interrupts can
happen for a lot of reasons and all driver handlers need to be able to deal
with them.

Reported-by: Evan Green <evgreen@chromium.org>
Debugged-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Evan Green <evgreen@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de
2020-02-01 09:31:47 +01:00
Linus Torvalds
26dca6dbd6 pci-v5.6-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl40PWgUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwclA/+Id/7Uc5S0r7xgFQRr3lbn0hHcx7f
 oBgmm6kGl8bu77MDiY32WLmPsp9e4BlK2M765cKQL5n20y8CzJ+kthZM8tZEDba4
 pnrZnWZ0A2xaBKzJqqYDtCqAeP97noCs4zBLo3JCA6jYCYI5bkvmdMQRlRjTUofO
 tkenGE+vexaJsLB7ghNskL3xGMueXLtLf/hXvaC6WGbSI9/zUmliHDL53DoKDPRo
 /9TGYDMwItZz+BhmBJz8hAL4naQIhIcDk2mz7CzWkY9xDhCJ1yeEwFvtvJwq0uM2
 Nmtq1g6yCB3sjlx+bRzrioLnouflztK1PGRbNugrMkR5XM9HIFmNwaDrqpU11ffA
 LQabMpbS3RWH3hbh4LYVMW13hbO+ld7/NG8jMFce2LHBWaGj6YejUQGdifz6vGRk
 JnDOgP19v5gWw08ibwkdfYzznPfMXp5IzFdJQFKhK+ugGDSJ8VeXiQ/pWtzghl3z
 P/puRw0BiL7ob/FUmhwn4J1Ytml7PZE+cJVN2l4C/CwKxR583GRUDgSHNL7Dky+o
 GcH9Tmjt4hQMNYRP01PACUmFYJwDfB+zgQ64a+uJsQwl/j+yfMnc1t/kqdM6yC9J
 GgkqLp989G/a3n9w5IC1P8aDYiwRqABvAFzlP9OZcIMUwmWbrhH175Qf6skKYIhH
 q9RKcLVXZdRS3mc=
 =fQ0E
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 "Resource management:

   - Improve resource assignment for hot-added nested bridges, e.g.,
     Thunderbolt (Nicholas Johnson)

  Power management:

   - Optionally print config space of devices before suspend (Chen Yu)

   - Increase D3 delay for AMD Ryzen5/7 XHCI controllers (Daniel Drake)

  Virtualization:

   - Generalize DMA alias quirks (James Sewart)

   - Add DMA alias quirk for PLX PEX NTB (James Sewart)

   - Fix IOV memory leak (Navid Emamdoost)

  AER:

   - Log which device prevents error recovery (Yicong Yang)

  Peer-to-peer DMA:

   - Whitelist Intel SkyLake-E (Armen Baloyan)

  Broadcom iProc host bridge driver:

   - Apply PAXC quirk whether driver is built-in or module (Wei Liu)

  Broadcom STB host bridge driver:

   - Add Broadcom STB PCIe host controller driver (Jim Quinlan)

  Intel Gateway SoC host bridge driver:

   - Add driver for Intel Gateway SoC (Dilip Kota)

  Intel VMD host bridge driver:

   - Add support for DMA aliases on other buses (Jon Derrick)

   - Remove dma_map_ops overrides (Jon Derrick)

   - Remove now-unused X86_DEV_DMA_OPS (Christoph Hellwig)

  NVIDIA Tegra host bridge driver:

   - Fix Tegra30 afi_pex2_ctrl register offset (Marcel Ziswiler)

  Panasonic UniPhier host bridge driver:

   - Remove module code since driver can't be built as a module
     (Masahiro Yamada)

  Qualcomm host bridge driver:

   - Add support for SDM845 PCIe controller (Bjorn Andersson)

  TI Keystone host bridge driver:

   - Fix "num-viewport" DT property error handling (Kishon Vijay Abraham I)

   - Fix link training retries initiation (Yurii Monakov)

   - Fix outbound region mapping (Yurii Monakov)

  Misc:

   - Add Switchtec Gen4 support (Kelvin Cao)

   - Add Switchtec Intercomm Notify and Upstream Error Containment
     support (Logan Gunthorpe)

   - Use dma_set_mask_and_coherent() since Switchtec supports 64-bit
     addressing (Wesley Sheng)"

* tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (60 commits)
  PCI: Allow adjust_bridge_window() to shrink resource if necessary
  PCI: Set resource size directly in adjust_bridge_window()
  PCI: Rename extend_bridge_window() to adjust_bridge_window()
  PCI: Rename extend_bridge_window() parameter
  PCI: Consider alignment of hot-added bridges when assigning resources
  PCI: Remove local variable usage in pci_bus_distribute_available_resources()
  PCI: Pass size + alignment to pci_bus_distribute_available_resources()
  PCI: Rename variables
  PCI: vmd: Add two VMD Device IDs
  PCI: Remove unnecessary braces
  PCI: brcmstb: Add MSI support
  PCI: brcmstb: Add Broadcom STB PCIe host controller driver
  x86/PCI: Remove X86_DEV_DMA_OPS
  PCI: vmd: Remove dma_map_ops overrides
  iommu/vt-d: Remove VMD child device sanity check
  iommu/vt-d: Use pci_real_dma_dev() for mapping
  PCI: Introduce pci_real_dma_dev()
  x86/PCI: Expose VMD's pci_dev in struct pci_sysdata
  x86/PCI: Add to_pci_sysdata() helper
  PCI/AER: Initialize aer_fifo
  ...
2020-01-31 14:48:54 -08:00
Linus Torvalds
b70a2d6b29 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - three fixes and a cleanup for the resctrl code

   - a HyperV fix

   - a fix to /proc/kcore contents in live debugging sessions

   - a fix for the x86 decoder opcode map"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/decoder: Add TEST opcode to Group3-2
  x86/resctrl: Clean up unused function parameter in mkdir path
  x86/resctrl: Fix a deadlock due to inaccurate reference
  x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup
  x86/resctrl: Fix use-after-free when deleting resource groups
  x86/hyper-v: Add "polling" bit to hv_synic_sint
  x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y
2020-01-31 11:05:33 -08:00
Linus Torvalds
e813e65038 ARM: Cleanups and corner case fixes
PPC: Bugfixes
 
 x86:
 * Support for mapping DAX areas with large nested page table entries.
 * Cleanups and bugfixes here too.  A particularly important one is
 a fix for FPU load when the thread has TIF_NEED_FPU_LOAD.  There is
 also a race condition which could be used in guest userspace to exploit
 the guest kernel, for which the embargo expired today.
 * Fast path for IPI delivery vmexits, shaving about 200 clock cycles
 from IPI latency.
 * Protect against "Spectre-v1/L1TF" (bring data in the cache via
 speculative out of bound accesses, use L1TF on the sibling hyperthread
 to read it), which unfortunately is an even bigger whack-a-mole game
 than SpectreV1.
 
 Sean continues his mission to rewrite KVM.  In addition to a sizable
 number of x86 patches, this time he contributed a pretty large refactoring
 of vCPU creation that affects all architectures but should not have any
 visible effect.
 
 s390 will come next week together with some more x86 patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeMxtCAAoJEL/70l94x66DQxIIAJv9hMmXLQHGFnUMskjGErR6
 DCLSC0YRdRMwE50CerblyJtGsMwGsPyHZwvZxoAceKJ9w0Yay9cyaoJ87ItBgHoY
 ce0HrqIUYqRSJ/F8WH2lSzkzMBr839rcmqw8p1tt4D5DIsYnxHGWwRaaP+5M/1KQ
 YKFu3Hea4L00U339iIuDkuA+xgz92LIbsn38svv5fxHhPAyWza0rDEYHNgzMKuoF
 IakLf5+RrBFAh6ZuhYWQQ44uxjb+uQa9pVmcqYzzTd5t1g4PV5uXtlJKesHoAvik
 Eba8IEUJn+HgQJjhp3YxQYuLeWOwRF3bwOiZ578MlJ4OPfYXMtbdlqCQANHOcGk=
 =H/q1
 -----END PGP SIGNATURE-----

Merge tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "This is the first batch of KVM changes.

  ARM:
   - cleanups and corner case fixes.

  PPC:
   - Bugfixes

  x86:
   - Support for mapping DAX areas with large nested page table entries.

   - Cleanups and bugfixes here too. A particularly important one is a
     fix for FPU load when the thread has TIF_NEED_FPU_LOAD. There is
     also a race condition which could be used in guest userspace to
     exploit the guest kernel, for which the embargo expired today.

   - Fast path for IPI delivery vmexits, shaving about 200 clock cycles
     from IPI latency.

   - Protect against "Spectre-v1/L1TF" (bring data in the cache via
     speculative out of bound accesses, use L1TF on the sibling
     hyperthread to read it), which unfortunately is an even bigger
     whack-a-mole game than SpectreV1.

  Sean continues his mission to rewrite KVM. In addition to a sizable
  number of x86 patches, this time he contributed a pretty large
  refactoring of vCPU creation that affects all architectures but should
  not have any visible effect.

  s390 will come next week together with some more x86 patches"

* tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  x86/KVM: Clean up host's steal time structure
  x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed
  x86/kvm: Cache gfn to pfn translation
  x86/kvm: Introduce kvm_(un)map_gfn()
  x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
  KVM: PPC: Book3S PR: Fix -Werror=return-type build failure
  KVM: PPC: Book3S HV: Release lock on page-out failure path
  KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer
  KVM: arm64: pmu: Only handle supported event counters
  KVM: arm64: pmu: Fix chained SW_INCR counters
  KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabled
  KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
  KVM: x86: Use a typedef for fastop functions
  KVM: X86: Add 'else' to unify fastop and execute call path
  KVM: x86: inline memslot_valid_for_gpte
  KVM: x86/mmu: Use huge pages for DAX-backed files
  KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()
  KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()
  KVM: x86/mmu: Zap any compound page when collapsing sptes
  KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)
  ...
2020-01-31 09:30:41 -08:00
Linus Torvalds
ccaaaf6fe5 MPX requires recompiling applications, which requires compiler support.
Unfortunately, GCC 9.1 is expected to be be released without support for
 MPX.  This means that there was only a relatively small window where
 folks could have ever used MPX.  It failed to gain wide adoption in the
 industry, and Linux was the only mainstream OS to ever support it widely.
 
 Support for the feature may also disappear on future processors.
 
 This set completes the process that we started during the 5.4 merge window.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJeK1/pAAoJEGg1lTBwyZKwgC8QAIiVn1d7A9Uj/WpnpgfCChCZ
 9XiV6Ak999qD9fbAcrgNfPjieaD4mtokocSRVJuRgJu5iLnIJCINlozLPe4yVl7P
 7zebnxkLq0CIA8d56bEUoFlC0J+oWYlDVQePZzNQsSk5KHVGXVLpF6U4vDVzZeQy
 cprgvdeY+ehB7G6IIo0MWTg5ylKYAsOAyVvK8NIGpKY2k6/YqCnsptnsVE7bvlHy
 TrEOiUWLv+hh0bMkZdP1PwKQKEuMO/IZly0HtviFbMN7T4TB1spfg7ELoBucEq3T
 s4EVbYRe+nIE4tuEAveaX3CgxJek8cY5MlticskdaKSEACBwabdOF55qsZy0u+WA
 PYC4iUIXfbOH8OgieKWtGX4IuSkRYdQ2nP4BOpe4ZX4+zvU7zOCIyVSKRrwkX8cc
 ADtWI5FAtB36KCgUuWnHGHNZpOxPTbTLBuBataFY4Q2uBNJEBJpscZ5H9ObtyGFU
 ZjlzqFnM0nFNDKEI1EEtv9jLzgZTU1RQ46s7EFeSeEQ2/s9wJ3+s5sBlVbljsmus
 o658bLOEaRWC/aF15dgmEXW9GAO6uifNdmbzGnRn7oEMYyFQPTWbZvi1zGz58QaG
 Y6WTtigVtsSrHS4wpYd+p+n1W06VnB6J3BpBM4G1VQv1Vm0dNd1tUOfkqOzPjg7c
 33Itmsz2LaW1mb67GlgZ
 =g4cC
 -----END PGP SIGNATURE-----

Merge tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx

Pull x86 MPX removal from Dave Hansen:
 "MPX requires recompiling applications, which requires compiler
  support. Unfortunately, GCC 9.1 is expected to be be released without
  support for MPX. This means that there was only a relatively small
  window where folks could have ever used MPX. It failed to gain wide
  adoption in the industry, and Linux was the only mainstream OS to ever
  support it widely.

  Support for the feature may also disappear on future processors.

  This set completes the process that we started during the 5.4 merge
  window when the MPX prctl()s were removed. XSAVE support is left in
  place, which allows MPX-using KVM guests to continue to function"

* tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx:
  x86/mpx: remove MPX from arch/x86
  mm: remove arch_bprm_mm_init() hook
  x86/mpx: remove bounds exception code
  x86/mpx: remove build infrastructure
  x86/alternatives: add missing insn.h include
2020-01-30 16:11:50 -08:00
Paolo Bonzini
4cbc418a44 Merge branch 'cve-2019-3016' into kvm-next-5.6
From Boris Ostrovsky:

The KVM hypervisor may provide a guest with ability to defer remote TLB
flush when the remote VCPU is not running. When this feature is used,
the TLB flush will happen only when the remote VPCU is scheduled to run
again. This will avoid unnecessary (and expensive) IPIs.

Under certain circumstances, when a guest initiates such deferred action,
the hypervisor may miss the request. It is also possible that the guest
may mistakenly assume that it has already marked remote VCPU as needing
a flush when in fact that request had already been processed by the
hypervisor. In both cases this will result in an invalid translation
being present in a vCPU, potentially allowing accesses to memory locations
in that guest's address space that should not be accessible.

Note that only intra-guest memory is vulnerable.

The five patches address both of these problems:
1. The first patch makes sure the hypervisor doesn't accidentally clear
a guest's remote flush request
2. The rest of the patches prevent the race between hypervisor
acknowledging a remote flush request and guest issuing a new one.

Conflicts:
	arch/x86/kvm/x86.c [move from kvm_arch_vcpu_free to kvm_arch_vcpu_destroy]
2020-01-30 18:47:59 +01:00
Boris Ostrovsky
a6bd811f12 x86/KVM: Clean up host's steal time structure
Now that we are mapping kvm_steal_time from the guest directly we
don't need keep a copy of it in kvm_vcpu_arch.st. The same is true
for the stime field.

This is part of CVE-2019-3016.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30 18:45:55 +01:00
Boris Ostrovsky
917248144d x86/kvm: Cache gfn to pfn translation
__kvm_map_gfn()'s call to gfn_to_pfn_memslot() is
* relatively expensive
* in certain cases (such as when done from atomic context) cannot be called

Stashing gfn-to-pfn mapping should help with both cases.

This is part of CVE-2019-3016.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30 18:45:55 +01:00
Linus Torvalds
33c84e89ab SCSI misc on 20200129
This series is slightly unusual because it includes Arnd's compat
 ioctl tree here:
 
 1c46a2cf2d Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue
 
 Excluding Arnd's changes, this is mostly an update of the usual
 drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas.  There
 are a couple of core and base updates around error propagation and
 atomicity in the attribute container base we use for the SCSI
 transport classes.  The rest is minor changes and updates.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXjHQJyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZZ8AQC02N+v
 iUnTl1YxGPjIWBbnHuUxN2Qbb9D3C6gAT1LkigEArlk163K3A1XEQHF/VNCdAz/f
 01XYTd3p1VHuegIBHlk=
 =Cn52
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series is slightly unusual because it includes Arnd's compat
  ioctl tree here:

    1c46a2cf2d Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue

  Excluding Arnd's changes, this is mostly an update of the usual
  drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas.

  There are a couple of core and base updates around error propagation
  and atomicity in the attribute container base we use for the SCSI
  transport classes.

  The rest is minor changes and updates"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits)
  scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask
  scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity
  scsi: hisi_sas: Modify the file permissions of trigger_dump to write only
  scsi: hisi_sas: Replace magic number when handle channel interrupt
  scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock
  scsi: hisi_sas: use threaded irq to process CQ interrupts
  scsi: ufs: Use UFS device indicated maximum LU number
  scsi: ufs: Add max_lu_supported in struct ufs_dev_info
  scsi: ufs: Delete is_init_prefetch from struct ufs_hba
  scsi: ufs: Inline two functions into their callers
  scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init()
  scsi: ufs: Split ufshcd_probe_hba() based on its called flow
  scsi: ufs: Delete struct ufs_dev_desc
  scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails
  scsi: ufs-mediatek: enable low-power mode for hibern8 state
  scsi: ufs: export some functions for vendor usage
  scsi: ufs-mediatek: add dbg_register_dump implementation
  scsi: qla2xxx: Fix a NULL pointer dereference in an error path
  scsi: qla1280: Make checking for 64bit support consistent
  scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1
  ...
2020-01-29 18:16:16 -08:00
Thomas Gleixner
979923871f x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
Tony reported a boot regression caused by the recent workaround for systems
which have a disabled (clock gate off) PIT.

On his machine the kernel fails to initialize the PIT because
apic_needs_pit() does not take into account whether the local APIC
interrupt delivery mode will actually allow to setup and use the local
APIC timer. This should be easy to reproduce with acpi=off on the
command line which also disables HPET.

Due to the way the PIT/HPET and APIC setup ordering works (APIC setup can
require working PIT/HPET) the information is not available at the point
where apic_needs_pit() makes this decision.

To address this, split out the interrupt mode selection from
apic_intr_mode_init(), invoke the selection before making the decision
whether PIT is required or not, and add the missing checks into
apic_needs_pit().

Fixes: c8c4076723 ("x86/timer: Skip PIT initialization on modern chipsets")
Reported-by: Anthony Buckley <tony.buckley000@gmail.com>
Tested-by: Anthony Buckley <tony.buckley000@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Daniel Drake <drake@endlessm.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206125
Link: https://lore.kernel.org/r/87sgk6tmk2.fsf@nanos.tec.linutronix.de
2020-01-29 12:50:12 +01:00
Linus Torvalds
a78208e243 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Removed CRYPTO_TFM_RES flags
   - Extended spawn grabbing to all algorithm types
   - Moved hash descsize verification into API code

  Algorithms:
   - Fixed recursive pcrypt dead-lock
   - Added new 32 and 64-bit generic versions of poly1305
   - Added cryptogams implementation of x86/poly1305

  Drivers:
   - Added support for i.MX8M Mini in caam
   - Added support for i.MX8M Nano in caam
   - Added support for i.MX8M Plus in caam
   - Added support for A33 variant of SS in sun4i-ss
   - Added TEE support for Raven Ridge in ccp
   - Added in-kernel API to submit TEE commands in ccp
   - Added AMD-TEE driver
   - Added support for BCM2711 in iproc-rng200
   - Added support for AES256-GCM based ciphers for chtls
   - Added aead support on SEC2 in hisilicon"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits)
  crypto: arm/chacha - fix build failured when kernel mode NEON is disabled
  crypto: caam - add support for i.MX8M Plus
  crypto: x86/poly1305 - emit does base conversion itself
  crypto: hisilicon - fix spelling mistake "disgest" -> "digest"
  crypto: chacha20poly1305 - add back missing test vectors and test chunking
  crypto: x86/poly1305 - fix .gitignore typo
  tee: fix memory allocation failure checks on drv_data and amdtee
  crypto: ccree - erase unneeded inline funcs
  crypto: ccree - make cc_pm_put_suspend() void
  crypto: ccree - split overloaded usage of irq field
  crypto: ccree - fix PM race condition
  crypto: ccree - fix FDE descriptor sequence
  crypto: ccree - cc_do_send_request() is void func
  crypto: ccree - fix pm wrongful error reporting
  crypto: ccree - turn errors to debug msgs
  crypto: ccree - fix AEAD decrypt auth fail
  crypto: ccree - fix typo in comment
  crypto: ccree - fix typos in error msgs
  crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data
  crypto: x86/sha - Eliminate casts on asm implementations
  ...
2020-01-28 15:38:56 -08:00
Linus Torvalds
c0275ae758 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu-features updates from Ingo Molnar:
 "The biggest change in this cycle was a large series from Sean
  Christopherson to clean up the handling of VMX features. This both
  fixes bugs/inconsistencies and makes the code more coherent and
  future-proof.

  There are also two cleanups and a minor TSX syslog messages
  enhancement"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/cpu: Remove redundant cpu_detect_cache_sizes() call
  x86/cpu: Print "VMX disabled" error message iff KVM is enabled
  KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs
  perf/x86: Provide stubs of KVM helpers for non-Intel CPUs
  KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits
  KVM: VMX: Check for full VMX support when verifying CPU compatibility
  KVM: VMX: Use VMX feature flag to query BIOS enabling
  KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR
  x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured
  x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl()
  x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_*
  x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs
  x86/vmx: Introduce VMX_FEATURES_*
  x86/cpu: Clear VMX feature flag if VMX is not fully enabled
  x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization
  x86/centaur: Use common IA32_FEAT_CTL MSR initialization
  x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked
  x86/intel: Initialize IA32_FEAT_CTL MSR at boot
  tools/x86: Sync msr-index.h from kernel sources
  selftests, kvm: Replace manual MSR defs with common msr-index.h
  ...
2020-01-28 12:46:42 -08:00
Giovanni Gherdovich
918229cdd5 x86/intel_pstate: Handle runtime turbo disablement/enablement in frequency invariance
On some platforms such as the Dell XPS 13 laptop the firmware disables turbo
when the machine is disconnected from AC, and viceversa it enables it again
when it's reconnected. In these cases a _PPC ACPI notification is issued.

The scheduler needs to know freq_max for frequency-invariant calculations.
To account for turbo availability to come and go, record freq_max at boot as
if turbo was available and store it in a helper variable. Use a setter
function to swap between freq_base and freq_max every time turbo goes off or on.

Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200122151617.531-7-ggherdovich@suse.cz
2020-01-28 21:37:06 +01:00
Giovanni Gherdovich
1567c3e346 x86, sched: Add support for frequency invariance
Implement arch_scale_freq_capacity() for 'modern' x86. This function
is used by the scheduler to correctly account usage in the face of
DVFS.

The present patch addresses Intel processors specifically and has positive
performance and performance-per-watt implications for the schedutil cpufreq
governor, bringing it closer to, if not on-par with, the powersave governor
from the intel_pstate driver/framework.

Large performance gains are obtained when the machine is lightly loaded and
no regression are observed at saturation. The benchmarks with the largest
gains are kernel compilation, tbench (the networking version of dbench) and
shell-intensive workloads.

1. FREQUENCY INVARIANCE: MOTIVATION
   * Without it, a task looks larger if the CPU runs slower

2. PECULIARITIES OF X86
   * freq invariance accounting requires knowing the ratio freq_curr/freq_max
   2.1 CURRENT FREQUENCY
       * Use delta_APERF / delta_MPERF * freq_base (a.k.a "BusyMHz")
   2.2 MAX FREQUENCY
       * It varies with time (turbo). As an approximation, we set it to a
         constant, i.e. 4-cores turbo frequency.

3. EFFECTS ON THE SCHEDUTIL FREQUENCY GOVERNOR
   * The invariant schedutil's formula has no feedback loop and reacts faster
     to utilization changes

4. KNOWN LIMITATIONS
   * In some cases tasks can't reach max util despite how hard they try

5. PERFORMANCE TESTING
   5.1 MACHINES
       * Skylake, Broadwell, Haswell
   5.2 SETUP
       * baseline Linux v5.2 w/ non-invariant schedutil. Tested freq_max = 1-2-3-4-8-12
         active cores turbo w/ invariant schedutil, and intel_pstate/powersave
   5.3 BENCHMARK RESULTS
       5.3.1 NEUTRAL BENCHMARKS
             * NAS Parallel Benchmark (HPC), hackbench
       5.3.2 NON-NEUTRAL BENCHMARKS
             * tbench (10-30% better), kernbench (10-15% better),
               shell-intensive-scripts (30-50% better)
             * no regressions
       5.3.3 SELECTION OF DETAILED RESULTS
       5.3.4 POWER CONSUMPTION, PERFORMANCE-PER-WATT
             * dbench (5% worse on one machine), kernbench (3% worse),
               tbench (5-10% better), shell-intensive-scripts (10-40% better)

6. MICROARCH'ES ADDRESSED HERE
   * Xeon Core before Scalable Performance processors line (Xeon Gold/Platinum
     etc have different MSRs semantic for querying turbo levels)

7. REFERENCES
   * MMTests performance testing framework, github.com/gormanm/mmtests

 +-------------------------------------------------------------------------+
 | 1. FREQUENCY INVARIANCE: MOTIVATION
 +-------------------------------------------------------------------------+

For example; suppose a CPU has two frequencies: 500 and 1000 Mhz. When
running a task that would consume 1/3rd of a CPU at 1000 MHz, it would
appear to consume 2/3rd (or 66.6%) when running at 500 MHz, giving the
false impression this CPU is almost at capacity, even though it can go
faster [*]. In a nutshell, without frequency scale-invariance tasks look
larger just because the CPU is running slower.

[*] (footnote: this assumes a linear frequency/performance relation; which
everybody knows to be false, but given realities its the best approximation
we can make.)

 +-------------------------------------------------------------------------+
 | 2. PECULIARITIES OF X86
 +-------------------------------------------------------------------------+

Accounting for frequency changes in PELT signals requires the computation of
the ratio freq_curr / freq_max. On x86 neither of those terms is readily
available.

2.1 CURRENT FREQUENCY
====================

Since modern x86 has hardware control over the actual frequency we run
at (because amongst other things, Turbo-Mode), we cannot simply use
the frequency as requested through cpufreq.

Instead we use the APERF/MPERF MSRs to compute the effective frequency
over the recent past. Also, because reading MSRs is expensive, don't
do so every time we need the value, but amortize the cost by doing it
every tick.

2.2 MAX FREQUENCY
=================

Obtaining freq_max is also non-trivial because at any time the hardware can
provide a frequency boost to a selected subset of cores if the package has
enough power to spare (eg: Turbo Boost). This means that the maximum frequency
available to a given core changes with time.

The approach taken in this change is to arbitrarily set freq_max to a constant
value at boot. The value chosen is the "4-cores (4C) turbo frequency" on most
microarchitectures, after evaluating the following candidates:

    * 1-core (1C) turbo frequency (the fastest turbo state available)
    * around base frequency (a.k.a. max P-state)
    * something in between, such as 4C turbo

To interpret these options, consider that this is the denominator in
freq_curr/freq_max, and that ratio will be used to scale PELT signals such as
util_avg and load_avg. A large denominator will undershoot (util_avg looks a
bit smaller than it really is), viceversa with a smaller denominator PELT
signals will tend to overshoot. Given that PELT drives frequency selection
in the schedutil governor, we will have:

    freq_max set to     | effect on DVFS
    --------------------+------------------
    1C turbo            | power efficiency (lower freq choices)
    base freq           | performance (higher util_avg, higher freq requests)
    4C turbo            | a bit of both

4C turbo proves to be a good compromise in a number of benchmarks (see below).

 +-------------------------------------------------------------------------+
 | 3. EFFECTS ON THE SCHEDUTIL FREQUENCY GOVERNOR
 +-------------------------------------------------------------------------+

Once an architecture implements a frequency scale-invariant utilization (the
PELT signal util_avg), schedutil switches its frequency selection formula from

    freq_next = 1.25 * freq_curr * util            [non-invariant util signal]

to

    freq_next = 1.25 * freq_max * util             [invariant util signal]

where, in the second formula, freq_max is set to the 1C turbo frequency (max
turbo). The advantage of the second formula, whose usage we unlock with this
patch, is that freq_next doesn't depend on the current frequency in an
iterative fashion, but can jump to any frequency in a single update. This
absence of feedback in the formula makes it quicker to react to utilization
changes and more robust against pathological instabilities.

Compare it to the update formula of intel_pstate/powersave:

    freq_next = 1.25 * freq_max * Busy%

where again freq_max is 1C turbo and Busy% is the percentage of time not spent
idling (calculated with delta_MPERF / delta_TSC); essentially the same as
invariant schedutil, and largely responsible for intel_pstate/powersave good
reputation. The non-invariant schedutil formula is derived from the invariant
one by approximating util_inv with util_raw * freq_curr / freq_max, but this
has limitations.

Testing shows improved performances due to better frequency selections when
the machine is lightly loaded, and essentially no change in behaviour at
saturation / overutilization.

 +-------------------------------------------------------------------------+
 | 4. KNOWN LIMITATIONS
 +-------------------------------------------------------------------------+

It's been shown that it is possible to create pathological scenarios where a
CPU-bound task cannot reach max utilization, if the normalizing factor
freq_max is fixed to a constant value (see [Lelli-2018]).

If freq_max is set to 4C turbo as we do here, one needs to peg at least 5
cores in a package doing some busywork, and observe that none of those task
will ever reach max util (1024) because they're all running at less than the
4C turbo frequency.

While this concern still applies, we believe the performance benefit of
frequency scale-invariant PELT signals outweights the cost of this limitation.

 [Lelli-2018]
 https://lore.kernel.org/lkml/20180517150418.GF22493@localhost.localdomain/

 +-------------------------------------------------------------------------+
 | 5. PERFORMANCE TESTING
 +-------------------------------------------------------------------------+

5.1 MACHINES
============

We tested the patch on three machines, with Skylake, Broadwell and Haswell
CPUs. The details are below, together with the available turbo ratios as
reported by the appropriate MSRs.

* 8x-SKYLAKE-UMA:
  Single socket E3-1240 v5, Skylake 4 cores/8 threads
  Max EFFiciency, BASE frequency and available turbo levels (MHz):

    EFFIC    800 |********
    BASE    3500 |***********************************
    4C      3700 |*************************************
    3C      3800 |**************************************
    2C      3900 |***************************************
    1C      3900 |***************************************

* 80x-BROADWELL-NUMA:
  Two sockets E5-2698 v4, 2x Broadwell 20 cores/40 threads
  Max EFFiciency, BASE frequency and available turbo levels (MHz):

    EFFIC   1200 |************
    BASE    2200 |**********************
    8C      2900 |*****************************
    7C      3000 |******************************
    6C      3100 |*******************************
    5C      3200 |********************************
    4C      3300 |*********************************
    3C      3400 |**********************************
    2C      3600 |************************************
    1C      3600 |************************************

* 48x-HASWELL-NUMA
  Two sockets E5-2670 v3, 2x Haswell 12 cores/24 threads
  Max EFFiciency, BASE frequency and available turbo levels (MHz):

    EFFIC   1200 |************
    BASE    2300 |***********************
    12C     2600 |**************************
    11C     2600 |**************************
    10C     2600 |**************************
    9C      2600 |**************************
    8C      2600 |**************************
    7C      2600 |**************************
    6C      2600 |**************************
    5C      2700 |***************************
    4C      2800 |****************************
    3C      2900 |*****************************
    2C      3100 |*******************************
    1C      3100 |*******************************

5.2 SETUP
=========

* The baseline is Linux v5.2 with schedutil (non-invariant) and the intel_pstate
  driver in passive mode.
* The rationale for choosing the various freq_max values to test have been to
  try all the 1-2-3-4C turbo levels (note that 1C and 2C turbo are identical
  on all machines), plus one more value closer to base_freq but still in the
  turbo range (8C turbo for both 80x-BROADWELL-NUMA and 48x-HASWELL-NUMA).
* In addition we've run all tests with intel_pstate/powersave for comparison.
* The filesystem is always XFS, the userspace is openSUSE Leap 15.1.
* 8x-SKYLAKE-UMA is capable of HWP (Hardware-Managed P-States), so the runs
  with active intel_pstate on this machine use that.

This gives, in terms of combinations tested on each machine:

* 8x-SKYLAKE-UMA
  * Baseline: Linux v5.2, non-invariant schedutil, intel_pstate passive
  * intel_pstate active + powersave + HWP
  * invariant schedutil, freq_max = 1C turbo
  * invariant schedutil, freq_max = 3C turbo
  * invariant schedutil, freq_max = 4C turbo

* both 80x-BROADWELL-NUMA and 48x-HASWELL-NUMA
  * [same as 8x-SKYLAKE-UMA, but no HWP capable]
  * invariant schedutil, freq_max = 8C turbo
    (which on 48x-HASWELL-NUMA is the same as 12C turbo, or "all cores turbo")

5.3 BENCHMARK RESULTS
=====================

5.3.1 NEUTRAL BENCHMARKS
------------------------

Tests that didn't show any measurable difference in performance on any of the
test machines between non-invariant schedutil and our patch are:

* NAS Parallel Benchmarks (NPB) using either MPI or openMP for IPC, any
  computational kernel
* flexible I/O (FIO)
* hackbench (using threads or processes, and using pipes or sockets)

5.3.2 NON-NEUTRAL BENCHMARKS
----------------------------

What follow are summary tables where each benchmark result is given a score.

* A tilde (~) means a neutral result, i.e. no difference from baseline.
* Scores are computed with the ratio result_new / result_baseline, so a tilde
  means a score of 1.00.
* The results in the score ratio are the geometric means of results running
  the benchmark with different parameters (eg: for kernbench: using 1, 2, 4,
  ... number of processes; for pgbench: varying the number of clients, and so
  on).
* The first three tables show higher-is-better kind of tests (i.e. measured in
  operations/second), the subsequent three show lower-is-better kind of tests
  (i.e. the workload is fixed and we measure elapsed time, think kernbench).
* "gitsource" is a name we made up for the test consisting in running the
  entire unit tests suite of the Git SCM and measuring how long it takes. We
  take it as a typical example of shell-intensive serialized workload.
* In the "I_PSTATE" column we have the results for intel_pstate/powersave. Other
  columns show invariant schedutil for different values of freq_max. 4C turbo
  is circled as it's the value we've chosen for the final implementation.

80x-BROADWELL-NUMA (comparison ratio; higher is better)
                                         +------+
                 I_PSTATE   1C     3C    | 4C   |  8C
pgbench-ro           1.14   ~      ~     | 1.11 |  1.14
pgbench-rw           ~      ~      ~     | ~    |  ~
netperf-udp          1.06   ~      1.06  | 1.05 |  1.07
netperf-tcp          ~      1.03   ~     | 1.01 |  1.02
tbench4              1.57   1.18   1.22  | 1.30 |  1.56
                                         +------+

8x-SKYLAKE-UMA (comparison ratio; higher is better)
                                         +------+
             I_PSTATE/HWP   1C     3C    | 4C   |
pgbench-ro           ~      ~      ~     | ~    |
pgbench-rw           ~      ~      ~     | ~    |
netperf-udp          ~      ~      ~     | ~    |
netperf-tcp          ~      ~      ~     | ~    |
tbench4              1.30   1.14   1.14  | 1.16 |
                                         +------+

48x-HASWELL-NUMA (comparison ratio; higher is better)
                                         +------+
                 I_PSTATE   1C     3C    | 4C   |  12C
pgbench-ro           1.15   ~      ~     | 1.06 |  1.16
pgbench-rw           ~      ~      ~     | ~    |  ~
netperf-udp          1.05   0.97   1.04  | 1.04 |  1.02
netperf-tcp          0.96   1.01   1.01  | 1.01 |  1.01
tbench4              1.50   1.05   1.13  | 1.13 |  1.25
                                         +------+

In the table above we see that active intel_pstate is slightly better than our
4C-turbo patch (both in reference to the baseline non-invariant schedutil) on
read-only pgbench and much better on tbench. Both cases are notable in which
it shows that lowering our freq_max (to 8C-turbo and 12C-turbo on
80x-BROADWELL-NUMA and 48x-HASWELL-NUMA respectively) helps invariant
schedutil to get closer.

If we ignore active intel_pstate and focus on the comparison with baseline
alone, there are several instances of double-digit performance improvement.

80x-BROADWELL-NUMA (comparison ratio; lower is better)
                                         +------+
                 I_PSTATE   1C     3C    | 4C   |  8C
dbench4              1.23   0.95   0.95  | 0.95 |  0.95
kernbench            0.93   0.83   0.83  | 0.83 |  0.82
gitsource            0.98   0.49   0.49  | 0.49 |  0.48
                                         +------+

8x-SKYLAKE-UMA (comparison ratio; lower is better)
                                         +------+
             I_PSTATE/HWP   1C     3C    | 4C   |
dbench4              ~      ~      ~     | ~    |
kernbench            ~      ~      ~     | ~    |
gitsource            0.92   0.55   0.55  | 0.55 |
                                         +------+

48x-HASWELL-NUMA (comparison ratio; lower is better)
                                         +------+
                 I_PSTATE   1C     3C    | 4C   |  8C
dbench4              ~      ~      ~     | ~    |  ~
kernbench            0.94   0.90   0.89  | 0.90 |  0.90
gitsource            0.97   0.69   0.69  | 0.69 |  0.69
                                         +------+

dbench is not very remarkable here, unless we notice how poorly active
intel_pstate is performing on 80x-BROADWELL-NUMA: 23% regression versus
non-invariant schedutil. We repeated that run getting consistent results. Out
of scope for the patch at hand, but deserving future investigation. Other than
that, we previously ran this campaign with Linux v5.0 and saw the patch doing
better on dbench a the time. We haven't checked closely and can only speculate
at this point.

On the NUMA boxes kernbench gets 10-15% improvements on average; we'll see in
the detailed tables that the gains concentrate on low process counts (lightly
loaded machines).

The test we call "gitsource" (running the git unit test suite, a long-running
single-threaded shell script) appears rather spectacular in this table (gains
of 30-50% depending on the machine). It is to be noted, however, that
gitsource has no adjustable parameters (such as the number of jobs in
kernbench, which we average over in order to get a single-number summary
score) and is exactly the kind of low-parallelism workload that benefits the
most from this patch. When looking at the detailed tables of kernbench or
tbench4, at low process or client counts one can see similar numbers.

5.3.3 SELECTION OF DETAILED RESULTS
-----------------------------------

Machine            : 48x-HASWELL-NUMA
Benchmark          : tbench4 (i.e. dbench4 over the network, actually loopback)
Varying parameter  : number of clients
Unit               : MB/sec (higher is better)

                   5.2.0 vanilla (BASELINE)               5.2.0 intel_pstate                   5.2.0 1C-turbo
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hmean  1        126.73  +- 0.31% (        )      315.91  +- 0.66% ( 149.28%)      125.03  +- 0.76% (  -1.34%)
Hmean  2        258.04  +- 0.62% (        )      614.16  +- 0.51% ( 138.01%)      269.58  +- 1.45% (   4.47%)
Hmean  4        514.30  +- 0.67% (        )     1146.58  +- 0.54% ( 122.94%)      533.84  +- 1.99% (   3.80%)
Hmean  8       1111.38  +- 2.52% (        )     2159.78  +- 0.38% (  94.33%)     1359.92  +- 1.56% (  22.36%)
Hmean  16      2286.47  +- 1.36% (        )     3338.29  +- 0.21% (  46.00%)     2720.20  +- 0.52% (  18.97%)
Hmean  32      4704.84  +- 0.35% (        )     4759.03  +- 0.43% (   1.15%)     4774.48  +- 0.30% (   1.48%)
Hmean  64      7578.04  +- 0.27% (        )     7533.70  +- 0.43% (  -0.59%)     7462.17  +- 0.65% (  -1.53%)
Hmean  128     6998.52  +- 0.16% (        )     6987.59  +- 0.12% (  -0.16%)     6909.17  +- 0.14% (  -1.28%)
Hmean  192     6901.35  +- 0.25% (        )     6913.16  +- 0.10% (   0.17%)     6855.47  +- 0.21% (  -0.66%)

                             5.2.0 3C-turbo                   5.2.0 4C-turbo                  5.2.0 12C-turbo
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hmean  1        128.43  +- 0.28% (   1.34%)      130.64  +- 3.81% (   3.09%)      153.71  +- 5.89% (  21.30%)
Hmean  2        311.70  +- 6.15% (  20.79%)      281.66  +- 3.40% (   9.15%)      305.08  +- 5.70% (  18.23%)
Hmean  4        641.98  +- 2.32% (  24.83%)      623.88  +- 5.28% (  21.31%)      906.84  +- 4.65% (  76.32%)
Hmean  8       1633.31  +- 1.56% (  46.96%)     1714.16  +- 0.93% (  54.24%)     2095.74  +- 0.47% (  88.57%)
Hmean  16      3047.24  +- 0.42% (  33.27%)     3155.02  +- 0.30% (  37.99%)     3634.58  +- 0.15% (  58.96%)
Hmean  32      4734.31  +- 0.60% (   0.63%)     4804.38  +- 0.23% (   2.12%)     4674.62  +- 0.27% (  -0.64%)
Hmean  64      7699.74  +- 0.35% (   1.61%)     7499.72  +- 0.34% (  -1.03%)     7659.03  +- 0.25% (   1.07%)
Hmean  128     6935.18  +- 0.15% (  -0.91%)     6942.54  +- 0.10% (  -0.80%)     7004.85  +- 0.12% (   0.09%)
Hmean  192     6901.62  +- 0.12% (   0.00%)     6856.93  +- 0.10% (  -0.64%)     6978.74  +- 0.10% (   1.12%)

This is one of the cases where the patch still can't surpass active
intel_pstate, not even when freq_max is as low as 12C-turbo. Otherwise, gains are
visible up to 16 clients and the saturated scenario is the same as baseline.

The scores in the summary table from the previous sections are ratios of
geometric means of the results over different clients, as seen in this table.

Machine            : 80x-BROADWELL-NUMA
Benchmark          : kernbench (kernel compilation)
Varying parameter  : number of jobs
Unit               : seconds (lower is better)

                   5.2.0 vanilla (BASELINE)               5.2.0 intel_pstate                   5.2.0 1C-turbo
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean  2        379.68  +- 0.06% (        )      330.20  +- 0.43% (  13.03%)      285.93  +- 0.07% (  24.69%)
Amean  4        200.15  +- 0.24% (        )      175.89  +- 0.22% (  12.12%)      153.78  +- 0.25% (  23.17%)
Amean  8        106.20  +- 0.31% (        )       95.54  +- 0.23% (  10.03%)       86.74  +- 0.10% (  18.32%)
Amean  16        56.96  +- 1.31% (        )       53.25  +- 1.22% (   6.50%)       48.34  +- 1.73% (  15.13%)
Amean  32        34.80  +- 2.46% (        )       33.81  +- 0.77% (   2.83%)       30.28  +- 1.59% (  12.99%)
Amean  64        26.11  +- 1.63% (        )       25.04  +- 1.07% (   4.10%)       22.41  +- 2.37% (  14.16%)
Amean  128       24.80  +- 1.36% (        )       23.57  +- 1.23% (   4.93%)       21.44  +- 1.37% (  13.55%)
Amean  160       24.85  +- 0.56% (        )       23.85  +- 1.17% (   4.06%)       21.25  +- 1.12% (  14.49%)

                             5.2.0 3C-turbo                   5.2.0 4C-turbo                   5.2.0 8C-turbo
- - - - - - - -  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean  2        284.08  +- 0.13% (  25.18%)      283.96  +- 0.51% (  25.21%)      285.05  +- 0.21% (  24.92%)
Amean  4        153.18  +- 0.22% (  23.47%)      154.70  +- 1.64% (  22.71%)      153.64  +- 0.30% (  23.24%)
Amean  8         87.06  +- 0.28% (  18.02%)       86.77  +- 0.46% (  18.29%)       86.78  +- 0.22% (  18.28%)
Amean  16        48.03  +- 0.93% (  15.68%)       47.75  +- 1.99% (  16.17%)       47.52  +- 1.61% (  16.57%)
Amean  32        30.23  +- 1.20% (  13.14%)       30.08  +- 1.67% (  13.57%)       30.07  +- 1.67% (  13.60%)
Amean  64        22.59  +- 2.02% (  13.50%)       22.63  +- 0.81% (  13.32%)       22.42  +- 0.76% (  14.12%)
Amean  128       21.37  +- 0.67% (  13.82%)       21.31  +- 1.15% (  14.07%)       21.17  +- 1.93% (  14.63%)
Amean  160       21.68  +- 0.57% (  12.76%)       21.18  +- 1.74% (  14.77%)       21.22  +- 1.00% (  14.61%)

The patch outperform active intel_pstate (and baseline) by a considerable
margin; the summary table from the previous section says 4C turbo and active
intel_pstate are 0.83 and 0.93 against baseline respectively, so 4C turbo is
0.83/0.93=0.89 against intel_pstate (~10% better on average). There is no
noticeable difference with regard to the value of freq_max.

Machine            : 8x-SKYLAKE-UMA
Benchmark          : gitsource (time to run the git unit test suite)
Varying parameter  : none
Unit               : seconds (lower is better)

                            5.2.0 vanilla           5.2.0 intel_pstate/hwp         5.2.0 1C-turbo
- - - - - - - -  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean         858.85  +- 1.16% (        )      791.94  +- 0.21% (   7.79%)      474.95 (  44.70%)

                           5.2.0 3C-turbo                   5.2.0 4C-turbo
- - - - - - - -  - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean         475.26  +- 0.20% (  44.66%)      474.34  +- 0.13% (  44.77%)

In this test, which is of interest as representing shell-intensive
(i.e. fork-intensive) serialized workloads, invariant schedutil outperforms
intel_pstate/powersave by a whopping 40% margin.

5.3.4 POWER CONSUMPTION, PERFORMANCE-PER-WATT
---------------------------------------------

The following table shows average power consumption in watt for each
benchmark. Data comes from turbostat (package average), which in turn is read
from the RAPL interface on CPUs. We know the patch affects CPU frequencies so
it's reasonable to ignore other power consumers (such as memory or I/O). Also,
we don't have a power meter available in the lab so RAPL is the best we have.

turbostat sampled average power every 10 seconds for the entire duration of
each benchmark. We took all those values and averaged them (i.e. with don't
have detail on a per-parameter granularity, only on whole benchmarks).

80x-BROADWELL-NUMA (power consumption, watts)
                                                    +--------+
               BASELINE I_PSTATE       1C       3C  |     4C |      8C
pgbench-ro       130.01   142.77   131.11   132.45  | 134.65 |  136.84
pgbench-rw        68.30    60.83    71.45    71.70  |  71.65 |   72.54
dbench4           90.25    59.06   101.43    99.89  | 101.10 |  102.94
netperf-udp       65.70    69.81    66.02    68.03  |  68.27 |   68.95
netperf-tcp       88.08    87.96    88.97    88.89  |  88.85 |   88.20
tbench4          142.32   176.73   153.02   163.91  | 165.58 |  176.07
kernbench         92.94   101.95   114.91   115.47  | 115.52 |  115.10
gitsource         40.92    41.87    75.14    75.20  |  75.40 |   75.70
                                                    +--------+
8x-SKYLAKE-UMA (power consumption, watts)
                                                    +--------+
              BASELINE I_PSTATE/HWP    1C       3C  |     4C |
pgbench-ro        46.49    46.68    46.56    46.59  |  46.52 |
pgbench-rw        29.34    31.38    30.98    31.00  |  31.00 |
dbench4           27.28    27.37    27.49    27.41  |  27.38 |
netperf-udp       22.33    22.41    22.36    22.35  |  22.36 |
netperf-tcp       27.29    27.29    27.30    27.31  |  27.33 |
tbench4           41.13    45.61    43.10    43.33  |  43.56 |
kernbench         42.56    42.63    43.01    43.01  |  43.01 |
gitsource         13.32    13.69    17.33    17.30  |  17.35 |
                                                    +--------+
48x-HASWELL-NUMA (power consumption, watts)
                                                    +--------+
               BASELINE I_PSTATE       1C       3C  |     4C |     12C
pgbench-ro       128.84   136.04   129.87   132.43  | 132.30 |  134.86
pgbench-rw        37.68    37.92    37.17    37.74  |  37.73 |   37.31
dbench4           28.56    28.73    28.60    28.73  |  28.70 |   28.79
netperf-udp       56.70    60.44    56.79    57.42  |  57.54 |   57.52
netperf-tcp       75.49    75.27    75.87    76.02  |  76.01 |   75.95
tbench4          115.44   139.51   119.53   123.07  | 123.97 |  130.22
kernbench         83.23    91.55    95.58    95.69  |  95.72 |   96.04
gitsource         36.79    36.99    39.99    40.34  |  40.35 |   40.23
                                                    +--------+

A lower power consumption isn't necessarily better, it depends on what is done
with that energy. Here are tables with the ratio of performance-per-watt on
each machine and benchmark. Higher is always better; a tilde (~) means a
neutral ratio (i.e. 1.00).

80x-BROADWELL-NUMA (performance-per-watt ratios; higher is better)
                                     +------+
             I_PSTATE     1C     3C  |   4C |    8C
pgbench-ro       1.04   1.06   0.94  | 1.07 |  1.08
pgbench-rw       1.10   0.97   0.96  | 0.96 |  0.97
dbench4          1.24   0.94   0.95  | 0.94 |  0.92
netperf-udp      ~      1.02   1.02  | ~    |  1.02
netperf-tcp      ~      1.02   ~     | ~    |  1.02
tbench4          1.26   1.10   1.06  | 1.12 |  1.26
kernbench        0.98   0.97   0.97  | 0.97 |  0.98
gitsource        ~      1.11   1.11  | 1.11 |  1.13
                                     +------+

8x-SKYLAKE-UMA (performance-per-watt ratios; higher is better)
                                     +------+
         I_PSTATE/HWP     1C     3C  |   4C |
pgbench-ro       ~      ~      ~     | ~    |
pgbench-rw       0.95   0.97   0.96  | 0.96 |
dbench4          ~      ~      ~     | ~    |
netperf-udp      ~      ~      ~     | ~    |
netperf-tcp      ~      ~      ~     | ~    |
tbench4          1.17   1.09   1.08  | 1.10 |
kernbench        ~      ~      ~     | ~    |
gitsource        1.06   1.40   1.40  | 1.40 |
                                     +------+

48x-HASWELL-NUMA  (performance-per-watt ratios; higher is better)
                                     +------+
             I_PSTATE     1C     3C  |   4C |   12C
pgbench-ro       1.09   ~      1.09  | 1.03 |  1.11
pgbench-rw       ~      0.86   ~     | ~    |  0.86
dbench4          ~      1.02   1.02  | 1.02 |  ~
netperf-udp      ~      0.97   1.03  | 1.02 |  ~
netperf-tcp      0.96   ~      ~     | ~    |  ~
tbench4          1.24   ~      1.06  | 1.05 |  1.11
kernbench        0.97   0.97   0.98  | 0.97 |  0.96
gitsource        1.03   1.33   1.32  | 1.32 |  1.33
                                     +------+

These results are overall pleasing: in plenty of cases we observe
performance-per-watt improvements. The few regressions (read/write pgbench and
dbench on the Broadwell machine) are of small magnitude. kernbench loses a few
percentage points (it has a 10-15% performance improvement, but apparently the
increase in power consumption is larger than that). tbench4 and gitsource, which
benefit the most from the patch, keep a positive score in this table which is
a welcome surprise; that suggests that in those particular workloads the
non-invariant schedutil (and active intel_pstate, too) makes some rather
suboptimal frequency selections.

+-------------------------------------------------------------------------+
| 6. MICROARCH'ES ADDRESSED HERE
+-------------------------------------------------------------------------+

The patch addresses Xeon Core processors that use MSR_PLATFORM_INFO and
MSR_TURBO_RATIO_LIMIT to advertise their base frequency and turbo frequencies
respectively. This excludes the recent Xeon Scalable Performance processors
line (Xeon Gold, Platinum etc) whose MSRs have to be parsed differently.

Subsequent patches will address:

* Xeon Scalable Performance processors and Atom Goldmont/Goldmont Plus
* Xeon Phi (Knights Landing, Knights Mill)
* Atom Silvermont

+-------------------------------------------------------------------------+
| 7. REFERENCES
+-------------------------------------------------------------------------+

Tests have been run with the help of the MMTests performance testing
framework, see github.com/gormanm/mmtests. The configuration file names for
the benchmark used are:

    db-pgbench-timed-ro-small-xfs
    db-pgbench-timed-rw-small-xfs
    io-dbench4-async-xfs
    network-netperf-unbound
    network-tbench
    scheduler-unbound
    workload-kerndevel-xfs
    workload-shellscripts-xfs
    hpc-nas-c-class-mpi-full-xfs
    hpc-nas-c-class-omp-full

All those benchmarks are generally available on the web:

pgbench: https://www.postgresql.org/docs/10/pgbench.html
netperf: https://hewlettpackard.github.io/netperf/
dbench/tbench: https://dbench.samba.org/
gitsource: git unit test suite, github.com/git/git
NAS Parallel Benchmarks: https://www.nas.nasa.gov/publications/npb.html
hackbench: https://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Doug Smythies <dsmythies@telus.net>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200122151617.531-2-ggherdovich@suse.cz
2020-01-28 21:36:59 +01:00
Linus Torvalds
f6170f0afb Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Ingo Molnar:
 "Misc changes:

   - Enhance #GP fault printouts by distinguishing between canonical and
     non-canonical address faults, and also add KASAN fault decoding.

   - Fix/enhance the x86 NMI handler by putting the duration check into
     a direct function call instead of an irq_work which we know to be
     broken in some cases.

   - Clean up do_general_protection() a bit"

* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Remove irq_work from the long duration NMI handler
  x86/traps: Cleanup do_general_protection()
  x86/kasan: Print original address on #GP
  x86/dumpstack: Introduce die_addr() for die() with #GP fault address
  x86/traps: Print address on #GP
  x86/insn-eval: Add support for 64-bit kernel mode
2020-01-28 12:28:06 -08:00
Linus Torvalds
6da49d1abd Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups all around the map"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Remove amd_get_topology_early()
  x86/tsc: Remove redundant assignment
  x86/crash: Use resource_size()
  x86/cpu: Add a missing prototype for arch_smt_update()
  x86/nospec: Remove unused RSB_FILL_LOOPS
  x86/vdso: Provide missing include file
  x86/Kconfig: Correct spelling and punctuation
  Documentation/x86/boot: Fix typo
  x86/boot: Fix a comment's incorrect file reference
  x86/process: Remove set but not used variables prev and next
  x86/Kconfig: Fix Kconfig indentation
2020-01-28 12:11:23 -08:00
Linus Torvalds
bcc8aff6af Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "Misc updates:

   - Remove last remaining calls to exception_enter/exception_exit() and
     simplify the entry code some more.

   - Remove force_iret()

   - Add support for "Fast Short Rep Mov", which is available starting
     with Ice Lake Intel CPUs - and make the x86 assembly version of
     memmove() use REP MOV for all sizes when FSRM is available.

   - Micro-optimize/simplify the 32-bit boot code a bit.

   - Use a more future-proof SYSRET instruction mnemonic"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Simplify calculation of output address
  x86/entry/64: Add instruction suffix to SYSRET
  x86: Remove force_iret()
  x86/cpufeatures: Add support for fast short REP; MOVSB
  x86/context-tracking: Remove exception_enter/exit() from KVM_PV_REASON_PAGE_NOT_PRESENT async page fault
  x86/context-tracking: Remove exception_enter/exit() from do_page_fault()
2020-01-28 11:08:13 -08:00
Ingo Molnar
6bd3357b61 Merge branches 'x86/hyperv', 'x86/kdump' and 'x86/misc' into x86/urgent, to pick up single-commit branches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-28 19:08:52 +01:00
Linus Torvalds
c0e809e244 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Ftrace is one of the last W^X violators (after this only KLP is
     left). These patches move it over to the generic text_poke()
     interface and thereby get rid of this oddity. This requires a
     surprising amount of surgery, by Peter Zijlstra.

   - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to
     count certain types of events that have a special, quirky hw ABI
     (by Kim Phillips)

   - kprobes fixes by Masami Hiramatsu

  Lots of tooling updates as well, the following subcommands were
  updated: annotate/report/top, c2c, clang, record, report/top TUI,
  sched timehist, tests; plus updates were done to the gtk ui, libperf,
  headers and the parser"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  perf/x86/amd: Add support for Large Increment per Cycle Events
  perf/x86/amd: Constrain Large Increment per Cycle events
  perf/x86/intel/rapl: Add Comet Lake support
  tracing: Initialize ret in syscall_enter_define_fields()
  perf header: Use last modification time for timestamp
  perf c2c: Fix return type for histogram sorting comparision functions
  perf beauty sockaddr: Fix augmented syscall format warning
  perf/ui/gtk: Fix gtk2 build
  perf ui gtk: Add missing zalloc object
  perf tools: Use %define api.pure full instead of %pure-parser
  libperf: Setup initial evlist::all_cpus value
  perf report: Fix no libunwind compiled warning break s390 issue
  perf tools: Support --prefix/--prefix-strip
  perf report: Clarify in help that --children is default
  tools build: Fix test-clang.cpp with Clang 8+
  perf clang: Fix build with Clang 9
  kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
  tools lib: Fix builds when glibc contains strlcpy()
  perf report/top: Make 'e' visible in the help and make it toggle showing callchains
  perf report/top: Do not offer annotation for symbols without samples
  ...
2020-01-28 09:44:15 -08:00
Linus Torvalds
634cd4b6af Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Cleanup of the GOP [graphics output] handling code in the EFI stub

   - Complete refactoring of the mixed mode handling in the x86 EFI stub

   - Overhaul of the x86 EFI boot/runtime code

   - Increase robustness for mixed mode code

   - Add the ability to disable DMA at the root port level in the EFI
     stub

   - Get rid of RWX mappings in the EFI memory map and page tables,
     where possible

   - Move the support code for the old EFI memory mapping style into its
     only user, the SGI UV1+ support code.

   - plus misc fixes, updates, smaller cleanups.

  ... and due to interactions with the RWX changes, another round of PAT
  cleanups make a guest appearance via the EFI tree - with no side
  effects intended"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
  efi/x86: Disable instrumentation in the EFI runtime handling code
  efi/libstub/x86: Fix EFI server boot failure
  efi/x86: Disallow efi=old_map in mixed mode
  x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld
  efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping
  efi: Fix handling of multiple efi_fake_mem= entries
  efi: Fix efi_memmap_alloc() leaks
  efi: Add tracking for dynamically allocated memmaps
  efi: Add a flags parameter to efi_memory_map
  efi: Fix comment for efi_mem_type() wrt absent physical addresses
  efi/arm: Defer probe of PCIe backed efifb on DT systems
  efi/x86: Limit EFI old memory map to SGI UV machines
  efi/x86: Avoid RWX mappings for all of DRAM
  efi/x86: Don't map the entire kernel text RW for mixed mode
  x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
  efi/libstub/x86: Fix unused-variable warning
  efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode
  efi/libstub/x86: Use const attribute for efi_is_64bit()
  efi: Allow disabling PCI busmastering on bridges during boot
  efi/x86: Allow translating 64-bit arguments for mixed mode calls
  ...
2020-01-28 09:03:40 -08:00
Linus Torvalds
9f2a43019e Merge branch 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull header cleanup from Ingo Molnar:
 "This is a treewide cleanup, mostly (but not exclusively) with x86
  impact, which breaks implicit dependencies on the asm/realtime.h
  header and finally removes it from asm/acpi.h"

* 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ACPI/sleep: Move acpi_get_wakeup_address() into sleep.c, remove <asm/realmode.h> from <asm/acpi.h>
  ACPI/sleep: Convert acpi_wakeup_address into a function
  x86/ACPI/sleep: Remove an unnecessary include of asm/realmode.h
  ASoC: Intel: Skylake: Explicitly include linux/io.h for virt_to_phys()
  vmw_balloon: Explicitly include linux/io.h for virt_to_phys()
  virt: vbox: Explicitly include linux/io.h to pick up various defs
  efi/capsule-loader: Explicitly include linux/io.h for page_to_phys()
  perf/x86/intel: Explicitly include asm/io.h to use virt_to_phys()
  x86/kprobes: Explicitly include vmalloc.h for set_vm_flush_reset_perms()
  x86/ftrace: Explicitly include vmalloc.h for set_vm_flush_reset_perms()
  x86/boot: Explicitly include realmode.h to handle RM reservations
  x86/efi: Explicitly include realmode.h to handle RM trampoline quirk
  x86/platform/intel/quark: Explicitly include linux/io.h for virt_to_phys()
  x86/setup: Enhance the comments
  x86/setup: Clean up the header portion of setup.c
2020-01-28 08:20:54 -08:00
Linus Torvalds
e279160f49 The timekeeping and timers departement provides:
- Time namespace support:
 
     If a container migrates from one host to another then it expects that
     clocks based on MONOTONIC and BOOTTIME are not subject to
     disruption. Due to different boot time and non-suspended runtime these
     clocks can differ significantly on two hosts, in the worst case time
     goes backwards which is a violation of the POSIX requirements.
 
     The time namespace addresses this problem. It allows to set offsets for
     clock MONOTONIC and BOOTTIME once after creation and before tasks are
     associated with the namespace. These offsets are taken into account by
     timers and timekeeping including the VDSO.
 
     Offsets for wall clock based clocks (REALTIME/TAI) are not provided by
     this mechanism. While in theory possible, the overhead and code
     complexity would be immense and not justified by the esoteric potential
     use cases which were discussed at Plumbers '18.
 
     The overhead for tasks in the root namespace (host time offsets = 0) is
     in the noise and great effort was made to ensure that especially in the
     VDSO. If time namespace is disabled in the kernel configuration the
     code is compiled out.
 
     Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature
     and kept on for more than a year addressing review comments, finding
     better solutions. A pleasant experience.
 
   - Overhaul of the alarmtimer device dependency handling to ensure that
     the init/suspend/resume ordering is correct.
 
   - A new clocksource/event driver for Microchip PIT64
 
   - Suspend/resume support for the Hyper-V clocksource
 
   - The usual pile of fixes, updates and improvements mostly in the
     driver code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl4vbTcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXT2D/96iJ3G9Snn2khEQP3XS2rYmtDGw7NO
 m1n96falwWeGe6zreU80R2Jge5nLxQtNhRoMPLLee1GpHwRC6lvqEqgdZ4LMBrD2
 JqV7Gzg8Urmdh+hpDsyTCpeEWEzoMKxiFOX8PxwctqUhM4szEe5iQg2YQsg85Jw2
 vG6M93N2xwDILh4rhEMbKjo+5ZmYn7c1RQvpGOSmpKOj940W/N7H2HBsFhdaJ1Kw
 FW5pFv1211PaU5RV2YNb2dMeeMTT1N3e2VN4Dkadoxp47pb+725gNHEBEjmV9poG
 Lp4IhzGAPnj8zVD88icQZSTaK3gUHMClxprJ0Pf84WEtiH7SeGu8BPYyu77+oNDe
 yzcctDJNyCWXkzmaP/fe/HLc0TStbvNAJ5Tagp4BC75gzebeb4/n8RtRT0fKeDYL
 pxpDPKDAPU7p1JSjxiWAtshqjBycWNY3Z49bA7/VhKBhnv8BDyBPGlYd7/4xrbGr
 RK7DQNXJwaJaiNJ7p5PiaFxGzNyB0B9sThD/slSlEInIKb4h9YzWr0TV+NB62VnB
 sDcN+tpLbRPz5/5cHGGfxR0+zKWpfyai8pzbmmaXEaKssjRYwyvcac5EZdgbWpbK
 k7CqAjoWLA2P+tGeePNJOf5JYK6Vmdyh4clmuwM0zOiRJ9NlWUyMf3z7QYILs4RO
 UAI+6opYlZEPAw==
 =x3qT
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "The timekeeping and timers departement provides:

   - Time namespace support:

     If a container migrates from one host to another then it expects
     that clocks based on MONOTONIC and BOOTTIME are not subject to
     disruption. Due to different boot time and non-suspended runtime
     these clocks can differ significantly on two hosts, in the worst
     case time goes backwards which is a violation of the POSIX
     requirements.

     The time namespace addresses this problem. It allows to set offsets
     for clock MONOTONIC and BOOTTIME once after creation and before
     tasks are associated with the namespace. These offsets are taken
     into account by timers and timekeeping including the VDSO.

     Offsets for wall clock based clocks (REALTIME/TAI) are not provided
     by this mechanism. While in theory possible, the overhead and code
     complexity would be immense and not justified by the esoteric
     potential use cases which were discussed at Plumbers '18.

     The overhead for tasks in the root namespace (ie where host time
     offsets = 0) is in the noise and great effort was made to ensure
     that especially in the VDSO. If time namespace is disabled in the
     kernel configuration the code is compiled out.

     Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
     feature and kept on for more than a year addressing review
     comments, finding better solutions. A pleasant experience.

   - Overhaul of the alarmtimer device dependency handling to ensure
     that the init/suspend/resume ordering is correct.

   - A new clocksource/event driver for Microchip PIT64

   - Suspend/resume support for the Hyper-V clocksource

   - The usual pile of fixes, updates and improvements mostly in the
     driver code"

* tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
  alarmtimer: Use wakeup source from alarmtimer platform device
  alarmtimer: Make alarmtimer platform device child of RTC device
  alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
  hrtimer: Add missing sparse annotation for __run_timer()
  lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
  MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
  clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
  clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
  clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
  clocksource/drivers/exynos_mct: Rename Exynos to lowercase
  clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
  clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
  clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
  clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
  clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
  clocksource/drivers/bcm2835_timer: Fix memory leak of timer
  clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
  clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
  clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
  ...
2020-01-27 16:47:05 -08:00
Linus Torvalds
6d277aca48 Power management updates for 5.6-rc1
- Update the ACPI processor driver in order to export
    acpi_processor_evaluate_cst() to the code outside of it, add
    ACPI support to the intel_idle driver based on that and clean
    up that driver somewhat (Rafael Wysocki).
 
  - Add an admin guide document for the intel_idle driver (Rafael
    Wysocki).
 
  - Clean up cpuidle core and drivers, enable compilation testing
    for some of them (Benjamin Gaignard, Krzysztof Kozlowski, Rafael
    Wysocki, Yangtao Li).
 
  - Fix reference counting of OPP (operating performance points) table
    structures (Viresh Kumar).
 
  - Add support for CPR (Core Power Reduction) to the AVS (Adaptive
    Voltage Scaling) subsystem (Niklas Cassel, Colin Ian King,
    YueHaibing).
 
  - Add support for TigerLake Mobile and JasperLake to the Intel RAPL
    power capping driver (Zhang Rui).
 
  - Update cpufreq drivers:
 
    * Add i.MX8MP support to imx-cpufreq-dt (Anson Huang).
 
    * Fix usage of a macro in loongson2_cpufreq (Alexandre Oliva).
 
    * Fix cpufreq policy reference counting issues in s3c and
      brcmstb-avs (chenqiwu).
 
    * Fix ACPI table reference counting issue and HiSilicon quirk
      handling in the CPPC driver (Hanjun Guo).
 
    * Clean up spelling mistake in intel_pstate (Harry Pan).
 
    * Convert the kirkwood and tegra186 drivers to using
      devm_platform_ioremap_resource() (Yangtao Li).
 
  - Update devfreq core:
 
    * Add 'name' sysfs attribute for devfreq devices (Chanwoo Choi).
 
    * Clean up the handing of transition statistics and allow them
      to be reset by writing 0 to the 'trans_stat' devfreq device
      attribute in sysfs (Kamil Konieczny).
 
    * Add 'devfreq_summary' to debugfs (Chanwoo Choi).
 
    * Clean up kerneldoc comments and Kconfig indentation (Krzysztof
      Kozlowski, Randy Dunlap).
 
  - Update devfreq drivers:
 
    * Add dynamic scaling for the imx8m DDR controller and clean up
      imx8m-ddrc (Leonard Crestez, YueHaibing).
 
    * Fix DT node reference counting and nitialization error code path
      in rk3399_dmc and add COMPILE_TEST and HAVE_ARM_SMCCC dependency
      for it (Chanwoo Choi, Yangtao Li).
 
    * Fix DT node reference counting in rockchip-dfi and make it use
      devm_platform_ioremap_resource() (Yangtao Li).
 
    * Fix excessive stack usage in exynos-ppmu (Arnd Bergmann).
 
    * Fix initialization error code paths in exynos-bus (Yangtao Li).
 
    * Clean up exynos-bus and exynos somewhat (Artur Świgoń, Krzysztof
      Kozlowski).
 
  - Add tracepoints for tracking usage_count updates unrelated to
    status changes in PM-runtime (Michał Mirosław).
 
  - Add sysfs attribute to control the "sync on suspend" behavior
    during system-wide suspend (Jonas Meurer).
 
  - Switch system-wide suspend tests over to 64-bit time (Alexandre
    Belloni).
 
  - Make wakeup sources statistics in debugfs cover deleted ones which
    used to be the case some time ago (zhuguangqing).
 
  - Clean up computations carried out during hibernation, update
    messages related to hibernation and fix a spelling mistake in one
    of them (Wen Yang, Luigi Semenzato, Colin Ian King).
 
  - Add mailmap entry for maintainer e-mail address that has not been
    functional for several years (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl4u2fESHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxvlkP/j5vDzyNUNJjnD6+897c8W+z5dwdiQfU
 QNtoopFXgw/fpOhGXRdj2mA4e6RtpU9aCCiHR6/qdh3/1qSnR5Y9R/51/gmdkwhY
 YakSxmgpgGrOJru94ApI1o/35eWwN/GxjajbfNY5ScrPQl/L0DF3iJWRsAOR5534
 p9e2gQqKecoE+MEn5JcGAXApA5xBLXuUmtWPUn5UGyhaz+jdmsf1zkDEOEvxREay
 hLGH1y6BY8HS/jytyNzISs9iDeBvg2fHmG8SskDiXVMke5sHBTU9MilgpnCFfQ0l
 OF/eNnTXTU7mAJhlnjBUt2rIe5peGSuhgg+Ur7s86xYqbj2SfsVM4UHjU0A6t9Jm
 sauWQh/Nbzw6XaCNzYKxP+dREAg0g/aq7xFqQi3bWx7YvzLk/hvNWi2+bv3adzx7
 Z3fvOki4xMXzLLrh0f1ipC8BKTsdioDZPAy06B80a0luv6ROdr6bPL7did14mWt2
 eCuPuZyXKhdV+PkjZHF+c4XT7N9NfGtE0WUQf54Q4VT00hDagGDliwXpm4ht1pjJ
 iO7uUJevXKSxMaV2xPZ+nWZaOeCVrMMTA1Ec1ELgC1n8WROZJ+SfhehgMQGp7BHS
 Hz4QO1HjTsCDnT+OU7JFeCRrkyXIlh75MOndWOOH6eTEXCAI9PihstB+UGXeNsK0
 BesNQz1sYY1O
 =g48u
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add ACPI support to the intel_idle driver along with an admin
  guide document for it, add support for CPR (Core Power Reduction) to
  the AVS (Adaptive Voltage Scaling) subsystem, add new hardware support
  in a few places, add some new sysfs attributes, debugfs files and
  tracepoints, fix bugs and clean up a bunch of things all over.

  Specifics:

   - Update the ACPI processor driver in order to export
     acpi_processor_evaluate_cst() to the code outside of it, add ACPI
     support to the intel_idle driver based on that and clean up that
     driver somewhat (Rafael Wysocki).

   - Add an admin guide document for the intel_idle driver (Rafael
     Wysocki).

   - Clean up cpuidle core and drivers, enable compilation testing for
     some of them (Benjamin Gaignard, Krzysztof Kozlowski, Rafael
     Wysocki, Yangtao Li).

   - Fix reference counting of OPP (operating performance points) table
     structures (Viresh Kumar).

   - Add support for CPR (Core Power Reduction) to the AVS (Adaptive
     Voltage Scaling) subsystem (Niklas Cassel, Colin Ian King,
     YueHaibing).

   - Add support for TigerLake Mobile and JasperLake to the Intel RAPL
     power capping driver (Zhang Rui).

   - Update cpufreq drivers:
      - Add i.MX8MP support to imx-cpufreq-dt (Anson Huang).
      - Fix usage of a macro in loongson2_cpufreq (Alexandre Oliva).
      - Fix cpufreq policy reference counting issues in s3c and
        brcmstb-avs (chenqiwu).
      - Fix ACPI table reference counting issue and HiSilicon quirk
        handling in the CPPC driver (Hanjun Guo).
      - Clean up spelling mistake in intel_pstate (Harry Pan).
      - Convert the kirkwood and tegra186 drivers to using
        devm_platform_ioremap_resource() (Yangtao Li).

   - Update devfreq core:
      - Add 'name' sysfs attribute for devfreq devices (Chanwoo Choi).
      - Clean up the handing of transition statistics and allow them to
        be reset by writing 0 to the 'trans_stat' devfreq device
        attribute in sysfs (Kamil Konieczny).
      - Add 'devfreq_summary' to debugfs (Chanwoo Choi).
      - Clean up kerneldoc comments and Kconfig indentation (Krzysztof
        Kozlowski, Randy Dunlap).

   - Update devfreq drivers:
      - Add dynamic scaling for the imx8m DDR controller and clean up
        imx8m-ddrc (Leonard Crestez, YueHaibing).
      - Fix DT node reference counting and nitialization error code path
        in rk3399_dmc and add COMPILE_TEST and HAVE_ARM_SMCCC dependency
        for it (Chanwoo Choi, Yangtao Li).
      - Fix DT node reference counting in rockchip-dfi and make it use
        devm_platform_ioremap_resource() (Yangtao Li).
      - Fix excessive stack usage in exynos-ppmu (Arnd Bergmann).
      - Fix initialization error code paths in exynos-bus (Yangtao Li).
      - Clean up exynos-bus and exynos somewhat (Artur Świgoń, Krzysztof
        Kozlowski).

   - Add tracepoints for tracking usage_count updates unrelated to
     status changes in PM-runtime (Michał Mirosław).

   - Add sysfs attribute to control the "sync on suspend" behavior
     during system-wide suspend (Jonas Meurer).

   - Switch system-wide suspend tests over to 64-bit time (Alexandre
     Belloni).

   - Make wakeup sources statistics in debugfs cover deleted ones which
     used to be the case some time ago (zhuguangqing).

   - Clean up computations carried out during hibernation, update
     messages related to hibernation and fix a spelling mistake in one
     of them (Wen Yang, Luigi Semenzato, Colin Ian King).

   - Add mailmap entry for maintainer e-mail address that has not been
     functional for several years (Rafael Wysocki)"

* tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (83 commits)
  cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG
  intel_idle: Clean up irtl_2_usec()
  intel_idle: Move 3 functions closer to their callers
  intel_idle: Annotate initialization code and data structures
  intel_idle: Move and clean up intel_idle_cpuidle_devices_uninit()
  intel_idle: Rearrange intel_idle_cpuidle_driver_init()
  intel_idle: Clean up NULL pointer check in intel_idle_init()
  intel_idle: Fold intel_idle_probe() into intel_idle_init()
  intel_idle: Eliminate __setup_broadcast_timer()
  cpuidle: fix cpuidle_find_deepest_state() kerneldoc warnings
  cpuidle: sysfs: fix warnings when compiling with W=1
  cpuidle: coupled: fix warnings when compiling with W=1
  cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount
  PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
  PM / devfreq: Add debugfs support with devfreq_summary file
  Documentation: admin-guide: PM: Add intel_idle document
  cpuidle: arm: Enable compile testing for some of drivers
  PM-runtime: add tracepoints for usage_count changes
  cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether"
  PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
  ...
2020-01-27 11:23:54 -08:00
Sean Christopherson
13c72c060f x86/mm: Introduce lookup_address_in_mm()
Add a helper, lookup_address_in_mm(), to traverse the page tables of a
given mm struct.  KVM will use the helper to retrieve the host mapping
level, e.g. 4k vs. 2mb vs. 1gb, of a compound (or DAX-backed) page
without having to resort to implementation specific metadata.  E.g. KVM
currently uses different logic for HugeTLB vs. THP, and would add a
third variant for DAX-backed files.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 20:00:03 +01:00
Peter Xu
6a3c623ba8 KVM: X86: Drop x86_set_memory_region()
The helper x86_set_memory_region() is only used in vmx_set_tss_addr()
and kvm_arch_destroy_vm().  Push the lock upper in both cases.  With
that, drop x86_set_memory_region().

This prepares to allow __x86_set_memory_region() to return a HVA
mapped, because the HVA will need to be protected by the lock too even
after __x86_set_memory_region() returns.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:53 +01:00
John Allen
a47970ed74 kvm/svm: PKU not currently supported
Current SVM implementation does not have support for handling PKU. Guests
running on a host with future AMD cpus that support the feature will read
garbage from the PKRU register and will hit segmentation faults on boot as
memory is getting marked as protected that should not be. Ensure that cpuid
from SVM does not advertise the feature.

Signed-off-by: John Allen <john.allen@amd.com>
Cc: stable@vger.kernel.org
Fixes: 0556cbdc2f ("x86/pkeys: Don't check if PKRU is zero before writing it")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:35 +01:00
Linus Torvalds
a5b871c91d dmaengine updates for v5.6-rc1
- Core:
    - Support for dynamic channels
    - Removal of various slave wrappers
    - Make few slave request APIs as private to dmaengine
    - Symlinks between channels and slaves
    - Support for hotplug of controllers
    - Support for metadata_ops for dma_async_tx_descriptor
    - Reporting DMA cached data amount
    - Virtual dma channel locking updates
 
  - New drivers/device/feature support support:
    - Driver for Intel data accelerators
    - Driver for TI K3 UDMA
    - Driver for PLX DMA engine
    - Driver for hisilicon Kunpeng DMA engine
    - Support for eDMA support for QorIQ LS1028A in fsl edma driver
    - Support for cyclic dma in sun4i driver
    - Support for X1830 in JZ4780 driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl4u+QkACgkQfBQHDyUj
 g0cCcg//awBruofTHIrBOwHmCX1a09mw5WmkFG48N7tYp4fvaI1aOgs3hH9PZiBG
 fFZUktodwYpEKg6JJOfm1RnLBuKm0+3zmaKGPdK1RcbaDURh8G9qhW65f4mfImvB
 GXlgw59WKtgPAM9zWW9UxjugAk4DBte5xVKYJUsI0t4P7k9TM4i0Fv0VmMUhhDuo
 buPD1cM/GWFHbE7OYJ51aGRtrOHV1nPgQaHBkWaT7EotzGsZ3gtWYzteI3BRXRV/
 IkSgxOefMkIgu1j3KIxFZ1CJDHCZSnx2B+AEMCcp63osyeHBOYoL7KQxo6tBjaRV
 fbCasbkTkvvJUjyZdtOdU2wqf7ZqoDkD+n5nkpENf4G1M8J5RiHmrFq96m3HRonE
 V1bmMslXhsJlvtoT6ec2iJFchiq0nx1XHyST6faUOK+0cd1lzbogWwztydQH4fwd
 TxfEd+eYlFFu3lGDfRp14Tz7fAcFNPZ2bJQhZkF6RpwUW3y3L0cJc3Y0AcWmNkvJ
 oStvTlbbUvgRgO7rvEyAmdPb31lE6PLaA0WCahcvf4zQxxNMyYyaWP73MegvqJGO
 pfJXBOWBTTKwu0fDR5UHJd3tEDABvcZnwBaCSYrpI5f9bJ4NRI3f4DIMwLBnw9IK
 aH6pzwo4gTAMuvxzq8KeTp3hU7kszyUN8q8hiTZlgVozMLKXhQY=
 =mv1v
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-5.6-rc1' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine updates from Vinod Koul:
 "This time we have a bunch of core changes to support dynamic channels,
  hotplug of controllers, new apis for metadata ops etc along with new
  drivers for Intel data accelerators, TI K3 UDMA, PLX DMA engine and
  hisilicon Kunpeng DMA engine. Also usual assorted updates to drivers.

  Core:
   - Support for dynamic channels
   - Removal of various slave wrappers
   - Make few slave request APIs as private to dmaengine
   - Symlinks between channels and slaves
   - Support for hotplug of controllers
   - Support for metadata_ops for dma_async_tx_descriptor
   - Reporting DMA cached data amount
   - Virtual dma channel locking updates

  New drivers/device/feature support support:
   - Driver for Intel data accelerators
   - Driver for TI K3 UDMA
   - Driver for PLX DMA engine
   - Driver for hisilicon Kunpeng DMA engine
   - Support for eDMA support for QorIQ LS1028A in fsl edma driver
   - Support for cyclic dma in sun4i driver
   - Support for X1830 in JZ4780 driver"

* tag 'dmaengine-5.6-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (62 commits)
  dmaengine: Create symlinks between DMA channels and slaves
  dmaengine: hisilicon: Add Kunpeng DMA engine support
  dmaengine: idxd: add char driver to expose submission portal to userland
  dmaengine: idxd: connect idxd to dmaengine subsystem
  dmaengine: idxd: add descriptor manipulation routines
  dmaengine: idxd: add sysfs ABI for idxd driver
  dmaengine: idxd: add configuration component of driver
  dmaengine: idxd: Init and probe for Intel data accelerators
  dmaengine: add support to dynamic register/unregister of channels
  dmaengine: break out channel registration
  x86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction
  dmaengine: ti: k3-udma: fix spelling mistake "limted" -> "limited"
  dmaengine: s3c24xx-dma: fix spelling mistake "to" -> "too"
  dmaengine: Move dma_get_{,any_}slave_channel() to private dmaengine.h
  dmaengine: Remove dma_request_slave_channel_compat() wrapper
  dmaengine: Remove dma_device_satisfies_mask() wrapper
  dt-bindings: fsl-imx-sdma: Add i.MX8MM/i.MX8MN/i.MX8MP compatible string
  dmaengine: zynqmp_dma: fix burst length configuration
  dmaengine: sun4i: Add support for cyclic requests with dedicated DMA
  dmaengine: fsl-qdma: fix duplicated argument to &&
  ...
2020-01-27 10:55:50 -08:00
Linus Torvalds
08c49dc135 platform-drivers-x86 for v5.6-1
* Enable thermal policy for ASUS TUF FX705DY/FX505DY
 * Support left round button on ASUS N56VB
 * Support new Mellanox platforms of basic class VMOD0009 and VMOD0010
 * Intel Comet Lake, Tiger Lake and Elkhart Lake support in the PMC driver
 * Big clean up to Intel PMC core, PMC IPC and SCU IPC drivers
 * Touchscreen support for the PiPO W11 tablet
 
 The following is an automated git shortlog grouped by driver:
 
 asus-nb-wmi:
  -  Support left round button on N56VB
 
 asus-wmi:
  -  Fix keyboard brightness cannot be set to 0
  -  Set throttle thermal policy to default
  -  Support throttle thermal policy
 
 Documentation/ABI:
  -  Add new attribute for mlxreg-io sysfs interfaces
  -  Style changes
  -  Add missed attribute for mlxreg-io sysfs interfaces
  -  Fix documentation inconsistency for mlxreg-io sysfs interfaces
 
 GPD pocket fan:
  -  Allow somewhat lower/higher temperature limits
  -  Use default values when wrong modparams are given
 
 intel_atomisp2_pm:
  -  Spelling fixes
  -  Refactor timeout loop
 
 intel_mid_powerbtn:
  -  Take a copy of ddata
 
 intel_pmc_core:
  -  update Comet Lake platform driver
  -  Fix spelling of MHz unit
  -  Fix indentation in function definitions
  -  Put more stuff under #ifdef DEBUG_FS
  -  Respect error code of kstrtou32_from_user()
  -  Add Intel Elkhart Lake support
  -  Add Intel Tiger Lake support
  -  Make debugfs entry for pch_ip_power_gating_status conditional
  -  Create platform dependent bitmap structs
  -  Remove unnecessary assignments
  -  Clean up: Remove comma after the termination line
 
 intel_pmc_ipc:
  -  Switch to use driver->dev_groups
  -  Propagate error from kstrtoul()
  -  Use octal permissions in sysfs attributes
  -  Get rid of unnecessary includes
  -  Drop ipc_data_readb()
  -  Drop intel_pmc_gcr_read() and intel_pmc_gcr_write()
  -  Make intel_pmc_ipc_raw_cmd() static
  -  Make intel_pmc_ipc_simple_command() static
  -  Make intel_pmc_gcr_update() static
 
 intel_scu_ipc:
  -  Reformat kernel-doc comments of exported functions
  -  Drop intel_scu_ipc_raw_command()
  -  Drop intel_scu_ipc_io[read|write][8|16]()
  -  Drop unused macros
  -  Drop unused prototype intel_scu_ipc_fw_update()
  -  Sleeping is fine when polling
  -  Drop intel_scu_ipc_i2c_cntrl()
  -  Remove Lincroft support
  -  Add constants for register offsets
  -  Fix interrupt support
 
 intel_scu_ipcutil:
  -  Remove default y from Kconfig
 
 intel_telemetry_debugfs:
  -  Respect error code of kstrtou32_from_user()
 
 intel_telemetry_pltdrv:
  -  use devm_platform_ioremap_resource()
 
 mlx-platform:
  -  Add support for next generation systems
  -  Add support for new capability register
  -  Add support for new system type
  -  Set system mux configuration based on system type
  -  Add more definitions for system attributes
  -  Cosmetic changes
 
 platform/mellanox:
  -  mlxreg-hotplug: Add support for new capability register
  -  fix potential deadlock in the tmfifo driver
 
 tools/power/x86/intel-speed-select:
  -  Update version
  -  Change the order for clos disable
  -  Fix result display for turbo-freq auto mode
  -  Add support for core-power discovery
  -  Allow additional core-power mailbox commands
  -  Update MAINTAINERS for the intel uncore frequency control
  -  Add support for Uncore frequency control
 
 touchscreen_dmi:
  -  Fix indentation in several places
  -  Add info for the PiPO W11 tablet
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqaflIX74DDDzMJJtb7wzTHR8rCgFAl4uxRcACgkQb7wzTHR8
 rCjV3xAArz8e4FmFRe77LF83BHduBYkMjnobDFqWa+BhUicapezU4121K/k1cSBi
 MRhYOK3+lZvMqFTbjFatgkCapjma34+6xMJztrT/drCgmxHzsLQgJsmJ3pXWqnYM
 t4CcPOjvjmZbNJT0NdKjeNyCwFGG6Lupf82fVfIXIshtaKE3xw51242+PJEI978i
 2VL3Ck0NL1stgUkKf/ppHkqsjLb0s6IBtgWv052s/0SlHT7RlFAonsCBxIktVa5L
 divCtnWJKRhbalCPnKOJ1Q6/kpR/BSNq6OddcgaACKH9i0d5KwtdNDdQ66FkEidN
 4rKK3gYMLRpgpLjApH91n7jWMQUaJawVN/3WuRfdnwQOcJ/PVY5vosv2yHKMy3yB
 mFLG0G2v3cXEmvRNEyeKBpSkOrAnsoylFt8zLN6/OP3cQ/jdrnq+UpurxRnBEGkm
 ZDbs5FYqjAPMyjL+/ldhndIdIqvusVgrl2V93BuAzN080UvkQ2FKjLnk0Dkc/xmZ
 XbXUAn+BCUpmABstLa7NJGruV3O9IdiSLHW5Cr0NZC74h4nHqDS57At7LkQbphkC
 Pt/+7tirpaEvSbgQZsVI0WfOwDjJf79FT4dYtCY1GkmSvmvF+OUgKicfEZsfuDan
 hLez0resoJkHLeJtcrF1lW3l3kyMaokBmkUhB1skOYlLUvcsOwg=
 =rMop
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.6-1' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver updates from Andy Shevchenko:

 - Enable thermal policy for ASUS TUF FX705DY/FX505DY

 - Support left round button on ASUS N56VB

 - Support new Mellanox platforms of basic class VMOD0009 and VMOD0010

 - Intel Comet Lake, Tiger Lake and Elkhart Lake support in the PMC
   driver

 - Big clean-up to Intel PMC core, PMC IPC and SCU IPC drivers

 - Touchscreen support for the PiPO W11 tablet

* tag 'platform-drivers-x86-v5.6-1' of git://git.infradead.org/linux-platform-drivers-x86: (64 commits)
  platform/x86: intel_pmc_ipc: Switch to use driver->dev_groups
  platform/x86: intel_pmc_ipc: Propagate error from kstrtoul()
  platform/x86: intel_pmc_ipc: Use octal permissions in sysfs attributes
  platform/x86: intel_pmc_ipc: Get rid of unnecessary includes
  platform/x86: intel_pmc_ipc: Drop ipc_data_readb()
  platform/x86: intel_pmc_ipc: Drop intel_pmc_gcr_read() and intel_pmc_gcr_write()
  platform/x86: intel_pmc_ipc: Make intel_pmc_ipc_raw_cmd() static
  platform/x86: intel_pmc_ipc: Make intel_pmc_ipc_simple_command() static
  platform/x86: intel_pmc_ipc: Make intel_pmc_gcr_update() static
  platform/x86: intel_scu_ipc: Reformat kernel-doc comments of exported functions
  platform/x86: intel_scu_ipc: Drop intel_scu_ipc_raw_command()
  platform/x86: intel_scu_ipc: Drop intel_scu_ipc_io[read|write][8|16]()
  platform/x86: intel_scu_ipc: Drop unused macros
  platform/x86: intel_scu_ipc: Drop unused prototype intel_scu_ipc_fw_update()
  platform/x86: intel_scu_ipc: Sleeping is fine when polling
  platform/x86: intel_scu_ipc: Drop intel_scu_ipc_i2c_cntrl()
  platform/x86: intel_scu_ipc: Remove Lincroft support
  platform/x86: intel_scu_ipc: Add constants for register offsets
  platform/x86: intel_scu_ipc: Fix interrupt support
  platform/x86: intel_scu_ipcutil: Remove default y from Kconfig
  ...
2020-01-27 10:42:35 -08:00
Linus Torvalds
067ba54c7a Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode update from Borislav Petkov:
 "Another boring branch this time around: mark a stub function inline,
  by Valdis Kletnieks"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Make stub function static inline
2020-01-27 09:25:59 -08:00
Richard Henderson
1640a7b9f4 x86: Mark archrandom.h functions __must_check
We must not use the pointer output without validating the
success of the random read.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200110145422.49141-8-broonie@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25 12:18:50 -05:00
Richard Henderson
5f2ed7f5b9 x86: Remove arch_has_random, arch_has_random_seed
Use the expansion of these macros directly in arch_get_random_*.

These symbols are currently part of the generic archrandom.h
interface, but are currently unused and can be removed.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200110145422.49141-2-broonie@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-25 12:18:50 -05:00
Christoph Hellwig
dab0198413 x86/PCI: Remove X86_DEV_DMA_OPS
There are no users of X86_DEV_DMA_OPS left, so remove the code.

Link: https://lore.kernel.org/r/1579613871-301529-8-git-send-email-jonathan.derrick@intel.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
2020-01-24 15:00:35 -06:00
Jon Derrick
34067c56fa x86/PCI: Expose VMD's pci_dev in struct pci_sysdata
Expose VMD's pci_dev pointer in struct pci_sysdata.  This will be used
indirectly by intel-iommu.c to find the correct domain.

Link: https://lore.kernel.org/r/1579613871-301529-3-git-send-email-jonathan.derrick@intel.com
[bhelgaas: commit log]
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>
2020-01-24 14:54:50 -06:00
Christoph Hellwig
aad6aa0cd6 x86/PCI: Add to_pci_sysdata() helper
Various helpers need the pci_sysdata just to dereference a single field in
it.  Add a little helper that returns the properly typed sysdata pointer to
require a little less boilerplate code.

[jonathan.derrick: to_pci_sysdata const argument]
Link: https://lore.kernel.org/r/1579613871-301529-2-git-send-email-jonathan.derrick@intel.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-24 14:54:08 -06:00
Sean Christopherson
987b2594ed KVM: x86: Move kvm_vcpu_init() invocation to common code
Move the kvm_cpu_{un}init() calls to common x86 code as an intermediate
step to removing kvm_cpu_{un}init() altogether.

Note, VMX'x alloc_apic_access_page() and init_rmode_identity_map() are
per-VM allocations and are intentionally kept if vCPU creation fails.
They are freed by kvm_arch_destroy_vm().

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 09:18:57 +01:00
Sean Christopherson
a9dd6f09d7 KVM: x86: Allocate vcpu struct in common x86 code
Move allocation of VMX and SVM vcpus to common x86.  Although the struct
being allocated is technically a VMX/SVM struct, it can be interpreted
directly as a 'struct kvm_vcpu' because of the pre-existing requirement
that 'struct kvm_vcpu' be located at offset zero of the arch/vendor vcpu
struct.

Remove the message from the build-time assertions regarding placement of
the struct, as compatibility with the arch usercopy region is no longer
the sole dependent on 'struct kvm_vcpu' being at offset zero.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 09:18:55 +01:00
Dave Jiang
232bb01bb8 x86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction
With the introduction of MOVDIR64B instruction, there is now an instruction
that can write 64 bytes of data atomically.

Quoting from Intel SDM:
"There is no atomicity guarantee provided for the 64-byte load operation
from source address, and processor implementations may use multiple
load operations to read the 64-bytes. The 64-byte direct-store issued
by MOVDIR64B guarantees 64-byte write-completion atomicity. This means
that the data arrives at the destination in a single undivided 64-byte
write transaction."

We have identified at least 3 different use cases for this instruction in
the format of func(dst, src, count):
1) Clear poison / Initialize MKTME memory
   @dst is normal memory.
   @src in normal memory. Does not increment. (Copy same line to all
   targets)
   @count (to clear/init multiple lines)
2) Submit command(s) to new devices
   @dst is a special MMIO region for a device. Does not increment.
   @src is normal memory. Increments.
   @count usually is 1, but can be multiple.
3) Copy to iomem in big chunks
   @dst is iomem and increments
   @src in normal memory and increments
   @count is number of chunks to copy

Add support for case #2 to support device that will accept commands via
this instruction. We provide a @count in order to submit a batch of
preprogrammed descriptors in virtually contiguous memory. This
allows the caller to submit multiple descriptors to a device with a single
submission. The special device requires the entire 64bytes descriptor to
be written atomically and will accept MOVDIR64B instruction.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/157965022175.73301.10174614665472962675.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-24 11:18:45 +05:30
Dave Hansen
45fc24e89b x86/mpx: remove MPX from arch/x86
From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

This removes all the remaining (dead at this point) MPX handling
code remaining in the tree.  The only remaining code is the XSAVE
support for MPX state which is currently needd for KVM to handle
VMs which might use MPX.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-23 10:41:20 -08:00
Dave Hansen
42222eae17 mm: remove arch_bprm_mm_init() hook
From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

arch_bprm_mm_init() is used at execve() time.  The only non-stub
implementation is on x86 for MPX.  Remove the hook entirely from
all architectures and generic code.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-23 10:41:16 -08:00
Mika Westerberg
a97368b314 platform/x86: intel_pmc_ipc: Drop intel_pmc_gcr_read() and intel_pmc_gcr_write()
These functions are not used anywhere so drop them completely.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-22 18:52:26 +02:00
Mika Westerberg
f827e5300d platform/x86: intel_pmc_ipc: Make intel_pmc_ipc_raw_cmd() static
This function is not called outside of intel_pmc_ipc.c so we can make it
static instead.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-22 18:52:26 +02:00
Mika Westerberg
3f751ba584 platform/x86: intel_pmc_ipc: Make intel_pmc_ipc_simple_command() static
This function is not called outside of intel_pmc_ipc.c so we can make it
static instead.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-22 18:52:26 +02:00
Mika Westerberg
e1f4616311 platform/x86: intel_pmc_ipc: Make intel_pmc_gcr_update() static
This function is not called outside of intel_pmc_ipc.c so we can make it
static instead.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-22 18:52:26 +02:00
Mika Westerberg
4907898873 platform/x86: intel_scu_ipc: Drop intel_scu_ipc_raw_command()
There is no user for this function so we can drop it from the driver.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-22 18:52:17 +02:00
Mika Westerberg
b7380a1626 platform/x86: intel_scu_ipc: Drop intel_scu_ipc_io[read|write][8|16]()
There are no users for these so we can remove them.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-22 18:52:17 +02:00
Mika Westerberg
a5f04a2e5e platform/x86: intel_scu_ipc: Drop unused prototype intel_scu_ipc_fw_update()
There is no implementation for that anymore so drop the prototype.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-22 18:52:16 +02:00
Mika Westerberg
74e9748b9b platform/x86: intel_scu_ipc: Drop intel_scu_ipc_i2c_cntrl()
There are no existing users for this functionality so drop it from the
driver completely. This also means we don't need to keep the struct
intel_scu_ipc_pdata_t around anymore so remove that as well.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-22 18:52:16 +02:00
Sean Christopherson
5ae78e95ed KVM: x86: Add dedicated emulator helpers for querying CPUID features
Add feature-specific helpers for querying guest CPUID support from the
emulator instead of having the emulator do a full CPUID and perform its
own bit tests.  The primary motivation is to eliminate the emulator's
usage of bit() so that future patches can add more extensive build-time
assertions on the usage of bit() without having to expose yet more code
to the emulator.

Note, providing a generic guest_cpuid_has() to the emulator doesn't work
due to the existing built-time assertions in guest_cpuid_has(), which
require the feature being checked to be a compile-time constant.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:58:22 +01:00
Miaohe Lin
311497e0c5 KVM: Fix some writing mistakes
Fix some writing mistakes in the comments.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:57:44 +01:00
Wanpeng Li
1e9e2622a1 KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath
ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our
product observation, multicast IPIs are not as common as unicast IPI like
RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc.

This patch introduce a mechanism to handle certain performance-critical
WRMSRs in a very early stage of KVM VMExit handler.

This mechanism is specifically used for accelerating writes to x2APIC ICR
that attempt to send a virtual IPI with physical destination-mode, fixed
delivery-mode and single target. Which was found as one of the main causes
of VMExits for Linux workloads.

The reason this mechanism significantly reduce the latency of such virtual
IPIs is by sending the physical IPI to the target vCPU in a very early stage
of KVM VMExit handler, before host interrupts are enabled and before expensive
operations such as reacquiring KVM’s SRCU lock.
Latency is reduced even more when KVM is able to use APICv posted-interrupt
mechanism (which allows to deliver the virtual IPI directly to target vCPU
without the need to kick it to host).

Testing on Xeon Skylake server:

The virtual IPI latency from sender send to receiver receive reduces
more than 200+ cpu cycles.

Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:57:12 +01:00
Ard Biesheuvel
1f299fad1e efi/x86: Limit EFI old memory map to SGI UV machines
We carry a quirk in the x86 EFI code to switch back to an older
method of mapping the EFI runtime services memory regions, because
it was deemed risky at the time to implement a new method without
providing a fallback to the old method in case problems arose.

Such problems did arise, but they appear to be limited to SGI UV1
machines, and so these are the only ones for which the fallback gets
enabled automatically (via a DMI quirk). The fallback can be enabled
manually as well, by passing efi=old_map, but there is very little
evidence that suggests that this is something that is being relied
upon in the field.

Given that UV1 support is not enabled by default by the distros
(Ubuntu, Fedora), there is no point in carrying this fallback code
all the time if there are no other users. So let's move it into the
UV support code, and document that efi=old_map now requires this
support code to be enabled.

Note that efi=old_map has been used in the past on other SGI UV
machines to work around kernel regressions in production, so we
keep the option to enable it by hand, but only if the kernel was
built with UV support.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-8-ardb@kernel.org
2020-01-20 08:13:01 +01:00
Ard Biesheuvel
796eb8d26a efi/libstub/x86: Use const attribute for efi_is_64bit()
Reshuffle the x86 stub code a bit so that we can tag the efi_is_64bit()
function with the 'const' attribute, which permits the compiler to
optimize away any redundant calls. Since we have two different entry
points for 32 and 64 bit firmware in the startup code, this also
simplifies the C code since we'll enter it with the efi_is64 variable
already set.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-2-ardb@kernel.org
2020-01-20 08:13:00 +01:00
Wei Liu
538f127cd3 x86/hyper-v: Add "polling" bit to hv_synic_sint
That bit is documented in TLFS 5.0c as follows:

  Setting the polling bit will have the effect of unmasking an
  interrupt source, except that an actual interrupt is not generated.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20191222233404.1629-1-wei.liu@kernel.org
2020-01-17 14:38:21 +01:00
Yazen Ghannam
89a76171bf x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType
Add support for a new version of the Load Store unit bank type as
indicated by its McaType value, which will be present in future SMCA
systems.

Add the new (HWID, MCATYPE) tuple. Reuse the same name, since this is
logically the same to the user.

Also, add the new error descriptions to edac_mce_amd.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200110015651.14887-2-Yazen.Ghannam@amd.com
2020-01-16 17:09:02 +01:00
Dmitry Safonov
550a77a74c x86/vdso: Add time napespace page
To support time namespaces in the VDSO with a minimal impact on regular non
time namespace affected tasks, the namespace handling needs to be hidden in
a slow path.

The most obvious place is vdso_seq_begin(). If a task belongs to a time
namespace then the VVAR page which contains the system wide VDSO data is
replaced with a namespace specific page which has the same layout as the
VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

Allocate the time namespace page among VVAR pages and place vdso_data on
it.  Provide __arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-23-dima@arista.com
2020-01-14 12:20:58 +01:00
Dmitry Safonov
64b302ab66 x86/vdso: Provide vdso_data offset on vvar_page
VDSO support for time namespaces needs to set up a page with the same
layout as VVAR. That timens page will be placed on position of VVAR page
inside namespace. That page has vdso_data->seq set to 1 to enforce
the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce
the time namespace handling path.

To prepare the time namespace page the kernel needs to know the vdso_data
offset.  Provide arch_get_vdso_data() helper for locating vdso_data on VVAR
page.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com
2020-01-14 12:20:57 +01:00
Vincenzo Frascino
0b5c12332d x86/vdso: Remove unused VDSO_HAS_32BIT_FALLBACK
VDSO_HAS_32BIT_FALLBACK has been removed from the core since
the architectures that support the generic vDSO library have
been converted to support the 32 bit fallbacks.

Remove unused VDSO_HAS_32BIT_FALLBACK from x86 vdso.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20190830135902.20861-9-vincenzo.frascino@arm.com
2020-01-14 12:20:46 +01:00
Sean Christopherson
616c59b523 perf/x86: Provide stubs of KVM helpers for non-Intel CPUs
Provide stubs for perf_guest_get_msrs() and intel_pt_handle_vmx() when
building without support for Intel CPUs, i.e. CPU_SUP_INTEL=n.  Lack of
stubs is not currently a problem as the only user, KVM_INTEL, takes a
dependency on CPU_SUP_INTEL=y.  Provide the stubs for all CPUs so that
KVM_INTEL can be built for any CPU with compatible hardware support,
e.g. Centuar and Zhaoxin CPUs.

Note, the existing stub for perf_guest_get_msrs() is essentially dead
code as KVM selects CONFIG_PERF_EVENTS, i.e. the only user guarantees
the full implementation is built.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-19-sean.j.christopherson@intel.com
2020-01-13 19:33:56 +01:00
Sean Christopherson
b39033f504 KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits
Define the VMCS execution control flags (consumed by KVM) using their
associated VMX_FEATURE_* to provide a strong hint that new VMX features
are expected to be added to VMX_FEATURE and considered for reporting via
/proc/cpuinfo.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-18-sean.j.christopherson@intel.com
2020-01-13 19:29:16 +01:00
Sean Christopherson
85c17291e2 x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured
Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether
IA32_FEAT_CTL has been initialized.  This will allow KVM, and any future
subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures
to query platform support, e.g. allows a future patch to remove KVM's
manual IA32_FEAT_CTL MSR checks.

Various features (on platforms that support IA32_FEAT_CTL) are dependent
on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE.  The
MSR is always configured during boot, but only if the CPU vendor is
recognized by the kernel.  Because CPUID doesn't incorporate the current
IA32_FEAT_CTL value in its reporting of relevant features, it's possible
for a feature to be reported as supported in cpufeatures but not truly
enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize
the CPU.

As a result, without the flag, KVM would see VMX as supported even if
IA32_FEAT_CTL hasn't been initialized, and so would need to manually
read the MSR and check the various enabling bits to avoid taking an
unexpected #GP on VMXON.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-14-sean.j.christopherson@intel.com
2020-01-13 18:49:00 +01:00
Sean Christopherson
b47ce1fed4 x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill
the capabilities during IA32_FEAT_CTL MSR initialization.

Make the VMX capabilities dependent on IA32_FEAT_CTL and
X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't
possibly support VMX, or when /proc/cpuinfo is not available.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-11-sean.j.christopherson@intel.com
2020-01-13 18:02:53 +01:00
Sean Christopherson
159348784f x86/vmx: Introduce VMX_FEATURES_*
Add a VMX-specific variant of X86_FEATURE_* flags, which will eventually
supplant the synthetic VMX flags defined in cpufeatures word 8.  Use the
Intel-defined layouts for the major VMX execution controls so that their
word entries can be directly populated from their respective MSRs, and
so that the VMX_FEATURE_* flags can be used to define the existing bit
definitions in asm/vmx.h, i.e. force developers to define a VMX_FEATURE
flag when adding support for a new hardware feature.

The majority of Intel's (and compatible CPU's) VMX capabilities are
enumerated via MSRs and not CPUID, i.e. querying /proc/cpuinfo doesn't
naturally provide any insight into the virtualization capabilities of
VMX enabled CPUs.  Commit

  e38e05a858 ("x86: extended "flags" to show virtualization HW feature
		 in /proc/cpuinfo")

attempted to address the issue by synthesizing select VMX features into
a Linux-defined word in cpufeatures.

Lack of reporting of VMX capabilities via /proc/cpuinfo is problematic
because there is no sane way for a user to query the capabilities of
their platform, e.g. when trying to find a platform to test a feature or
debug an issue that has a hardware dependency.  Lack of reporting is
especially problematic when the user isn't familiar with VMX, e.g. the
format of the MSRs is non-standard, existence of some MSRs is reported
by bits in other MSRs, several "features" from KVM's point of view are
enumerated as 3+ distinct features by hardware, etc...

The synthetic cpufeatures approach has several flaws:

  - The set of synthesized VMX flags has become extremely stale with
    respect to the full set of VMX features, e.g. only one new flag
    (EPT A/D) has been added in the the decade since the introduction of
    the synthetic VMX features.  Failure to keep the VMX flags up to
    date is likely due to the lack of a mechanism that forces developers
    to consider whether or not a new feature is worth reporting.

  - The synthetic flags may incorrectly be misinterpreted as affecting
    kernel behavior, i.e. KVM, the kernel's sole consumer of VMX,
    completely ignores the synthetic flags.

  - New CPU vendors that support VMX have duplicated the hideous code
    that propagates VMX features from MSRs to cpufeatures.  Bringing the
    synthetic VMX flags up to date would exacerbate the copy+paste
    trainwreck.

Define separate VMX_FEATURE flags to set the stage for enumerating VMX
capabilities outside of the cpu_has() framework, and for adding
functional usage of VMX_FEATURE_* to help ensure the features reported
via /proc/cpuinfo is up to date with respect to kernel recognition of
VMX capabilities.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-10-sean.j.christopherson@intel.com
2020-01-13 17:57:26 +01:00
Sean Christopherson
32ad73db7f x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation.  Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.

Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.

Opportunistically, fix a few other annoyances with the defines:

  - Relocate the bit defines so that they immediately follow the MSR
    define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
  - Add whitespace around the block of feature control defines to make
    it clear they're all related.
  - Use BIT() instead of manually encoding the bit shift.
  - Use "VMX" instead of "VMXON" to match the SDM.
  - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
    be consistent with the kernel's verbiage used for all other feature
    control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
    likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
    the (literal) one-off usage of _ON, the SDM is simply "wrong".

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2020-01-13 17:23:08 +01:00
Jan H. Schönherr
8438b84ab4 x86/mce: Take action on UCNA/Deferred errors again
Commit

  fa92c58694 ("x86, mce: Support memory error recovery for both UCNA
		and Deferred error in machine_check_poll")

added handling of UCNA and Deferred errors by adding them to the ring
for SRAO errors.

Later, commit

  fd4cf79fcc ("x86/mce: Remove the MCE ring for Action Optional errors")

switched storage from the SRAO ring to the unified pool that is still
in use today. In order to only act on the intended errors, a filter
for MCE_AO_SEVERITY is used -- effectively removing handling of
UCNA/Deferred errors again.

Extend the severity filter to include UCNA/Deferred errors again.
Also, generalize the naming of the notifier from SRAO to UC to capture
the extended scope.

Note, that this change may cause a message like the following to appear,
as the same address may be reported as SRAO and as UCNA:

 Memory failure: 0x5fe3284: already hardware poisoned

Technically, this is a return to previous behavior.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200103150722.20313-2-jschoenh@amazon.de
2020-01-13 10:07:23 +01:00
Changbin Du
248ed51048 x86/nmi: Remove irq_work from the long duration NMI handler
First, printk() is NMI-context safe now since the safe printk() has been
implemented and it already has an irq_work to make NMI-context safe.

Second, this NMI irq_work actually does not work if a NMI handler causes
panic by watchdog timeout. It has no chance to run in such case, while
the safe printk() will flush its per-cpu buffers before panicking.

While at it, repurpose the irq_work callback into a function which
concentrates the NMI duration checking and makes the code easier to
follow.

 [ bp: Massage. ]

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200111125427.15662-1-changbin.du@gmail.com
2020-01-11 15:55:39 +01:00
Matthew Garrett
4444f8541d efi: Allow disabling PCI busmastering on bridges during boot
Add an option to disable the busmaster bit in the control register on
all PCI bridges before calling ExitBootServices() and passing control
to the runtime kernel. System firmware may configure the IOMMU to prevent
malicious PCI devices from being able to attack the OS via DMA. However,
since firmware can't guarantee that the OS is IOMMU-aware, it will tear
down IOMMU configuration when ExitBootServices() is called. This leaves
a window between where a hostile device could still cause damage before
Linux configures the IOMMU again.

If CONFIG_EFI_DISABLE_PCI_DMA is enabled or "efi=disable_early_pci_dma"
is passed on the command line, the EFI stub will clear the busmaster bit
on all PCI bridges before ExitBootServices() is called. This will
prevent any malicious PCI devices from being able to perform DMA until
the kernel reenables busmastering after configuring the IOMMU.

This option may cause failures with some poorly behaved hardware and
should not be enabled without testing. The kernel commandline options
"efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma" may be
used to override the default. Note that PCI devices downstream from PCI
bridges are disconnected from their drivers first, using the UEFI
driver model API, so that DMA can be disabled safely at the bridge
level.

[ardb: disconnect PCI I/O handles first, as suggested by Arvind]

Co-developed-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <matthewgarrett@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-18-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:55:04 +01:00
Arvind Sankar
ea7d87f98f efi/x86: Allow translating 64-bit arguments for mixed mode calls
Introduce the ability to define macros to perform argument translation
for the calls that need it, and define them for the boot services that
we currently use.

When calling 32-bit firmware methods in mixed mode, all output
parameters that are 32-bit according to the firmware, but 64-bit in the
kernel (ie OUT UINTN * or OUT VOID **) must be initialized in the
kernel, or the upper 32 bits may contain garbage. Define macros that
zero out the upper 32 bits of the output before invoking the firmware
method.

When a 32-bit EFI call takes 64-bit arguments, the mixed-mode call must
push the two 32-bit halves as separate arguments onto the stack. This
can be achieved by splitting the argument into its two halves when
calling the assembler thunk. Define a macro to do this for the
free_pages boot service.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-17-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:55:04 +01:00
Arvind Sankar
14b864f4b5 efi/x86: Check number of arguments to variadic functions
On x86 we need to thunk through assembler stubs to call the EFI services
for mixed mode, and for runtime services in 64-bit mode. The assembler
stubs have limits on how many arguments it handles. Introduce a few
macros to check that we do not try to pass too many arguments to the
stubs.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-16-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:55:04 +01:00
Ard Biesheuvel
ea5e1919b4 efi/x86: Simplify mixed mode call wrapper
Calling 32-bit EFI runtime services from a 64-bit OS involves
switching back to the flat mapping with a stack carved out of
memory that is 32-bit addressable.

There is no need to actually execute the 64-bit part of this
routine from the flat mapping as well, as long as the entry
and return address fit in 32 bits. There is also no need to
preserve part of the calling context in global variables: we
can simply push the old stack pointer value to the new stack,
and keep the return address from the code32 section in EBX.

While at it, move the conditional check whether to invoke
the mixed mode version of SetVirtualAddressMap() into the
64-bit implementation of the wrapper routine.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-11-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:55:03 +01:00
Ard Biesheuvel
a46d674068 efi/x86: Simplify i386 efi_call_phys() firmware call wrapper
The variadic efi_call_phys() wrapper that exists on i386 was
originally created to call into any EFI firmware runtime service,
but in practice, we only use it once, to call SetVirtualAddressMap()
during early boot.
The flexibility provided by the variadic nature also makes it
type unsafe, and makes the assembler code more complicated than
needed, since it has to deal with an unknown number of arguments
living on the stack.

So clean this up, by renaming the helper to efi_call_svam(), and
dropping the unneeded complexity. Let's also drop the reference
to the efi_phys struct and grab the address from the EFI system
table directly.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-9-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:55:02 +01:00
Ard Biesheuvel
6982947045 efi/x86: Split SetVirtualAddresMap() wrappers into 32 and 64 bit versions
Split the phys_efi_set_virtual_address_map() routine into 32 and 64 bit
versions, so we can simplify them individually in subsequent patches.

There is very little overlap between the logic anyway, and this has
already been factored out in prolog/epilog routines which are completely
different between 32 bit and 64 bit. So let's take it one step further,
and get rid of the overlap completely.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-8-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:55:02 +01:00
Ard Biesheuvel
89ed486532 efi/x86: Avoid redundant cast of EFI firmware service pointer
All EFI firmware call prototypes have been annotated as __efiapi,
permitting us to attach attributes regarding the calling convention
by overriding __efiapi to an architecture specific value.

On 32-bit x86, EFI firmware calls use the plain calling convention
where all arguments are passed via the stack, and cleaned up by the
caller. Let's add this to the __efiapi definition so we no longer
need to cast the function pointers before invoking them.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-6-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:55:02 +01:00
Ard Biesheuvel
6cfcd6f001 efi/x86: Re-disable RT services for 32-bit kernels running on 64-bit EFI
Commit a8147dba75 ("efi/x86: Rename efi_is_native() to efi_is_mixed()")
renamed and refactored efi_is_native() into efi_is_mixed(), but failed
to take into account that these are not diametrical opposites.

Mixed mode is a construct that permits 64-bit kernels to boot on 32-bit
firmware, but there is another non-native combination which is supported,
i.e., 32-bit kernels booting on 64-bit firmware, but only for boot and not
for runtime services. Also, mixed mode can be disabled in Kconfig, in
which case the 64-bit kernel can still be booted from 32-bit firmware,
but without access to runtime services.

Due to this oversight, efi_runtime_supported() now incorrectly returns
true for such configurations, resulting in crashes at boot. So fix this
by making efi_runtime_supported() aware of this.

As a side effect, some efi_thunk_xxx() stubs have become obsolete, so
remove them as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-4-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:55:01 +01:00
Ingo Molnar
57ad87ddce Merge branch 'x86/mm' into efi/core, to pick up dependencies
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:53:14 +01:00
Andy Shevchenko
e883cafd8d platform/x86: intel_telemetry_pltdrv: use devm_platform_ioremap_resource()
Use devm_platform_ioremap_resource() to simplify the code a bit.

While here, drop initialized but unused ssram_base_addr and ssram_size members.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-10 11:50:32 +02:00
Eric Biggers
674f368a95 crypto: remove CRYPTO_TFM_RES_BAD_KEY_LEN
The CRYPTO_TFM_RES_BAD_KEY_LEN flag was apparently meant as a way to
make the ->setkey() functions provide more information about errors.

However, no one actually checks for this flag, which makes it pointless.

Also, many algorithms fail to set this flag when given a bad length key.
Reviewing just the generic implementations, this is the case for
aes-fixed-time, cbcmac, echainiv, nhpoly1305, pcrypt, rfc3686, rfc4309,
rfc7539, rfc7539esp, salsa20, seqiv, and xcbc.  But there are probably
many more in arch/*/crypto/ and drivers/crypto/.

Some algorithms can even set this flag when the key is the correct
length.  For example, authenc and authencesn set it when the key payload
is malformed in any way (not just a bad length), the atmel-sha and ccree
drivers can set it if a memory allocation fails, and the chelsio driver
sets it for bad auth tag lengths, not just bad key lengths.

So even if someone actually wanted to start checking this flag (which
seems unlikely, since it's been unused for a long time), there would be
a lot of work needed to get it working correctly.  But it would probably
be much better to go back to the drawing board and just define different
return values, like -EINVAL if the key is invalid for the algorithm vs.
-EKEYREJECTED if the key was rejected by a policy like "no weak keys".
That would be much simpler, less error-prone, and easier to test.

So just remove this flag.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-09 11:30:53 +08:00
Brian Gerst
2b10906f2d x86: Remove force_iret()
force_iret() was originally intended to prevent the return to user mode with
the SYSRET or SYSEXIT instructions, in cases where the register state could
have been changed to be incompatible with those instructions.  The entry code
has been significantly reworked since then, and register state is validated
before SYSRET or SYSEXIT are used.  force_iret() no longer serves its original
purpose and can be eliminated.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20191219115812.102620-1-brgerst@gmail.com
2020-01-08 19:40:51 +01:00
Sean Christopherson
736c291c9f KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM
Convert a plethora of parameters and variables in the MMU and page fault
flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM.

Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical
addresses.  When TDP is enabled, the fault address is a guest physical
address and thus can be a 64-bit value, even when both KVM and its guest
are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a
64-bit field, not a natural width field.

Using a gva_t for the fault address means KVM will incorrectly drop the
upper 32-bits of the GPA.  Ditto for gva_to_gpa() when it is used to
translate L2 GPAs to L1 GPAs.

Opportunistically rename variables and parameters to better reflect the
dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain
"addr" instead of "vaddr" when the address may be either a GVA or an L2
GPA.  Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid
a confusing "gpa_t gva" declaration; this also sets the stage for a
future patch to combing nonpaging_page_fault() and tdp_page_fault() with
minimal churn.

Sprinkle in a few comments to document flows where an address is known
to be a GVA and thus can be safely truncated to a 32-bit value.  Add
WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help
document such cases and detect bugs.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08 18:16:02 +01:00
Xiaoyao Li
5e3d394fdd KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING
The mis-spelling is found by checkpatch.pl, so fix them.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08 18:15:59 +01:00
Xiaoyao Li
4e2a0bc56a KVM: VMX: Rename NMI_PENDING to NMI_WINDOW
Rename the NMI-window exiting related definitions to match the latest
Intel SDM. No functional changes.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08 18:15:59 +01:00
Xiaoyao Li
9dadc2f918 KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOW
Rename interrupt-windown exiting related definitions to match the
latest Intel SDM. No functional changes.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08 18:15:59 +01:00
Peter Xu
c96001c570 KVM: X86: Use APIC_DEST_* macros properly in kvm_lapic_irq.dest_mode
We were using either APIC_DEST_PHYSICAL|APIC_DEST_LOGICAL or 0|1 to
fill in kvm_lapic_irq.dest_mode.  It's fine only because in most cases
when we check against dest_mode it's against APIC_DEST_PHYSICAL (which
equals to 0).  However, that's not consistent.  We'll have problem
when we want to start checking against APIC_DEST_LOGICAL, which does
not equals to 1.

This patch firstly introduces kvm_lapic_irq_dest_mode() helper to take
any boolean of destination mode and return the APIC_DEST_* macro.
Then, it replaces the 0|1 settings of irq.dest_mode with the helper.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08 17:33:14 +01:00
Tony Luck
f444a5ff95 x86/cpufeatures: Add support for fast short REP; MOVSB
>From the Intel Optimization Reference Manual:

3.7.6.1 Fast Short REP MOVSB
Beginning with processors based on Ice Lake Client microarchitecture,
REP MOVSB performance of short operations is enhanced. The enhancement
applies to string lengths between 1 and 128 bytes long.  Support for
fast-short REP MOVSB is enumerated by the CPUID feature flag: CPUID
[EAX=7H, ECX=0H).EDX.FAST_SHORT_REP_MOVSB[bit 4] = 1. There is no change
in the REP STOS performance.

Add an X86_FEATURE_FSRM flag for this.

memmove() avoids REP MOVSB for short (< 32 byte) copies. Check FSRM and
use REP MOVSB for short copies on systems that support it.

 [ bp: Massage and add comment. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191216214254.26492-1-tony.luck@intel.com
2020-01-08 11:29:25 +01:00
Arnd Bergmann
202bf8d758 compat: provide compat_ptr() on all architectures
In order to avoid needless #ifdef CONFIG_COMPAT checks,
move the compat_ptr() definition to linux/compat.h
where it can be seen by any file regardless of the
architecture.

Only s390 needs a special definition, this can use the
self-#define trick we have elsewhere.

Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-01-03 09:32:51 +01:00
Anthony Steinhauser
fae7bfcc78 x86/nospec: Remove unused RSB_FILL_LOOPS
It was never really used, see

  117cc7a908 ("x86/retpoline: Fill return stack buffer on vmexit")

  [ bp: Massage. ]

Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191226204512.24524-1-asteinhauser@google.com
2020-01-02 10:54:53 +01:00
Jann Horn
aa49f20462 x86/dumpstack: Introduce die_addr() for die() with #GP fault address
Split __die() into __die_header() and __die_body(). This allows inserting
extra information below the header line that initiates the bug report.

Introduce a new function die_addr() that behaves like die(), but is for
faults only and uses __die_header() and __die_body() so that a future
commit can print extra information after the header line.

 [ bp: Comment the KASAN-specific usage of gp_addr. ]

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191218231150.12139-3-jannh@google.com
2019-12-31 13:11:35 +01:00
Jann Horn
7be4412721 x86/insn-eval: Add support for 64-bit kernel mode
To support evaluating 64-bit kernel mode instructions:

* Replace existing checks for user_64bit_mode() with a new helper that
checks whether code is being executed in either 64-bit kernel mode or
64-bit user mode.

* Select the GS base depending on whether the instruction is being
evaluated in kernel mode.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191218231150.12139-1-jannh@google.com
2019-12-30 20:17:15 +01:00
Ingo Molnar
28336be568 Linux 5.5-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4JNtkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGdN0H/3UI6LHOx1ol3/7L
 TwgMibg2pNxNU05bowDjQt92+Hgj9JM0TeFBsfr5hLaeKBgeVCPr5xK/vH09NlKu
 otVGbhBLpl9OAUu9znTfbt4bcqhJKlr/K0mS5e1vPsXvZ3wdHS27trwjgyu16/pP
 NJwkcs5/VRYVC/SrZay2NvheKN+DoGSd4+ZlJprwtAAVMdbEvoaGqRLGKLfLeDMc
 Z04w8AKhnKIxSkt+eEmuW9+pAQJUAkk4QVjixcJe8q0QpA1XIj965yvE8+XpjbLo
 eFxupmZq4S2JdCjsa+iBferJ5juR1FVhbHSbZtLsTtkPVegI9ug911WQ+KiCqErI
 VkiKUl8=
 =rNsn
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc4' into locking/kcsan, to resolve conflicts

Conflicts:
	init/main.c
	lib/Kconfig.debug

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-30 08:10:51 +01:00
Ard Biesheuvel
966291f634 efi/libstub: Rename efi_call_early/_runtime macros to be more intuitive
The macros efi_call_early and efi_call_runtime are used to call EFI
boot services and runtime services, respectively. However, the naming
is confusing, given that the early vs runtime distinction may suggest
that these are used for calling the same set of services either early
or late (== at runtime), while in reality, the sets of services they
can be used with are completely disjoint, and efi_call_runtime is also
only usable in 'early' code.

So do a global sweep to replace all occurrences with efi_bs_call or
efi_rt_call, respectively, where BS and RT match the idiom used by
the UEFI spec to refer to boot time or runtime services.

While at it, use 'func' as the macro parameter name for the function
pointers, which is less likely to collide and cause weird build errors.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-24-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:49:25 +01:00
Ard Biesheuvel
99ea8b1db2 efi/libstub: Drop 'table' argument from efi_table_attr() macro
None of the definitions of the efi_table_attr() still refer to
their 'table' argument so let's get rid of it entirely.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-23-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:49:24 +01:00
Ard Biesheuvel
47c0fd39b7 efi/libstub: Drop protocol argument from efi_call_proto() macro
After refactoring the mixed mode support code, efi_call_proto()
no longer uses its protocol argument in any of its implementation,
so let's remove it altogether.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-22-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:49:24 +01:00
Ard Biesheuvel
c3710de506 efi/libstub/x86: Drop __efi_early() export and efi_config struct
The various pointers we stash in the efi_config struct which we
retrieve using __efi_early() are simply copies of the ones in
the EFI system table, which we have started accessing directly
in the previous patch. So drop all the __efi_early() related
plumbing, as well as all the assembly code dealing with efi_config,
which allows us to move the PE/COFF entry point to C code as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-18-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:49:22 +01:00
Ard Biesheuvel
afc4cc71cf efi/libstub/x86: Avoid thunking for native firmware calls
We use special wrapper routines to invoke firmware services in the
native case as well as the mixed mode case. For mixed mode, the need
is obvious, but for the native cases, we can simply rely on the
compiler to generate the indirect call, given that GCC now has
support for the MS calling convention (and has had it for quite some
time now). Note that on i386, the decompressor and the EFI stub are not
built with -mregparm=3 like the rest of the i386 kernel, so we can
safely allow the compiler to emit the indirect calls here as well.

So drop all the wrappers and indirection, and switch to either native
calls, or direct calls into the thunk routine for mixed mode.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-14-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:49:20 +01:00
Ard Biesheuvel
f958efe975 efi/libstub: Distinguish between native/mixed not 32/64 bit
Currently, we support mixed mode by casting all boot time firmware
calls to 64-bit explicitly on native 64-bit systems, and to 32-bit
on 32-bit systems or 64-bit systems running with 32-bit firmware.

Due to this explicit awareness of the bitness in the code, we do a
lot of casting even on generic code that is shared with other
architectures, where mixed mode does not even exist. This casting
leads to loss of coverage of type checking by the compiler, which
we should try to avoid.

So instead of distinguishing between 32-bit vs 64-bit, distinguish
between native vs mixed, and limit all the nasty casting and
pointer mangling to the code that actually deals with mixed mode.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-10-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:49:17 +01:00
Ard Biesheuvel
a8147dba75 efi/x86: Rename efi_is_native() to efi_is_mixed()
The ARM architecture does not permit combining 32-bit and 64-bit code
at the same privilege level, and so EFI mixed mode is strictly a x86
concept.

In preparation of turning the 32/64 bit distinction in shared stub
code to a native vs mixed one, refactor x86's current use of the
helper function efi_is_native() into efi_is_mixed().

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-7-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:49:16 +01:00
Ard Biesheuvel
58ec655a75 efi/libstub: Remove unused __efi_call_early() macro
The macro __efi_call_early() is defined by various architectures but
never used. Let's get rid of it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-6-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:49:15 +01:00
Zhang Rui
b2d32af0bf x86/cpu: Add Jasper Lake to Intel family
Japser Lake is an Atom family processor.
It uses Tremont cores and is targeted at mobile platforms.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-12-20 10:07:10 +01:00
Sean Christopherson
957a227d41 x86/boot: Fix a comment's incorrect file reference
Fix the comment for 'struct real_mode_header' to reference the correct
assembly file, realmode/rm/header.S.  The comment has always incorrectly
referenced realmode.S, which doesn't exist, as defining the associated
asm blob.

Specify the file's path relative to arch/x86 to avoid confusion with
boot/header.S.  Update the comment for 'struct trampoline_header' to
also include the relative path to keep things consistent, and tweak the
dual 64/32 reference so that it doesn't appear to be an extension of the
relative path, i.e. avoid "realmode/rm/trampoline_32/64.S".

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191126195911.3429-1-sean.j.christopherson@intel.com
2019-12-16 14:09:33 +01:00
Borislav Petkov
72c2ce9867 x86/bugs: Move enum taa_mitigations to bugs.c
... because it is used only there.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20191112221823.19677-1-bp@alien8.de
2019-12-14 16:06:33 +01:00
Valdis Klētnieks
82c881b28a x86/microcode/AMD: Make stub function static inline
When building with C=1 W=1 (and when CONFIG_MICROCODE_AMD=n, as Luc Van
Oostenryck correctly points out) both sparse and gcc complain:

  CHECK   arch/x86/kernel/cpu/microcode/core.c
  ./arch/x86/include/asm/microcode_amd.h:56:6: warning: symbol \
	  'reload_ucode_amd' was not declared. Should it be static?
    CC      arch/x86/kernel/cpu/microcode/core.o
  In file included from arch/x86/kernel/cpu/microcode/core.c:36:
  ./arch/x86/include/asm/microcode_amd.h:56:6: warning: no previous \
	  prototype for 'reload_ucode_amd' [-Wmissing-prototypes]
     56 | void reload_ucode_amd(void) {}
        |      ^~~~~~~~~~~~~~~~

And they're right - that function can be a static inline like its
brethren.

Signed-off-by: Valdis Klētnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/52170.1575603873@turing-police
2019-12-12 22:29:00 +01:00
Kees Cook
9c1e8836ed crypto: x86 - Regularize glue function prototypes
The crypto glue performed function prototype casting via macros to make
indirect calls to assembly routines. Instead of performing casts at the
call sites (which trips Control Flow Integrity prototype checking), switch
each prototype to a common standard set of arguments which allows the
removal of the existing macros. In order to keep pointer math unchanged,
internal casting between u128 pointers and u8 pointers is added.

Co-developed-by: João Moreira <joao.moreira@intel.com>
Signed-off-by: João Moreira <joao.moreira@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-12-11 16:36:54 +08:00
Sean Christopherson
960786422f x86/ACPI/sleep: Move acpi_get_wakeup_address() into sleep.c, remove <asm/realmode.h> from <asm/acpi.h>
Move the definition of acpi_get_wakeup_address() into sleep.c to break
linux/acpi.h's dependency (by way of asm/acpi.h) on asm/realmode.h.
Everyone and their mother includes linux/acpi.h, i.e. modifying
realmode.h results in a full kernel rebuild, which makes the already
inscrutable real mode boot code even more difficult to understand and is
positively rage inducing when trying to make changes to x86's boot flow.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/r/20191126165417.22423-13-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:15:48 +01:00
Sean Christopherson
8c53b318b2 ACPI/sleep: Convert acpi_wakeup_address into a function
Convert acpi_wakeup_address from a raw variable into a function so that
x86 can wrap its dereference of the real mode boot header in a function
instead of broadcasting it to the world via a #define.  This sets the
stage for a future patch to move x86's definition of the new function,
acpi_get_wakeup_address(), out of asm/acpi.h and thus break acpi.h's
dependency on asm/realmode.h.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/r/20191126165417.22423-12-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:15:48 +01:00
Ingo Molnar
186525bd6b mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions
- Untangle the somewhat incestous way of how VMALLOC_START is used all across the
  kernel, but is, on x86, defined deep inside one of the lowest level page table headers.
  It doesn't help that vmalloc.h only includes a single asm header:

     #include <asm/page.h>           /* pgprot_t */

  So there was no existing cross-arch way to decouple address layout
  definitions from page.h details. I used this:

   #ifndef VMALLOC_START
   # include <asm/vmalloc.h>
   #endif

  This way every architecture that wants to simplify page.h can do so.

- Also on x86 we had a couple of LDT related inline functions that used
  the late-stage address space layout positions - but these could be
  uninlined without real trouble - the end result is cleaner this way as
  well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Ingo Molnar
1f059dfdf5 mm/vmalloc: Add empty <asm/vmalloc.h> headers and use them from <linux/vmalloc.h>
In the x86 MM code we'd like to untangle various types of historic
header dependency spaghetti, but for this we'd need to pass to
the generic vmalloc code various vmalloc related defines that
customarily come via the <asm/page.h> low level arch header.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Ingo Molnar
4efb566491 x86/mm: Tabulate the page table encoding definitions
I got lost in trying to figure out which bits were enabled
in one of the PTE masks, so let's make it pretty
obvious at the definition site already:

 #define PAGE_NONE            __pg(   0|   0|   0|___A|   0|   0|   0|___G)
 #define PAGE_SHARED          __pg(__PP|__RW|_USR|___A|__NX|   0|   0|   0)
 #define PAGE_SHARED_EXEC     __pg(__PP|__RW|_USR|___A|   0|   0|   0|   0)
 #define PAGE_COPY_NOEXEC     __pg(__PP|   0|_USR|___A|__NX|   0|   0|   0)
 #define PAGE_COPY_EXEC       __pg(__PP|   0|_USR|___A|   0|   0|   0|   0)
 #define PAGE_COPY            __pg(__PP|   0|_USR|___A|__NX|   0|   0|   0)
 #define PAGE_READONLY        __pg(__PP|   0|_USR|___A|__NX|   0|   0|   0)
 #define PAGE_READONLY_EXEC   __pg(__PP|   0|_USR|___A|   0|   0|   0|   0)

 #define __PAGE_KERNEL            (__PP|__RW|   0|___A|__NX|___D|   0|___G)
 #define __PAGE_KERNEL_EXEC       (__PP|__RW|   0|___A|   0|___D|   0|___G)
 #define _KERNPG_TABLE_NOENC      (__PP|__RW|   0|___A|   0|___D|   0|   0)
 #define _KERNPG_TABLE            (__PP|__RW|   0|___A|   0|___D|   0|   0| _ENC)
 #define _PAGE_TABLE_NOENC        (__PP|__RW|_USR|___A|   0|___D|   0|   0)
 #define _PAGE_TABLE              (__PP|__RW|_USR|___A|   0|___D|   0|   0| _ENC)
 #define __PAGE_KERNEL_RO         (__PP|   0|   0|___A|__NX|___D|   0|___G)
 #define __PAGE_KERNEL_RX         (__PP|   0|   0|___A|   0|___D|   0|___G)
 #define __PAGE_KERNEL_NOCACHE    (__PP|__RW|   0|___A|__NX|___D|   0|___G| __NC)
 #define __PAGE_KERNEL_VVAR       (__PP|   0|_USR|___A|__NX|___D|   0|___G)
 #define __PAGE_KERNEL_LARGE      (__PP|__RW|   0|___A|__NX|___D|_PSE|___G)
 #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW|   0|___A|   0|___D|_PSE|___G)
 #define __PAGE_KERNEL_WP         (__PP|__RW|   0|___A|__NX|___D|   0|___G| __WP)

Especially security relevant bits like 'NX' or coherence related bits like 'G'
are now super easy to read based on a single grep.

We do the underscore gymnastics to not pollute the kernel's symbol namespace,
and the longest line still fits into 80 columns, so this should be readable
for everyone.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Ingo Molnar
533d49b37a x86/mm/pat: Clean up <asm/memtype.h> externs
Half of the declarations have an 'extern', half of them not,
use 'extern' consistently.

This makes grepping for APIs easier, such as:

  dagon:~/tip> git grep -E '\<memtype_.*\(' arch/x86/ | grep extern
  arch/x86/include/asm/memtype.h:extern int memtype_reserve(u64 start, u64 end,
  arch/x86/include/asm/memtype.h:extern int memtype_free(u64 start, u64 end);
  arch/x86/include/asm/memtype.h:extern int memtype_kernel_map_sync(u64 base, unsigned long size,
  arch/x86/include/asm/memtype.h:extern int memtype_reserve_io(resource_size_t start, resource_size_t end,
  arch/x86/include/asm/memtype.h:extern void memtype_free_io(resource_size_t start, resource_size_t end);
  arch/x86/mm/pat/memtype.h:extern int memtype_check_insert(struct memtype *entry_new,
  arch/x86/mm/pat/memtype.h:extern struct memtype *memtype_erase(u64 start, u64 end);
  arch/x86/mm/pat/memtype.h:extern struct memtype *memtype_lookup(u64 addr);
  arch/x86/mm/pat/memtype.h:extern int memtype_copy_nth_element(struct memtype *entry_out, loff_t pos);

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Ingo Molnar
eb243d1d28 x86/mm/pat: Rename <asm/pat.h> => <asm/memtype.h>
pat.h is a file whose main purpose is to provide the memtype_*() APIs.

PAT is the low level hardware mechanism - but the high level abstraction
is memtype.

So name the header <memtype.h> as well - this goes hand in hand with memtype.c
and memtype_interval.c.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Ingo Molnar
ecdd6ee77b x86/mm/pat: Standardize on memtype_*() prefix for APIs
Half of our memtype APIs are memtype_ prefixed, the other half are _memtype suffixed:

	reserve_memtype()
	free_memtype()
	kernel_map_sync_memtype()
	io_reserve_memtype()
	io_free_memtype()

	memtype_check_insert()
	memtype_erase()
	memtype_lookup()
	memtype_copy_nth_element()

Use prefixes consistently, like most other modern kernel APIs:

	reserve_memtype()		=> memtype_reserve()
	free_memtype()			=> memtype_free()
	kernel_map_sync_memtype()	=> memtype_kernel_map_sync()
	io_reserve_memtype()		=> memtype_reserve_io()
	io_free_memtype()		=> memtype_free_io()

	memtype_check_insert()		=> memtype_check_insert()
	memtype_erase()			=> memtype_erase()
	memtype_lookup()		=> memtype_lookup()
	memtype_copy_nth_element()	=> memtype_copy_nth_element()

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Ingo Molnar
5557e831f6 x86/mm/pat: Disambiguate PAT-disabled boot messages
Right now we have these four types of PAT-disabled boot messages:

  x86/PAT: PAT support disabled.
  x86/PAT: PAT MSR is 0, disabled.
  x86/PAT: MTRRs disabled, skipping PAT initialization too.
  x86/PAT: PAT not supported by CPU.

The first message is ambiguous in that it doesn't signal that PAT is off
due to a boot option.

The second message doesn't really make it clear that this is the MSR value
during early bootup and it's the firmware environment that disabled PAT
support.

The fourth message doesn't really make it clear that we disable PAT support
because CONFIG_MTRR is off in the kernel.

Clarify, harmonize and fix the spelling in these user-visible messages:

  x86/PAT: PAT support disabled via boot option.
  x86/PAT: PAT support disabled by the firmware.
  x86/PAT: PAT support disabled because CONFIG_MTRR is disabled in the kernel.
  x86/PAT: PAT not supported by the CPU.

Also add a fifth message, in case PAT support is disabled at build time:

  x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.

Previously we'd just silently return from pat_init() without giving any indication
that PAT support is off.

Finally, clarify/extend some of the comments related to PAT initialization.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Ingo Molnar
2040cf9f59 Linux 5.5-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3tf/0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGlKwH/3fTToujuJfTx5E5
 mrARAP65J1L/DxpEKvKRt2bNZo6w13mNd8g7ZPmYChz90bYGvXQSG8hYTU9iAw3O
 yimSTJlNXDhVAluB53XnDdUxIWC4HUZsNxWJNCeXMuiMcGNsTGX+v3f+x7oHCT0P
 jI1RSIsFGjgr0RWqZ8U5aJckQo2xABC1TfYw53K66Oc/JLZpSFJFwMgjf1fD5diU
 HGDA8E2p0u1TQIyNzr86iqMvnlSRYBQwBQn6OgEKCG4Z0NLtXfDF4mqnxsXgLmIH
 oQoFfxaMKXyGWds7ZxwcGWntALCF41ThfpiJWDIyxjWxFEty4bqTCbDPwwyp7ip0
 iuASmTI=
 =YqO2
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc1' into core/kprobes, to resolve conflicts

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:11:00 +01:00
Linus Torvalds
43a2898631 powerpc updates for 5.5 #2
A few commits splitting the KASAN instrumented bitops header in
 three, to match the split of the asm-generic bitops headers.
 
 This is needed on powerpc because we use asm-generic/bitops/non-atomic.h,
 for the non-atomic bitops, whereas the existing KASAN instrumented
 bitops assume all the underlying operations are provided by the arch
 as arch_foo() versions.
 
 Thanks to:
   Daniel Axtens & Christophe Leroy.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl3qN0wTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLWFD/4/e3CO1oWjQdDGjgvhydhmhjMQqI9m
 EINQjg3hl0ceig1HzsgjVdWI4TOf7bXbaQRf8m2pFWpA+iSx4KpjlnXzNjfUDapO
 JqzKYfVSdi0o6OAFigDYuN1V5F2jAgrM7/w5PKpVuiAABcgJNcEY59tgEMTdj9r0
 9H3OekYv0UnZ5ZNsUhCibKVvVLdbtys3ALrm1YETAauCkY/lpNk6afcr33t0iw2l
 WAdd2sfWvw4tBn+35ZrNnv7z4hQ423Imd+lyuI5zhMBOOGspgMxlGGeIn370WyAi
 00Jl66TRot6EtWGDVzV10bjB53qDhHrtNIk0NG2QMBB8vdTjBMtXfJnVc3Q8iVZ9
 GkpdvMLNAlmxa4AuzpdHAOUQK48aggQzDmJkGp/JdYT6+WwTa5SLZcsy+njHGyjU
 It+728FnStilM1XvjnaN9pljqANEcN4/YIycK55XGDsZS9fVRMY/QMAQZbdLHfzc
 nB54Q/8vtPc9H69ws3U3g0ogVtYi5ca5RpTPiYU6WUQfEe9mjZdEgglsHC1y8ef2
 9J3Muv4ASDGLjKN+G4dvKLCIgF+QKtsvwrtSztQq6wVnXUxuUQBEQSCaMPN9PN5g
 Hlkjk95WevXcHyXeE2r8cXndUTboIgWRMZp0AHao44du3EziPMCvE0tFAHOEWfV0
 8Oty9Fo5QSb1Mw==
 =FCVX
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:
 "A few commits splitting the KASAN instrumented bitops header in three,
  to match the split of the asm-generic bitops headers.

  This is needed on powerpc because we use the generic bitops for the
  non-atomic case only, whereas the existing KASAN instrumented bitops
  assume all the underlying operations are provided by the arch as
  arch_foo() versions.

  Thanks to: Daniel Axtens & Christophe Leroy"

* tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  docs/core-api: Remove possibly confusing sub-headings from Bit Operations
  powerpc: support KASAN instrumentation of bitops
  kasan: support instrumented bitops combined with generic bitops
2019-12-06 13:36:31 -08:00
Masahiro Yamada
0fb9dc2867 arch: sembuf.h: make uapi asm/sembuf.h self-contained
Userspace cannot compile <asm/sembuf.h> due to some missing type
definitions.  For example, building it for x86 fails as follows:

    CC      usr/include/asm/sembuf.h.s
  In file included from <command-line>:32:0:
  usr/include/asm/sembuf.h:17:20: error: field `sem_perm' has incomplete type
    struct ipc64_perm sem_perm; /* permissions .. see ipc.h */
                      ^~~~~~~~
  usr/include/asm/sembuf.h:24:2: error: unknown type name `__kernel_time_t'
    __kernel_time_t sem_otime; /* last semop time */
    ^~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:25:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t __unused1;
    ^~~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:26:2: error: unknown type name `__kernel_time_t'
    __kernel_time_t sem_ctime; /* last change time */
    ^~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:27:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t __unused2;
    ^~~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:29:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t sem_nsems; /* no. of semaphores in array */
    ^~~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:30:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t __unused3;
    ^~~~~~~~~~~~~~~~
  usr/include/asm/sembuf.h:31:2: error: unknown type name `__kernel_ulong_t'
    __kernel_ulong_t __unused4;
    ^~~~~~~~~~~~~~~~

It is just a matter of missing include directive.

Include <asm/ipcbuf.h> to make it self-contained, and add it to
the compile-test coverage.

Link: http://lkml.kernel.org/r/20191030063855.9989-3-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-04 19:44:14 -08:00
Masahiro Yamada
9ef0e00418 arch: msgbuf.h: make uapi asm/msgbuf.h self-contained
Userspace cannot compile <asm/msgbuf.h> due to some missing type
definitions.  For example, building it for x86 fails as follows:

    CC      usr/include/asm/msgbuf.h.s
  In file included from usr/include/asm/msgbuf.h:6:0,
                   from <command-line>:32:
  usr/include/asm-generic/msgbuf.h:25:20: error: field `msg_perm' has incomplete type
    struct ipc64_perm msg_perm;
                      ^~~~~~~~
  usr/include/asm-generic/msgbuf.h:27:2: error: unknown type name `__kernel_time_t'
    __kernel_time_t msg_stime; /* last msgsnd time */
    ^~~~~~~~~~~~~~~
  usr/include/asm-generic/msgbuf.h:28:2: error: unknown type name `__kernel_time_t'
    __kernel_time_t msg_rtime; /* last msgrcv time */
    ^~~~~~~~~~~~~~~
  usr/include/asm-generic/msgbuf.h:29:2: error: unknown type name `__kernel_time_t'
    __kernel_time_t msg_ctime; /* last change time */
    ^~~~~~~~~~~~~~~
  usr/include/asm-generic/msgbuf.h:41:2: error: unknown type name `__kernel_pid_t'
    __kernel_pid_t msg_lspid; /* pid of last msgsnd */
    ^~~~~~~~~~~~~~
  usr/include/asm-generic/msgbuf.h:42:2: error: unknown type name `__kernel_pid_t'
    __kernel_pid_t msg_lrpid; /* last receive pid */
    ^~~~~~~~~~~~~~

It is just a matter of missing include directive.

Include <asm/ipcbuf.h> to make it self-contained, and add it to
the compile-test coverage.

Link: http://lkml.kernel.org/r/20191030063855.9989-2-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-04 19:44:14 -08:00
Linus Torvalds
1daa56bcfd IOMMU Updates for Linux v5.5
Including:
 
 	- Conversion of the AMD IOMMU driver to use the dma-iommu code
 	  for imlementing the DMA-API. This gets rid of quite some code
 	  in the driver itself, but also has some potential for
 	  regressions (non are known at the moment).
 
 	- Support for the Qualcomm SMMUv2 implementation in the SDM845
 	  SoC.  This also includes some firmware interface changes, but
 	  those are acked by the respective maintainers.
 
 	- Preparatory work to support two distinct page-tables per
 	  domain in the ARM-SMMU driver
 
 	- Power management improvements for the ARM SMMUv2
 
 	- Custom PASID allocator support
 
 	- Multiple PCI DMA alias support for the AMD IOMMU driver
 
 	- Adaption of the Mediatek driver to the changed IO/TLB flush
 	  interface of the IOMMU core code.
 
 	- Preparatory patches for the Renesas IOMMU driver to support
 	  future hardware.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl3hCC4ACgkQK/BELZcB
 GuN9QQ/+LFh2TdhtiemIakA/19nv1FTP719uje7vjX4gGBGD++NEzW7mBcAXSEnD
 rBta1GsD6N8h0fdT53Nw8cezQ1ldBomKG3j+mzcju7TcuRwebhCEQaxh2iWy+I6g
 cp6HxTu3G0E6Zy7wd+MWyJzvXa7MXV2p8iCDs7Dp8yEow+c55b4LAIoeRWx3rjsT
 rat29MuJ8TGLP6vOYHcpI+REGfda4rsog75980RIoOEuqRjMG6JPj9clPeakSNtQ
 Rl1EtgrDskbRCgDSujbzDMHAYRUKvdCuTuTM1De/GQO+GWYsOtzqBHkct67sGn9I
 H518Be9m4xfYyyktVM6K9bSpxzCOtor+u6LFOejufJN/7vL2qtePZX7EHL/ks8zh
 Mn80H/1ch1UcFcF9p7V7QCMUSyaiX/VWhgwWIdPf3CGrKVaLnQ8mkB82Zf0VNuQT
 OzcfJcVF+skhDkXdFL5xUkQtqqTHhpaK2CzvvTDAsR1KXMCc6mH/MT/9m+mOFQFK
 P+klgGdU5rVniru10k4pamT5LlLubRV0NBpaAiGr2R3dfyYyiS/D9FBSLanqO+JM
 AgSnmOSbl7y927DxufkVPH8M7TxSdtQVo7VoQFjSWE8B9bh4qU6MVV30enLvY0Fj
 g4DP+8srOvY0vNsWNiBe2JpldABGEAbumFt78g1WV2tFi1d/NUI=
 =ntaE
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Conversion of the AMD IOMMU driver to use the dma-iommu code for
   imlementing the DMA-API. This gets rid of quite some code in the
   driver itself, but also has some potential for regressions (non are
   known at the moment).

 - Support for the Qualcomm SMMUv2 implementation in the SDM845 SoC.
   This also includes some firmware interface changes, but those are
   acked by the respective maintainers.

 - Preparatory work to support two distinct page-tables per domain in
   the ARM-SMMU driver

 - Power management improvements for the ARM SMMUv2

 - Custom PASID allocator support

 - Multiple PCI DMA alias support for the AMD IOMMU driver

 - Adaption of the Mediatek driver to the changed IO/TLB flush interface
   of the IOMMU core code.

 - Preparatory patches for the Renesas IOMMU driver to support future
   hardware.

* tag 'iommu-updates-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (62 commits)
  iommu/rockchip: Don't provoke WARN for harmless IRQs
  iommu/vt-d: Turn off translations at shutdown
  iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved
  iommu/arm-smmu: Remove duplicate error message
  iommu/arm-smmu-v3: Don't display an error when IRQ lines are missing
  iommu/ipmmu-vmsa: Add utlb_offset_base
  iommu/ipmmu-vmsa: Add helper functions for "uTLB" registers
  iommu/ipmmu-vmsa: Calculate context registers' offset instead of a macro
  iommu/ipmmu-vmsa: Add helper functions for MMU "context" registers
  iommu/ipmmu-vmsa: tidyup register definitions
  iommu/ipmmu-vmsa: Remove all unused register definitions
  iommu/mediatek: Reduce the tlb flush timeout value
  iommu/mediatek: Get rid of the pgtlock
  iommu/mediatek: Move the tlb_sync into tlb_flush
  iommu/mediatek: Delete the leaf in the tlb_flush
  iommu/mediatek: Use gather to achieve the tlb range flush
  iommu/mediatek: Add a new tlb_lock for tlb_flush
  iommu/mediatek: Correct the flush_iotlb_all callback
  iommu/io-pgtable-arm: Rename IOMMU_QCOM_SYS_CACHE and improve doc
  iommu/io-pgtable-arm: Rationalise MAIR handling
  ...
2019-12-02 11:05:00 -08:00
Linus Torvalds
e5b3fc125d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Various fixes:

   - Fix the PAT performance regression that downgraded write-combining
     device memory regions to uncached.

   - There's been a number of bugs in 32-bit double fault handling -
     hopefully all fixed now.

   - Fix an LDT crash

   - Fix an FPU over-optimization that broke with GCC9 code
     optimizations.

   - Misc cleanups"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/pat: Fix off-by-one bugs in interval tree search
  x86/ioperm: Save an indentation level in tss_update_io_bitmap()
  x86/fpu: Don't cache access to fpu_fpregs_owner_ctx
  x86/entry/32: Remove unused 'restore_all_notrace' local label
  x86/ptrace: Document FSBASE and GSBASE ABI oddities
  x86/ptrace: Remove set_segment_reg() implementations for current
  x86/traps: die() instead of panicking on a double fault
  x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit
  x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area
  x86/doublefault/32: Rename doublefault.c to doublefault_32.c
  x86/traps: Disentangle the 32-bit and 64-bit doublefault code
  lkdtm: Add a DOUBLE_FAULT crash type on x86
  selftests/x86/single_step_syscall: Check SYSENTER directly
  x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()
2019-12-01 19:05:07 -08:00
Linus Torvalds
b7fcf31f70 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:

 - Make /sys/devices/cpu/rdpmc based RDPMC enforcement more
   instantaneous

 - decoder: Update the Intel opcode map

 - Various tooling fixes, including a few late optimizations and
   cleanups.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  perf script: Fix invalid LBR/binary mismatch error
  perf script: Fix brstackinsn for AUXTRACE
  perf affinity: Add infrastructure to save/restore affinity
  perf pmu: Use file system cache to optimize sysfs access
  perf regs: Make perf_reg_name() return "unknown" instead of NULL
  perf diff: Use llabs() with 64-bit values
  perf diff: Use llabs() with 64-bit values
  perf/x86: Implement immediate enforcement of /sys/devices/cpu/rdpmc value of 0
  perf tools: Allow to link with libbpf dynamicaly
  perf tests: Rename tests/map_groups.c to tests/maps.c
  perf tests: Rename thread-mg-share to thread-maps-share
  perf maps: Rename map_groups.h to maps.h
  perf maps: Rename 'mg' variables to 'maps'
  perf map_symbol: Rename ms->mg to ms->maps
  perf addr_location: Rename al->mg to al->maps
  perf thread: Rename thread->mg to thread->maps
  perf maps: Merge 'struct maps' with 'struct map_groups'
  x86/insn: perf tools: Add some more instructions to the new instructions test
  x86/insn: Add some more Intel instructions to the opcode map
  perf map: Remove unused functions
  ...
2019-12-01 18:49:57 -08:00
Linus Torvalds
ceb3074745 y2038: syscall implementation cleanups
This is a series of cleanups for the y2038 work, mostly intended
 for namespace cleaning: the kernel defines the traditional
 time_t, timeval and timespec types that often lead to y2038-unsafe
 code. Even though the unsafe usage is mostly gone from the kernel,
 having the types and associated functions around means that we
 can still grow new users, and that we may be missing conversions
 to safe types that actually matter.
 
 There are still a number of driver specific patches needed to
 get the last users of these types removed, those have been
 submitted to the respective maintainers.
 
 Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJd3D+wAAoJEJpsee/mABjZfdcQAJvl6e+4ddKoDMIVJqVCE25N
 meFRgA7S8jy6BefEVeUgI8TxK+amGO36szMBUEnZxSSxq9u+gd13m5bEK6Xq/ov7
 4KTAiA3Irm/W5FBTktu1zc5ROIra1Xj7jLdubf8wEC3viSXIXB3+68Y28iBN7D2O
 k9kSpwINC5lWeC8guZy2I+2yc4ywUEXao9nVh8C/J+FQtU02TcdLtZop9OhpAa8u
 U19VVH3WHkQI7ZfLvBTUiYK6tlYTiYCnpr8l6sm850CnVv1fzBW+DzmVhPJ6FdFd
 4m5staC0sQ6gVqtjVMBOtT5CdzREse6hpwbKo2GRWFroO5W9tljMOJJXHvv/f6kz
 DxrpUmj37JuRbqAbr8KDmQqPo6M2CRkxFxjol1yh5ER63u1xMwLm/PQITZIMDvPO
 jrFc2C2SdM2E9bKP/RMCVoKSoRwxCJ5IwJ2AF237rrU0sx/zB2xsrOGssx5CWEgc
 3bbk6tDQujJJubnCfgRy1tTxpLZOHEEKw8YhFLLbR2LCtA9pA/0rfLLad16cjA5e
 5jIHxfsFc23zgpzrJeB7kAF/9xgu1tlA5BotOs3VBE89LtWOA9nK5dbPXng6qlUe
 er3xLCfS38ovhUw6DusQpaYLuaYuLM7DKO4iav9kuTMcY9GkbPk7vDD3KPGh2goy
 hY5cSM8+kT1q/THLnUBH
 =Bdbv
 -----END PGP SIGNATURE-----

Merge tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull y2038 cleanups from Arnd Bergmann:
 "y2038 syscall implementation cleanups

  This is a series of cleanups for the y2038 work, mostly intended for
  namespace cleaning: the kernel defines the traditional time_t, timeval
  and timespec types that often lead to y2038-unsafe code. Even though
  the unsafe usage is mostly gone from the kernel, having the types and
  associated functions around means that we can still grow new users,
  and that we may be missing conversions to safe types that actually
  matter.

  There are still a number of driver specific patches needed to get the
  last users of these types removed, those have been submitted to the
  respective maintainers"

Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/

* tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
  y2038: alarm: fix half-second cut-off
  y2038: ipc: fix x32 ABI breakage
  y2038: fix typo in powerpc vdso "LOPART"
  y2038: allow disabling time32 system calls
  y2038: itimer: change implementation to timespec64
  y2038: move itimer reset into itimer.c
  y2038: use compat_{get,set}_itimer on alpha
  y2038: itimer: compat handling to itimer.c
  y2038: time: avoid timespec usage in settimeofday()
  y2038: timerfd: Use timespec64 internally
  y2038: elfcore: Use __kernel_old_timeval for process times
  y2038: make ns_to_compat_timeval use __kernel_old_timeval
  y2038: socket: use __kernel_old_timespec instead of timespec
  y2038: socket: remove timespec reference in timestamping
  y2038: syscalls: change remaining timeval to __kernel_old_timeval
  y2038: rusage: use __kernel_old_timeval
  y2038: uapi: change __kernel_time_t to __kernel_old_time_t
  y2038: stat: avoid 'time_t' in 'struct stat'
  y2038: ipc: remove __kernel_time_t reference from headers
  y2038: vdso: powerpc: avoid timespec references
  ...
2019-12-01 14:00:59 -08:00
Linus Torvalds
0dd0c8f7db - Support for new VMBus protocols (Andrea Parri).
- Hibernation support (Dexuan Cui).
 - Latency testing framework (Branden Bonaby).
 - Decoupling Hyper-V page size from guest page size (Himadri Pandya).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAl3f5YIACgkQ3qZv95d3
 LNzBww/8Cpv/BnOs2cp56OhC+2++3YlWfmxGnvQb9h52weElgr1AZF33lAynp8BZ
 YssOcDnS/G2iAkNDffbQA7s3WTwIjP1weJibOeKbtcXp4SuhNR3gnJafufNddNDv
 bw8ZReLQV7hy3sHb3OUx0aJk5Mssp0N9ZpxRilyIpLELPfVp63gFebq6s1MQYljk
 BAiNO4SKqsGQGZApt2F4Cc3hX2wU2ZfiDm6SifXiLYITGnvilIn7XFIht+2jJBWS
 CdzRoGXcwhQhlj68XWlc89SOzJb7vVUMO1sr84psfbQ2LbhJU8lfJKRJ4b4lR07Z
 Uv5FYxjr14S65fv7DkzCfWU+uPN/sObG4pPXihlfqcTraOvYLQ6/x8cw+9tGZg4H
 aTtnF40hnO81aKsvPAeIsSzVkoyPaSrt7KKhk+Bw/5EUDTTNp6EbIuL4xwnKt6Rt
 2UpA5HM9guQqNb6OZrjlpZfJgd9bNP4CZLBTfOukmnZpONKr2Wv3wubcwQJ8ibQc
 1WZ5SfN2Wmg999Ski7j9qzHk0tWJxa6SX+2NLEHRKxy2nJSJ1zlAr//bznMyMgH/
 yKPDaSkOFoy0aqiTKV2WzuOY6FGXTrSo5vq8YAgYRgp3xB+5+7zLeqlj3ipXhLYE
 HH/eqB27eSnvi0jpub4TbszGJG0o4Z1aYx3aHYYqrOfWX/A5Vls=
 =oJGE
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V updates from Sasha Levin:

 - support for new VMBus protocols (Andrea Parri)

 - hibernation support (Dexuan Cui)

 - latency testing framework (Branden Bonaby)

 - decoupling Hyper-V page size from guest page size (Himadri Pandya)

* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
  Drivers: hv: vmbus: Fix crash handler reset of Hyper-V synic
  drivers/hv: Replace binary semaphore with mutex
  drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86
  HID: hyperv: Add the support of hibernation
  hv_balloon: Add the support of hibernation
  x86/hyperv: Implement hv_is_hibernation_supported()
  Drivers: hv: balloon: Remove dependencies on guest page size
  Drivers: hv: vmbus: Remove dependencies on guest page size
  x86: hv: Add function to allocate zeroed page for Hyper-V
  Drivers: hv: util: Specify ring buffer size using Hyper-V page size
  Drivers: hv: Specify receive buffer size using Hyper-V page size
  tools: hv: add vmbus testing tool
  drivers: hv: vmbus: Introduce latency testing
  video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
  video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
  hv_netvsc: Add the support of hibernation
  hv_sock: Add the support of hibernation
  video: hyperv_fb: Add the support of hibernation
  scsi: storvsc: Add the support of hibernation
  Drivers: hv: vmbus: Add module parameter to cap the VMBus version
  ...
2019-11-30 14:50:51 -08:00
Linus Torvalds
81b6b96475 dma-mapping updates for 5.5-rc1
- improve dma-debug scalability (Eric Dumazet)
  - tiny dma-debug cleanup (Dan Carpenter)
  - check for vmap memory in dma_map_single (Kees Cook)
  - check for dma_addr_t overflows in dma-direct when using
    DMA offsets (Nicolas Saenz Julienne)
  - switch the x86 sta2x11 SOC to use more generic DMA code
    (Nicolas Saenz Julienne)
  - fix arm-nommu dma-ranges handling (Vladimir Murzin)
  - use __initdata in CMA (Shyam Saini)
  - replace the bus dma mask with a limit (Nicolas Saenz Julienne)
  - merge the remapping helpers into the main dma-direct flow (me)
  - switch xtensa to the generic dma remap handling (me)
  - various cleanups around dma_capable (me)
  - remove unused dev arguments to various dma-noncoherent helpers (me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl3f+eULHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPyPg/+PVHCrhmepudQQFHu6wfurE5U77iNnoUifvG+b5z5
 5mHmTMkQwyox6rKDe8NuFApAhz1VJDSUgSelPmvTSOIEIGXCvX1p+GqRSVS5YQON
 aLzGvbWKE8hCpaPdDHKYDauD1FZGMM8L2P5oOMF9X9fQ94xxRqfqJM6c8iD16Sgg
 +aOgPNzTnxQHJFF/Dbt/mjJrKXWI+XF+bgUbH+l9yKa7Dd7ibmJR8yl9hs1jmp0H
 1CZ+CizwnAs57rCd1a6Ybc6gj59tySc03NMnnbTko+KDxrcbD3Ee2tpqHVkkCjYz
 Yl0m4FIpbotrpokL/FIS727bVvkjbWgoeM+kiVPoYzmZea3pq/tFDr6tp/BxDhFj
 TZXSFfgQljlYMD3ppSoklFlfjGriVWV0tPO3arPXwuuMF5EX/IMQmvxei05jpc8n
 iELNXOP9iZZkY4tLHy2hn2uWrxBRrS1WQwlLg9hahlNRzyfFSyHeP0zWlVDt+RgF
 5CCbEI+HQcUqg1FApB30lQNWTn1+dJftrpKVBlgNBIyIa/z2rFbt8GdSnItxjfQX
 /XX8EZbFvF6AcXkgURkYFIoKM/EbYShOSLcYA3PTUtcuTnF6Kk5eimySiGWZTVCS
 prruSFDZJOvL3SnOIMIiYVmBdB7lEbDyLI/VYuhoECXEDCJpVmRktNkJNg4q6/E+
 fjQ=
 =e5wO
 -----END PGP SIGNATURE-----

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - improve dma-debug scalability (Eric Dumazet)

 - tiny dma-debug cleanup (Dan Carpenter)

 - check for vmap memory in dma_map_single (Kees Cook)

 - check for dma_addr_t overflows in dma-direct when using DMA offsets
   (Nicolas Saenz Julienne)

 - switch the x86 sta2x11 SOC to use more generic DMA code (Nicolas
   Saenz Julienne)

 - fix arm-nommu dma-ranges handling (Vladimir Murzin)

 - use __initdata in CMA (Shyam Saini)

 - replace the bus dma mask with a limit (Nicolas Saenz Julienne)

 - merge the remapping helpers into the main dma-direct flow (me)

 - switch xtensa to the generic dma remap handling (me)

 - various cleanups around dma_capable (me)

 - remove unused dev arguments to various dma-noncoherent helpers (me)

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux:

* tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping: (22 commits)
  dma-mapping: treat dev->bus_dma_mask as a DMA limit
  dma-direct: exclude dma_direct_map_resource from the min_low_pfn check
  dma-direct: don't check swiotlb=force in dma_direct_map_resource
  dma-debug: clean up put_hash_bucket()
  powerpc: remove support for NULL dev in __phys_to_dma / __dma_to_phys
  dma-direct: avoid a forward declaration for phys_to_dma
  dma-direct: unify the dma_capable definitions
  dma-mapping: drop the dev argument to arch_sync_dma_for_*
  x86/PCI: sta2x11: use default DMA address translation
  dma-direct: check for overflows on 32 bit DMA addresses
  dma-debug: increase HASH_SIZE
  dma-debug: reorder struct dma_debug_entry fields
  xtensa: use the generic uncached segment support
  dma-mapping: merge the generic remapping helpers into dma-direct
  dma-direct: provide mmap and get_sgtable method overrides
  dma-direct: remove the dma_handle argument to __dma_direct_alloc_pages
  dma-direct: remove __dma_direct_free_pages
  usb: core: Remove redundant vmap checks
  kernel: dma-contiguous: mark CMA parameters __initdata/__initconst
  dma-debug: add a schedule point in debug_dma_dump_mappings()
  ...
2019-11-28 11:16:43 -08:00
Linus Torvalds
a308a71022 generic ioremap support
- clean up various obsolete ioremap and iounmap variants
  - add a new generic ioremap implementation and switch csky, nds32 and
    riscv over to it
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl3cKcsLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYO1CRAAwFQigsbi0CqqshPWnP0owKV+HA4Xfz/lQZsd7SM/
 BVXhKyDJQum6gp73dW025HCfjidTknsbdCUIP/LNUgAnop3lOlnB31/munDnJJ1H
 6hB1pc+zB9VgbOe0A6TxtxPRm5aE33k1hZIZS99lOh7mY3FvF7mbkkbVoCjdS3Cq
 a9bTX+X+esfUQ5GgaIc2zmz2GLkyFXIeVGs8/CoOX58ESCWQcVZrsQRompo4SgrI
 jqwf47NzdmK8hW4mZ+jdQUiWiAmNs5+2om7Bvi/deFAIFUo1/hLHvQzqEGramq/j
 5SPHax2gWAN3uWYP91QISkUAJWFydwgmUDoTO1M04ov4xLuBrqIQmc43tLjHo2UT
 RwMozWJWN+gkB9zTIboqMPi2qcuDaWcCij7LwHl5zLxPTcOKsrALarL55BQ8MipQ
 x6fpvskrQQvlArNTsRWFRUq0mCtkzE3wMZ9RR3AIETQL2hlAzB1S4gzhD+Z6WTYY
 pXNgkunonVGxwyN/7iJTEl/mvF/+MynGcWqhrwHZLqncyhn/WJJ2USH3nAD1+yjp
 v8v6UUeMXIjUsGAyfTjXy/WXAfwRuSC038AAFcmWKDdh08h4XvPHRficT4U8wr34
 7WzGizHP9f1CqrhYL/4exhPY9X2Yb7HhsFd0bZGG0rRvSillPUp0b8s++m12QuQU
 +VY=
 =ooiA
 -----END PGP SIGNATURE-----

Merge tag 'ioremap-5.5' of git://git.infradead.org/users/hch/ioremap

Pull generic ioremap support from Christoph Hellwig:
 "This adds the remaining bits for an entirely generic ioremap and
  iounmap to lib/ioremap.c. To facilitate that, it cleans up the giant
  mess of weird ioremap variants we had with no users outside the arch
  code.

  For now just the three newest ports use the code, but there is more
  than a handful others that can be converted without too much work.

  Summary:

   - clean up various obsolete ioremap and iounmap variants

   - add a new generic ioremap implementation and switch csky, nds32 and
     riscv over to it"

* tag 'ioremap-5.5' of git://git.infradead.org/users/hch/ioremap: (21 commits)
  nds32: use generic ioremap
  csky: use generic ioremap
  csky: remove ioremap_cache
  riscv: use the generic ioremap code
  lib: provide a simple generic ioremap implementation
  sh: remove __iounmap
  nios2: remove __iounmap
  hexagon: remove __iounmap
  m68k: rename __iounmap and mark it static
  arch: rely on asm-generic/io.h for default ioremap_* definitions
  asm-generic: don't provide ioremap for CONFIG_MMU
  asm-generic: ioremap_uc should behave the same with and without MMU
  xtensa: clean up ioremap
  x86: Clean up ioremap()
  parisc: remove __ioremap
  nios2: remove __ioremap
  alpha: remove the unused __ioremap wrapper
  hexagon: clean up ioremap
  ia64: rename ioremap_nocache to ioremap_uc
  unicore32: remove ioremap_cached
  ...
2019-11-28 10:57:12 -08:00
Sebastian Andrzej Siewior
59c4bd853a x86/fpu: Don't cache access to fpu_fpregs_owner_ctx
The state/owner of the FPU is saved to fpu_fpregs_owner_ctx by pointing
to the context that is currently loaded. It never changed during the
lifetime of a task - it remained stable/constant.

After deferred FPU registers loading until return to userland was
implemented, the content of fpu_fpregs_owner_ctx may change during
preemption and must not be cached.

This went unnoticed for some time and was now noticed, in particular
since gcc 9 is caching that load in copy_fpstate_to_sigframe() and
reusing it in the retry loop:

  copy_fpstate_to_sigframe()
    load fpu_fpregs_owner_ctx and save on stack
    fpregs_lock()
    copy_fpregs_to_sigframe() /* failed */
    fpregs_unlock()
         *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***

    fault_in_pages_writeable() /* succeed, retry */

    fpregs_lock()
	__fpregs_load_activate()
	  fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
    copy_fpregs_to_sigframe() /* succeeds, random FPU content */

This is a comparison of the assembly produced by gcc 9, without vs with this
patch:

| # arch/x86/kernel/fpu/signal.c:173:      if (!access_ok(buf, size))
|        cmpq    %rdx, %rax      # tmp183, _4
|        jb      .L190   #,
|-# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|-#APP
|-# 512 "arch/x86/include/asm/fpu/internal.h" 1
|-       movq %gs:fpu_fpregs_owner_ctx,%rax      #, pfo_ret__
|-# 0 "" 2
|-#NO_APP
|-       movq    %rax, -88(%rbp) # pfo_ret__, %sfp
…
|-# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|-       movq    -88(%rbp), %rcx # %sfp, pfo_ret__
|-       cmpq    %rcx, -64(%rbp) # pfo_ret__, %sfp
|+# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|+#APP
|+# 512 "arch/x86/include/asm/fpu/internal.h" 1
|+       movq %gs:fpu_fpregs_owner_ctx(%rip),%rax        # fpu_fpregs_owner_ctx, pfo_ret__
|+# 0 "" 2
|+# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|+#NO_APP
|+       cmpq    %rax, -64(%rbp) # pfo_ret__, %sfp

Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
fpu_fpregs_owner_ctx during preemption points.

The Fixes: tag points to the commit where deferred FPU loading was
added. Since this commit, the compiler is no longer allowed to move the
load of fpu_fpregs_owner_ctx somewhere else / outside of the locked
section. A task preemption will change its value and stale content will
be observed.

 [ bp: Massage. ]

Debugged-by: Austin Clements <austin@google.com>
Debugged-by: David Chase <drchase@golang.org>
Debugged-by: Ian Lance Taylor <ian@airs.com>
Fixes: 5f409e20b7 ("x86/fpu: Defer FPU state load until return to userspace")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Austin Clements <austin@google.com>
Cc: Barret Rhoden <brho@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Chase <drchase@golang.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: ian@airs.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Bleecher Snyder <josharian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191128085306.hxfa2o3knqtu4wfn@linutronix.de
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205663
2019-11-28 10:16:46 +01:00
Linus Torvalds
95f1fa9e34 New tracing features:
- PERAMAENT flag to ftrace_ops when attaching a callback to a function
    As /proc/sys/kernel/ftrace_enabled when set to zero will disable all
    attached callbacks in ftrace, this has a detrimental impact on live
    kernel tracing, as it disables all that it patched. If a ftrace_ops
    is registered to ftrace with the PERMANENT flag set, it will prevent
    ftrace_enabled from being disabled, and if ftrace_enabled is already
    disabled, it will prevent a ftrace_ops with PREMANENT flag set from
    being registered.
 
  - New register_ftrace_direct(). As eBPF would like to register its own
    trampolines to be called by the ftrace nop locations directly,
    without going through the ftrace trampoline, this function has been
    added. This allows for eBPF trampolines to live along side of
    ftrace, perf, kprobe and live patching. It also utilizes the ftrace
    enabled_functions file that keeps track of functions that have been
    modified in the kernel, to allow for security auditing.
 
  - Allow for kernel internal use of ftrace instances. Subsystems in
    the kernel can now create and destroy their own tracing instances
    which allows them to have their own tracing buffer, and be able
    to record events without worrying about other users from writing over
    their data.
 
  - New seq_buf_hex_dump() that lets users use the hex_dump() in their
    seq_buf usage.
 
  - Notifications now added to tracing_max_latency to allow user space
    to know when a new max latency is hit by one of the latency tracers.
 
  - Wider spread use of generic compare operations for use of bsearch and
    friends.
 
  - More synthetic event fields may be defined (32 up from 16)
 
  - Use of xarray for architectures with sparse system calls, for the
    system call trace events.
 
 This along with small clean ups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXdwv4BQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnB5AP91vsdHQjwE1+/UWG/cO+qFtKvn2QJK
 QmBRIJNH/s+1TAD/fAOhgw+ojSK3o/qc+NpvPTEW9AEwcJL1wacJUn+XbQc=
 =ztql
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "New tracing features:

   - New PERMANENT flag to ftrace_ops when attaching a callback to a
     function.

     As /proc/sys/kernel/ftrace_enabled when set to zero will disable
     all attached callbacks in ftrace, this has a detrimental impact on
     live kernel tracing, as it disables all that it patched. If a
     ftrace_ops is registered to ftrace with the PERMANENT flag set, it
     will prevent ftrace_enabled from being disabled, and if
     ftrace_enabled is already disabled, it will prevent a ftrace_ops
     with PREMANENT flag set from being registered.

   - New register_ftrace_direct().

     As eBPF would like to register its own trampolines to be called by
     the ftrace nop locations directly, without going through the ftrace
     trampoline, this function has been added. This allows for eBPF
     trampolines to live along side of ftrace, perf, kprobe and live
     patching. It also utilizes the ftrace enabled_functions file that
     keeps track of functions that have been modified in the kernel, to
     allow for security auditing.

   - Allow for kernel internal use of ftrace instances.

     Subsystems in the kernel can now create and destroy their own
     tracing instances which allows them to have their own tracing
     buffer, and be able to record events without worrying about other
     users from writing over their data.

   - New seq_buf_hex_dump() that lets users use the hex_dump() in their
     seq_buf usage.

   - Notifications now added to tracing_max_latency to allow user space
     to know when a new max latency is hit by one of the latency
     tracers.

   - Wider spread use of generic compare operations for use of bsearch
     and friends.

   - More synthetic event fields may be defined (32 up from 16)

   - Use of xarray for architectures with sparse system calls, for the
     system call trace events.

  This along with small clean ups and fixes"

* tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (51 commits)
  tracing: Enable syscall optimization for MIPS
  tracing: Use xarray for syscall trace events
  tracing: Sample module to demonstrate kernel access to Ftrace instances.
  tracing: Adding new functions for kernel access to Ftrace instances
  tracing: Fix Kconfig indentation
  ring-buffer: Fix typos in function ring_buffer_producer
  ftrace: Use BIT() macro
  ftrace: Return ENOTSUPP when DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not configured
  ftrace: Rename ftrace_graph_stub to ftrace_stub_graph
  ftrace: Add a helper function to modify_ftrace_direct() to allow arch optimization
  ftrace: Add helper find_direct_entry() to consolidate code
  ftrace: Add another check for match in register_ftrace_direct()
  ftrace: Fix accounting bug with direct->count in register_ftrace_direct()
  ftrace/selftests: Fix spelling mistake "wakeing" -> "waking"
  tracing: Increase SYNTH_FIELDS_MAX for synthetic_events
  ftrace/samples: Add a sample module that implements modify_ftrace_direct()
  ftrace: Add modify_ftrace_direct()
  tracing: Add missing "inline" in stub function of latency_fsnotify()
  tracing: Remove stray tab in TRACE_EVAL_MAP_FILE's help text
  tracing: Use seq_buf_hex_dump() to dump buffers
  ...
2019-11-27 11:42:01 -08:00
Anthony Steinhauser
405b45376d perf/x86: Implement immediate enforcement of /sys/devices/cpu/rdpmc value of 0
When you successfully write 0 to /sys/devices/cpu/rdpmc, the RDPMC
instruction should be disabled unconditionally and immediately (after you
close the SYSFS file) by the documentation.

Instead, in the current implementation the PMU must be reloaded which
happens only eventually some time in the future. Only after that the RDPMC
instruction becomes disabled (on ring 3) on the respective core.

This change makes the treatment of the 0 value as blocking and as
unconditional as the current treatment of the 2 value, only the CR4.PCE
bit is naturally set to false instead of true.

Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Link: https://lkml.kernel.org/r/20191125054838.137615-1-asteinhauser@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 10:32:11 +01:00
Peter Zijlstra
5c02ece818 x86/kprobes: Fix ordering while text-patching
Kprobes does something like:

register:
	arch_arm_kprobe()
	  text_poke(INT3)
          /* guarantees nothing, INT3 will become visible at some point, maybe */

        kprobe_optimizer()
	  /* guarantees the bytes after INT3 are unused */
	  synchronize_rcu_tasks();
	  text_poke_bp(JMP32);
	  /* implies IPI-sync, kprobe really is enabled */

unregister:
	__disarm_kprobe()
	  unoptimize_kprobe()
	    text_poke_bp(INT3 + tail);
	    /* implies IPI-sync, so tail is guaranteed visible */
          arch_disarm_kprobe()
            text_poke(old);
	    /* guarantees nothing, old will maybe become visible */

	synchronize_rcu()

        free-stuff

Now the problem is that on register, the synchronize_rcu_tasks() does
not imply sufficient to guarantee all CPUs have already observed INT3
(although in practice this is exceedingly unlikely not to have
happened) (similar to how MEMBARRIER_CMD_PRIVATE_EXPEDITED does not
imply MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE).

Worse, even if it did, we'd have to do 2 synchronize calls to provide
the guarantee we're looking for, the first to ensure INT3 is visible,
the second to guarantee nobody is then still using the instruction
bytes after INT3.

Similar on unregister; the synchronize_rcu() between
__unregister_kprobe_top() and __unregister_kprobe_bottom() does not
guarantee all CPUs are free of the INT3 (and observe the old text).

Therefore, sprinkle some IPI-sync love around. This guarantees that
all CPUs agree on the text and RCU once again provides the required
guaranteed.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132458.162172862@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:24 +01:00
Peter Zijlstra
ab09e95ca0 x86/kprobes: Convert to text-patching.h
Convert kprobes to the new text-poke naming.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132458.103959370@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:24 +01:00
Peter Zijlstra
67c1d4a280 x86/ftrace: Use text_gen_insn()
Replace the ftrace_code_union with the generic text_gen_insn() helper,
which does exactly this.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132457.932808000@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:24 +01:00
Peter Zijlstra
254d2c0451 x86/alternative: Add text_opcode_size()
Introduce a common helper to map *_INSN_OPCODE to *_INSN_SIZE.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132457.875666061@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:24 +01:00
Peter Zijlstra
c12af4407f x86/mm: Remove set_kernel_text_r[ow]()
With the last and only user of these functions gone (ftrace) remove
them as well to avoid ever growing new users.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132457.819095320@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:24 +01:00
Peter Zijlstra
768ae4406a x86/ftrace: Use text_poke()
Move ftrace over to using the generic x86 text_poke functions; this
avoids having a second/different copy of that code around.

This also avoids ftrace violating the (new) W^X rule and avoids
fragmenting the kernel text page-tables, due to no longer having to
toggle them RW.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132457.761255803@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:24 +01:00
Peter Zijlstra
63f62addb8 x86/alternatives: Add and use text_gen_insn() helper
Provide a simple helper function to create common instruction
encodings.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132457.703538332@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:24 +01:00
Peter Zijlstra
18cbc8bed0 x86/alternatives, jump_label: Provide better text_poke() batching interface
Adding another text_poke_bp_batch() user made me realize the interface
is all sorts of wrong. The text poke vector should be internal to the
implementation.

This then results in a trivial interface:

  text_poke_queue()  - which has the 'normal' text_poke_bp() interface
  text_poke_finish() - which takes no arguments and flushes any
                       pending text_poke()s.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132457.646280715@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:24 +01:00
Peter Zijlstra
8f4a4160c6 x86/alternatives: Update int3_emulate_push() comment
Update the comment now that we've merged x86_32 support.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132457.588386013@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:24 +01:00
Linus Torvalds
6e9f879684 ACPI updates for 5.5-rc1
- Update the ACPICA code in the kernel to upstream revision 20191018
    including:
 
    * Fixes for Clang warnings (Bob Moore).
 
    * Fix for possible overflow in get_tick_count() (Bob Moore).
 
    * Introduction of acpi_unload_table() (Bob Moore).
 
    * Debugger and utilities updates (Erik Schmauss).
 
    * Fix for unloading tables loaded via configfs (Nikolaus Voss).
 
  - Add support for EFI specific purpose memory to optionally allow
    either application-exclusive or core-kernel-mm managed access to
    differentiated memory (Dan Williams).
 
  - Fix and clean up processing of the HMAT table (Brice Goglin,
    Qian Cai, Tao Xu).
 
  - Update the ACPI EC driver to make it work on systems with
    hardware-reduced ACPI (Daniel Drake).
 
  - Always build in support for the Generic Event Device (GED) to
    allow one kernel binary to work both on systems with full
    hardware ACPI and hardware-reduced ACPI (Arjan van de Ven).
 
  - Fix the table unload mechanism to unregister platform devices
    created when the given table was loaded (Andy Shevchenko).
 
  - Rework the lid blacklist handling in the button driver and add
    more lid quirks to it (Hans de Goede).
 
  - Improve ACPI-based device enumeration for some platforms based
    on Intel BayTrail SoCs (Hans de Goede).
 
  - Add an OpRegion driver for the Cherry Trail Crystal Cove PMIC
    and prevent handlers from being registered for unhandled PMIC
    OpRegions (Hans de Goede).
 
  - Unify ACPI _HID/_UID matching (Andy Shevchenko).
 
  - Clean up documentation and comments (Cao jin, James Pack, Kacper
    Piwiński).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl3dHNkSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx/NkP/2y6DWjslA6UW4gjZwaRBcjYoyWExMtQ
 Z86goiRJtP+/NqOwm09wHFcV6FdZ4kitUno3UgMCDZJjrURapg1D0rxb1lSYtMzs
 mGr2FBZlVsJ9erOVSzKj1x2afVhdgl0Rl0fxPzoKgCFt8tCJar6cXy4CVEQKdeLs
 eUui2ksXMIEODGhpN/tr/fJqY4O4jlLmPY6gKWfFpSTsv6lnZmzcCxLf5EvUU7JW
 O91/jXdWz4Vl6IdP32sce6dGDjkvwnY105c7HeBf5EQWUe9RHFuSex982qhCD8U+
 iE+JzlhoYpUb03EktJSXbL++IKUHvoUpTanbhka6unMhazC86x0hDf7ruUtYo2Bk
 V8347CFeQ1x2O5IabfJNnUfKaMYhYmOXIoFHJTLKFO5mcCJmP8KOOyDAYilC1psb
 RJpl1fDoAhk7NqhMttyBqfxiotP0kMoKuqtAAl8Y0hTF0DwR9IfKntuTtp1yTGds
 R4dpJrizUDzw1/o4fCWbc3dFZQR3NFGpL/EAyfPzqjGaeaBBkLoNYstqkal5XHwT
 CILmQg2WHoNuQLXZ4NFFDrM2k2G+VUAjQdkYcb/MCOFbw+aTVPu1wyQq37RLtbMo
 9UwGeeT6SXW3iA1nyMoM+YvitjmxS7gHPPPl+b9G6kBubAzBPp91Ra0Mj9dPIGRB
 Evv5nzOIh8Hi
 =7Cqr
 -----END PGP SIGNATURE-----

Merge tag 'acpi-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "These update the ACPICA code in the kernel to upstream revision
  20191018, add support for EFI specific purpose memory, update the ACPI
  EC driver to make it work on systems with hardware-reduced ACPI,
  improve ACPI-based device enumeration for some platforms, rework the
  lid blacklist handling in the button driver and add more lid quirks to
  it, unify ACPI _HID/_UID matching, fix assorted issues and clean up
  the code and documentation.

  Specifics:

   - Update the ACPICA code in the kernel to upstream revision 20191018
     including:
      * Fixes for Clang warnings (Bob Moore)
      * Fix for possible overflow in get_tick_count() (Bob Moore)
      * Introduction of acpi_unload_table() (Bob Moore)
      * Debugger and utilities updates (Erik Schmauss)
      * Fix for unloading tables loaded via configfs (Nikolaus Voss)

   - Add support for EFI specific purpose memory to optionally allow
     either application-exclusive or core-kernel-mm managed access to
     differentiated memory (Dan Williams)

   - Fix and clean up processing of the HMAT table (Brice Goglin, Qian
     Cai, Tao Xu)

   - Update the ACPI EC driver to make it work on systems with
     hardware-reduced ACPI (Daniel Drake)

   - Always build in support for the Generic Event Device (GED) to allow
     one kernel binary to work both on systems with full hardware ACPI
     and hardware-reduced ACPI (Arjan van de Ven)

   - Fix the table unload mechanism to unregister platform devices
     created when the given table was loaded (Andy Shevchenko)

   - Rework the lid blacklist handling in the button driver and add more
     lid quirks to it (Hans de Goede)

   - Improve ACPI-based device enumeration for some platforms based on
     Intel BayTrail SoCs (Hans de Goede)

   - Add an OpRegion driver for the Cherry Trail Crystal Cove PMIC and
     prevent handlers from being registered for unhandled PMIC OpRegions
     (Hans de Goede)

   - Unify ACPI _HID/_UID matching (Andy Shevchenko)

   - Clean up documentation and comments (Cao jin, James Pack, Kacper
     Piwiński)"

* tag 'acpi-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
  ACPI: OSI: Shoot duplicate word
  ACPI: HMAT: use %u instead of %d to print u32 values
  ACPI: NUMA: HMAT: fix a section mismatch
  ACPI: HMAT: don't mix pxm and nid when setting memory target processor_pxm
  ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" device
  ACPI: NUMA: HMAT: Register HMAT at device_initcall level
  device-dax: Add a driver for "hmem" devices
  dax: Fix alloc_dax_region() compile warning
  lib: Uplevel the pmem "region" ida to a global allocator
  x86/efi: Add efi_fake_mem support for EFI_MEMORY_SP
  arm/efi: EFI soft reservation to memblock
  x86/efi: EFI soft reservation to E820 enumeration
  efi: Common enable/disable infrastructure for EFI soft reservation
  x86/efi: Push EFI_MEMMAP check into leaf routines
  efi: Enumerate EFI_MEMORY_SP
  ACPI: NUMA: Establish a new drivers/acpi/numa/ directory
  ACPICA: Update version to 20191018
  ACPICA: debugger: remove leading whitespaces when converting a string to a buffer
  ACPICA: acpiexec: initialize all simple types and field units from user input
  ACPICA: debugger: add field unit support for acpi_db_get_next_token
  ...
2019-11-26 19:25:25 -08:00
Linus Torvalds
c2da5bdc66 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 merge fix from Ingo Molnar:
 "I missed one other semantic conflict that can result in build failures
  on certain stripped down x86 32-bit configs, for example 32-bit
  'allnoconfig' where CONFIG_X86_IOPL_IOPERM gets turned off"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/iopl: Make 'struct tss_struct' constant size again
2019-11-26 17:12:12 -08:00
Linus Torvalds
168829ad09 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - A comprehensive rewrite of the robust/PI futex code's exit handling
     to fix various exit races. (Thomas Gleixner et al)

   - Rework the generic REFCOUNT_FULL implementation using
     atomic_fetch_* operations so that the performance impact of the
     cmpxchg() loops is mitigated for common refcount operations.

     With these performance improvements the generic implementation of
     refcount_t should be good enough for everybody - and this got
     confirmed by performance testing, so remove ARCH_HAS_REFCOUNT and
     REFCOUNT_FULL entirely, leaving the generic implementation enabled
     unconditionally. (Will Deacon)

   - Other misc changes, fixes, cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  lkdtm: Remove references to CONFIG_REFCOUNT_FULL
  locking/refcount: Remove unused 'refcount_error_report()' function
  locking/refcount: Consolidate implementations of refcount_t
  locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
  locking/refcount: Move saturation warnings out of line
  locking/refcount: Improve performance of generic REFCOUNT_FULL code
  locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the <linux/refcount.h> header
  locking/refcount: Remove unused refcount_*_checked() variants
  locking/refcount: Ensure integer operands are treated as signed
  locking/refcount: Define constants for saturation and max refcount values
  futex: Prevent exit livelock
  futex: Provide distinct return value when owner is exiting
  futex: Add mutex around futex exit
  futex: Provide state handling for exec() as well
  futex: Sanitize exit state handling
  futex: Mark the begin of futex exit explicitly
  futex: Set task::futex_state to DEAD right after handling futex exit
  futex: Split futex_mm_release() for exit/exec
  exit/exec: Seperate mm_release()
  futex: Replace PF_EXITPIDONE with a state
  ...
2019-11-26 16:02:40 -08:00
Linus Torvalds
3f59dbcace Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main kernel side changes in this cycle were:

   - Various Intel-PT updates and optimizations (Alexander Shishkin)

   - Prohibit kprobes on Xen/KVM emulate prefixes (Masami Hiramatsu)

   - Add support for LSM and SELinux checks to control access to the
     perf syscall (Joel Fernandes)

   - Misc other changes, optimizations, fixes and cleanups - see the
     shortlog for details.

  There were numerous tooling changes as well - 254 non-merge commits.
  Here are the main changes - too many to list in detail:

   - Enhancements to core tooling infrastructure, perf.data, libperf,
     libtraceevent, event parsing, vendor events, Intel PT, callchains,
     BPF support and instruction decoding.

   - There were updates to the following tools:

        perf annotate
        perf diff
        perf inject
        perf kvm
        perf list
        perf maps
        perf parse
        perf probe
        perf record
        perf report
        perf script
        perf stat
        perf test
        perf trace

   - And a lot of other changes: please see the shortlog and Git log for
     more details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (279 commits)
  perf parse: Fix potential memory leak when handling tracepoint errors
  perf probe: Fix spelling mistake "addrees" -> "address"
  libtraceevent: Fix memory leakage in copy_filter_type
  libtraceevent: Fix header installation
  perf intel-bts: Does not support AUX area sampling
  perf intel-pt: Add support for decoding AUX area samples
  perf intel-pt: Add support for recording AUX area samples
  perf pmu: When using default config, record which bits of config were changed by the user
  perf auxtrace: Add support for queuing AUX area samples
  perf session: Add facility to peek at all events
  perf auxtrace: Add support for dumping AUX area samples
  perf inject: Cut AUX area samples
  perf record: Add aux-sample-size config term
  perf record: Add support for AUX area sampling
  perf auxtrace: Add support for AUX area sample recording
  perf auxtrace: Move perf_evsel__find_pmu()
  perf record: Add a function to test for kernel support for AUX area sampling
  perf tools: Add kernel AUX area sampling definitions
  perf/core: Make the mlock accounting simple again
  perf report: Jump to symbol source view from total cycles view
  ...
2019-11-26 15:04:47 -08:00
Andy Lutomirski
7d8d8cfdee x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit
The old x86_32 doublefault_fn() was old and crufty, and it did not
even try to recover.  do_double_fault() is much nicer.  Rewrite the
32-bit double fault code to sanitize CPU state and call
do_double_fault().  This is mostly an exercise i386 archaeology.

With this patch applied, 32-bit double faults get a real stack trace,
just like 64-bit double faults.

[ mingo: merged the patch to a later kernel base. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26 22:00:04 +01:00
Andy Lutomirski
dc4e0021b0 x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area
There are three problems with the current layout of the doublefault
stack and TSS.  First, the TSS is only cacheline-aligned, which is
not enough -- if the hardware portion of the TSS (struct x86_hw_tss)
crosses a page boundary, horrible things happen [0].  Second, the
stack and TSS are global, so simultaneous double faults on different
CPUs will cause massive corruption.  Third, the whole mechanism
won't work if user CR3 is loaded, resulting in a triple fault [1].

Let the doublefault stack and TSS share a page (which prevents the
TSS from spanning a page boundary), make it percpu, and move it into
cpu_entry_area.  Teach the stack dump code about the doublefault
stack.

[0] Real hardware will read past the end of the page onto the next
    *physical* page if a task switch happens.  Virtual machines may
    have any number of bugs, and I would consider it reasonable for
    a VM to summarily kill the guest if it tries to task-switch to
    a page-spanning TSS.

[1] Real hardware triple faults.  At least some VMs seem to hang.
    I'm not sure what's going on.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26 21:53:34 +01:00
Andy Lutomirski
93efbde2c3 x86/traps: Disentangle the 32-bit and 64-bit doublefault code
The 64-bit doublefault handler is much nicer than the 32-bit one.
As a first step toward unifying them, make the 64-bit handler
self-contained.  This should have no effect no functional effect
except in the odd case of x86_64 with CONFIG_DOUBLEFAULT=n in which
case it will change the logging a bit.

This also gets rid of CONFIG_DOUBLEFAULT configurability on 64-bit
kernels.  It didn't do anything useful -- CONFIG_DOUBLEFAULT=n
didn't actually disable doublefault handling on x86_64.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26 21:53:34 +01:00
Ingo Molnar
0bcd776272 x86/iopl: Make 'struct tss_struct' constant size again
After the following commit:

  05b042a194: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise")

'struct cpu_entry_area' has to be Kconfig invariant, so that we always
have a matching CPU_ENTRY_AREA_PAGES size.

This commit added a CONFIG_X86_IOPL_IOPERM dependency to tss_struct:

  111e7b15cf: ("x86/ioperm: Extend IOPL config to control ioperm() as well")

Which, if CONFIG_X86_IOPL_IOPERM is turned off, reduces the size of
cpu_entry_area by two pages, triggering the assert:

  ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_202’ declared with attribute error: BUILD_BUG_ON failed: (CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE

Simplify the Kconfig dependencies and make cpu_entry_area constant
size on 32-bit kernels again.

Fixes: 05b042a194: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise")
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26 21:49:04 +01:00
Linus Torvalds
ab851d49f6 Merge branch 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 iopl updates from Ingo Molnar:
 "This implements a nice simplification of the iopl and ioperm code that
  Thomas Gleixner discovered: we can implement the IO privilege features
  of the iopl system call by using the IO permission bitmap in
  permissive mode, while trapping CLI/STI/POPF/PUSHF uses in user-space
  if they change the interrupt flag.

  This implements that feature, with testing facilities and related
  cleanups"

[ "Simplification" may be an over-statement. The main goal is to avoid
  the cli/sti of iopl by effectively implementing the IO port access
  parts of iopl in terms of ioperm.

  This may end up not workign well in case people actually depend on
  cli/sti being available, or if there are mixed uses of iopl and
  ioperm. We will see..       - Linus ]

* 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  x86/ioperm: Fix use of deprecated config option
  x86/entry/32: Clarify register saving in __switch_to_asm()
  selftests/x86/iopl: Extend test to cover IOPL emulation
  x86/ioperm: Extend IOPL config to control ioperm() as well
  x86/iopl: Remove legacy IOPL option
  x86/iopl: Restrict iopl() permission scope
  x86/iopl: Fixup misleading comment
  selftests/x86/ioperm: Extend testing so the shared bitmap is exercised
  x86/ioperm: Share I/O bitmap if identical
  x86/ioperm: Remove bitmap if all permissions dropped
  x86/ioperm: Move TSS bitmap update to exit to user work
  x86/ioperm: Add bitmap sequence number
  x86/ioperm: Move iobitmap data into a struct
  x86/tss: Move I/O bitmap data into a seperate struct
  x86/io: Speedup schedule out of I/O bitmap user
  x86/ioperm: Avoid bitmap allocation if no permissions are set
  x86/ioperm: Simplify first ioperm() invocation logic
  x86/iopl: Cleanup include maze
  x86/tss: Fix and move VMX BUILD_BUG_ON()
  x86/cpu: Unify cpu_init()
  ...
2019-11-26 11:12:02 -08:00
Linus Torvalds
1d87200446 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Cross-arch changes to move the linker sections for NOTES and
     EXCEPTION_TABLE into the RO_DATA area, where they belong on most
     architectures. (Kees Cook)

   - Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
     trap jumps into the middle of those padding areas instead of
     sliding execution. (Kees Cook)

   - A thorough cleanup of symbol definitions within x86 assembler code.
     The rather randomly named macros got streamlined around a
     (hopefully) straightforward naming scheme:

        SYM_START(name, linkage, align...)
        SYM_END(name, sym_type)

        SYM_FUNC_START(name)
        SYM_FUNC_END(name)

        SYM_CODE_START(name)
        SYM_CODE_END(name)

        SYM_DATA_START(name)
        SYM_DATA_END(name)

     etc - with about three times of these basic primitives with some
     label, local symbol or attribute variant, expressed via postfixes.

     No change in functionality intended. (Jiri Slaby)

   - Misc other changes, cleanups and smaller fixes"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  x86/entry/64: Remove pointless jump in paranoid_exit
  x86/entry/32: Remove unused resume_userspace label
  x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
  m68k: Convert missed RODATA to RO_DATA
  x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
  x86/mm: Report actual image regions in /proc/iomem
  x86/mm: Report which part of kernel image is freed
  x86/mm: Remove redundant address-of operators on addresses
  xtensa: Move EXCEPTION_TABLE to RO_DATA segment
  powerpc: Move EXCEPTION_TABLE to RO_DATA segment
  parisc: Move EXCEPTION_TABLE to RO_DATA segment
  microblaze: Move EXCEPTION_TABLE to RO_DATA segment
  ia64: Move EXCEPTION_TABLE to RO_DATA segment
  h8300: Move EXCEPTION_TABLE to RO_DATA segment
  c6x: Move EXCEPTION_TABLE to RO_DATA segment
  arm64: Move EXCEPTION_TABLE to RO_DATA segment
  alpha: Move EXCEPTION_TABLE to RO_DATA segment
  x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
  x86/vmlinux: Actually use _etext for the end of the text segment
  vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
  ...
2019-11-26 10:42:40 -08:00
Linus Torvalds
5c4a1c090d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "These are the fixes left over from the v5.4 cycle:

   - Various low level 32-bit entry code fixes and improvements by Andy
     Lutomirski, Peter Zijlstra and Thomas Gleixner.

   - Fix 32-bit Xen PV breakage, by Jan Beulich"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/32: Fix FIXUP_ESPFIX_STACK with user CR3
  x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise
  selftests/x86/sigreturn/32: Invalidate DS and ES when abusing the kernel
  selftests/x86/mov_ss_trap: Fix the SYSENTER test
  x86/entry/32: Fix NMI vs ESPFIX
  x86/entry/32: Unwind the ESPFIX stack earlier on exception entry
  x86/entry/32: Move FIXUP_FRAME after pushing %fs in SAVE_ALL
  x86/entry/32: Use %ss segment where required
  x86/entry/32: Fix IRET exception
  x86/cpu_entry_area: Add guard page for entry stack on 32bit
  x86/pti/32: Size initial_page_table correctly
  x86/doublefault/32: Fix stack canaries in the double fault handler
  x86/xen/32: Simplify ring check in xen_iret_crit_fixup()
  x86/xen/32: Make xen_iret_crit_fixup() independent of frame layout
  x86/stackframe/32: Repair 32-bit Xen PV
2019-11-26 10:12:28 -08:00
Linus Torvalds
da42761df5 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "UV platform updates (with a 'hubless' variant) and Jailhouse updates
  for better UART support"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/jailhouse: Only enable platform UARTs if available
  x86/jailhouse: Improve setup data version comparison
  x86/platform/uv: Account for UV Hubless in is_uvX_hub Ops
  x86/platform/uv: Check EFI Boot to set reboot type
  x86/platform/uv: Decode UVsystab Info
  x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files
  x86/platform/uv: Setup UV functions for Hubless UV Systems
  x86/platform/uv: Add return code to UV BIOS Init function
  x86/platform/uv: Return UV Hubless System Type
  x86/platform/uv: Save OEM_ID from ACPI MADT probe
2019-11-26 09:52:37 -08:00
Linus Torvalds
1c134b198d Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - A PAT series from Davidlohr Bueso, which simplifies the memtype
     rbtree by using the interval tree helpers. (There's more cleanups
     in this area queued up, but they didn't make the merge window.)

   - Also flip over CONFIG_X86_5LEVEL to default-y. This might draw in a
     few more testers, as all the major distros are going to have
     5-level paging enabled by default in their next iterations.

   - Misc cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/pat: Rename pat_rbtree.c to pat_interval.c
  x86/mm/pat: Drop the rbt_ prefix from external memtype calls
  x86/mm/pat: Do not pass 'rb_root' down the memtype tree helper functions
  x86/mm/pat: Convert the PAT tree to a generic interval tree
  x86/mm: Clean up the pmd_read_atomic() comments
  x86/mm: Fix function name typo in pmd_read_atomic() comment
  x86/cpu: Clean up intel_tlb_table[]
  x86/mm: Enable 5-level paging support by default
2019-11-26 09:50:14 -08:00
Linus Torvalds
24ee25a6da Merge branch 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kdump updates from Ingo Molnar:
 "This solves a kdump artifact where encrypted memory contents are
  dumped, instead of unencrypted ones.

  The solution also happens to simplify the kdump code, to everyone's
  delight"

* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/crash: Align function arguments on opening braces
  x86/kdump: Remove the backup region handling
  x86/kdump: Always reserve the low 1M when the crashkernel option is specified
  x86/crash: Add a forward declaration of struct kimage
2019-11-26 09:48:19 -08:00
Linus Torvalds
64d6a12094 Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 hyperv updates from Ingo Molnar:
 "Misc updates to the hyperv guest code:

   - Rework clockevents initialization to better support hibernation

   - Allow guests to enable InvariantTSC

   - Micro-optimize send_ipi_one"

* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hyperv: Initialize clockevents earlier in CPU onlining
  x86/hyperv: Allow guests to enable InvariantTSC
  x86/hyperv: Micro-optimize send_ipi_one()
2019-11-26 09:43:34 -08:00
Linus Torvalds
cd4771f770 Merge branch 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 syscall entry updates from Ingo Molnar:
 "These changes relate to the preparatory cleanup of syscall function
  type signatures - to fix indirect call mismatches with Control-Flow
  Integrity (CFI) checking.

  No change in behavior intended"

* 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Use the correct function type for native_set_fixmap()
  syscalls/x86: Fix function types in COND_SYSCALL
  syscalls/x86: Use the correct function type for sys_ni_syscall
  syscalls/x86: Use COMPAT_SYSCALL_DEFINE0 for IA32 (rt_)sigreturn
  syscalls/x86: Wire up COMPAT_SYSCALL_DEFINE0
  syscalls/x86: Use the correct function type in SYSCALL_DEFINE0
2019-11-26 09:25:36 -08:00
Linus Torvalds
a25bbc2644 Merge branches 'x86-cpu-for-linus' and 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu and fpu updates from Ingo Molnar:

 - math-emu fixes

 - CPUID updates

 - sanity-check RDRAND output to see whether the CPU at least pretends
   to produce random data

 - various unaligned-access across cachelines fixes in preparation of
   hardware level split-lock detection

 - fix MAXSMP constraints to not allow !CPUMASK_OFFSTACK kernels with
   larger than 512 NR_CPUS

 - misc FPU related cleanups

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Align the x86_capability array to size of unsigned long
  x86/cpu: Align cpu_caps_cleared and cpu_caps_set to unsigned long
  x86/umip: Make the comments vendor-agnostic
  x86/Kconfig: Rename UMIP config parameter
  x86/Kconfig: Enforce limit of 512 CPUs with MAXSMP and no CPUMASK_OFFSTACK
  x86/cpufeatures: Add feature bit RDPRU on AMD
  x86/math-emu: Limit MATH_EMULATION to 486SX compatibles
  x86/math-emu: Check __copy_from_user() result
  x86/rdrand: Sanity-check RDRAND output

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Use XFEATURE_FP/SSE enum values instead of hardcoded numbers
  x86/fpu: Shrink space allocated for xstate_comp_offsets
  x86/fpu: Update stale variable name in comment
2019-11-26 08:58:08 -08:00
Linus Torvalds
85fbf15bc9 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The main changes were:

   - Extend the boot protocol to allow future extensions without hitting
     the setup_header size limit.

   - Add quirk to devicetree systems to disable the RTC unless it's
     listed as a supported device.

   - Fix ld.lld linker pedantry"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Introduce setup_indirect
  x86/boot: Introduce kernel_info.setup_type_max
  x86/boot: Introduce kernel_info
  x86/init: Allow DT configured systems to disable RTC at boot time
  x86/realmode: Explicitly set entry point via ENTRY in linker script
2019-11-26 08:40:20 -08:00
Linus Torvalds
fd2615908d Merge branches 'core-objtool-for-linus', 'x86-cleanups-for-linus' and 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 objtool, cleanup, and apic updates from Ingo Molnar:
 "Objtool:

   - Fix a gawk 5.0 incompatibility in gen-insn-attr-x86.awk. Most
     distros are still on gawk 4.2.x.

  Cleanup:

   - Misc cleanups, plus the removal of obsolete code such as Calgary
     IOMMU support, which code hasn't seen any real testing in a long
     time and there's no known users left.

  apic:

   - Two changes: a cleanup and a fix for an (old) race for oneshot
     threaded IRQ handlers"

* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/insn: Fix awk regexp warnings

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Remove unused asm/rio.h
  x86: Fix typos in comments
  x86/pci: Remove #ifdef __KERNEL__ guard from <asm/pci.h>
  x86/pci: Remove pci_64.h
  x86: Remove the calgary IOMMU driver
  x86/apic, x86/uprobes: Correct parameter names in kernel-doc comments
  x86/kdump: Remove the unused crash_copy_backup_region()
  x86/nmi: Remove stale EDAC include leftover

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioapic: Rename misnamed functions
  x86/ioapic: Prevent inconsistent state when moving an interrupt
2019-11-26 08:21:54 -08:00
Linus Torvalds
386403a115 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
 "Another merge window, another pull full of stuff:

   1) Support alternative names for network devices, from Jiri Pirko.

   2) Introduce per-netns netdev notifiers, also from Jiri Pirko.

   3) Support MSG_PEEK in vsock/virtio, from Matias Ezequiel Vara
      Larsen.

   4) Allow compiling out the TLS TOE code, from Jakub Kicinski.

   5) Add several new tracepoints to the kTLS code, also from Jakub.

   6) Support set channels ethtool callback in ena driver, from Sameeh
      Jubran.

   7) New SCTP events SCTP_ADDR_ADDED, SCTP_ADDR_REMOVED,
      SCTP_ADDR_MADE_PRIM, and SCTP_SEND_FAILED_EVENT. From Xin Long.

   8) Add XDP support to mvneta driver, from Lorenzo Bianconi.

   9) Lots of netfilter hw offload fixes, cleanups and enhancements,
      from Pablo Neira Ayuso.

  10) PTP support for aquantia chips, from Egor Pomozov.

  11) Add UDP segmentation offload support to igb, ixgbe, and i40e. From
      Josh Hunt.

  12) Add smart nagle to tipc, from Jon Maloy.

  13) Support L2 field rewrite by TC offloads in bnxt_en, from Venkat
      Duvvuru.

  14) Add a flow mask cache to OVS, from Tonghao Zhang.

  15) Add XDP support to ice driver, from Maciej Fijalkowski.

  16) Add AF_XDP support to ice driver, from Krzysztof Kazimierczak.

  17) Support UDP GSO offload in atlantic driver, from Igor Russkikh.

  18) Support it in stmmac driver too, from Jose Abreu.

  19) Support TIPC encryption and auth, from Tuong Lien.

  20) Introduce BPF trampolines, from Alexei Starovoitov.

  21) Make page_pool API more numa friendly, from Saeed Mahameed.

  22) Introduce route hints to ipv4 and ipv6, from Paolo Abeni.

  23) Add UDP segmentation offload to cxgb4, Rahul Lakkireddy"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1857 commits)
  libbpf: Fix usage of u32 in userspace code
  mm: Implement no-MMU variant of vmalloc_user_node_flags
  slip: Fix use-after-free Read in slip_open
  net: dsa: sja1105: fix sja1105_parse_rgmii_delays()
  macvlan: schedule bc_work even if error
  enetc: add support Credit Based Shaper(CBS) for hardware offload
  net: phy: add helpers phy_(un)lock_mdio_bus
  mdio_bus: don't use managed reset-controller
  ax88179_178a: add ethtool_op_get_ts_info()
  mlxsw: spectrum_router: Fix use of uninitialized adjacency index
  mlxsw: spectrum_router: After underlay moves, demote conflicting tunnels
  bpf: Simplify __bpf_arch_text_poke poke type handling
  bpf: Introduce BPF_TRACE_x helper for the tracing tests
  bpf: Add bpf_jit_blinding_enabled for !CONFIG_BPF_JIT
  bpf, testing: Add various tail call test cases
  bpf, x86: Emit patchable direct jump as tail call
  bpf: Constant map key tracking for prog array pokes
  bpf: Add poke dependency tracking for prog array maps
  bpf: Add initial poke descriptor table for jit images
  bpf: Move owner type, jited info into array auxiliary data
  ...
2019-11-25 20:02:57 -08:00
Linus Torvalds
752272f16d ARM:
- Data abort report and injection
 - Steal time support
 - GICv4 performance improvements
 - vgic ITS emulation fixes
 - Simplify FWB handling
 - Enable halt polling counters
 - Make the emulated timer PREEMPT_RT compliant
 
 s390:
 - Small fixes and cleanups
 - selftest improvements
 - yield improvements
 
 PPC:
 - Add capability to tell userspace whether we can single-step the guest.
 - Improve the allocation of XIVE virtual processor IDs
 - Rewrite interrupt synthesis code to deliver interrupts in virtual
   mode when appropriate.
 - Minor cleanups and improvements.
 
 x86:
 - XSAVES support for AMD
 - more accurate report of nested guest TSC to the nested hypervisor
 - retpoline optimizations
 - support for nested 5-level page tables
 - PMU virtualization optimizations, and improved support for nested
   PMU virtualization
 - correct latching of INITs for nested virtualization
 - IOAPIC optimization
 - TSX_CTRL virtualization for more TAA happiness
 - improved allocation and flushing of SEV ASIDs
 - many bugfixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJd27PMAAoJEL/70l94x66DspsH+gPc6YWtKJFJH58Zj8NrNh6y
 t0FwDFcvUa51+m4jaY4L5Y8+zqu1dZFnPPhFGqNWpxrjCEvE/glQJv3BiUX06Seh
 aYUHNymGoYCTJOHaaGhV+NlgQaDuZOCOkIsOLAPehyFd1KojwB+FRC0xmO6aROPw
 9yQgYrKuK1UUn5HwxBNrMS4+Xv+2iKv/9sTnq1G4W2qX2NZQg84LVPg1zIdkCh3D
 3GOvoCBEk3ivQqjmdE7rP/InPr0XvW0b6TFhchIk8J6jEIQFHsmOUefiTvTxsIHV
 OKAZwvyeYPrYHA/aDZpaBmY2aR0ydfKDUQcviNIJoF1vOktGs0hvl3VbsmG8QCg=
 =OSI1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - data abort report and injection
   - steal time support
   - GICv4 performance improvements
   - vgic ITS emulation fixes
   - simplify FWB handling
   - enable halt polling counters
   - make the emulated timer PREEMPT_RT compliant

  s390:
   - small fixes and cleanups
   - selftest improvements
   - yield improvements

  PPC:
   - add capability to tell userspace whether we can single-step the
     guest
   - improve the allocation of XIVE virtual processor IDs
   - rewrite interrupt synthesis code to deliver interrupts in virtual
     mode when appropriate.
   - minor cleanups and improvements.

  x86:
   - XSAVES support for AMD
   - more accurate report of nested guest TSC to the nested hypervisor
   - retpoline optimizations
   - support for nested 5-level page tables
   - PMU virtualization optimizations, and improved support for nested
     PMU virtualization
   - correct latching of INITs for nested virtualization
   - IOAPIC optimization
   - TSX_CTRL virtualization for more TAA happiness
   - improved allocation and flushing of SEV ASIDs
   - many bugfixes and cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints
  KVM: x86: Grab KVM's srcu lock when setting nested state
  KVM: x86: Open code shared_msr_update() in its only caller
  KVM: Fix jump label out_free_* in kvm_init()
  KVM: x86: Remove a spurious export of a static function
  KVM: x86: create mmu/ subdirectory
  KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page
  KVM: x86: remove set but not used variable 'called'
  KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning
  KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it
  KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality
  KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID
  KVM: x86: do not modify masked bits of shared MSRs
  KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
  KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
  KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
  KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT
  KVM: x86: Unexport kvm_vcpu_reload_apic_access_page()
  KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1
  KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization
  ...
2019-11-25 18:02:36 -08:00
Linus Torvalds
3f3c8be973 xen: fixes for xen
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXdtnVQAKCRCAXGG7T9hj
 vg1hAQDqG1DKZvR6BtlvETFMz7ZlrXVkpm6C74Wy4bLiO5KSSAEAneFbrDwFVa0c
 d05Z6wemjlyEd7u3gkVQBKfHkbWBRQQ=
 =aDIL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - a small series to remove the build constraint of Xen x86 MCE handling
   to 64-bit only

 - a bunch of minor cleanups

* tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Fix Kconfig indentation
  xen/mcelog: also allow building for 32-bit kernels
  xen/mcelog: add PPIN to record when available
  xen/mcelog: drop __MC_MSR_MCGCAP
  xen/gntdev: Use select for DMA_SHARED_BUFFER
  xen: mm: make xen_mm_init static
  xen: mm: include <xen/xen-ops.h> for missing declarations
2019-11-25 17:45:31 -08:00
Linus Torvalds
4ba380f616 arm64 updates for 5.5:
- On ARMv8 CPUs without hardware updates of the access flag, avoid
   failing cow_user_page() on PFN mappings if the pte is old. The patches
   introduce an arch_faults_on_old_pte() macro, defined as false on x86.
   When true, cow_user_page() makes the pte young before attempting
   __copy_from_user_inatomic().
 
 - Covert the synchronous exception handling paths in
   arch/arm64/kernel/entry.S to C.
 
 - FTRACE_WITH_REGS support for arm64.
 
 - ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4
 
 - Several kselftest cases specific to arm64, together with a MAINTAINERS
   update for these files (moved to the ARM64 PORT entry).
 
 - Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
   instructions under certain conditions.
 
 - Workaround for Cortex-A57 and A72 errata where the CPU may
   speculatively execute an AT instruction and associate a VMID with the
   wrong guest page tables (corrupting the TLB).
 
 - Perf updates for arm64: additional PMU topologies on HiSilicon
   platforms, support for CCN-512 interconnect, AXI ID filtering in the
   IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.
 
 - GICv3 optimisation to avoid a heavy barrier when accessing the
   ICC_PMR_EL1 register.
 
 - ELF HWCAP documentation updates and clean-up.
 
 - SMC calling convention conduit code clean-up.
 
 - KASLR diagnostics printed during boot
 
 - NVIDIA Carmel CPU added to the KPTI whitelist
 
 - Some arm64 mm clean-ups: use generic free_initrd_mem(), remove stale
   macro, simplify calculation in __create_pgd_mapping(), typos.
 
 - Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
   endinanness to help with allmodconfig.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl3YJswACgkQa9axLQDI
 XvFwYg//aTGhNLew3ADgW2TYal7LyqetRROixPBrzqHLu2A8No1+QxHMaKxpZVyf
 pt25tABuLtPHql3qBzE0ltmfbLVsPj/3hULo404EJb9HLRfUnVGn7gcPkc+p4YAr
 IYkYPXJbk6OlJ84vI+4vXmDEF12bWCqamC9qZ+h99qTpMjFXFO17DSJ7xQ8Xic3A
 HHgCh4uA7gpTVOhLxaS6KIw+AZNYwvQxLXch2+wj6agbGX79uw9BeMhqVXdkPq8B
 RTDJpOdS970WOT4cHWOkmXwsqqGRqgsgyu+bRUJ0U72+0y6MX0qSHIUnVYGmNc5q
 Dtox4rryYLvkv/hbpkvjgVhv98q3J1mXt/CalChWB5dG4YwhJKN2jMiYuoAvB3WS
 6dR7Dfupgai9gq1uoKgBayS2O6iFLSa4g58vt3EqUBqmM7W7viGFPdLbuVio4ycn
 CNF2xZ8MZR6Wrh1JfggO7Hc11EJdSqESYfHO6V/pYB4pdpnqJLDoriYHXU7RsZrc
 HvnrIvQWKMwNbqBvpNbWvK5mpBMMX2pEienA3wOqKNH7MbepVsG+npOZTVTtl9tN
 FL0ePb/mKJu/2+gW8ntiqYn7EzjKprRmknOiT2FjWWo0PxgJ8lumefuhGZZbaOWt
 /aTAeD7qKd/UXLKGHF/9v3q4GEYUdCFOXP94szWVPyLv+D9h8L8=
 =TPL9
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Apart from the arm64-specific bits (core arch and perf, new arm64
  selftests), it touches the generic cow_user_page() (reviewed by
  Kirill) together with a macro for x86 to preserve the existing
  behaviour on this architecture.

  Summary:

   - On ARMv8 CPUs without hardware updates of the access flag, avoid
     failing cow_user_page() on PFN mappings if the pte is old. The
     patches introduce an arch_faults_on_old_pte() macro, defined as
     false on x86. When true, cow_user_page() makes the pte young before
     attempting __copy_from_user_inatomic().

   - Covert the synchronous exception handling paths in
     arch/arm64/kernel/entry.S to C.

   - FTRACE_WITH_REGS support for arm64.

   - ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4

   - Several kselftest cases specific to arm64, together with a
     MAINTAINERS update for these files (moved to the ARM64 PORT entry).

   - Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
     instructions under certain conditions.

   - Workaround for Cortex-A57 and A72 errata where the CPU may
     speculatively execute an AT instruction and associate a VMID with
     the wrong guest page tables (corrupting the TLB).

   - Perf updates for arm64: additional PMU topologies on HiSilicon
     platforms, support for CCN-512 interconnect, AXI ID filtering in
     the IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.

   - GICv3 optimisation to avoid a heavy barrier when accessing the
     ICC_PMR_EL1 register.

   - ELF HWCAP documentation updates and clean-up.

   - SMC calling convention conduit code clean-up.

   - KASLR diagnostics printed during boot

   - NVIDIA Carmel CPU added to the KPTI whitelist

   - Some arm64 mm clean-ups: use generic free_initrd_mem(), remove
     stale macro, simplify calculation in __create_pgd_mapping(), typos.

   - Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
     endinanness to help with allmodconfig"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
  arm64: Kconfig: add a choice for endianness
  kselftest: arm64: fix spelling mistake "contiguos" -> "contiguous"
  arm64: Kconfig: make CMDLINE_FORCE depend on CMDLINE
  MAINTAINERS: Add arm64 selftests to the ARM64 PORT entry
  arm64: kaslr: Check command line before looking for a seed
  arm64: kaslr: Announce KASLR status on boot
  kselftest: arm64: fake_sigreturn_misaligned_sp
  kselftest: arm64: fake_sigreturn_bad_size
  kselftest: arm64: fake_sigreturn_duplicated_fpsimd
  kselftest: arm64: fake_sigreturn_missing_fpsimd
  kselftest: arm64: fake_sigreturn_bad_size_for_magic0
  kselftest: arm64: fake_sigreturn_bad_magic
  kselftest: arm64: add helper get_current_context
  kselftest: arm64: extend test_init functionalities
  kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht]
  kselftest: arm64: mangle_pstate_invalid_daif_bits
  kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils
  kselftest: arm64: extend toplevel skeleton Makefile
  drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platform
  arm64: mm: reserve CMA and crashkernel in ZONE_DMA32
  ...
2019-11-25 15:39:19 -08:00
Arnd Bergmann
af3784689e y2038: ipc: fix x32 ABI breakage
The correct type on x32 is 64-bit wide, same as for the other struct
members around it, so use  __kernel_long_t in place of the original
__kernel_time_t here, corresponding to the rest of the structure.

Fixes: caf5e32d4e ("y2038: ipc: remove __kernel_time_t reference from headers")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-25 21:30:12 +01:00
Will Deacon
fb041bb7c0 locking/refcount: Consolidate implementations of refcount_t
The generic implementation of refcount_t should be good enough for
everybody, so remove ARCH_HAS_REFCOUNT and REFCOUNT_FULL entirely,
leaving the generic implementation enabled unconditionally.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-9-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:15:32 +01:00
Ingo Molnar
ceb9e77324 Merge branch 'x86/core' into perf/core, to resolve conflicts and to pick up completed topic tree
Conflicts:
	tools/perf/check-headers.sh

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:09:27 +01:00
Ingo Molnar
f01ec4fca8 Merge branch 'x86/build' into x86/asm, to pick up completed topic branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:05:09 +01:00
Ingo Molnar
05b042a194 x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise
When two recent commits that increased the size of the 'struct cpu_entry_area'
were merged in -tip, the 32-bit defconfig build started failing on the following
build time assert:

  ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_189’ declared with attribute error: BUILD_BUG_ON failed: CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE
  arch/x86/mm/cpu_entry_area.c:189:2: note: in expansion of macro ‘BUILD_BUG_ON’
  In function ‘setup_cpu_entry_area_ptes’,

Which corresponds to the following build time assert:

	BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE);

The purpose of this assert is to sanity check the fixed-value definition of
CPU_ENTRY_AREA_PAGES arch/x86/include/asm/pgtable_32_types.h:

	#define CPU_ENTRY_AREA_PAGES    (NR_CPUS * 41)

The '41' is supposed to match sizeof(struct cpu_entry_area)/PAGE_SIZE, which value
we didn't want to define in such a low level header, because it would cause
dependency hell.

Every time the size of cpu_entry_area is changed, we have to adjust CPU_ENTRY_AREA_PAGES
accordingly - and this assert is checking that constraint.

But the assert is both imprecise and buggy, primarily because it doesn't
include the single readonly IDT page that is mapped at CPU_ENTRY_AREA_BASE
(which begins at a PMD boundary).

This bug was hidden by the fact that by accident CPU_ENTRY_AREA_PAGES is defined
too large upstream (v5.4-rc8):

	#define CPU_ENTRY_AREA_PAGES    (NR_CPUS * 40)

While 'struct cpu_entry_area' is 155648 bytes, or 38 pages. So we had two extra
pages, which hid the bug.

The following commit (not yet upstream) increased the size to 40 pages:

  x86/iopl: ("Restrict iopl() permission scope")

... but increased CPU_ENTRY_AREA_PAGES only 41 - i.e. shortening the gap
to just 1 extra page.

Then another not-yet-upstream commit changed the size again:

  880a98c339: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")

Which increased the cpu_entry_area size from 38 to 39 pages, but
didn't change CPU_ENTRY_AREA_PAGES (kept it at 40). This worked
fine, because we still had a page left from the accidental 'reserve'.

But when these two commits were merged into the same tree, the
combined size of cpu_entry_area grew from 38 to 40 pages, while
CPU_ENTRY_AREA_PAGES finally caught up to 40 as well.

Which is fine in terms of functionality, but the assert broke:

	BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE);

because CPU_ENTRY_AREA_MAP_SIZE is the total size of the area,
which is 1 page larger due to the IDT page.

To fix all this, change the assert to two precise asserts:

	BUILD_BUG_ON((CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE);
	BUILD_BUG_ON(CPU_ENTRY_AREA_TOTAL_SIZE != CPU_ENTRY_AREA_MAP_SIZE);

This takes the IDT page into account, and also connects the size-based
define of CPU_ENTRY_AREA_TOTAL_SIZE with the address-subtraction based
define of CPU_ENTRY_AREA_MAP_SIZE.

Also clean up some of the names which made it rather confusing:

 - 'CPU_ENTRY_AREA_TOT_SIZE' wasn't actually the 'total' size of
   the cpu-entry-area, but the per-cpu array size, so rename this
   to CPU_ENTRY_AREA_ARRAY_SIZE.

 - Introduce CPU_ENTRY_AREA_TOTAL_SIZE that _is_ the total mapping
   size, with the IDT included.

 - Add comments where '+1' denotes the IDT mapping - it wasn't
   obvious and took me about 3 hours to decode...

Finally, because this particular commit is actually applied after
this patch:

  880a98c339: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")

Fix the CPU_ENTRY_AREA_PAGES value from 40 pages to the correct 39 pages.

All future commits that change cpu_entry_area will have to adjust
this value precisely.

As a side note, we should probably attempt to remove CPU_ENTRY_AREA_PAGES
and derive its value directly from the structure, without causing
header hell - but that is an adventure for another day! :-)

Fixes: 880a98c339: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 08:53:33 +01:00
Himadri Pandya
fa36dcdf8b x86: hv: Add function to allocate zeroed page for Hyper-V
Hyper-V assumes page size to be 4K. While this assumption holds true on
x86 architecture, it might not  be true for ARM64 architecture. Hence
define hyper-v specific function to allocate a zeroed page which can
have a different implementation on ARM64 architecture to handle the
conflict between hyper-v's assumed page size and actual guest page size.

Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-21 20:10:45 -05:00
Thomas Gleixner
880a98c339 x86/cpu_entry_area: Add guard page for entry stack on 32bit
The entry stack in the cpu entry area is protected against overflow by the
readonly GDT on 64-bit, but on 32-bit the GDT needs to be writeable and
therefore does not trigger a fault on stack overflow.

Add a guard page.

Fixes: c482feefe1 ("x86/entry/64: Make cpu_entry_area.tss read-only")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
2019-11-21 19:37:43 +01:00
Paolo Bonzini
46f4f0aabc Merge branch 'kvm-tsx-ctrl' into HEAD
Conflicts:
	arch/x86/kvm/vmx/vmx.c
2019-11-21 12:03:40 +01:00
Paolo Bonzini
edef5c36b0 KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID
Because KVM always emulates CPUID, the CPUID clear bit
(bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually"
by the hypervisor when performing said emulation.

Right now neither kvm-intel.ko nor kvm-amd.ko implement
MSR_IA32_TSX_CTRL but this will change in the next patch.

Reviewed-by: Jim Mattson <jmattson@google.com>
Tested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21 09:59:31 +01:00
David S. Miller
ee5a489fd9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-11-20

The following pull-request contains BPF updates for your *net-next* tree.

We've added 81 non-merge commits during the last 17 day(s) which contain
a total of 120 files changed, 4958 insertions(+), 1081 deletions(-).

There are 3 trivial conflicts, resolve it by always taking the chunk from
196e8ca748:

<<<<<<< HEAD
=======
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
>>>>>>> 196e8ca748

<<<<<<< HEAD
void *bpf_map_area_alloc(u64 size, int numa_node)
=======
static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
>>>>>>> 196e8ca748

<<<<<<< HEAD
        if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
=======
        /* kmalloc()'ed memory can't be mmap()'ed */
        if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>>>>>>> 196e8ca748

The main changes are:

1) Addition of BPF trampoline which works as a bridge between kernel functions,
   BPF programs and other BPF programs along with two new use cases: i) fentry/fexit
   BPF programs for tracing with practically zero overhead to call into BPF (as
   opposed to k[ret]probes) and ii) attachment of the former to networking related
   programs to see input/output of networking programs (covering xdpdump use case),
   from Alexei Starovoitov.

2) BPF array map mmap support and use in libbpf for global data maps; also a big
   batch of libbpf improvements, among others, support for reading bitfields in a
   relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko.

3) Extend s390x JIT with usage of relative long jumps and loads in order to lift
   the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich.

4) Add BPF audit support and emit messages upon successful prog load and unload in
   order to have a timeline of events, from Daniel Borkmann and Jiri Olsa.

5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode
   (XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson.

6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API
   call named bpf_get_link_xdp_info() for retrieving the full set of prog
   IDs attached to XDP, from Toke Høiland-Jørgensen.

7) Add BTF support for array of int, array of struct and multidimensional arrays
   and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau.

8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo.

9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid
   xdping to be run as standalone, from Jiri Benc.

10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song.

11) Fix a memory leak in BPF fentry test run data, from Colin Ian King.

12) Various smaller misc cleanups and improvements mostly all over BPF selftests and
    samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20 18:11:23 -08:00
Jan Beulich
81ff2c37f9 x86/stackframe/32: Repair 32-bit Xen PV
Once again RPL checks have been introduced which don't account for a 32-bit
kernel living in ring 1 when running in a PV Xen domain. The case in
FIXUP_FRAME has been preventing boot.

Adjust BUG_IF_WRONG_CR3 as well to guard against future uses of the macro
on a code path reachable when running in PV mode under Xen; I have to admit
that I stopped at a certain point trying to figure out whether there are
present ones.

Fixes: 3c88c692c2 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stable Team <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/0fad341f-b7f5-f859-d55d-f0084ee7087e@suse.com
2019-11-19 21:58:28 +01:00
Ingo Molnar
8e1d58ae0c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/kcsan
Pull the KCSAN subsystem from Paul E. McKenney:

   "This pull request contains base kernel concurrency sanitizer
    (KCSAN) enablement for x86, courtesy of Marco Elver.  KCSAN is a
    sampling watchpoint-based data-race detector, and is documented in
    Documentation/dev-tools/kcsan.rst.  KCSAN was announced in September,
    and much feedback has since been incorporated:

      http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com

    The data races located thus far have resulted in a number of fixes:

      https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan

    Additional information may be found here:

      https://lore.kernel.org/lkml/20191114180303.66955-1-elver@google.com/
   "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-19 19:56:28 +01:00
Ingo Molnar
9f4813b531 Linux 5.4-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3RzgkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGN18H/0JZbfIpy8/4Irol
 0va7Aj2fBi1a5oxfqYsMKN0u3GKbN3OV9tQ+7w1eBNGvL72TGadgVTzTY+Im7A9U
 UjboAc7jDPCG+YhIwXFufMiIAq5jDIj6h0LDas7ALsMfsnI/RhTwgNtLTAkyI3dH
 YV/6ljFULwueJHCxzmrYbd1x39PScj3kCNL2pOe6On7rXMKOemY/nbbYYISxY30E
 GMgKApSS+li7VuSqgrKoq5Qaox26LyR2wrXB1ij4pqEJ9xgbnKRLdHuvXZnE+/5p
 46EMirt+yeSkltW3d2/9MoCHaA76ESzWMMDijLx7tPgoTc3RB3/3ZLsm3rYVH+cR
 cRlNNSk=
 =0+Cg
 -----END PGP SIGNATURE-----

Merge tag 'v5.4-rc8' into WIP.x86/mm, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-19 09:00:45 +01:00
Thomas Gleixner
b41d62201b x86: Remove unused asm/rio.h
The removed calgary IOMMU driver was the only user of this header file.

Reported-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
2019-11-18 10:52:11 +01:00
Marco Elver
40d04110f8 x86, kcsan: Enable KCSAN for x86
This patch enables KCSAN for x86, with updates to build rules to not use
KCSAN for several incompatible compilation units.

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-11-16 07:23:16 -08:00
Thomas Gleixner
111e7b15cf x86/ioperm: Extend IOPL config to control ioperm() as well
If iopl() is disabled, then providing ioperm() does not make much sense.

Rename the config option and disable/enable both syscalls with it. Guard
the code with #ifdefs where appropriate.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16 11:24:06 +01:00
Thomas Gleixner
a24ca99768 x86/iopl: Remove legacy IOPL option
The IOPL emulation via the I/O bitmap is sufficient. Remove the legacy
cruft dealing with the (e)flags based IOPL mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com> (Paravirt and Xen parts)
Acked-by: Andy Lutomirski <luto@kernel.org>
2019-11-16 11:24:05 +01:00
Thomas Gleixner
c8137ace56 x86/iopl: Restrict iopl() permission scope
The access to the full I/O port range can be also provided by the TSS I/O
bitmap, but that would require to copy 8k of data on scheduling in the
task. As shown with the sched out optimization TSS.io_bitmap_base can be
used to switch the incoming task to a preallocated I/O bitmap which has all
bits zero, i.e. allows access to all I/O ports.

Implementing this allows to provide an iopl() emulation mode which restricts
the IOPL level 3 permissions to I/O port access but removes the STI/CLI
permission which is coming with the hardware IOPL mechansim.

Provide a config option to switch IOPL to emulation mode, make it the
default and while at it also provide an option to disable IOPL completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
2019-11-16 11:24:05 +01:00
Thomas Gleixner
4804e382c1 x86/ioperm: Share I/O bitmap if identical
The I/O bitmap is duplicated on fork. That's wasting memory and slows down
fork. There is no point to do so. As long as the bitmap is not modified it
can be shared between threads and processes.

Add a refcount and just share it on fork. If a task modifies the bitmap
then it has to do the duplication if and only if it is shared.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
2019-11-16 11:24:04 +01:00
Thomas Gleixner
ea5f1cd7ab x86/ioperm: Remove bitmap if all permissions dropped
If ioperm() results in a bitmap with all bits set (no permissions to any
I/O port), then handling that bitmap on context switch and exit to user
mode is pointless. Drop it.

Move the bitmap exit handling to the ioport code and reuse it for both the
thread exit path and dropping it. This allows to reuse this code for the
upcoming iopl() emulation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
2019-11-16 11:24:03 +01:00
Thomas Gleixner
22fe5b0439 x86/ioperm: Move TSS bitmap update to exit to user work
There is no point to update the TSS bitmap for tasks which use I/O bitmaps
on every context switch. It's enough to update it right before exiting to
user space.

That reduces the context switch bitmap handling to invalidating the io
bitmap base offset in the TSS when the outgoing task has TIF_IO_BITMAP
set. The invaldiation is done on purpose when a task with an IO bitmap
switches out to prevent any possible leakage of an activated IO bitmap.

It also removes the requirement to update the tasks bitmap atomically in
ioperm().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16 11:24:03 +01:00
Thomas Gleixner
060aa16fdb x86/ioperm: Add bitmap sequence number
Add a globally unique sequence number which is incremented when ioperm() is
changing the I/O bitmap of a task. Store the new sequence number in the
io_bitmap structure and compare it with the sequence number of the I/O
bitmap which was last loaded on a CPU. Only update the bitmap if the
sequence is different.

That should further reduce the overhead of I/O bitmap scheduling when there
are only a few I/O bitmap users on the system.

The 64bit sequence counter is sufficient. A wraparound of the sequence
counter assuming an ioperm() call every nanosecond would require about 584
years of uptime.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16 11:24:02 +01:00
Thomas Gleixner
577d5cd7e5 x86/ioperm: Move iobitmap data into a struct
No point in having all the data in thread_struct, especially as upcoming
changes add more.

Make the bitmap in the new struct accessible as array of longs and as array
of characters via a union, so both the bitmap functions and the update
logic can avoid type casts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16 11:24:02 +01:00
Thomas Gleixner
f5848e5fd2 x86/tss: Move I/O bitmap data into a seperate struct
Move the non hardware portion of I/O bitmap data into a seperate struct for
readability sake.

Originally-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16 11:24:01 +01:00
Thomas Gleixner
ecc7e37d4d x86/io: Speedup schedule out of I/O bitmap user
There is no requirement to update the TSS I/O bitmap when a thread using it is
scheduled out and the incoming thread does not use it.

For the permission check based on the TSS I/O bitmap the CPU calculates the memory
location of the I/O bitmap by the address of the TSS and the io_bitmap_base member
of the tss_struct. The easiest way to invalidate the I/O bitmap is to switch the
offset to an address outside of the TSS limit.

If an I/O instruction is issued from user space the TSS limit causes #GP to be
raised in the same was as valid I/O bitmap with all bits set to 1 would do.

This removes the extra work when an I/O bitmap using task is scheduled out
and puts the burden on the rare I/O bitmap users when they are scheduled
in.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-16 11:24:01 +01:00
Thomas Gleixner
2fff071d28 x86/process: Unify copy_thread_tls()
While looking at the TSS io bitmap it turned out that any change in that
area would require identical changes to copy_thread_tls(). The 32 and 64
bit variants share sufficient code to consolidate them into a common
function to avoid duplication of upcoming modifications.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
2019-11-16 11:23:59 +01:00
Peter Zijlstra
c3d6324f84 x86/alternatives: Teach text_poke_bp() to emulate instructions
In preparation for static_call and variable size jump_label support,
teach text_poke_bp() to emulate instructions, namely:

  JMP32, JMP8, CALL, NOP2, NOP_ATOMIC5, INT3

The current text_poke_bp() takes a @handler argument which is used as
a jump target when the temporary INT3 is hit by a different CPU.

When patching CALL instructions, this doesn't work because we'd miss
the PUSH of the return address. Instead, teach poke_int3_handler() to
emulate an instruction, typically the instruction we're patching in.

This fits almost all text_poke_bp() users, except
arch_unoptimize_kprobe() which restores random text, and for that site
we have to build an explicit emulate instruction.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132457.529086974@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 8c7eebc10687af45ac8e40ad1bac0cf7893dba9f)
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-11-15 14:07:01 -08:00
Fenghua Yu
db8c33f8b5 x86/cpu: Align the x86_capability array to size of unsigned long
The x86_capability array in cpuinfo_x86 is of type u32 and thus is
naturally aligned to 4 bytes. But, set_bit() and clear_bit() require the
array to be aligned to size of unsigned long (i.e. 8 bytes on 64-bit
systems).

The array pointer is handed into atomic bit operations. If the access is
not aligned to unsigned long then the atomic bit operations can end up
crossing a cache line boundary, which causes the CPU to do a full bus lock
as it can't lock both cache lines at once. The bus lock operation is heavy
weight and can cause severe performance degradation.

The upcoming #AC split lock detection mechanism will issue warnings for
this kind of access.

Force the alignment of the array to unsigned long. This avoids the massive
code changes which would be required when converting the array data type to
unsigned long.

[ tglx: Rewrote changelog so it contains information WHY this is required ]

Suggested-by: David Laight <David.Laight@aculab.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190916223958.27048-4-tony.luck@intel.com
2019-11-15 20:20:33 +01:00
Arnd Bergmann
caf5e32d4e y2038: ipc: remove __kernel_time_t reference from headers
There are two structures based on time_t that conflict between libc and
kernel: timeval and timespec. Both are now renamed to __kernel_old_timeval
and __kernel_old_timespec.

For time_t, the old typedef is still __kernel_time_t. There is nothing
wrong with that name, but it would be nice to not use that going forward
as this type is used almost only in deprecated interfaces because of
the y2038 overflow.

In the IPC headers (msgbuf.h, sembuf.h, shmbuf.h), __kernel_time_t is only
used for the 64-bit variants, which are not deprecated.

Change these to a plain 'long', which is the same type as __kernel_time_t
on all 64-bit architectures anyway, to reduce the number of users of the
old type.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-15 14:38:28 +01:00
Nitesh Narayan Lal
7ee30bc132 KVM: x86: deliver KVM IOAPIC scan request to target vCPUs
In IOAPIC fixed delivery mode instead of flushing the scan
requests to all vCPUs, we should only send the requests to
vCPUs specified within the destination field.

This patch introduces kvm_get_dest_vcpus_mask() API which
retrieves an array of target vCPUs by using
kvm_apic_map_get_dest_lapic() and then based on the
vcpus_idx, it sets the bit in a bitmap. However, if the above
fails kvm_get_dest_vcpus_mask() finds the target vCPUs by
traversing all available vCPUs. Followed by setting the
bits in the bitmap.

If we had different vCPUs in the previous request for the
same redirection table entry then bits corresponding to
these vCPUs are also set. This to done to keep
ioapic_handled_vectors synchronized.

This bitmap is then eventually passed on to
kvm_make_vcpus_request_mask() to generate a masked request
only for the target vCPUs.

This would enable us to reduce the latency overhead on isolated
vCPUs caused by the IPI to process due to KVM_REQ_IOAPIC_SCAN.

Suggested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15 11:44:22 +01:00
Like Xu
b35e5548b4 KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC
Currently, a host perf_event is created for a vPMC functionality emulation.
It’s unpredictable to determine if a disabled perf_event will be reused.
If they are disabled and are not reused for a considerable period of time,
those obsolete perf_events would increase host context switch overhead that
could have been avoided.

If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu
sched time slice, and its independent enable bit of the vPMC isn't set,
we can predict that the guest has finished the use of this vPMC, and then
do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events
in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in.

This lazy mechanism delays the event release time to the beginning of the
next scheduled time slice if vPMC's MSRs aren't changed during this time
slice. If guest comes back to use this vPMC in next time slice, a new perf
event would be re-created via perf_event_create_kernel_counter() as usual.

Suggested-by: Wei Wang <wei.w.wang@intel.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15 11:44:10 +01:00
Like Xu
a6da0d77e9 KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter
The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
a heavyweight and high-frequency operation, especially when host disables
the watchdog (maximum 21000000 ns) which leads to an unacceptable latency
of the guest NMI handler. It limits the use of vPMUs in the guest.

When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop
and release its existing perf_event (if any) every time EVEN in most cases
almost the same requested perf_event will be created and configured again.

For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl'
for fixed) is the same as its current config AND a new sample period based
on pmc->counter is accepted by host perf interface, the current event could
be reused safely as a new created one does. Otherwise, do release the
undesirable perf_event and reprogram a new one as usual.

It's light-weight to call pmc_pause_counter (disable, read and reset event)
and pmc_resume_counter (recalibrate period and re-enable event) as guest
expects instead of release-and-create again on any condition. Compared to
use the filterable event->attr or hw.config, a new 'u64 current_config'
field is added to save the last original programed config for each vPMC.

Based on this implementation, the number of calls to pmc_reprogram_counter
is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event.
In the usage of multiplexing perf sampling mode, the average latency of the
guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up).
If host disables watchdog, the minimum latecy of guest NMI handler could be
speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average.

Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15 11:44:09 +01:00
Christoph Hellwig
b52b0c4fc9 x86/pci: Remove #ifdef __KERNEL__ guard from <asm/pci.h>
pci.h is not a UAPI header, so the __KERNEL__ ifdef is rather pointless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191113071836.21041-4-hch@lst.de
2019-11-15 10:37:14 +01:00
Christoph Hellwig
948fdcf942 x86/pci: Remove pci_64.h
This file only contains external declarations for two non-existing
function pointers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191113071836.21041-3-hch@lst.de
2019-11-15 10:36:59 +01:00
Christoph Hellwig
90dc392fc4 x86: Remove the calgary IOMMU driver
The calgary IOMMU was only used on high-end IBM systems in the early
x86_64 age and has no known users left.  Remove it to avoid having to
touch it for pending changes to the DMA API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191113071836.21041-2-hch@lst.de
2019-11-15 10:36:59 +01:00
Thomas Gleixner
ac94be498f Merge branch 'linus' into x86/hyperv
Pick up upstream fixes to avoid conflicts.
2019-11-15 10:30:50 +01:00
Lianbo Jiang
7c321eb2b8 x86/kdump: Remove the backup region handling
When the crashkernel kernel command line option is specified, the low
1M memory will always be reserved now. Therefore, it's not necessary to
create a backup region anymore and also no need to copy the contents of
the first 640k to it.

Remove all the code related to handling that backup region.

 [ bp: Massage commit message. ]

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: Dave Young <dyoung@redhat.com>
Cc: d.hatayama@fujitsu.com
Cc: dhowells@redhat.com
Cc: ebiederm@xmission.com
Cc: horms@verge.net.au
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jürgen Gross <jgross@suse.com>
Cc: kexec@lists.infradead.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: vgoyal@redhat.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191108090027.11082-3-lijiang@redhat.com
2019-11-14 18:24:43 +01:00
Lianbo Jiang
6f599d8423 x86/kdump: Always reserve the low 1M when the crashkernel option is specified
On x86, purgatory() copies the first 640K of memory to a backup region
because the kernel needs those first 640K for the real mode trampoline
during boot, among others.

However, when SME is enabled, the kernel cannot properly copy the old
memory to the backup area but reads only its encrypted contents. The
result is that the crash tool gets invalid pointers when parsing vmcore:

  crash> kmem -s|grep -i invalid
  kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
  kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
  crash>

So reserve the remaining low 1M memory when the crashkernel option is
specified (after reserving real mode memory) so that allocated memory
does not fall into the low 1M area and thus the copying of the contents
of the first 640k to a backup region in purgatory() can be avoided
altogether.

This way, it does not need to be included in crash dumps or used for
anything except the trampolines that must live in the low 1M.

 [ bp: Heavily rewrite commit message, flip check logic in
   crash_reserve_low_1M().]

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: Dave Young <dyoung@redhat.com>
Cc: d.hatayama@fujitsu.com
Cc: dhowells@redhat.com
Cc: ebiederm@xmission.com
Cc: horms@verge.net.au
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jürgen Gross <jgross@suse.com>
Cc: kexec@lists.infradead.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: vgoyal@redhat.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191108090027.11082-2-lijiang@redhat.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204793
2019-11-14 13:54:33 +01:00
Lianbo Jiang
112eee5d06 x86/crash: Add a forward declaration of struct kimage
Add a forward declaration of struct kimage to the crash.h header because
future changes will invoke a crash-specific function from the realmode
init path and the compiler will complain otherwise like this:

  In file included from arch/x86/realmode/init.c:11:
  ./arch/x86/include/asm/crash.h:5:32: warning: ‘struct kimage’ declared inside\
   parameter list will not be visible outside of this definition or declaration
      5 | int crash_load_segments(struct kimage *image);
        |                                ^~~~~~
  ./arch/x86/include/asm/crash.h:6:37: warning: ‘struct kimage’ declared inside\
   parameter list will not be visible outside of this definition or declaration
      6 | int crash_copy_backup_region(struct kimage *image);
        |                                     ^~~~~~
  ./arch/x86/include/asm/crash.h:7:39: warning: ‘struct kimage’ declared inside\
   parameter list will not be visible outside of this definition or declaration
      7 | int crash_setup_memmap_entries(struct kimage *image,
        |

 [ bp: Rewrite the commit message. ]

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: d.hatayama@fujitsu.com
Cc: dhowells@redhat.com
Cc: dyoung@redhat.com
Cc: ebiederm@xmission.com
Cc: horms@verge.net.au
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jürgen Gross <jgross@suse.com>
Cc: kexec@lists.infradead.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: vgoyal@redhat.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191108090027.11082-4-lijiang@redhat.com
Link: https://lkml.kernel.org/r/201910310233.EJRtTMWP%25lkp@intel.com
2019-11-14 13:46:31 +01:00
Jan Beulich
4e3f77d841 xen/mcelog: add PPIN to record when available
This is to augment commit 3f5a7896a5 ("x86/mce: Include the PPIN in MCE
records when available").

I'm also adding "synd" and "ipid" fields to struct xen_mce, in an
attempt to keep field offsets in sync with struct mce. These two fields
won't get populated for now, though.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2019-11-14 10:01:57 +01:00
Josh Poimboeuf
77ac117b3a ftrace/x86: Tell objtool to ignore nondeterministic ftrace stack layout
Objtool complains about the new ftrace direct trampoline code:

  arch/x86/kernel/ftrace_64.o: warning: objtool: ftrace_regs_caller()+0x190: stack state mismatch: cfa1=7+16 cfa2=7+24

Typically, code has a deterministic stack layout, such that at a given
instruction address, the stack frame size is always the same.

That's not the case for the new ftrace_regs_caller() code after it
adjusts the stack for the direct case.  Just plead ignorance and assume
it's always the non-direct path.  Note this creates a tiny window for
ORC to get confused.

Link: http://lkml.kernel.org/r/20191108225100.ea3bhsbdf6oerj6g@treble

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-13 09:36:50 -05:00
Steven Rostedt (VMware)
562955fe6a ftrace/x86: Add register_ftrace_direct() for custom trampolines
Enable x86 to allow for register_ftrace_direct(), where a custom trampoline
may be called directly from an ftrace mcount/fentry location.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-13 09:36:49 -05:00
Joerg Roedel
9b3a713fee Merge branches 'iommu/fixes', 'arm/qcom', 'arm/renesas', 'arm/rockchip', 'arm/mediatek', 'arm/tegra', 'arm/smmu', 'x86/amd', 'x86/vt-d', 'virtio' and 'core' into next 2019-11-12 17:11:25 +01:00
Daniel Kiper
b3c72fc9a7 x86/boot: Introduce setup_indirect
The setup_data is a bit awkward to use for extremely large data objects,
both because the setup_data header has to be adjacent to the data object
and because it has a 32-bit length field. However, it is important that
intermediate stages of the boot process have a way to identify which
chunks of memory are occupied by kernel data. Thus introduce an uniform
way to specify such indirect data as setup_indirect struct and
SETUP_INDIRECT type.

And finally bump setup_header version in arch/x86/boot/header.S.

Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: ard.biesheuvel@linaro.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: dave.hansen@linux.intel.com
Cc: eric.snowberg@oracle.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: kanth.ghatraju@oracle.com
Cc: linux-doc@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rdunlap@infradead.org
Cc: ross.philipson@oracle.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191112134640.16035-4-daniel.kiper@oracle.com
2019-11-12 16:21:15 +01:00
Daniel Kiper
00cd1c154d x86/boot: Introduce kernel_info.setup_type_max
This field contains maximal allowed type for setup_data.

Do not bump setup_header version in arch/x86/boot/header.S because it
will be followed by additional changes coming into the Linux/x86 boot
protocol.

Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: ard.biesheuvel@linaro.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: dave.hansen@linux.intel.com
Cc: eric.snowberg@oracle.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: kanth.ghatraju@oracle.com
Cc: linux-doc@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rdunlap@infradead.org
Cc: ross.philipson@oracle.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191112134640.16035-3-daniel.kiper@oracle.com
2019-11-12 16:16:54 +01:00
Daniel Kiper
2c33c27fd6 x86/boot: Introduce kernel_info
The relationships between the headers are analogous to the various data
sections:

  setup_header = .data
  boot_params/setup_data = .bss

What is missing from the above list? That's right:

  kernel_info = .rodata

We have been (ab)using .data for things that could go into .rodata or .bss for
a long time, for lack of alternatives and -- especially early on -- inertia.
Also, the BIOS stub is responsible for creating boot_params, so it isn't
available to a BIOS-based loader (setup_data is, though).

setup_header is permanently limited to 144 bytes due to the reach of the
2-byte jump field, which doubles as a length field for the structure, combined
with the size of the "hole" in struct boot_params that a protected-mode loader
or the BIOS stub has to copy it into. It is currently 119 bytes long, which
leaves us with 25 very precious bytes. This isn't something that can be fixed
without revising the boot protocol entirely, breaking backwards compatibility.

boot_params proper is limited to 4096 bytes, but can be arbitrarily extended
by adding setup_data entries. It cannot be used to communicate properties of
the kernel image, because it is .bss and has no image-provided content.

kernel_info solves this by providing an extensible place for information about
the kernel image. It is readonly, because the kernel cannot rely on a
bootloader copying its contents anywhere, but that is OK; if it becomes
necessary it can still contain data items that an enabled bootloader would be
expected to copy into a setup_data chunk.

Do not bump setup_header version in arch/x86/boot/header.S because it
will be followed by additional changes coming into the Linux/x86 boot
protocol.

Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: ard.biesheuvel@linaro.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: dave.hansen@linux.intel.com
Cc: eric.snowberg@oracle.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: kanth.ghatraju@oracle.com
Cc: linux-doc@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rdunlap@infradead.org
Cc: ross.philipson@oracle.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191112134640.16035-2-daniel.kiper@oracle.com
2019-11-12 16:10:34 +01:00
Andrea Parri
dce7cd6275 x86/hyperv: Allow guests to enable InvariantTSC
If the hardware supports TSC scaling, Hyper-V will set bit 15 of the
HV_PARTITION_PRIVILEGE_MASK in guest VMs with a compatible Hyper-V
configuration version.  Bit 15 corresponds to the
AccessTscInvariantControls privilege.  If this privilege bit is set,
guests can access the HvSyntheticInvariantTscControl MSR: guests can
set bit 0 of this synthetic MSR to enable the InvariantTSC feature.
After setting the synthetic MSR, CPUID will enumerate support for
InvariantTSC.

Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lkml.kernel.org/r/20191003155200.22022-1-parri.andrea@gmail.com
2019-11-12 11:44:21 +01:00
Vitaly Kuznetsov
b264f57fde x86/hyperv: Micro-optimize send_ipi_one()
When sending an IPI to a single CPU there is no need to deal with cpumasks.

With 2 CPU guest on WS2019 a minor (like 3%, 8043 -> 7761 CPU cycles)
improvement with smp_call_function_single() loop benchmark can be seeb. The
optimization, however, is tiny and straitforward. Also, send_ipi_one() is
important for PV spinlock kick.

Switching to the regular APIC IPI send for CPU > 64 case does not make
sense as it is twice as expesive (12650 CPU cycles for __send_ipi_mask_ex()
call, 26000 for orig_apic.send_IPI(cpu, vector)).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Link: https://lkml.kernel.org/r/20191027151938.7296-1-vkuznets@redhat.com
2019-11-12 11:44:20 +01:00
Christoph Hellwig
d092a87073 arch: rely on asm-generic/io.h for default ioremap_* definitions
Various architectures that use asm-generic/io.h still defined their
own default versions of ioremap_nocache, ioremap_wt and ioremap_wc
that point back to plain ioremap directly or indirectly.  Remove these
definitions and rely on asm-generic/io.h instead.  For this to work
the backup ioremap_* defintions needs to be changed to purely cpp
macros instea of inlines to cover for architectures like openrisc
that only define ioremap after including <asm-generic/io.h>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
2019-11-11 21:18:19 +01:00
Christoph Hellwig
c0d94aa54b x86: Clean up ioremap()
Use ioremap() as the main implemented function, and defines
ioremap_nocache() as a deprecated alias of ioremap() in
preparation of removing ioremap_nocache() entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-11 17:19:49 +01:00
Yian Chen
f036c7fa0a iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved
VT-d RMRR (Reserved Memory Region Reporting) regions are reserved
for device use only and should not be part of allocable memory pool of OS.

BIOS e820_table reports complete memory map to OS, including OS usable
memory ranges and BIOS reserved memory ranges etc.

x86 BIOS may not be trusted to include RMRR regions as reserved type
of memory in its e820 memory map, hence validate every RMRR entry
with the e820 memory map to make sure the RMRR regions will not be
used by OS for any other purposes.

ia64 EFI is working fine so implement RMRR validation as a dummy function

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-11-11 16:06:07 +01:00
Nicolas Saenz Julienne
e380a0394c x86/PCI: sta2x11: use default DMA address translation
The devices found behind this PCIe chip have unusual DMA mapping
constraints as there is an AMBA interconnect placed in between them and
the different PCI endpoints. The offset between physical memory
addresses and AMBA's view is provided by reading a PCI config register,
which is saved and used whenever DMA mapping is needed.

It turns out that this DMA setup can be represented by properly setting
'dma_pfn_offset', 'dma_bus_mask' and 'dma_mask' during the PCI device
enable fixup. And ultimately allows us to get rid of this device's
custom DMA functions.

Aside from the code deletion and DMA setup, sta2x11_pdev_to_mapping() is
moved to avoid warnings whenever CONFIG_PM is not enabled.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-11-11 10:52:19 +01:00
Dan Williams
199c847176 x86/efi: Add efi_fake_mem support for EFI_MEMORY_SP
Given that EFI_MEMORY_SP is platform BIOS policy decision for marking
memory ranges as "reserved for a specific purpose" there will inevitably
be scenarios where the BIOS omits the attribute in situations where it
is desired. Unlike other attributes if the OS wants to reserve this
memory from the kernel the reservation needs to happen early in init. So
early, in fact, that it needs to happen before e820__memblock_setup()
which is a pre-requisite for efi_fake_memmap() that wants to allocate
memory for the updated table.

Introduce an x86 specific efi_fake_memmap_early() that can search for
attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table
accordingly.

The KASLR code that scans the command line looking for user-directed
memory reservations also needs to be updated to consider
"efi_fake_mem=nn@ss:0x40000" requests.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07 15:44:23 +01:00
Dan Williams
262b45ae3a x86/efi: EFI soft reservation to E820 enumeration
UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
interpretation of the EFI Memory Types as "reserved for a specific
purpose".

The proposed Linux behavior for specific purpose memory is that it is
reserved for direct-access (device-dax) by default and not available for
any kernel usage, not even as an OOM fallback.  Later, through udev
scripts or another init mechanism, these device-dax claimed ranges can
be reconfigured and hot-added to the available System-RAM with a unique
node identifier. This device-dax management scheme implements "soft" in
the "soft reserved" designation by allowing some or all of the
reservation to be recovered as typical memory. This policy can be
disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with
efi=nosoftreserve.

This patch introduces 2 new concepts at once given the entanglement
between early boot enumeration relative to memory that can optionally be
reserved from the kernel page allocator by default. The new concepts
are:

- E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP
  attribute on EFI_CONVENTIONAL memory, update the E820 map with this
  new type. Only perform this classification if the
  CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as
  typical ram.

- IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for
  a device driver to search iomem resources for application specific
  memory. Teach the iomem code to identify such ranges as "Soft Reserved".

Note that the comment for do_add_efi_memmap() needed refreshing since it
seemed to imply that the efi map might overflow the e820 table, but that
is not an issue as of commit 7b6e4ba3cb "x86/boot/e820: Clean up the
E820_X_MAX definition" that removed the 128 entry limit for
e820__range_add().

A follow-on change integrates parsing of the ACPI HMAT to identify the
node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
now, just identify and reserve memory of this type.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07 15:44:14 +01:00
Dan Williams
6950e31b35 x86/efi: Push EFI_MEMMAP check into leaf routines
In preparation for adding another EFI_MEMMAP dependent call that needs
to occur before e820__memblock_setup() fixup the existing efi calls to
check for EFI_MEMMAP internally. This ends up being cleaner than the
alternative of checking EFI_MEMMAP multiple times in setup_arch().

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-07 15:44:04 +01:00
Babu Moger
b971880fe7 x86/Kconfig: Rename UMIP config parameter
AMD 2nd generation EPYC processors support the UMIP (User-Mode
Instruction Prevention) feature. So, rename X86_INTEL_UMIP to
generic X86_UMIP and modify the text to cover both Intel and AMD.

 [ bp: take of the disabled-features.h copy in tools/ too. ]

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/157298912544.17462.2018334793891409521.stgit@naples-babu.amd.com
2019-11-07 11:07:29 +01:00
Daniel Axtens
81d2c6f819 kasan: support instrumented bitops combined with generic bitops
Currently bitops-instrumented.h assumes that the architecture provides
atomic, non-atomic and locking bitops (e.g. both set_bit and __set_bit).
This is true on x86 and s390, but is not always true: there is a
generic bitops/non-atomic.h header that provides generic non-atomic
operations, and also a generic bitops/lock.h for locking operations.

powerpc uses the generic non-atomic version, so it does not have it's
own e.g. __set_bit that could be renamed arch___set_bit.

Split up bitops-instrumented.h to mirror the atomic/non-atomic/lock
split. This allows arches to only include the headers where they
have arch-specific versions to rename. Update x86 and s390.

(The generic operations are automatically instrumented because they're
written in C, not asm.)

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190820024941.12640-1-dja@axtens.net
2019-11-07 13:15:39 +11:00
Junaid Shahid
1aa9b9572b kvm: x86: mmu: Recovery of shattered NX large pages
The page table pages corresponding to broken down large pages are zapped in
FIFO order, so that the large page can potentially be recovered, if it is
not longer being used for execution.  This removes the performance penalty
for walking deeper EPT page tables.

By default, one large page will last about one hour once the guest
reaches a steady state.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-04 20:26:00 +01:00
Kees Cook
5494c3a6a0 x86/mm: Report which part of kernel image is freed
The memory freeing report wasn't very useful for figuring out which
parts of the kernel image were being freed. Add the details for clearer
reporting in dmesg.

Before:

  Freeing unused kernel image memory: 1348K
  Write protecting the kernel read-only data: 20480k
  Freeing unused kernel image memory: 2040K
  Freeing unused kernel image memory: 172K

After:

  Freeing unused kernel image (initmem) memory: 1348K
  Write protecting the kernel read-only data: 20480k
  Freeing unused kernel image (text/rodata gap) memory: 2040K
  Freeing unused kernel image (rodata/data gap) memory: 172K

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-28-keescook@chromium.org
2019-11-04 18:50:33 +01:00
Kees Cook
b907693883 x86/vmlinux: Actually use _etext for the end of the text segment
Various calculations are using the end of the exception table (which
does not need to be executable) as the end of the text segment. Instead,
in preparation for moving the exception table into RO_DATA, move _etext
after the exception table and update the calculations.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-16-keescook@chromium.org
2019-11-04 17:54:16 +01:00
Paolo Bonzini
b8e8c8303f kvm: mmu: ITLB_MULTIHIT mitigation
With some Intel processors, putting the same virtual address in the TLB
as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
and cause the processor to issue a machine check resulting in a CPU lockup.

Unfortunately when EPT page tables use huge pages, it is possible for a
malicious guest to cause this situation.

Add a knob to mark huge pages as non-executable. When the nx_huge_pages
parameter is enabled (and we are using EPT), all huge pages are marked as
NX. If the guest attempts to execute in one of those pages, the page is
broken down into 4K pages, which are then marked executable.

This is not an issue for shadow paging (except nested EPT), because then
the host is in control of TLB flushes and the problematic situation cannot
happen.  With nested EPT, again the nested guest can cause problems shadow
and direct EPT is treated in the same way.

[ tglx: Fixup default to auto and massage wording a bit ]

Originally-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-04 12:22:02 +01:00
Vineela Tummalapalli
db4d30fbb7 x86/bugs: Add ITLB_MULTIHIT bug infrastructure
Some processors may incur a machine check error possibly resulting in an
unrecoverable CPU lockup when an instruction fetch encounters a TLB
multi-hit in the instruction TLB. This can occur when the page size is
changed along with either the physical address or cache type. The relevant
erratum can be found here:

   https://bugzilla.kernel.org/show_bug.cgi?id=205195

There are other processors affected for which the erratum does not fully
disclose the impact.

This issue affects both bare-metal x86 page tables and EPT.

It can be mitigated by either eliminating the use of large pages or by
using careful TLB invalidations when changing the page size in the page
tables.

Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in
MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which
are mitigated against this issue.

Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-04 12:22:01 +01:00
Pawan Gupta
1b42f01741 x86/speculation/taa: Add mitigation for TSX Async Abort
TSX Async Abort (TAA) is a side channel vulnerability to the internal
buffers in some Intel processors similar to Microachitectural Data
Sampling (MDS). In this case, certain loads may speculatively pass
invalid data to dependent operations when an asynchronous abort
condition is pending in a TSX transaction.

This includes loads with no fault or assist condition. Such loads may
speculatively expose stale data from the uarch data structures as in
MDS. Scope of exposure is within the same-thread and cross-thread. This
issue affects all current processors that support TSX, but do not have
ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES.

On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0,
CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers
using VERW or L1D_FLUSH, there is no additional mitigation needed for
TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by
disabling the Transactional Synchronization Extensions (TSX) feature.

A new MSR IA32_TSX_CTRL in future and current processors after a
microcode update can be used to control the TSX feature. There are two
bits in that MSR:

* TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted
Transactional Memory (RTM).

* TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other
TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
disabled with updated microcode but still enumerated as present by
CPUID(EAX=7).EBX{bit4}.

The second mitigation approach is similar to MDS which is clearing the
affected CPU buffers on return to user space and when entering a guest.
Relevant microcode update is required for the mitigation to work.  More
details on this approach can be found here:

  https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html

The TSX feature can be controlled by the "tsx" command line parameter.
If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is
deployed. The effective mitigation state can be read from sysfs.

 [ bp:
   - massage + comments cleanup
   - s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh.
   - remove partial TAA mitigation in update_mds_branch_idle() - Josh.
   - s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g
 ]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 08:36:58 +01:00
Pawan Gupta
c2955f270a x86/msr: Add the IA32_TSX_CTRL MSR
Transactional Synchronization Extensions (TSX) may be used on certain
processors as part of a speculative side channel attack.  A microcode
update for existing processors that are vulnerable to this attack will
add a new MSR - IA32_TSX_CTRL to allow the system administrator the
option to disable TSX as one of the possible mitigations.

The CPUs which get this new MSR after a microcode upgrade are the ones
which do not set MSR_IA32_ARCH_CAPABILITIES.MDS_NO (bit 5) because those
CPUs have CPUID.MD_CLEAR, i.e., the VERW implementation which clears all
CPU buffers takes care of the TAA case as well.

  [ Note that future processors that are not vulnerable will also
    support the IA32_TSX_CTRL MSR. ]

Add defines for the new IA32_TSX_CTRL MSR and its bits.

TSX has two sub-features:

1. Restricted Transactional Memory (RTM) is an explicitly-used feature
   where new instructions begin and end TSX transactions.
2. Hardware Lock Elision (HLE) is implicitly used when certain kinds of
   "old" style locks are used by software.

Bit 7 of the IA32_ARCH_CAPABILITIES indicates the presence of the
IA32_TSX_CTRL MSR.

There are two control bits in IA32_TSX_CTRL MSR:

  Bit 0: When set, it disables the Restricted Transactional Memory (RTM)
         sub-feature of TSX (will force all transactions to abort on the
	 XBEGIN instruction).

  Bit 1: When set, it disables the enumeration of the RTM and HLE feature
         (i.e. it will make CPUID(EAX=7).EBX{bit4} and
	  CPUID(EAX=7).EBX{bit11} read as 0).

The other TSX sub-feature, Hardware Lock Elision (HLE), is
unconditionally disabled by the new microcode but still enumerated
as present by CPUID(EAX=7).EBX{bit4}, unless disabled by
IA32_TSX_CTRL_MSR[1] - TSX_CTRL_CPUID_CLEAR.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 08:36:58 +01:00
Linus Torvalds
153a971ff5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Two fixes for the VMWare guest support:

   - Unbreak VMWare platform detection which got wreckaged by converting
     an integer constant to a string constant.

   - Fix the clang build of the VMWAre hypercall by explicitely
     specifying the ouput register for INL instead of using the short
     form"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/vmware: Fix platform detection VMWARE_PORT macro
  x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
2019-10-27 07:14:40 -04:00
Jim Mattson
671ddc700f KVM: nVMX: Don't leak L1 MMIO regions to L2
If the "virtualize APIC accesses" VM-execution control is set in the
VMCS, the APIC virtualization hardware is triggered when a page walk
in VMX non-root mode terminates at a PTE wherein the address of the 4k
page frame matches the APIC-access address specified in the VMCS. On
hardware, the APIC-access address may be any valid 4k-aligned physical
address.

KVM's nVMX implementation enforces the additional constraint that the
APIC-access address specified in the vmcs12 must be backed by
a "struct page" in L1. If not, L0 will simply clear the "virtualize
APIC accesses" VM-execution control in the vmcs02.

The problem with this approach is that the L1 guest has arranged the
vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
VM-execution control is clear in the vmcs12--so that the L2 guest
physical address(es)--or L2 guest linear address(es)--that reference
the L2 APIC map to the APIC-access address specified in the
vmcs12. Without the "virtualize APIC accesses" VM-execution control in
the vmcs02, the APIC accesses in the L2 guest will directly access the
APIC-access page in L1.

When there is no mapping whatsoever for the APIC-access address in L1,
the L2 VM just loses the intended APIC virtualization. However, when
the APIC-access address is mapped to an MMIO region in L1, the L2
guest gets direct access to the L1 MMIO device. For example, if the
APIC-access address specified in the vmcs12 is 0xfee00000, then L2
gets direct access to L1's APIC.

Since this vmcs12 configuration is something that KVM cannot
faithfully emulate, the appropriate response is to exit to userspace
with KVM_INTERNAL_ERROR_EMULATION.

Fixes: fe3ef05c75 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
Reported-by: Dan Cross <dcross@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 19:04:40 +02:00
Aaron Lewis
7204160eb7 KVM: x86: Introduce vcpu->arch.xsaves_enabled
Cache whether XSAVES is enabled in the guest by adding xsaves_enabled to
vcpu->arch.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Change-Id: If4638e0901c28a4494dad2e103e2c075e8ab5d68
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 15:42:48 +02:00
Lianbo Jiang
e095cb7a0f x86/kdump: Remove the unused crash_copy_backup_region()
The crash_copy_backup_region() function is unused so remove it.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: dhowells@redhat.com
Cc: dyoung@redhat.com
Cc: ebiederm@xmission.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jürgen Gross <jgross@suse.com>
Cc: kexec@lists.infradead.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: vgoyal@redhat.com
Cc: x86-ml <x86@kernel.org>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Link: https://lkml.kernel.org/r/20191017094347.20327-3-lijiang@redhat.com
2019-10-22 13:59:49 +02:00
Like Xu
4be946728f KVM: x86/vPMU: Declare kvm_pmu->reprogram_pmi field using DECLARE_BITMAP
Replace the explicit declaration of "u64 reprogram_pmi" with the generic
macro DECLARE_BITMAP for all possible appropriate number of bits.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 13:39:32 +02:00
Suthikulpanit, Suravee
2cf9af0b56 kvm: x86: Modify kvm_x86_ops.get_enable_apicv() to use struct kvm parameter
Generally, APICv for all vcpus in the VM are enable/disable in the same
manner. So, get_enable_apicv() should represent APICv status of the VM
instead of each VCPU.

Modify kvm_x86_ops.get_enable_apicv() to take struct kvm as parameter
instead of struct kvm_vcpu.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 13:34:17 +02:00
Sean Christopherson
34059c2570 KVM: x86: Fold decache_cr3() into cache_reg()
Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common
cache_reg() callback and drop the dedicated decache_cr3().  The name
decache_cr3() is somewhat confusing as the caching behavior of CR3
follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and
has nothing in common with the caching behavior of CR0/CR4 (whose
decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage).

This would effectivel adds a BUG() if KVM attempts to cache CR3 on SVM.
Change it to a WARN_ON_ONCE() -- if the cache never requires filling,
the value is already in the right place -- and opportunistically add one
in VMX to provide an equivalent check.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 13:34:16 +02:00
Sean Christopherson
f8845541e9 KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'
Now that indexing into arch.regs is either protected by WARN_ON_ONCE or
done with hardcoded enums, combine all definitions for registers that
are tracked by regs_avail and regs_dirty into 'enum kvm_reg'.  Having a
single enum type will simplify additional cleanup related to regs_avail
and regs_dirty.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22 13:34:14 +02:00
Thomas Hellstrom
6fee2a0be0 x86/cpu/vmware: Fix platform detection VMWARE_PORT macro
The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT
definition, but expects it to be an integer. However, when it was moved
to the new vmware.h include file, it was changed to be a string to better
fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the
platform detection VMWARE_PORT functionality.

Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB
definitions to be integers, and use __stringify() for their stringified
form when needed.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: b4dd4f6e36 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/20191021172403.3085-3-thomas_os@shipmail.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-22 00:51:44 +02:00
Thomas Hellstrom
db633a4e0e x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
LLVM's assembler doesn't accept the short form INL instruction:

  inl (%%dx)

but instead insists on the output register to be explicitly specified.

This was previously fixed for the VMWARE_PORT macro. Fix it also for
the VMWARE_HYPERCALL macro.

Suggested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: clang-built-linux@googlegroups.com
Fixes: b4dd4f6e36 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/20191021172403.3085-2-thomas_os@shipmail.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-22 00:51:44 +02:00
Jia He
f2c4e5970c x86/mm: implement arch_faults_on_old_pte() stub on x86
arch_faults_on_old_pte is a helper to indicate that it might cause page
fault when accessing old pte. But on x86, there is feature to setting
pte access flag by hardware. Hence implement an overriding stub which
always returns false.

Signed-off-by: Jia He <justin.he@arm.com>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-18 11:11:25 +01:00
Jiri Slaby
b4edca1501 x86/asm: Remove the last GLOBAL user and remove the macro
Convert the remaining 32bit users and remove the GLOBAL macro finally.
In particular, this means to use SYM_ENTRY for the singlestepping hack
region.

Exclude the global definition of GLOBAL from x86 too.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191011115108.12392-20-jslaby@suse.cz
2019-10-18 11:29:50 +02:00
Jiri Slaby
ffedeeb780 linkage: Introduce new macros for assembler symbols
Introduce new C macros for annotations of functions and data in
assembly. There is a long-standing mess in macros like ENTRY, END,
ENDPROC and similar. They are used in different manners and sometimes
incorrectly.

So introduce macros with clear use to annotate assembly as follows:

a) Support macros for the ones below
   SYM_T_FUNC -- type used by assembler to mark functions
   SYM_T_OBJECT -- type used by assembler to mark data
   SYM_T_NONE -- type used by assembler to mark entries of unknown type

   They are defined as STT_FUNC, STT_OBJECT, and STT_NOTYPE
   respectively. According to the gas manual, this is the most portable
   way. I am not sure about other assemblers, so this can be switched
   back to %function and %object if this turns into a problem.
   Architectures can also override them by something like ", @function"
   if they need.

   SYM_A_ALIGN, SYM_A_NONE -- align the symbol?
   SYM_L_GLOBAL, SYM_L_WEAK, SYM_L_LOCAL -- linkage of symbols

b) Mostly internal annotations, used by the ones below
   SYM_ENTRY -- use only if you have to (for non-paired symbols)
   SYM_START -- use only if you have to (for paired symbols)
   SYM_END -- use only if you have to (for paired symbols)

c) Annotations for code
   SYM_INNER_LABEL_ALIGN -- only for labels in the middle of code
   SYM_INNER_LABEL -- only for labels in the middle of code

   SYM_FUNC_START_LOCAL_ALIAS -- use where there are two local names for
	one function
   SYM_FUNC_START_ALIAS -- use where there are two global names for one
	function
   SYM_FUNC_END_ALIAS -- the end of LOCAL_ALIASed or ALIASed function

   SYM_FUNC_START -- use for global functions
   SYM_FUNC_START_NOALIGN -- use for global functions, w/o alignment
   SYM_FUNC_START_LOCAL -- use for local functions
   SYM_FUNC_START_LOCAL_NOALIGN -- use for local functions, w/o
	alignment
   SYM_FUNC_START_WEAK -- use for weak functions
   SYM_FUNC_START_WEAK_NOALIGN -- use for weak functions, w/o alignment
   SYM_FUNC_END -- the end of SYM_FUNC_START_LOCAL, SYM_FUNC_START,
	SYM_FUNC_START_WEAK, ...

   For functions with special (non-C) calling conventions:
   SYM_CODE_START -- use for non-C (special) functions
   SYM_CODE_START_NOALIGN -- use for non-C (special) functions, w/o
	alignment
   SYM_CODE_START_LOCAL -- use for local non-C (special) functions
   SYM_CODE_START_LOCAL_NOALIGN -- use for local non-C (special)
	functions, w/o alignment
   SYM_CODE_END -- the end of SYM_CODE_START_LOCAL or SYM_CODE_START

d) For data
   SYM_DATA_START -- global data symbol
   SYM_DATA_START_LOCAL -- local data symbol
   SYM_DATA_END -- the end of the SYM_DATA_START symbol
   SYM_DATA_END_LABEL -- the labeled end of SYM_DATA_START symbol
   SYM_DATA -- start+end wrapper around simple global data
   SYM_DATA_LOCAL -- start+end wrapper around simple local data

==========

The macros allow to pair starts and ends of functions and mark functions
correctly in the output ELF objects.

All users of the old macros in x86 are converted to use these in further
patches.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191011115108.12392-2-jslaby@suse.cz
2019-10-18 09:48:11 +02:00
Masami Hiramatsu
4d65adfcd1 x86: xen: insn: Decode Xen and KVM emulate-prefix signature
Decode Xen and KVM's emulate-prefix signature by x86 insn decoder.
It is called "prefix" but actually not x86 instruction prefix, so
this adds insn.emulate_prefix_size field instead of reusing
insn.prefixes.

If x86 decoder finds a special sequence of instructions of
XEN_EMULATE_PREFIX and 'ud2a; .ascii "kvm"', it just counts the
length, set insn.emulate_prefix_size and fold it with the next
instruction. In other words, the signature and the next instruction
is treated as a single instruction.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: https://lkml.kernel.org/r/156777564986.25081.4964537658500952557.stgit@devnote2
2019-10-17 21:31:57 +02:00
Masami Hiramatsu
b3dc0695fa x86: xen: kvm: Gather the definition of emulate prefixes
Gather the emulate prefixes, which forcibly make the following
instruction emulated on virtualization, in one place.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/156777563917.25081.7286628561790289995.stgit@devnote2
2019-10-17 21:31:57 +02:00
Masami Hiramatsu
f7919fd943 x86/asm: Allow to pass macros to __ASM_FORM()
Use __stringify() at __ASM_FORM() so that user can pass
code including macros to __ASM_FORM().

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/156777562873.25081.2288083344657460959.stgit@devnote2
2019-10-17 21:31:57 +02:00
Linus Torvalds
fcb45a2848 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "A handful of fixes: a kexec linking fix, an AMD MWAITX fix, a vmware
  guest support fix when built under Clang, and new CPU model number
  definitions"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Comet Lake to the Intel CPU models header
  lib/string: Make memzero_explicit() inline instead of external
  x86/cpu/vmware: Use the full form of INL in VMWARE_PORT
  x86/asm: Fix MWAITX C-state hint value
2019-10-12 14:46:14 -07:00
Linus Torvalds
e9ec3588a9 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 license tag fixlets from Ingo Molnar:
 "Fix a couple of SPDX tags in x86 headers to follow the canonical
  pattern"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Use the correct SPDX License Identifier in headers
2019-10-12 14:37:55 -07:00
Sami Tolvanen
f53e2cd0b8 x86/mm: Use the correct function type for native_set_fixmap()
We call native_set_fixmap indirectly through the function pointer
struct pv_mmu_ops::set_fixmap, which expects the first parameter to be
'unsigned' instead of 'enum fixed_addresses'. This patch changes the
function type for native_set_fixmap to match the pointer, which fixes
indirect call mismatches with Control-Flow Integrity (CFI) checking.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190913211402.193018-1-samitolvanen@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-11 12:52:32 +02:00
Sami Tolvanen
6e4847640c syscalls/x86: Fix function types in COND_SYSCALL
Define a weak function in COND_SYSCALL instead of a weak alias to
sys_ni_syscall(), which has an incompatible type. This fixes indirect
call mismatches with Control-Flow Integrity (CFI) checking.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191008224049.115427-6-samitolvanen@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-11 12:49:19 +02:00
Andy Lutomirski
cf3b83e19d syscalls/x86: Wire up COMPAT_SYSCALL_DEFINE0
x86 has special handling for COMPAT_SYSCALL_DEFINEx, but there was
no override for COMPAT_SYSCALL_DEFINE0.  Wire it up so that we can
use it for rt_sigreturn.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191008224049.115427-3-samitolvanen@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-11 12:49:18 +02:00
Sami Tolvanen
8661d769ab syscalls/x86: Use the correct function type in SYSCALL_DEFINE0
Although a syscall defined using SYSCALL_DEFINE0 doesn't accept
parameters, use the correct function type to avoid type mismatches
with Control-Flow Integrity (CFI) checking.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191008224049.115427-2-samitolvanen@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-11 12:49:18 +02:00
Ralf Ramsauer
7a56b81c47 x86/jailhouse: Only enable platform UARTs if available
ACPI tables aren't available if Linux runs as guest of the hypervisor
Jailhouse. This makes the 8250 driver probe for all platform UARTs as it
assumes that all UARTs are present in case of !ACPI. Jailhouse will stop
execution of Linux guest due to port access violation.

So far, these access violations were solved by tuning the 8250.nr_uarts
cmdline parameter, but this has limitations: Only consecutive platform
UARTs can be mapped to Linux, and only in the sequence 0x3f8, 0x2f8,
0x3e8, 0x2e8.

Beginning from setup_data version 2, Jailhouse will place information of
available platform UARTs in setup_data. This allows for selective
activation of platform UARTs.

Query setup_data version and only activate available UARTS. This
patch comes with backward compatibility, and will still support older
setup_data versions. In case of older setup_data versions, Linux falls
back to the old behaviour.

Signed-off-by: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: jailhouse-dev@googlegroups.com
Cc: Juergen Gross <jgross@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191010102102.421035-3-ralf.ramsauer@oth-regensburg.de
2019-10-10 15:43:59 +02:00
Ralf Ramsauer
0935e5f752 x86/jailhouse: Improve setup data version comparison
Soon, setup_data will contain information on passed-through platform
UARTs. This requires some preparational work for the sanity check of the
header and the check of the version.

Use the following strategy:

  1. Ensure that the header declares at least enough space for the
     version and the compatible_version as it must hold that fields for
     any version. The location and semantics of header+version fields
     will never change.

  2. Copy over data -- as much as as possible. The length is either
     limited by the header length or the length of setup_data.

  3. Things are now in place -- sanity check if the header length
     complies the actual version.

For future versions of the setup_data, only step 3 requires alignment.

Signed-off-by: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: jailhouse-dev@googlegroups.com
Cc: Juergen Gross <jgross@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191010102102.421035-2-ralf.ramsauer@oth-regensburg.de
2019-10-10 15:38:30 +02:00
Kan Liang
8d7c6ac3b2 x86/cpu: Add Comet Lake to the Intel CPU models header
Comet Lake is the new 10th Gen Intel processor. Add two new CPU model
numbers to the Intel family list.

The CPU model numbers are not published in the SDM yet but they come
from an authoritative internal source.

 [ bp: Touch up commit message. ]

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: ak@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1570549810-25049-2-git-send-email-kan.liang@linux.intel.com
2019-10-08 19:01:31 +02:00
Janakarajan Natarajan
454de1e7d9 x86/asm: Fix MWAITX C-state hint value
As per "AMD64 Architecture Programmer's Manual Volume 3: General-Purpose
and System Instructions", MWAITX EAX[7:4]+1 specifies the optional hint
of the optimized C-state. For C0 state, EAX[7:4] should be set to 0xf.

Currently, a value of 0xf is set for EAX[3:0] instead of EAX[7:4]. Fix
this by changing MWAITX_DISABLE_CSTATES from 0xf to 0xf0.

This hasn't had any implications so far because setting reserved bits in
EAX is simply ignored by the CPU.

 [ bp: Fixup comment in delay_mwaitx() and massage. ]

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191007190011.4859-1-Janakarajan.Natarajan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-08 13:25:24 +02:00
Babu Moger
9d40b85bb4 x86/cpufeatures: Add feature bit RDPRU on AMD
AMD Zen 2 introduces a new RDPRU instruction which is used to give
access to some processor registers that are typically only accessible
when the privilege level is zero.

ECX is used as the implicit register to specify which register to read.
RDPRU places the specified register’s value into EDX:EAX.

For example, the RDPRU instruction can be used to read MPERF and APERF
at CPL > 0.

Add the feature bit so it is visible in /proc/cpuinfo.

Details are available in the AMD64 Architecture Programmer’s Manual:
https://www.amd.com/system/files/TechDocs/24594.pdf

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: ak@linux.intel.com
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: robert.hu@linux.intel.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191007204839.5727.10803.stgit@localhost.localdomain
2019-10-08 09:28:37 +02:00
Linus Torvalds
c512c69187 uaccess: implement a proper unsafe_copy_to_user() and switch filldir over to it
In commit 9f79b78ef7 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I made filldir() use unsafe_put_user(), which
improves code generation on x86 enormously.

But because we didn't have a "unsafe_copy_to_user()", the dirent name
copy was also done by hand with unsafe_put_user() in a loop, and it
turns out that a lot of other architectures didn't like that, because
unlike x86, they have various alignment issues.

Most non-x86 architectures trap and fix it up, and some (like xtensa)
will just fail unaligned put_user() accesses unconditionally.  Which
makes that "copy using put_user() in a loop" not work for them at all.

I could make that code do explicit alignment etc, but the architectures
that don't like unaligned accesses also don't really use the fancy
"user_access_begin/end()" model, so they might just use the regular old
__copy_to_user() interface.

So this commit takes that looping implementation, turns it into the x86
version of "unsafe_copy_to_user()", and makes other architectures
implement the unsafe copy version as __copy_to_user() (the same way they
do for the other unsafe_xyz() accessor functions).

Note that it only does this for the copying _to_ user space, and we
still don't have a unsafe version of copy_from_user().

That's partly because we have no current users of it, but also partly
because the copy_from_user() case is slightly different and cannot
efficiently be implemented in terms of a unsafe_get_user() loop (because
gcc can't do asm goto with outputs).

It would be trivial to do this using "rep movsb", which would work
really nicely on newer x86 cores, but really badly on some older ones.

Al Viro is looking at cleaning up all our user copy routines to make
this all a non-issue, but for now we have this simple-but-stupid version
for x86 that works fine for the dirent name copy case because those
names are short strings and we simply don't need anything fancier.

Fixes: 9f79b78ef7 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-and-tested-by: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07 12:56:48 -07:00
Mike Travis
4fb7d08707 x86/platform/uv: Account for UV Hubless in is_uvX_hub Ops
The references in the is_uvX_hub() function uses the hub_info pointer
which will be NULL when the system is hubless.  This change avoids
that NULL dereference.  It is also an optimization in performance.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Justin Ernst <justin.ernst@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190910145840.294981941@stormcage.eag.rdlabs.hpecorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 13:42:11 +02:00
Mike Travis
8785968bce x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files
Indicate to UV user utilities that UV hubless support is available on
this system via the existing /proc infterface.  The current interface is
maintained with the addition of new /proc leaves ("hubbed", "hubless",
and "oemid") that contain the specific type of UV arch this one is.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Justin Ernst <justin.ernst@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190910145840.055590900@stormcage.eag.rdlabs.hpecorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 13:42:10 +02:00
Mike Travis
9743cb68f7 x86/platform/uv: Add return code to UV BIOS Init function
Add a return code to the UV BIOS init function that indicates the
successful initialization of the kernel/BIOS callback interface.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Justin Ernst <justin.ernst@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190910145839.895739629@stormcage.eag.rdlabs.hpecorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 13:42:10 +02:00
Mike Travis
0959f8256a x86/platform/uv: Return UV Hubless System Type
Return the type of UV hubless system for UV specific code that depends
on that.  Add a function to convert UV system type to bit pattern needed
for is_uv_hubless().

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Justin Ernst <justin.ernst@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190910145839.814880843@stormcage.eag.rdlabs.hpecorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 13:42:10 +02:00
Linus Torvalds
b145b0eb20 ARM and x86 bugfixes of all kinds. The most visible one is that migrating
a nested hypervisor has always been busted on Broadwell and newer processors,
 and that has finally been fixed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdlzTRAAoJEL/70l94x66DElcH/Rvhn5VQE/n2J+tKEXAICxQu
 FqcTBJ5x2mp04aFe7xD3kWoKRJmz2lmHdw2ahFd4sqqLfGEFF/KW24ADI33vzLx/
 UmT78O0Je3PX77TRnEXy+napbJny0iT6ikTAQKPbyQ151JlqlbPvatpDXXLPWQHv
 jj6nKHCvMBrhV3kgaXO3cTFl8swX1hvR9lo9PcA2gRNt+HMN0heUmpfKughPoOes
 JH+UNjsEr7MYlXYlIIc9o71EYH+kgPObwlLejy0ture+dvvZEJUJjZJE8H/XG5f2
 ryXG9favaCOTAvaGf0R5Es+47A3crqUr6gHS0N28QKPn7x4hehIkKpA9dXQnWIw=
 =1/LN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM and x86 bugfixes of all kinds.

  The most visible one is that migrating a nested hypervisor has always
  been busted on Broadwell and newer processors, and that has finally
  been fixed"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: x86: omit "impossible" pmu MSRs from MSR list
  KVM: nVMX: Fix consistency check on injected exception error code
  KVM: x86: omit absent pmu MSRs from MSR list
  selftests: kvm: Fix libkvm build error
  kvm: vmx: Limit guest PMCs to those supported on the host
  kvm: x86, powerpc: do not allow clearing largepages debugfs entry
  KVM: selftests: x86: clarify what is reported on KVM_GET_MSRS failure
  KVM: VMX: Set VMENTER_L1D_FLUSH_NOT_REQUIRED if !X86_BUG_L1TF
  selftests: kvm: add test for dirty logging inside nested guests
  KVM: x86: fix nested guest live migration with PML
  KVM: x86: assign two bits to track SPTE kinds
  KVM: x86: Expose XSAVEERPTR to the guest
  kvm: x86: Enumerate support for CLZERO instruction
  kvm: x86: Use AMD CPUID semantics for AMD vCPUs
  kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH
  KVM: X86: Fix userspace set invalid CR4
  kvm: x86: Fix a spurious -E2BIG in __do_cpuid_func
  KVM: LAPIC: Loosen filter for adaptive tuning of lapic_timer_advance_ns
  KVM: arm/arm64: vgic: Use the appropriate TRACE_INCLUDE_PATH
  arm64: KVM: Kill hyp_alternate_select()
  ...
2019-10-04 11:17:51 -07:00
Arnd Bergmann
87d6021b81 x86/math-emu: Limit MATH_EMULATION to 486SX compatibles
The FPU emulation code is old and fragile in places, try to limit its
use to builds for CPUs that actually use it. As far as I can tell,
this is only true for i486sx compatibles, including the Cyrix 486SLC,
AMD Am486SX and ÉLAN SC410, UMC U5S amd DM&P VortexSX86, all of which
were relatively short-lived and got replaced with i486DX compatible
processors soon after introduction, though some of the embedded versions
remained available much longer.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bill Metzenthen <billm@melbpc.org.au>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191001142344.1274185-2-arnd@arndb.de
2019-10-03 10:51:17 +02:00
Nishad Kamdar
6184488a19 x86: Use the correct SPDX License Identifier in headers
Correct the SPDX License Identifier format in a couple of headers.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/697848ff866ade29e78e872525d7a3067642fd37.1555427420.git.nishadkamdar@gmail.com
2019-10-01 20:31:35 +02:00
Linus Torvalds
aefcf2f4b5 Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull kernel lockdown mode from James Morris:
 "This is the latest iteration of the kernel lockdown patchset, from
  Matthew Garrett, David Howells and others.

  From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

  There are two major changes since this was last proposed for mainline:

   - Separating lockdown from EFI secure boot. Background discussion is
     covered here: https://lwn.net/Articles/751061/

   -  Implementation as an LSM, with a default stackable lockdown LSM
      module. This allows the lockdown feature to be policy-driven,
      rather than encoding an implicit policy within the mechanism.

  The new locked_down LSM hook is provided to allow LSMs to make a
  policy decision around whether kernel functionality that would allow
  tampering with or examining the runtime state of the kernel should be
  permitted.

  The included lockdown LSM provides an implementation with a simple
  policy intended for general purpose use. This policy provides a coarse
  level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

  Enable the kernel lockdown feature. If set to integrity, kernel features
  that allow userland to modify the running kernel are disabled. If set to
  confidentiality, kernel features that allow userland to extract
  confidential information from the kernel are also disabled.

  This may also be controlled via /sys/kernel/security/lockdown and
  overriden by kernel configuration.

  New or existing LSMs may implement finer-grained controls of the
  lockdown features. Refer to the lockdown_reason documentation in
  include/linux/security.h for details.

  The lockdown feature has had signficant design feedback and review
  across many subsystems. This code has been in linux-next for some
  weeks, with a few fixes applied along the way.

  Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
  when kernel lockdown is in confidentiality mode") is missing a
  Signed-off-by from its author. Matthew responded that he is providing
  this under category (c) of the DCO"

* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
  kexec: Fix file verification on S390
  security: constify some arrays in lockdown LSM
  lockdown: Print current->comm in restriction messages
  efi: Restrict efivar_ssdt_load when the kernel is locked down
  tracefs: Restrict tracefs when the kernel is locked down
  debugfs: Restrict debugfs when the kernel is locked down
  kexec: Allow kexec_file() with appropriate IMA policy when locked down
  lockdown: Lock down perf when in confidentiality mode
  bpf: Restrict bpf when kernel lockdown is in confidentiality mode
  lockdown: Lock down tracing and perf kprobes when in confidentiality mode
  lockdown: Lock down /proc/kcore
  x86/mmiotrace: Lock down the testmmiotrace module
  lockdown: Lock down module params that specify hardware parameters (eg. ioport)
  lockdown: Lock down TIOCSSERIAL
  lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
  acpi: Disable ACPI table override if the kernel is locked down
  acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
  ACPI: Limit access to custom_method when the kernel is locked down
  x86/msr: Restrict MSR access when the kernel is locked down
  x86: Lock down IO port access when the kernel is locked down
  ...
2019-09-28 08:14:15 -07:00
Linus Torvalds
8bbe0dec38 x86 KVM changes:
* The usual accuracy improvements for nested virtualization
 * The usual round of code cleanups from Sean
 * Added back optimizations that were prematurely removed in 5.2
   (the bare minimum needed to fix the regression was in 5.3-rc8,
   here comes the rest)
 * Support for UMWAIT/UMONITOR/TPAUSE
 * Direct L2->L0 TLB flushing when L0 is Hyper-V and L1 is KVM
 * Tell Windows guests if SMT is disabled on the host
 * More accurate detection of vmexit cost
 * Revert a pvqspinlock pessimization
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdjfaKAAoJEL/70l94x66D8MAH/2thJnM47tYtMTFA4GBFugeH
 mAx8OApWFBo8apOip+8ElFLPQ8FQdZCzr9ti8H4JkuzKxgsxCs1iqEg5pHEKxSTi
 K9kLOZwoFtwgy3XmxC0PIZ9lT2Wx74ruh1HF+QG/YsjKH636UPv2VpmulsTNbm62
 2ryzOb3TlGT/cjf+gv9l6IYIxZa2Ff19PF4i//H8u4YRBj358/jr99CK01iE0M9r
 4NhEKiQZywzREWtKxymGOM6HEbwbWcIa+loYjj2htq8epep6f9Y1zQ0Jcn5+nPA0
 cn1T2gGJAJ0OUahKLwNbz8pzrFDkW+eoQgqCBJZ4RT9Uf8WCESfl14p+/vRkAMg=
 =tk5S
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "x86 KVM changes:

   - The usual accuracy improvements for nested virtualization

   - The usual round of code cleanups from Sean

   - Added back optimizations that were prematurely removed in 5.2 (the
     bare minimum needed to fix the regression was in 5.3-rc8, here
     comes the rest)

   - Support for UMWAIT/UMONITOR/TPAUSE

   - Direct L2->L0 TLB flushing when L0 is Hyper-V and L1 is KVM

   - Tell Windows guests if SMT is disabled on the host

   - More accurate detection of vmexit cost

   - Revert a pvqspinlock pessimization"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (56 commits)
  KVM: nVMX: cleanup and fix host 64-bit mode checks
  KVM: vmx: fix build warnings in hv_enable_direct_tlbflush() on i386
  KVM: x86: Don't check kvm_rebooting in __kvm_handle_fault_on_reboot()
  KVM: x86: Drop ____kvm_handle_fault_on_reboot()
  KVM: VMX: Add error handling to VMREAD helper
  KVM: VMX: Optimize VMX instruction error and fault handling
  KVM: x86: Check kvm_rebooting in kvm_spurious_fault()
  KVM: selftests: fix ucall on x86
  Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
  kvm: nvmx: limit atomic switch MSRs
  kvm: svm: Intercept RDPRU
  kvm: x86: Add "significant index" flag to a few CPUID leaves
  KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero
  KVM: x86/mmu: Explicitly track only a single invalid mmu generation
  KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call"
  KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete page first""
  KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap all pages""
  KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""
  KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages""
  KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow page related tracepoints""
  ...
2019-09-27 12:44:26 -07:00
Paolo Bonzini
6eeb4ef049 KVM: x86: assign two bits to track SPTE kinds
Currently, we are overloading SPTE_SPECIAL_MASK to mean both
"A/D bits unavailable" and MMIO, where the difference between the
two is determined by mio_mask and mmio_value.

However, the next patch will need two bits to distinguish
availability of A/D bits from write protection.  So, while at
it give MMIO its own bit pattern, and move the two bits from
bit 62 to bits 52..53 since Intel is allocating EPT page table
bits from the top.

Reviewed-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-27 13:13:24 +02:00
Sean Christopherson
f209a26dd5 KVM: x86: Don't check kvm_rebooting in __kvm_handle_fault_on_reboot()
Remove the kvm_rebooting check from VMX/SVM instruction exception fixup
now that kvm_spurious_fault() conditions its BUG() on !kvm_rebooting.
Because the 'cleanup_insn' functionally is also gone, deferring to
kvm_spurious_fault() means __kvm_handle_fault_on_reboot() can eliminate
its .fixup code entirely and have its exception table entry branch
directly to the call to kvm_spurious_fault().

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-25 15:30:19 +02:00
Sean Christopherson
98cd382d50 KVM: x86: Drop ____kvm_handle_fault_on_reboot()
Remove the variation of __kvm_handle_fault_on_reboot() that accepts a
post-fault cleanup instruction now that its sole user (VMREAD) uses
a different method for handling faults.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-25 15:30:14 +02:00
Sean Christopherson
4b526de50e KVM: x86: Check kvm_rebooting in kvm_spurious_fault()
Explicitly check kvm_rebooting in kvm_spurious_fault() prior to invoking
BUG(), as opposed to assuming the caller has already done so.  Letting
kvm_spurious_fault() be called "directly" will allow VMX to better
optimize its low level assembly flows.

As a happy side effect, kvm_spurious_fault() no longer needs to be
marked as a dead end since it doesn't unconditionally BUG().

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-25 15:23:33 +02:00
Ingo Molnar
44e09568cf x86/mm: Clean up the pmd_read_atomic() comments
Fix spelling, consistent parenthesis and grammar - and also clarify
the language where needed.

Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-25 10:13:27 +02:00
Wei Yang
a2f7a0bfca x86/mm: Fix function name typo in pmd_read_atomic() comment
The function involved should be pte_offset_map_lock() and we never have
function pmd_offset_map_lock defined.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190925014453.20236-1-richardw.yang@linux.intel.com
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-25 08:40:19 +02:00
Mike Rapoport
782de70c42 mm: consolidate pgtable_cache_init() and pgd_cache_init()
Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem
cache for page table allocations on several architectures that do not use
PAGE_SIZE tables for one or more levels of the page table hierarchy.

Most architectures do not implement these functions and use __weak default
NOP implementation of pgd_cache_init().  Since there is no such default
for pgtable_cache_init(), its empty stub is duplicated among most
architectures.

Rename the definitions of pgd_cache_init() to pgtable_cache_init() and
drop empty stubs of pgtable_cache_init().

Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Will Deacon <will@kernel.org>		[arm64]
Acked-by: Thomas Gleixner <tglx@linutronix.de>	[x86]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 15:54:09 -07:00
Nicholas Piggin
13224794cb mm: remove quicklist page table caches
Patch series "mm: remove quicklist page table caches".

A while ago Nicholas proposed to remove quicklist page table caches [1].

I've rebased his patch on the curren upstream and switched ia64 and sh to
use generic versions of PTE allocation.

[1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com

This patch (of 3):

Remove page table allocator "quicklists".  These have been around for a
long time, but have not got much traction in the last decade and are only
used on ia64 and sh architectures.

The numbers in the initial commit look interesting but probably don't
apply anymore.  If anybody wants to resurrect this it's in the git
history, but it's unhelpful to have this code and divergent allocator
behaviour for minor archs.

Also it might be better to instead make more general improvements to page
allocator if this is still so slow.

Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 15:54:09 -07:00
Marc Orr
f0b5105af6 kvm: nvmx: limit atomic switch MSRs
Allowing an unlimited number of MSRs to be specified via the VMX
load/store MSR lists (e.g., vm-entry MSR load list) is bad for two
reasons. First, a guest can specify an unreasonable number of MSRs,
forcing KVM to process all of them in software. Second, the SDM bounds
the number of MSRs allowed to be packed into the atomic switch MSR lists.
Quoting the "Miscellaneous Data" section in the "VMX Capability
Reporting Facility" appendix:

"Bits 27:25 is used to compute the recommended maximum number of MSRs
that should appear in the VM-exit MSR-store list, the VM-exit MSR-load
list, or the VM-entry MSR-load list. Specifically, if the value bits
27:25 of IA32_VMX_MISC is N, then 512 * (N + 1) is the recommended
maximum number of MSRs to be included in each list. If the limit is
exceeded, undefined processor behavior may result (including a machine
check during the VMX transition)."

Because KVM needs to protect itself and can't model "undefined processor
behavior", arbitrarily force a VM-entry to fail due to MSR loading when
the MSR load list is too large. Similarly, trigger an abort during a VM
exit that encounters an MSR load list or MSR store list that is too large.

The MSR list size is intentionally not pre-checked so as to maintain
compatibility with hardware inasmuch as possible.

Test these new checks with the kvm-unit-test "x86: nvmx: test max atomic
switch MSRs".

Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 16:32:15 +02:00
Jim Mattson
0cb8410b90 kvm: svm: Intercept RDPRU
The RDPRU instruction gives the guest read access to the IA32_APERF
MSR and the IA32_MPERF MSR. According to volume 3 of the APM, "When
virtualization is enabled, this instruction can be intercepted by the
Hypervisor. The intercept bit is at VMCB byte offset 10h, bit 14."
Since we don't enumerate the instruction in KVM_SUPPORTED_CPUID,
intercept it and synthesize #UD.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Drew Schmitt <dasch@google.com>
Reviewed-by: Jacob Xu <jacobhxu@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 16:15:36 +02:00
Sean Christopherson
ca333add69 KVM: x86/mmu: Explicitly track only a single invalid mmu generation
Toggle mmu_valid_gen between '0' and '1' instead of blindly incrementing
the generation.  Because slots_lock is held for the entire duration of
zapping obsolete pages, it's impossible for there to be multiple invalid
generations associated with shadow pages at any given time.

Toggling between the two generations (valid vs. invalid) allows changing
mmu_valid_gen from an unsigned long to a u8, which reduces the size of
struct kvm_mmu_page from 160 to 152 bytes on 64-bit KVM, i.e. reduces
KVM's memory footprint by 8 bytes per shadow page.

Set sp->mmu_valid_gen before it is added to active_mmu_pages.
Functionally this has no effect as kvm_mmu_alloc_page() has a single
caller that sets sp->mmu_valid_gen soon thereafter, but visually it is
jarring to see a shadow page being added to the list without its
mmu_valid_gen first being set.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 14:36:00 +02:00
Sean Christopherson
31741eb11a KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete page first""
Now that the fast invalidate mechanism has been reintroduced, restore
the performance tweaks for fast invalidation that existed prior to its
removal.

Paraphrashing the original changelog:

  Introduce a per-VM list to track obsolete shadow pages, i.e. pages
  which have been deleted from the mmu cache but haven't yet been freed.
  When page reclaiming is needed, zap/free the deleted pages first.

This reverts commit 52d5dedc79.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 14:35:47 +02:00
Tao Xu
bf653b78f9 KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit
As the latest Intel 64 and IA-32 Architectures Software Developer's
Manual, UMWAIT and TPAUSE instructions cause a VM exit if the
RDTSC exiting and enable user wait and pause VM-execution
controls are both 1.

Because KVM never enable RDTSC exiting, the vm-exit for UMWAIT and TPAUSE
should never happen. Considering EXIT_REASON_XSAVES and
EXIT_REASON_XRSTORS is also unexpected VM-exit for KVM. Introduce a common
exit helper handle_unexpected_vmexit() to handle these unexpected VM-exit.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Jingqi Liu <jingqi.liu@intel.com>
Signed-off-by: Jingqi Liu <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 14:34:51 +02:00
Tao Xu
e69e72faa3 KVM: x86: Add support for user wait instructions
UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.
This patch adds support for user wait instructions in KVM. Availability
of the user wait instructions is indicated by the presence of the CPUID
feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may
be executed at any privilege level, and use 32bit IA32_UMWAIT_CONTROL MSR
to set the maximum time.

The behavior of user wait instructions in VMX non-root operation is
determined first by the setting of the "enable user wait and pause"
secondary processor-based VM-execution control bit 26.
	If the VM-execution control is 0, UMONITOR/UMWAIT/TPAUSE cause
an invalid-opcode exception (#UD).
	If the VM-execution control is 1, treatment is based on the
setting of the “RDTSC exiting†VM-execution control. Because KVM never
enables RDTSC exiting, if the instruction causes a delay, the amount of
time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay. If
IA32_UMWAIT_CONTROL[31:2] is zero, the virtual delay is the value in
EDX:EAX minus the value that RDTSC would return; if
IA32_UMWAIT_CONTROL[31:2] is not zero, the virtual delay is the minimum
of that difference and AND(IA32_UMWAIT_CONTROL,FFFFFFFCH).

Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it.

Detailed information about user wait instructions can be found in the
latest Intel 64 and IA-32 Architectures Software Developer's Manual.

Co-developed-by: Jingqi Liu <jingqi.liu@intel.com>
Signed-off-by: Jingqi Liu <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 14:34:20 +02:00
Sean Christopherson
41577ab8bd KVM: x86: Add comments to document various emulation types
Document the intended usage of each emulation type as each exists to
handle an edge case of one kind or another and can be easily
misinterpreted at first glance.

Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 14:34:14 +02:00
Sean Christopherson
60fc3d02d5 KVM: x86: Remove emulation_result enums, EMULATE_{DONE,FAIL,USER_EXIT}
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.

Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely.  Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively.  Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 14:34:00 +02:00
Sean Christopherson
b400060620 KVM: x86: Add explicit flag for forced emulation on #UD
Add an explicit emulation type for forced #UD emulation and use it to
detect that KVM should unconditionally inject a #UD instead of falling
into its standard emulation failure handling.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 14:30:54 +02:00
Sean Christopherson
42cbf06872 KVM: x86: Move #GP injection for VMware into x86_emulate_instruction()
Immediately inject a #GP when VMware emulation fails and return
EMULATE_DONE instead of propagating EMULATE_FAIL up the stack.  This
helps pave the way for removing EMULATE_FAIL altogether.

Rename EMULTYPE_VMWARE to EMULTYPE_VMWARE_GP to document that the x86
emulator is called to handle VMware #GP interception, e.g. why a #GP
is injected on emulation failure for EMULTYPE_VMWARE_GP.

Drop EMULTYPE_NO_UD_ON_FAIL as a standalone type.  The "no #UD on fail"
is used only in the VMWare case and is obsoleted by having the emulator
itself reinject #GP.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 14:30:47 +02:00
Vitaly Kuznetsov
b2d8b167e1 KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible
Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot
guarantee that two virtual processors won't end up running on sibling SMT
threads without knowing about it. This is done as an optimization as in
this case there is nothing the guest can do to protect itself against MDS
and issuing additional flush requests is just pointless. On bare metal the
topology is known, however, when Hyper-V is running nested (e.g. on top of
KVM) it needs an additional piece of information: a confirmation that the
exposed topology (wrt vCPU placement on different SMT threads) is
trustworthy.

NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in
TLFS as follows: "Indicates that a virtual processor will never share a
physical core with another virtual processor, except for virtual processors
that are reported as sibling SMT threads." From KVM we can give such
guarantee in two cases:
- SMT is unsupported or forcefully disabled (just 'disabled' doesn't work
 as it can become re-enabled during the lifetime of the guest).
- vCPUs are properly pinned so the scheduler won't put them on sibling
SMT threads (when they're not reported as such).

This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the
first case. The second case is outside of KVM's domain of responsibility
(as vCPU pinning is actually done by someone who manages KVM's userspace -
e.g. libvirt pinning QEMU threads).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 13:37:30 +02:00
Vitaly Kuznetsov
6f6a657c99 KVM/Hyper-V/VMX: Add direct tlb flush support
Hyper-V provides direct tlb flush function which helps
L1 Hypervisor to handle Hyper-V tlb flush request from
L2 guest. Add the function support for VMX.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 13:37:14 +02:00
Tianyu Lan
344c6c8047 KVM/Hyper-V: Add new KVM capability KVM_CAP_HYPERV_DIRECT_TLBFLUSH
Hyper-V direct tlb flush function should be enabled for
guest that only uses Hyper-V hypercall. User space
hypervisor(e.g, Qemu) can disable KVM identification in
CPUID and just exposes Hyper-V identification to make
sure the precondition. Add new KVM capability KVM_CAP_
HYPERV_DIRECT_TLBFLUSH for user space to enable Hyper-V
direct tlb function and this function is default to be
disabled in KVM.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 13:37:13 +02:00
Tianyu Lan
7a83247e01 x86/Hyper-V: Fix definition of struct hv_vp_assist_page
The struct hv_vp_assist_page was defined incorrectly.
The "vtl_control" should be u64[3], "nested_enlightenments
_control" should be a u64 and there are 7 reserved bytes
following "enlighten_vmentry". Fix the definition.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24 13:37:11 +02:00
Linus Torvalds
227c3e9eb5 Make use of gcc 9's "asm inline()" (Rasmus Villemoes):
gcc 9+ (and gcc 8.3, 7.5) provides a way to override the otherwise
     crude heuristic that gcc uses to estimate the size of the code
     represented by an asm() statement. From the gcc docs
 
       If you use 'asm inline' instead of just 'asm', then for inlining
       purposes the size of the asm is taken as the minimum size, ignoring
       how many instructions GCC thinks it is.
 
     For compatibility with older compilers, we obviously want a
 
       #if [understands asm inline]
       #define asm_inline asm inline
       #else
       #define asm_inline asm
       #endif
 
     But since we #define the identifier inline to attach some attributes,
     we have to use an alternate spelling of that keyword. gcc provides
     both __inline__ and __inline, and we currently #define both to inline,
     so they all have the same semantics. We have to free up one of
     __inline__ and __inline, and the latter is by far the easiest.
 
     The two x86 changes cause smaller code gen differences than I'd
     expect, but I think we do want the asm_inline thing available sooner
     or later, so this is just to get the ball rolling.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl2D9iAACgkQGXyLc2ht
 IW31DxAA1ajlhPor7NccuhJrz3hXMkPfEBEaTEYp98a5XYGARqrRwMEDzB7j9O/t
 6BZ/ZD8/IehuoNjNP3E5IOnDfvP+a94WL25/hjpTN4aBuZKWJz0X7As8TJJ+bwQc
 v2Hyo+yqzcSEhCI7Mc34uo+TuPaFYEoKHvg+hhSXi4h7c5eqtGKCaB2286iEkk/6
 bAo7n6ogYN64wXjbVXePmpQWgVJG2tsz/blG0hHMISW5UTzWkK/hYZkSf6jdFGSN
 aft1l9EMGx5skAQwFfnDOgf805/TE4jliD2nvaZzT0f/UtYkGjx7C77dcFlPY9Mf
 9R1M01rCQ0KfxR5BqHPN/DbTrd+GzBvKswjrTIB+TwopEfy8yVY2uRIq1lP+aobD
 V3HOtdicgw3wB/fF40pjqoCp3ByawKLlzRpQuqdSmqHs0kS6ForbfLClg8riru6a
 VnJqOXkLlZXU/VysdxuCPbAqN/Rvf9YV3H25x8BOJJWsXa0RdotSbyPIpyPcUo5K
 NycxR/vrLJ8hQqEdOY9iKJP+IMGmAfOJD8fppG/Xsnav8c+NtHPAQlW1c8ee/x2g
 Dbm2AWPCZnrSrFiQ1mDUdc921d/sLTUC/nOHzR8FiZEJGmYVmYSSw3PekFjQW1aV
 TIqvghk15XLB7Ye8JE/eKmoNIBtOsftXpYBzI3716tX46bnPu+4=
 =+BVw
 -----END PGP SIGNATURE-----

Merge tag 'compiler-attributes-for-linus-v5.4' of git://github.com/ojeda/linux

Pull asm inline support from Miguel Ojeda:
 "Make use of gcc 9's "asm inline()" (Rasmus Villemoes):

  gcc 9+ (and gcc 8.3, 7.5) provides a way to override the otherwise
  crude heuristic that gcc uses to estimate the size of the code
  represented by an asm() statement. From the gcc docs

      If you use 'asm inline' instead of just 'asm', then for inlining
      purposes the size of the asm is taken as the minimum size, ignoring
      how many instructions GCC thinks it is.

  For compatibility with older compilers, we obviously want a

      #if [understands asm inline]
      #define asm_inline asm inline
      #else
      #define asm_inline asm
      #endif

  But since we #define the identifier inline to attach some attributes,
  we have to use an alternate spelling of that keyword. gcc provides
  both __inline__ and __inline, and we currently #define both to inline,
  so they all have the same semantics.

  We have to free up one of __inline__ and __inline, and the latter is
  by far the easiest.

  The two x86 changes cause smaller code gen differences than I'd
  expect, but I think we do want the asm_inline thing available sooner
  or later, so this is just to get the ball rolling"

* tag 'compiler-attributes-for-linus-v5.4' of git://github.com/ojeda/linux:
  x86: bug.h: use asm_inline in _BUG_FLAGS definitions
  x86: alternative.h: use asm_inline for all alternative variants
  compiler-types.h: add asm_inline definition
  compiler_types.h: don't #define __inline
  lib/zstd/mem.h: replace __inline by inline
  staging: rtl8723bs: replace __inline by inline
2019-09-21 09:47:19 -07:00
Linus Torvalds
45824fc0da powerpc updates for 5.4
- Initial support for running on a system with an Ultravisor, which is software
    that runs below the hypervisor and protects guests against some attacks by
    the hypervisor.
 
  - Support for building the kernel to run as a "Secure Virtual Machine", ie. as
    a guest capable of running on a system with an Ultravisor.
 
  - Some changes to our DMA code on bare metal, to allow devices with medium
    sized DMA masks (> 32 && < 59 bits) to use more than 2GB of DMA space.
 
  - Support for firmware assisted crash dumps on bare metal (powernv).
 
  - Two series fixing bugs in and refactoring our PCI EEH code.
 
  - A large series refactoring our exception entry code to use gas macros, both
    to make it more readable and also enable some future optimisations.
 
 As well as many cleanups and other minor features & fixups.
 
 Thanks to:
   Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh
   Kumar K.V, Anju T Sudhakar, Anshuman Khandual, Balbir Singh, Benjamin
   Herrenschmidt, Cédric Le Goater, Christophe JAILLET, Christophe Leroy,
   Christopher M. Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens,
   David Gibson, David Hildenbrand, Desnes A. Nunes do Rosario, Ganesh Goudar,
   Gautham R. Shenoy, Greg Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari
   Bathini, Joakim Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras,
   Lianbo Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar,
   Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan Chancellor,
   Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Qian Cai, Ram
   Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm, Sam Bobroff, Santosh Sivaraj,
   Segher Boessenkool, Sukadev Bhattiprolu, Thiago Bauermann, Thiago Jung
   Bauermann, Thomas Gleixner, Tom Lendacky, Vasant Hegde.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl2EtEcTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPfsD/9uXyBXn3anI/H08+mk74k5gCsmMQpn
 D442CD/ByogZcccp23yBTlhawtCE03hcHnCLygn0Xgd8a4YvHts/RGHUe3fPHqlG
 bEyZ7jsLVz5ebNZQP7r4eGs2pSzCajwJy2N9HJ/C1ojf15rrfRxoVJtnyhE2wXpm
 DL+6o2K+nUCB3gTQ1Inr3DnWzoGOOUfNTOea2u+J+yfHwGRqOBYpevwqiwy5eelK
 aRjUJCqMTvrzra49MeFwjo0Nt3/Y8UNcwA+JlGdeR8bRuWhFrYmyBRiZEKPaujNO
 5EAfghBBlB0KQCqvF/tRM/c0OftHqK59AMobP9T7u9oOaBXeF/FpZX/iXjzNDPsN
 j9Oo2tKLTu/YVEXqBFuREGP+znANr1Wo4CFyOG8SbvYz0HFjR6XbtRJsS+0e8GWl
 kqX5/ZhYz3lBnKSNe9jgWOrh/J0KCSFigBTEWJT3xsn4YE8x8kK2l9KPqAIldWEP
 sKb2UjGS7v0NKq+NvShH88Q9AeQUEIjTcg/9aDDQDe6FaRQ7KiF8bUxSdwSPi+Fn
 j0lnF6i+1ATWZKuCr85veVi7C5qoe/+MqalnmP7MxULyzgXLLxUgN0SzEYO6QofK
 LQK/VaH2XVr5+M5YAb7K4/NX5gbM3s1bKrCiUy4EyHNvgG7gricYdbz6HgAjKpR7
 oP0rHfgmVYvF1g==
 =WlW+
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "This is a bit late, partly due to me travelling, and partly due to a
  power outage knocking out some of my test systems *while* I was
  travelling.

   - Initial support for running on a system with an Ultravisor, which
     is software that runs below the hypervisor and protects guests
     against some attacks by the hypervisor.

   - Support for building the kernel to run as a "Secure Virtual
     Machine", ie. as a guest capable of running on a system with an
     Ultravisor.

   - Some changes to our DMA code on bare metal, to allow devices with
     medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of
     DMA space.

   - Support for firmware assisted crash dumps on bare metal (powernv).

   - Two series fixing bugs in and refactoring our PCI EEH code.

   - A large series refactoring our exception entry code to use gas
     macros, both to make it more readable and also enable some future
     optimisations.

  As well as many cleanups and other minor features & fixups.

  Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew
  Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual,
  Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe
  JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig,
  Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand,
  Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg
  Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim
  Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo
  Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar,
  Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan
  Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
  O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm,
  Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu,
  Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom
  Lendacky, Vasant Hegde"

* tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits)
  powerpc/mm/mce: Keep irqs disabled during lockless page table walk
  powerpc: Use ftrace_graph_ret_addr() when unwinding
  powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
  ftrace: Look up the address of return_to_handler() using helpers
  powerpc: dump kernel log before carrying out fadump or kdump
  docs: powerpc: Add missing documentation reference
  powerpc/xmon: Fix output of XIVE IPI
  powerpc/xmon: Improve output of XIVE interrupts
  powerpc/mm/radix: remove useless kernel messages
  powerpc/fadump: support holes in kernel boot memory area
  powerpc/fadump: remove RMA_START and RMA_END macros
  powerpc/fadump: update documentation about option to release opalcore
  powerpc/fadump: consider f/w load area
  powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file
  powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes
  powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
  powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel
  powerpc/fadump: improve how crashed kernel's memory is reserved
  powerpc/fadump: consider reserved ranges while releasing memory
  powerpc/fadump: make crash memory ranges array allocation generic
  ...
2019-09-20 11:48:06 -07:00
Linus Torvalds
671df18953 dma-mapping updates for 5.4:
- add dma-mapping and block layer helpers to take care of IOMMU
    merging for mmc plus subsequent fixups (Yoshihiro Shimoda)
  - rework handling of the pgprot bits for remapping (me)
  - take care of the dma direct infrastructure for swiotlb-xen (me)
  - improve the dma noncoherent remapping infrastructure (me)
  - better defaults for ->mmap, ->get_sgtable and ->get_required_mask (me)
  - cleanup mmaping of coherent DMA allocations (me)
  - various misc cleanups (Andy Shevchenko, me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl2CSucLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPfrhAAgXZA/EdFPvkkCoDrmgtf3XkudX9gajeCd9g4NZy6
 ZBQElTVvm4S0sQj7IXgALnMumDMbbTibW5SQLX5GwQDe+XXBpZ8ajpAnJAXc8a5T
 qaFQ4SInr4CgBZf9nZKDkbSBZ1Tu3AQm1c0QI8riRCkrVTuX4L06xpCef4Yh4mgO
 rwWEjIioYpQiKZMmu98riXh3ZNfFG3mVJRhKt8B6XJbBgnUnjDOPYGgaUwp6CU20
 tFBKL2GaaV0vdLJ5wYhIGXT4DJ8tp9T5n3IYGZv1Ux889RaZEHlCrMxzelYeDbCT
 KhZbhcSECGnddsh73t/UX7/KhytuqnfKa9n+Xo6AWuA47xO4c36quOOcTk9M0vE5
 TfGDmewgL6WIv4lzokpRn5EkfDhyL33j8eYJrJ8e0ldcOhSQIFk4ciXnf2stWi6O
 JrlzzzSid+zXxu48iTfoPdnMr7psTpiMvvRvKfEeMp2FX9Fg6EdMzJYLTEl+COHB
 0WwNacZmY3P01+b5EZXEgqKEZevIIdmPKbyM9rPtTjz8BjBwkABHTpN3fWbVBf7/
 Ax6OPYyW40xp1fnJuzn89m3pdOxn88FpDdOaeLz892Zd+Qpnro1ayulnFspVtqGM
 mGbzA9whILvXNRpWBSQrvr2IjqMRjbBxX3BVACl3MMpOChgkpp5iANNfSDjCftSF
 Zu8=
 =/wGv
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.4' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - add dma-mapping and block layer helpers to take care of IOMMU merging
   for mmc plus subsequent fixups (Yoshihiro Shimoda)

 - rework handling of the pgprot bits for remapping (me)

 - take care of the dma direct infrastructure for swiotlb-xen (me)

 - improve the dma noncoherent remapping infrastructure (me)

 - better defaults for ->mmap, ->get_sgtable and ->get_required_mask
   (me)

 - cleanup mmaping of coherent DMA allocations (me)

 - various misc cleanups (Andy Shevchenko, me)

* tag 'dma-mapping-5.4' of git://git.infradead.org/users/hch/dma-mapping: (41 commits)
  mmc: renesas_sdhi_internal_dmac: Add MMC_CAP2_MERGE_CAPABLE
  mmc: queue: Fix bigger segments usage
  arm64: use asm-generic/dma-mapping.h
  swiotlb-xen: merge xen_unmap_single into xen_swiotlb_unmap_page
  swiotlb-xen: simplify cache maintainance
  swiotlb-xen: use the same foreign page check everywhere
  swiotlb-xen: remove xen_swiotlb_dma_mmap and xen_swiotlb_dma_get_sgtable
  xen: remove the exports for xen_{create,destroy}_contiguous_region
  xen/arm: remove xen_dma_ops
  xen/arm: simplify dma_cache_maint
  xen/arm: use dev_is_dma_coherent
  xen/arm: consolidate page-coherent.h
  xen/arm: use dma-noncoherent.h calls for xen-swiotlb cache maintainance
  arm: remove wrappers for the generic dma remap helpers
  dma-mapping: introduce a dma_common_find_pages helper
  dma-mapping: always use VM_DMA_COHERENT for generic DMA remap
  vmalloc: lift the arm flag for coherent mappings to common code
  dma-mapping: provide a better default ->get_required_mask
  dma-mapping: remove the dma_declare_coherent_memory export
  remoteproc: don't allow modular build
  ...
2019-09-19 13:27:23 -07:00
Linus Torvalds
8b53c76533 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Add the ability to abort a skcipher walk.

  Algorithms:
   - Fix XTS to actually do the stealing.
   - Add library helpers for AES and DES for single-block users.
   - Add library helpers for SHA256.
   - Add new DES key verification helper.
   - Add surrounding bits for ESSIV generator.
   - Add accelerations for aegis128.
   - Add test vectors for lzo-rle.

  Drivers:
   - Add i.MX8MQ support to caam.
   - Add gcm/ccm/cfb/ofb aes support in inside-secure.
   - Add ofb/cfb aes support in media-tek.
   - Add HiSilicon ZIP accelerator support.

  Others:
   - Fix potential race condition in padata.
   - Use unbound workqueues in padata"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (311 commits)
  crypto: caam - Cast to long first before pointer conversion
  crypto: ccree - enable CTS support in AES-XTS
  crypto: inside-secure - Probe transform record cache RAM sizes
  crypto: inside-secure - Base RD fetchcount on actual RD FIFO size
  crypto: inside-secure - Base CD fetchcount on actual CD FIFO size
  crypto: inside-secure - Enable extended algorithms on newer HW
  crypto: inside-secure: Corrected configuration of EIP96_TOKEN_CTRL
  crypto: inside-secure - Add EIP97/EIP197 and endianness detection
  padata: remove cpu_index from the parallel_queue
  padata: unbind parallel jobs from specific CPUs
  padata: use separate workqueues for parallel and serial work
  padata, pcrypt: take CPU hotplug lock internally in padata_alloc_possible
  crypto: pcrypt - remove padata cpumask notifier
  padata: make padata_do_parallel find alternate callback CPU
  workqueue: require CPU hotplug read exclusion for apply_workqueue_attrs
  workqueue: unconfine alloc/apply/free_workqueue_attrs()
  padata: allocate workqueue internally
  arm64: dts: imx8mq: Add CAAM node
  random: Use wait_event_freezable() in add_hwgenerator_randomness()
  crypto: ux500 - Fix COMPILE_TEST warnings
  ...
2019-09-18 12:11:14 -07:00
Linus Torvalds
fe38bd6862 * s390: ioctl hardening, selftests
* ARM: ITS translation cache; support for 512 vCPUs, various cleanups
 and bugfixes
 
 * PPC: various minor fixes and preparation
 
 * x86: bugfixes all over the place (posted interrupts, SVM, emulation
 corner cases, blocked INIT), some IPI optimizations
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdf7fdAAoJEL/70l94x66DJzkIAKDcuWXJB4Qtoto6yUvPiHZm
 LYkY/Dn1zulb/DhzrBoXFey/jZXwl9kxMYkVTefnrAl0fRwFGX+G1UYnQrtAL6Gr
 ifdTYdy3kZhXCnnp99QAantWDswJHo1THwbmHrlmkxS4MdisEaTHwgjaHrDRZ4/d
 FAEwW2isSonP3YJfTtsKFFjL9k2D4iMnwZ/R2B7UOaWvgnerZ1GLmOkilvnzGGEV
 IQ89IIkWlkKd4SKgq8RkDKlfW5JrLrSdTK2Uf0DvAxV+J0EFkEaR+WlLsqumra0z
 Eg3KwNScfQj0DyT0TzurcOxObcQPoMNSFYXLRbUu1+i0CGgm90XpF1IosiuihgU=
 =w6I3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "s390:
   - ioctl hardening
   - selftests

  ARM:
   - ITS translation cache
   - support for 512 vCPUs
   - various cleanups and bugfixes

  PPC:
   - various minor fixes and preparation

  x86:
   - bugfixes all over the place (posted interrupts, SVM, emulation
     corner cases, blocked INIT)
   - some IPI optimizations"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (75 commits)
  KVM: X86: Use IPI shorthands in kvm guest when support
  KVM: x86: Fix INIT signal handling in various CPU states
  KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode
  KVM: VMX: Stop the preemption timer during vCPU reset
  KVM: LAPIC: Micro optimize IPI latency
  kvm: Nested KVM MMUs need PAE root too
  KVM: x86: set ctxt->have_exception in x86_decode_insn()
  KVM: x86: always stop emulation on page fault
  KVM: nVMX: trace nested VM-Enter failures detected by H/W
  KVM: nVMX: add tracepoint for failed nested VM-Enter
  x86: KVM: svm: Fix a check in nested_svm_vmrun()
  KVM: x86: Return to userspace with internal error on unexpected exit reason
  KVM: x86: Add kvm_emulate_{rd,wr}msr() to consolidate VXM/SVM code
  KVM: x86: Refactor up kvm_{g,s}et_msr() to simplify callers
  doc: kvm: Fix return description of KVM_SET_MSRS
  KVM: X86: Tune PLE Window tracepoint
  KVM: VMX: Change ple_window type to unsigned int
  KVM: X86: Remove tailing newline for tracepoints
  KVM: X86: Trace vcpu_id for vmexit
  KVM: x86: Manually calculate reserved bits when loading PDPTRS
  ...
2019-09-18 09:49:13 -07:00
Linus Torvalds
77dcfe2b9e Power management updates for 5.4-rc1
- Rework the main suspend-to-idle control flow to avoid repeating
    "noirq" device resume and suspend operations in case of spurious
    wakeups from the ACPI EC and decouple the ACPI EC wakeups support
    from the LPS0 _DSM support (Rafael Wysocki).
 
  - Extend the wakeup sources framework to expose wakeup sources as
    device objects in sysfs (Tri Vo, Stephen Boyd).
 
  - Expose system suspend statistics in sysfs (Kalesh Singh).
 
  - Introduce a new haltpoll cpuidle driver and a new matching
    governor for virtualized guests wanting to do guest-side polling
    in the idle loop (Marcelo Tosatti, Joao Martins, Wanpeng Li,
    Stephen Rothwell).
 
  - Fix the menu and teo cpuidle governors to allow the scheduler tick
    to be stopped if PM QoS is used to limit the CPU idle state exit
    latency in some cases (Rafael Wysocki).
 
  - Increase the resolution of the play_idle() argument to microseconds
    for more fine-grained injection of CPU idle cycles (Daniel Lezcano).
 
  - Switch over some users of cpuidle notifiers to the new QoS-based
    frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY
    policy notifier events (Viresh Kumar).
 
  - Add new cpufreq driver based on nvmem for sun50i (Yangtao Li).
 
  - Add support for MT8183 and MT8516 to the mediatek cpufreq driver
    (Andrew-sh.Cheng, Fabien Parent).
 
  - Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
    Huang).
 
  - Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz).
 
  - Update the qcom cpufreq driver (among other things, to make it
    easier to extend and to use kryo cpufreq for other nvmem-based
    SoCs) and add qcs404 support to it  (Niklas Cassel, Douglas
    RAILLARD, Sibi Sankar, Sricharan R).
 
  - Fix assorted issues and make assorted minor improvements in the
    cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli,
    Gustavo Silva, Hariprasad Kelam).
 
  - Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd
    Bergmann).
 
  - Add new Exynos PPMU events to devfreq events and extend that
    mechanism (Lukasz Luba).
 
  - Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny).
 
  - Improve devfreq documentation and governor code, fix spelling
    typos in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard
    Crestez, MyungJoo Ham, Gaël PORTAY).
 
  - Add regulators enable and disable to the OPP (operating performance
    points) framework (Kamil Konieczny).
 
  - Update the OPP framework to support multiple opp-suspend properties
    (Anson Huang).
 
  - Fix assorted issues and make assorted minor improvements in the OPP
    code (Niklas Cassel, Viresh Kumar, Yue Hu).
 
  - Clean up the generic power domains (genpd) framework (Ulf Hansson).
 
  - Clean up assorted pieces of power management code and documentation
    (Akinobu Mita, Amit Kucheria, Chuhong Yuan).
 
  - Update the pm-graph tool to version 5.5 including multiple fixes
    and improvements (Todd Brandt).
 
  - Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven,
    Sébastien Szymanski).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl2ArZ4SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxgfYQAK80hs43vWQDmp7XKrN4pQe8+qYULAGO
 fBfrFl+NG9y/cnuqnt3NtA8MoyNsMMkMLkpkEDMfSbYqqH5ehEzX5+uGJWiWx8+Y
 oH5KU8MH7Tj/utYaalGzDt0AHfHZDIGC0NCUNQJVtE/4mOANFabwsCwscp4MrD5Q
 WjFN8U4BrsmWgJdZ/U9QIWcDZ0I+1etCF+rZG2yxSv31FMq2Zk/Qm4YyobqCvQFl
 TR9rxl08wqUmIYIz5cDjt/3AKH7NLLDqOTstbCL7cmufM5XPFc1yox69xc89UrIa
 4AMgmDp7SMwFG/gdUPof0WQNmx7qxmiRAPleAOYBOZW/8jPNZk2y+RhM5NeF72m7
 AFqYiuxqatkSb4IsT8fLzH9IUZOdYr8uSmoMQECw+MHdApaKFjFV8Lb/qx5+AwkD
 y7pwys8dZSamAjAf62eUzJDWcEwkNrujIisGrIXrVHb7ISbweskMOmdAYn9p4KgP
 dfRzpJBJ45IaMIdbaVXNpg3rP7Apfs7X1X+/ZhG6f+zHH3zYwr8Y81WPqX8WaZJ4
 qoVCyxiVWzMYjY2/1lzjaAdqWojPWHQ3or3eBaK52DouyG3jY6hCDTLwU7iuqcCX
 jzAtrnqrNIKufvaObEmqcmYlIIOFT7QaJCtGUSRFQLfSon8fsVSR7LLeXoAMUJKT
 JWQenuNaJngK
 =TBDQ
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These include a rework of the main suspend-to-idle code flow (related
  to the handling of spurious wakeups), a switch over of several users
  of cpufreq notifiers to QoS-based limits, a new devfreq driver for
  Tegra20, a new cpuidle driver and governor for virtualized guests, an
  extension of the wakeup sources framework to expose wakeup sources as
  device objects in sysfs, and more.

  Specifics:

   - Rework the main suspend-to-idle control flow to avoid repeating
     "noirq" device resume and suspend operations in case of spurious
     wakeups from the ACPI EC and decouple the ACPI EC wakeups support
     from the LPS0 _DSM support (Rafael Wysocki).

   - Extend the wakeup sources framework to expose wakeup sources as
     device objects in sysfs (Tri Vo, Stephen Boyd).

   - Expose system suspend statistics in sysfs (Kalesh Singh).

   - Introduce a new haltpoll cpuidle driver and a new matching governor
     for virtualized guests wanting to do guest-side polling in the idle
     loop (Marcelo Tosatti, Joao Martins, Wanpeng Li, Stephen Rothwell).

   - Fix the menu and teo cpuidle governors to allow the scheduler tick
     to be stopped if PM QoS is used to limit the CPU idle state exit
     latency in some cases (Rafael Wysocki).

   - Increase the resolution of the play_idle() argument to microseconds
     for more fine-grained injection of CPU idle cycles (Daniel
     Lezcano).

   - Switch over some users of cpuidle notifiers to the new QoS-based
     frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY
     policy notifier events (Viresh Kumar).

   - Add new cpufreq driver based on nvmem for sun50i (Yangtao Li).

   - Add support for MT8183 and MT8516 to the mediatek cpufreq driver
     (Andrew-sh.Cheng, Fabien Parent).

   - Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
     Huang).

   - Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz).

   - Update the qcom cpufreq driver (among other things, to make it
     easier to extend and to use kryo cpufreq for other nvmem-based
     SoCs) and add qcs404 support to it (Niklas Cassel, Douglas
     RAILLARD, Sibi Sankar, Sricharan R).

   - Fix assorted issues and make assorted minor improvements in the
     cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli,
     Gustavo Silva, Hariprasad Kelam).

   - Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd
     Bergmann).

   - Add new Exynos PPMU events to devfreq events and extend that
     mechanism (Lukasz Luba).

   - Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny).

   - Improve devfreq documentation and governor code, fix spelling typos
     in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard Crestez,
     MyungJoo Ham, Gaël PORTAY).

   - Add regulators enable and disable to the OPP (operating performance
     points) framework (Kamil Konieczny).

   - Update the OPP framework to support multiple opp-suspend properties
     (Anson Huang).

   - Fix assorted issues and make assorted minor improvements in the OPP
     code (Niklas Cassel, Viresh Kumar, Yue Hu).

   - Clean up the generic power domains (genpd) framework (Ulf Hansson).

   - Clean up assorted pieces of power management code and documentation
     (Akinobu Mita, Amit Kucheria, Chuhong Yuan).

   - Update the pm-graph tool to version 5.5 including multiple fixes
     and improvements (Todd Brandt).

   - Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven,
     Sébastien Szymanski)"

* tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (126 commits)
  cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available
  cpuidle-haltpoll: do not set an owner to allow modunload
  cpuidle-haltpoll: return -ENODEV on modinit failure
  cpuidle-haltpoll: set haltpoll as preferred governor
  cpuidle: allow governor switch on cpuidle_register_driver()
  PM: runtime: Documentation: add runtime_status ABI document
  pm-graph: make setVal unbuffered again for python2 and python3
  powercap: idle_inject: Use higher resolution for idle injection
  cpuidle: play_idle: Increase the resolution to usec
  cpuidle-haltpoll: vcpu hotplug support
  cpufreq: Add qcs404 to cpufreq-dt-platdev blacklist
  cpufreq: qcom: Add support for qcs404 on nvmem driver
  cpufreq: qcom: Refactor the driver to make it easier to extend
  cpufreq: qcom: Re-organise kryo cpufreq to use it for other nvmem based qcom socs
  dt-bindings: opp: Add qcom-opp bindings with properties needed for CPR
  dt-bindings: opp: qcom-nvmem: Support pstates provided by a power domain
  Documentation: cpufreq: Update policy notifier documentation
  cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events
  PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state()
  PM / Domains: Simplify genpd_lookup_dev()
  ...
2019-09-17 19:15:14 -07:00
Linus Torvalds
7f2444d38f Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core timer updates from Thomas Gleixner:
 "Timers and timekeeping updates:

   - A large overhaul of the posix CPU timer code which is a preparation
     for moving the CPU timer expiry out into task work so it can be
     properly accounted on the task/process.

     An update to the bogus permission checks will come later during the
     merge window as feedback was not complete before heading of for
     travel.

   - Switch the timerqueue code to use cached rbtrees and get rid of the
     homebrewn caching of the leftmost node.

   - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
     single function

   - Implement the separation of hrtimers to be forced to expire in hard
     interrupt context even when PREEMPT_RT is enabled and mark the
     affected timers accordingly.

   - Implement a mechanism for hrtimers and the timer wheel to protect
     RT against priority inversion and live lock issues when a (hr)timer
     which should be canceled is currently executing the callback.
     Instead of infinitely spinning, the task which tries to cancel the
     timer blocks on a per cpu base expiry lock which is held and
     released by the (hr)timer expiry code.

   - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
     resulting in faster access to timekeeping functions.

   - Updates to various clocksource/clockevent drivers and their device
     tree bindings.

   - The usual small improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
  posix-cpu-timers: Fix permission check regression
  posix-cpu-timers: Always clear head pointer on dequeue
  hrtimer: Add a missing bracket and hide `migration_base' on !SMP
  posix-cpu-timers: Make expiry_active check actually work correctly
  posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
  tick: Mark sched_timer to expire in hard interrupt context
  hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
  x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
  posix-cpu-timers: Utilize timerqueue for storage
  posix-cpu-timers: Move state tracking to struct posix_cputimers
  posix-cpu-timers: Deduplicate rlimit handling
  posix-cpu-timers: Remove pointless comparisons
  posix-cpu-timers: Get rid of 64bit divisions
  posix-cpu-timers: Consolidate timer expiry further
  posix-cpu-timers: Get rid of zero checks
  rlimit: Rewrite non-sensical RLIMIT_CPU comment
  posix-cpu-timers: Respect INFINITY for hard RTTIME limit
  posix-cpu-timers: Switch thread group sampling to array
  posix-cpu-timers: Restructure expiry array
  posix-cpu-timers: Remove cputime_expires
  ...
2019-09-17 12:35:15 -07:00
Linus Torvalds
c5f12fdb8b Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:

 - Cleanup the apic IPI implementation by removing duplicated code and
   consolidating the functions into the APIC core.

 - Implement a safe variant of the IPI broadcast mode. Contrary to
   earlier attempts this uses the core tracking of which CPUs have been
   brought online at least once so that a broadcast does not end up in
   some dead end in BIOS/SMM code when the CPU is still waiting for
   init. Once all CPUs have been brought up once, IPI broadcasting is
   enabled. Before that regular one by one IPIs are issued.

 - Drop the paravirt CR8 related functions as they have no user anymore

 - Initialize the APIC TPR to block interrupt 16-31 as they are reserved
   for CPU exceptions and should never be raised by any well behaving
   device.

 - Emit a warning when vector space exhaustion breaks the admin set
   affinity of an interrupt.

 - Make sure to use the NMI fallback when shutdown via reboot vector IPI
   fails. The original code had conditions which prevent the code path
   to be reached.

 - Annotate various APIC config variables as RO after init.

[ The ipi broadcase change came in earlier through the cpu hotplug
  branch, but I left the explanation in the commit message since it was
  shared between the two different branches    - Linus ]

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  x86/apic/vector: Warn when vector space exhaustion breaks affinity
  x86/apic: Annotate global config variables as "read-only after init"
  x86/apic/x2apic: Implement IPI shorthands support
  x86/apic/flat64: Remove the IPI shorthand decision logic
  x86/apic: Share common IPI helpers
  x86/apic: Remove the shorthand decision logic
  x86/smp: Enhance native_send_call_func_ipi()
  x86/smp: Move smp_function_call implementations into IPI code
  x86/apic: Provide and use helper for send_IPI_allbutself()
  x86/apic: Add static key to Control IPI shorthands
  x86/apic: Move no_ipi_broadcast() out of 32bit
  x86/apic: Add NMI_VECTOR wait to IPI shorthand
  x86/apic: Remove dest argument from __default_send_IPI_shortcut()
  x86/hotplug: Silence APIC and NMI when CPU is dead
  x86/cpu: Move arch_smt_update() to a neutral place
  x86/apic/uv: Make x2apic_extra_bits static
  x86/apic: Consolidate the apic local headers
  x86/apic: Move apic_flat_64 header into apic directory
  x86/apic: Move ipi header into apic directory
  x86/apic: Cleanup the include maze
  ...
2019-09-17 12:04:39 -07:00
Linus Torvalds
258b16ec9a Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 interrupt updates from Thomas Gleixner:
 "A small set of changes to simplify and improve the interrupt handling
  in do_IRQ() by moving the common case into common code and thereby
  cleaning it up"

* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Check for VECTOR_UNUSED directly
  x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() code
  x86/irq: Improve definition of VECTOR_SHUTDOWN et al
2019-09-17 11:13:48 -07:00
Rafael J. Wysocki
2cdd5cc703 Merge branch 'pm-cpuidle'
* pm-cpuidle:
  cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available
  cpuidle-haltpoll: do not set an owner to allow modunload
  cpuidle-haltpoll: return -ENODEV on modinit failure
  cpuidle-haltpoll: set haltpoll as preferred governor
  cpuidle: allow governor switch on cpuidle_register_driver()
  powercap: idle_inject: Use higher resolution for idle injection
  cpuidle: play_idle: Increase the resolution to usec
  cpuidle-haltpoll: vcpu hotplug support
  cpuidle: teo: Get rid of redundant check in teo_update()
  cpuidle: teo: Allow tick to be stopped if PM QoS is used
  cpuidle: menu: Allow tick to be stopped if PM QoS is used
  cpuidle: header file stubs must be "static inline"
  cpuidle-haltpoll: disable host side polling when kvm virtualized
  cpuidle: add haltpoll governor
  governors: unify last_state_idx
  cpuidle: add poll_limit_ns to cpuidle_device structure
  add cpuidle-haltpoll driver
2019-09-17 09:41:26 +02:00
Linus Torvalds
7ac63f6ba5 Merge branch 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vmware updates from Ingo Molnar:
 "This updates the VMWARE guest driver with support for VMCALL/VMMCALL
  based hypercalls"

* 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  input/vmmouse: Update the backdoor call with support for new instructions
  drm/vmwgfx: Update the backdoor call with support for new instructions
  x86/vmware: Add a header file for hypercall definitions
  x86/vmware: Update platform detection code for VMCALL/VMMCALL hypercalls
2019-09-16 19:40:24 -07:00
Linus Torvalds
e2bddc20b5 Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 hyperv updates from Ingo Molnar:
 "Misc updates related to page size abstractions within the HyperV code,
  in preparation for future features"

* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  drivers: hv: vmbus: Replace page definition with Hyper-V specific one
  x86/hyperv: Add functions to allocate/deallocate page for Hyper-V
  x86/hyperv: Create and use Hyper-V page definitions
2019-09-16 19:39:00 -07:00
Linus Torvalds
ac51667b5b Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:

 - Make cpumask_of_node() more robust against invalid node IDs

 - Simplify and speed up load_mm_cr4()

 - Unexport and remove various unused set_memory_*() APIs

 - Misc cleanups

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix cpumask_of_node() error condition
  x86/mm: Remove the unused set_memory_wt() function
  x86/mm: Remove set_pages_x() and set_pages_nx()
  x86/mm: Remove the unused set_memory_array_*() functions
  x86/mm: Unexport set_memory_x() and set_memory_nx()
  x86/fixmap: Cleanup outdated comments
  x86/kconfig: Remove X86_DIRECT_GBPAGES dependency on !DEBUG_PAGEALLOC
  x86/mm: Avoid redundant interrupt disable in load_mm_cr4()
2019-09-16 19:21:34 -07:00
Linus Torvalds
e0d60a1e68 Merge branch 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry updates from Ingo Molnar:
 "This contains x32 and compat syscall improvements, the biggest one of
  which splits x32 syscalls into their own table, which allows new
  syscalls to share the x32 and x86-64 number - which turns the
  512-547 special syscall numbers range into a legacy wart that won't be
  extended going forward"

* 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/syscalls: Split the x32 syscalls into their own table
  x86/syscalls: Disallow compat entries for all types of 64-bit syscalls
  x86/syscalls: Use the compat versions of rt_sigsuspend() and rt_sigprocmask()
  x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long
2019-09-16 19:06:29 -07:00
Linus Torvalds
22331f8952 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu-feature updates from Ingo Molnar:

 - Rework the Intel model names symbols/macros, which were decades of
   ad-hoc extensions and added random noise. It's now a coherent, easy
   to follow nomenclature.

 - Add new Intel CPU model IDs:
    - "Tiger Lake" desktop and mobile models
    - "Elkhart Lake" model ID
    - and the "Lightning Mountain" variant of Airmont, plus support code

 - Add the new AVX512_VP2INTERSECT instruction to cpufeatures

 - Remove Intel MPX user-visible APIs and the self-tests, because the
   toolchain (gcc) is not supporting it going forward. This is the
   first, lowest-risk phase of MPX removal.

 - Remove X86_FEATURE_MFENCE_RDTSC

 - Various smaller cleanups and fixes

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  x86/cpu: Update init data for new Airmont CPU model
  x86/cpu: Add new Airmont variant to Intel family
  x86/cpu: Add Elkhart Lake to Intel family
  x86/cpu: Add Tiger Lake to Intel family
  x86: Correct misc typos
  x86/intel: Add common OPTDIFFs
  x86/intel: Aggregate microserver naming
  x86/intel: Aggregate big core graphics naming
  x86/intel: Aggregate big core mobile naming
  x86/intel: Aggregate big core client naming
  x86/cpufeature: Explain the macro duplication
  x86/ftrace: Remove mcount() declaration
  x86/PCI: Remove superfluous returns from void functions
  x86/msr-index: Move AMD MSRs where they belong
  x86/cpu: Use constant definitions for CPU models
  lib: Remove redundant ftrace flag removal
  x86/crash: Remove unnecessary comparison
  x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE()
  x86: Remove X86_FEATURE_MFENCE_RDTSC
  x86/mpx: Remove MPX APIs
  ...
2019-09-16 18:47:53 -07:00
Linus Torvalds
fc6fd1392a Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build cleanup from Ingo Molnar:
 "A single change that removes unnecessary asm-generic wrappers"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Remove unneeded uapi asm-generic wrappers
2019-09-16 18:29:19 -07:00
Linus Torvalds
df4c0b18f2 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:

 - Add UMIP emulation/spoofing for 64-bit processes as well, because of
   Wine based gaming.

 - Clean up symbols/labels in low level asm code

 - Add an assembly optimized mul_u64_u32_div() implementation on x86-64.

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/umip: Add emulation (spoofing) for UMIP covered instructions in 64-bit processes as well
  x86/asm: Make some functions local labels
  x86/asm/suspend: Get rid of bogus_64_magic
  x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64
2019-09-16 18:07:08 -07:00
Linus Torvalds
7e67a85999 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
   Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
   Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.

   As perf and the scheduler is getting bigger and more complex,
   document the status quo of current responsibilities and interests,
   and spread the review pain^H^H^H^H fun via an increase in the Cc:
   linecount generated by scripts/get_maintainer.pl. :-)

 - Add another series of patches that brings the -rt (PREEMPT_RT) tree
   closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
   into a new CONFIG_PREEMPTION category that will allow the eventual
   introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
   to go though.

 - Extend the CPU cgroup controller with uclamp.min and uclamp.max to
   allow the finer shaping of CPU bandwidth usage.

 - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).

 - Improve the behavior of high CPU count, high thread count
   applications running under cpu.cfs_quota_us constraints.

 - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.

 - Improve CPU isolation housekeeping CPU allocation NUMA locality.

 - Fix deadline scheduler bandwidth calculations and logic when cpusets
   rebuilds the topology, or when it gets deadline-throttled while it's
   being offlined.

 - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
   setscheduler() system calls without creating global serialization.
   Add new synchronization between cpuset topology-changing events and
   the deadline acceptance tests in setscheduler(), which were broken
   before.

 - Rework the active_mm state machine to be less confusing and more
   optimal.

 - Rework (simplify) the pick_next_task() slowpath.

 - Improve load-balancing on AMD EPYC systems.

 - ... and misc cleanups, smaller fixes and improvements - please see
   the Git log for more details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  sched/psi: Correct overly pessimistic size calculation
  sched/fair: Speed-up energy-aware wake-ups
  sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
  sched/uclamp: Update CPU's refcount on TG's clamp changes
  sched/uclamp: Use TG's clamps to restrict TASK's clamps
  sched/uclamp: Propagate system defaults to the root group
  sched/uclamp: Propagate parent clamps
  sched/uclamp: Extend CPU's cgroup controller
  sched/topology: Improve load balancing on AMD EPYC systems
  arch, ia64: Make NUMA select SMP
  sched, perf: MAINTAINERS update, add submaintainers and reviewers
  sched/fair: Use rq_lock/unlock in online_fair_sched_group
  cpufreq: schedutil: fix equation in comment
  sched: Rework pick_next_task() slow-path
  sched: Allow put_prev_task() to drop rq->lock
  sched/fair: Expose newidle_balance()
  sched: Add task_struct pointer to sched_class::set_curr_task
  sched: Rework CPU hotplug task selection
  sched/{rt,deadline}: Fix set_next_task vs pick_next_task
  sched: Fix kerneldoc comment for ia64_set_curr_task
  ...
2019-09-16 17:25:49 -07:00
Linus Torvalds
772c1d06bd Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Improved kbprobes robustness

   - Intel PEBS support for PT hardware tracing

   - Other Intel PT improvements: high order pages memory footprint
     reduction and various related cleanups

   - Misc cleanups

  The perf tooling side has been very busy in this cycle, with over 300
  commits. This is an incomplete high-level summary of the many
  improvements done by over 30 developers:

   - Lots of updates to the following tools:

      'perf c2c'
      'perf config'
      'perf record'
      'perf report'
      'perf script'
      'perf test'
      'perf top'
      'perf trace'

   - Updates to libperf and libtraceevent, and a consolidation of the
     proliferation of x86 instruction decoder libraries.

   - Vendor event updates for Intel and PowerPC CPUs,

   - Updates to hardware tracing tooling for ARM and Intel CPUs,

   - ... and lots of other changes and cleanups - see the shortlog and
     Git log for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (322 commits)
  kprobes: Prohibit probing on BUG() and WARN() address
  perf/x86: Make more stuff static
  x86, perf: Fix the dependency of the x86 insn decoder selftest
  objtool: Ignore intentional differences for the x86 insn decoder
  objtool: Update sync-check.sh from perf's check-headers.sh
  perf build: Ignore intentional differences for the x86 insn decoder
  perf intel-pt: Use shared x86 insn decoder
  perf intel-pt: Remove inat.c from build dependency list
  perf: Update .gitignore file
  objtool: Move x86 insn decoder to a common location
  perf metricgroup: Support multiple events for metricgroup
  perf metricgroup: Scale the metric result
  perf pmu: Change convert_scale from static to global
  perf symbols: Move mem_info and branch_info out of symbol.h
  perf auxtrace: Uninline functions that touch perf_session
  perf tools: Remove needless evlist.h include directives
  perf tools: Remove needless evlist.h include directives
  perf tools: Remove needless thread_map.h include directives
  perf tools: Remove needless thread.h include directives
  perf tools: Remove needless map.h include directives
  ...
2019-09-16 17:06:21 -07:00
Linus Torvalds
c7eba51cfd Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:

 - improve rwsem scalability

 - add uninitialized rwsem debugging check

 - reduce lockdep's stacktrace memory usage and add diagnostics

 - misc cleanups, code consolidation and constification

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mutex: Fix up mutex_waiter usage
  locking/mutex: Use mutex flags macro instead of hard code
  locking/mutex: Make __mutex_owner static to mutex.c
  locking/qspinlock,x86: Clarify virt_spin_lock_key
  locking/rwsem: Check for operations on an uninitialized rwsem
  locking/rwsem: Make handoff writer optimistically spin on owner
  locking/lockdep: Report more stack trace statistics
  locking/lockdep: Reduce space occupied by stack traces
  stacktrace: Constify 'entries' arguments
  locking/lockdep: Make it clear that what lock_class::key points at is not modified
2019-09-16 16:49:55 -07:00
Linus Torvalds
cc9b499a1f Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:

 - refactor the EFI config table handling across architectures

 - add support for the Dell EMC OEM config table

 - include AER diagnostic output to CPER handling of fatal PCIe errors

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: cper: print AER info of PCIe fatal error
  efi: Export Runtime Configuration Interface table to sysfs
  efi: ia64: move SAL systab handling out of generic EFI code
  efi/x86: move UV_SYSTAB handling into arch/x86
  efi: x86: move efi_is_table_address() into arch/x86
2019-09-16 16:47:38 -07:00
Linus Torvalds
e77fafe9af arm64 updates for 5.4:
- 52-bit virtual addressing in the kernel
 
 - New ABI to allow tagged user pointers to be dereferenced by syscalls
 
 - Early RNG seeding by the bootloader
 
 - Improve robustness of SMP boot
 
 - Fix TLB invalidation in light of recent architectural clarifications
 
 - Support for i.MX8 DDR PMU
 
 - Remove direct LSE instruction patching in favour of static keys
 
 - Function error injection using kprobes
 
 - Support for the PPTT "thread" flag introduced by ACPI 6.3
 
 - Move PSCI idle code into proper cpuidle driver
 
 - Relaxation of implicit I/O memory barriers
 
 - Build with RELR relocations when toolchain supports them
 
 - Numerous cleanups and non-critical fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl1yYREQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNAM3CAChqDFQkryXoHwdeEcaukMRVNxtxOi4pM4g
 5xqkb7PoqRJssIblsuhaXjrSD97yWCgaqCmFe6rKoes++lP4bFcTe22KXPPyPBED
 A+tK4nTuKKcZfVbEanUjI+ihXaHJmKZ/kwAxWsEBYZ4WCOe3voCiJVNO2fHxqg1M
 8TskZ2BoayTbWMXih0eJg2MCy/xApBq4b3nZG4bKI7Z9UpXiKN1NYtDh98ZEBK4V
 d/oNoHsJ2ZvIQsztoBJMsvr09DTCazCijWZiECadm6l41WEPFizngrACiSJLLtYo
 0qu4qxgg9zgFlvBCRQmIYSggTuv35RgXSfcOwChmW5DUjHG+f9GK
 =Ru4B
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "Although there isn't tonnes of code in terms of line count, there are
  a fair few headline features which I've noted both in the tag and also
  in the merge commits when I pulled everything together.

  The part I'm most pleased with is that we had 35 contributors this
  time around, which feels like a big jump from the usual small group of
  core arm64 arch developers. Hopefully they all enjoyed it so much that
  they'll continue to contribute, but we'll see.

  It's probably worth highlighting that we've pulled in a branch from
  the risc-v folks which moves our CPU topology code out to where it can
  be shared with others.

  Summary:

   - 52-bit virtual addressing in the kernel

   - New ABI to allow tagged user pointers to be dereferenced by
     syscalls

   - Early RNG seeding by the bootloader

   - Improve robustness of SMP boot

   - Fix TLB invalidation in light of recent architectural
     clarifications

   - Support for i.MX8 DDR PMU

   - Remove direct LSE instruction patching in favour of static keys

   - Function error injection using kprobes

   - Support for the PPTT "thread" flag introduced by ACPI 6.3

   - Move PSCI idle code into proper cpuidle driver

   - Relaxation of implicit I/O memory barriers

   - Build with RELR relocations when toolchain supports them

   - Numerous cleanups and non-critical fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (114 commits)
  arm64: remove __iounmap
  arm64: atomics: Use K constraint when toolchain appears to support it
  arm64: atomics: Undefine internal macros after use
  arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL
  arm64: asm: Kill 'asm/atomic_arch.h'
  arm64: lse: Remove unused 'alt_lse' assembly macro
  arm64: atomics: Remove atomic_ll_sc compilation unit
  arm64: avoid using hard-coded registers for LSE atomics
  arm64: atomics: avoid out-of-line ll/sc atomics
  arm64: Use correct ll/sc atomic constraints
  jump_label: Don't warn on __exit jump entries
  docs/perf: Add documentation for the i.MX8 DDR PMU
  perf/imx_ddr: Add support for AXI ID filtering
  arm64: kpti: ensure patched kernel text is fetched from PoU
  arm64: fix fixmap copy for 16K pages and 48-bit VA
  perf/smmuv3: Validate groups for global filtering
  perf/smmuv3: Validate group size
  arm64: Relax Documentation/arm64/tagged-pointers.rst
  arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F
  arm64: mm: Ignore spurious translation faults taken from the kernel
  ...
2019-09-16 14:31:40 -07:00
Linus Torvalds
52a5525214 IOMMU Updates for Linux v5.4:
Including:
 
 	- Batched unmap support for the IOMMU-API
 
 	- Support for unlocked command queueing in the ARM-SMMU driver
 
 	- Rework the ATS support in the ARM-SMMU driver
 
 	- More refactoring in the ARM-SMMU driver to support hardware
 	  implemention specific quirks and errata
 
 	- Bounce buffering DMA-API implementatation in the Intel VT-d driver
 	  for untrusted devices (like Thunderbolt devices)
 
 	- Fixes for runtime PM support in the OMAP iommu driver
 
 	- MT8183 IOMMU support in the Mediatek IOMMU driver
 
 	- Rework of the way the IOMMU core sets the default domain type for
 	  groups. Changing the default domain type on x86 does not require two
 	  kernel parameters anymore.
 
 	- More smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl1/pdoACgkQK/BELZcB
 GuMvCw/+K1GPyyZbPWAuXcnclraSZTHXS1lV0yilBXXyT2omFRQpRJYZGN/8NTbE
 SqD2FtzTKGuGSy2jA0drd3RcMKK/zZsFYnJShiM3FHLXatZdaFrnkK7vRHuzKlHf
 dvOlH7gHKtjIPPXodUEb0xd/oRAEIVsKjJyq1fBMARPPAluhU7mIFUI/xbGvX17g
 LM00hIxEhVNsSPemU2kTVISNBPVneecNVLlKXySjp0YPW/+sf8R7tTvwlSXX6h3I
 JM6wOU479O8mBvIcpAjfZlanHCHtqLk0ybaPx666DjdgYx6cUBHbDCF0P57XnGJA
 HNeVGtBwGQb8VWgbPLJKrStSOzYudDG8ndctqfKYN7uiPDjYM2/sqXcwQSVXR9vX
 AjT2s0GFEWT/AJhgBSeg9PJilEX1hPtomGKcQhKfR0wRGycixeZJFbwToQqzJrZN
 7XoORbZPH1S5W6sjXsXH3eVPW3EGnKipulJSPGDqFLa2aIUG+PXuu/A+iJS6sADh
 mqzVfcEs3/NYsro40eA/iQc0t99ftJXgpX18KxYprjyL6VWcwC/xeHcT/Zw9abxz
 r7dYDGTR0z6RIew0GOaeZVdZJh/J6yraKCNDS0ARZgol6JPaU7HGHhh6Ohdmmu8L
 Gtsgdxp4NeLEgp4mQiRvvpQ5pPJ/YR+oCOx3v+PPnKRLhMTxymQ=
 =UF08
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - batched unmap support for the IOMMU-API

 - support for unlocked command queueing in the ARM-SMMU driver

 - rework the ATS support in the ARM-SMMU driver

 - more refactoring in the ARM-SMMU driver to support hardware
   implemention specific quirks and errata

 - bounce buffering DMA-API implementatation in the Intel VT-d driver
   for untrusted devices (like Thunderbolt devices)

 - fixes for runtime PM support in the OMAP iommu driver

 - MT8183 IOMMU support in the Mediatek IOMMU driver

 - rework of the way the IOMMU core sets the default domain type for
   groups. Changing the default domain type on x86 does not require two
   kernel parameters anymore.

 - more smaller fixes and cleanups

* tag 'iommu-updates-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (113 commits)
  iommu/vt-d: Declare Broadwell igfx dmar support snafu
  iommu/vt-d: Add Scalable Mode fault information
  iommu/vt-d: Use bounce buffer for untrusted devices
  iommu/vt-d: Add trace events for device dma map/unmap
  iommu/vt-d: Don't switch off swiotlb if bounce page is used
  iommu/vt-d: Check whether device requires bounce buffer
  swiotlb: Split size parameter to map/unmap APIs
  iommu/omap: Mark pm functions __maybe_unused
  iommu/ipmmu-vmsa: Disable cache snoop transactions on R-Car Gen3
  iommu/ipmmu-vmsa: Move IMTTBCR_SL0_TWOBIT_* to restore sort order
  iommu: Don't use sme_active() in generic code
  iommu/arm-smmu-v3: Fix build error without CONFIG_PCI_ATS
  iommu/qcom: Use struct_size() helper
  iommu: Remove wrong default domain comments
  iommu/dma: Fix for dereferencing before null checking
  iommu/mediatek: Clean up struct mtk_smi_iommu
  memory: mtk-smi: Get rid of need_larbid
  iommu/mediatek: Fix VLD_PA_RNG register backup when suspend
  memory: mtk-smi: Add bus_sel for mt8183
  memory: mtk-smi: Invoke pm runtime_callback to enable clocks
  ...
2019-09-16 14:14:40 -07:00
Rasmus Villemoes
32ee8230b2 x86: bug.h: use asm_inline in _BUG_FLAGS definitions
This helps preventing a BUG* or WARN* in some static inline from
preventing that (or one of its callers) being inlined, so should allow
gcc to make better informed inlining decisions.

For example, with gcc 9.2, tcp_fastopen_no_cookie() vanishes from
net/ipv4/tcp_fastopen.o. It does not itself have any BUG or WARN, but
it calls dst_metric() which has a WARN_ON_ONCE - and despite that
WARN_ON_ONCE vanishing since the condition is compile-time false,
dst_metric() is apparently sufficiently "large" that when it gets
inlined into tcp_fastopen_no_cookie(), the latter becomes too large
for inlining.

Overall, if one asks size(1), .text decreases a little and .data
increases by about the same amount (x86-64 defconfig)

$ size vmlinux.{before,after}
   text    data     bss     dec     hex filename
19709726        5202600 1630280 26542606        195020e vmlinux.before
19709330        5203068 1630280 26542678        1950256 vmlinux.after

while bloat-o-meter says

add/remove: 10/28 grow/shrink: 103/51 up/down: 3669/-2854 (815)
...
Total: Before=14783683, After=14784498, chg +0.01%

Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2019-09-15 20:14:15 +02:00
Rasmus Villemoes
40576e5e63 x86: alternative.h: use asm_inline for all alternative variants
Most, if not all, uses of the alternative* family just provide one or
two instructions in .text, but the string literal can be quite large,
causing gcc to overestimate the size of the generated code. That in
turn affects its decisions about inlining of the function containing
the alternative() asm statement.

New enough versions of gcc allow one to overrule the estimated size by
using "asm inline" instead of just "asm". So replace asm by the helper
asm_inline, which for older gccs just expands to asm.

Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2019-09-15 20:14:15 +02:00
Sean Christopherson
002c5f73c5 KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
James Harvey reported a livelock that was introduced by commit
d012a06ab1 ("Revert "KVM: x86/mmu: Zap only the relevant pages when
removing a memslot"").

The livelock occurs because kvm_mmu_zap_all() as it exists today will
voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
to add shadow pages.  With enough vCPUs, kvm_mmu_zap_all() can get stuck
in an infinite loop as it can never zap all pages before observing lock
contention or the need to reschedule.  The equivalent of kvm_mmu_zap_all()
that was in use at the time of the reverted commit (4e103134b8, "KVM:
x86/mmu: Zap only the relevant pages when removing a memslot") employed
a fast invalidate mechanism and was not susceptible to the above livelock.

There are three ways to fix the livelock:

- Reverting the revert (commit d012a06ab1) is not a viable option as
  the revert is needed to fix a regression that occurs when the guest has
  one or more assigned devices.  It's unlikely we'll root cause the device
  assignment regression soon enough to fix the regression timely.

- Remove the conditional reschedule from kvm_mmu_zap_all().  However, although
  removing the reschedule would be a smaller code change, it's less safe
  in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
  the wild for flushing memslots since the fast invalidate mechanism was
  introduced by commit 6ca18b6950 ("KVM: x86: use the fast way to
  invalidate all pages"), back in 2013.

- Reintroduce the fast invalidate mechanism and use it when zapping shadow
  pages in response to a memslot being deleted/moved, which is what this
  patch does.

For all intents and purposes, this is a revert of commit ea145aacf4
("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
commit 7390de1e99 ("Revert "KVM: x86: use the fast way to invalidate
all pages""), i.e. restores the behavior of commit 5304b8d37c ("KVM:
MMU: fast invalidate all pages") and commit 6ca18b6950 ("KVM: x86:
use the fast way to invalidate all pages") respectively.

Fixes: d012a06ab1 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
Reported-by: James Harvey <jamespharvey20@gmail.com>
Cc: Alex Willamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-14 09:25:11 +02:00
Liran Alon
4b9852f4f3 KVM: x86: Fix INIT signal handling in various CPU states
Commit cd7764fe9f ("KVM: x86: latch INITs while in system management mode")
changed code to latch INIT while vCPU is in SMM and process latched INIT
when leaving SMM. It left a subtle remark in commit message that similar
treatment should also be done while vCPU is in VMX non-root-mode.

However, INIT signals should actually be latched in various vCPU states:
(*) For both Intel and AMD, INIT signals should be latched while vCPU
is in SMM.
(*) For Intel, INIT should also be latched while vCPU is in VMX
operation and later processed when vCPU leaves VMX operation by
executing VMXOFF.
(*) For AMD, INIT should also be latched while vCPU runs with GIF=0
or in guest-mode with intercept defined on INIT signal.

To fix this:
1) Add kvm_x86_ops->apic_init_signal_blocked() such that each CPU vendor
can define the various CPU states in which INIT signals should be
blocked and modify kvm_apic_accept_events() to use it.
2) Modify vmx_check_nested_events() to check for pending INIT signal
while vCPU in guest-mode. If so, emualte vmexit on
EXIT_REASON_INIT_SIGNAL. Note that nSVM should have similar behaviour
but is currently left as a TODO comment to implement in the future
because nSVM don't yet implement svm_check_nested_events().

Note: Currently KVM nVMX implementation don't support VMX wait-for-SIPI
activity state as specified in MSR_IA32_VMX_MISC bits 6:8 exposed to
guest (See nested_vmx_setup_ctls_msrs()).
If and when support for this activity state will be implemented,
kvm_check_nested_events() would need to avoid emulating vmexit on
INIT signal in case activity-state is wait-for-SIPI. In addition,
kvm_apic_accept_events() would need to be modified to avoid discarding
SIPI in case VMX activity-state is wait-for-SIPI but instead delay
SIPI processing to vmx_check_nested_events() that would clear
pending APIC events and emulate vmexit on SIPI.

Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Co-developed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-11 18:11:45 +02:00
Liran Alon
4a53d99dd0 KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode
According to Intel SDM section 25.2 "Other Causes of VM Exits",
When INIT signal is received on a CPU that is running in VMX
non-root mode it should cause an exit with exit-reason of 3.
(See Intel SDM Appendix C "VMX BASIC EXIT REASONS")

This patch introduce the exit-reason definition.

Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Co-developed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-11 18:07:12 +02:00
Sean Christopherson
380e0055bc KVM: nVMX: trace nested VM-Enter failures detected by H/W
Use the recently added tracepoint for logging nested VM-Enter failures
instead of spamming the kernel log when hardware detects a consistency
check failure.  Take the opportunity to print the name of the error code
instead of dumping the raw hex number, but limit the symbol table to
error codes that can reasonably be encountered by KVM.

Add an equivalent tracepoint in nested_vmx_check_vmentry_hw(), e.g. so
that tracing of "invalid control field" errors isn't suppressed when
nested early checks are enabled.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-11 17:34:17 +02:00
Christoph Hellwig
b4dca15129 swiotlb-xen: simplify cache maintainance
Now that we know we always have the dma-noncoherent.h helpers available
if we are on an architecture with support for non-coherent devices,
we can just call them directly, and remove the calls to the dma-direct
routines, including the fact that we call the dma_direct_map_page
routines but ignore the value returned from it.  Instead we now have
Xen wrappers for the arch_sync_dma_for_{device,cpu} helpers that call
the special Xen versions of those routines for foreign pages.

Note that the new helpers get the physical address passed in addition
to the dma address to avoid another translation for the local cache
maintainance.  The pfn_valid checks remain on the dma address as in
the old code, even if that looks a little funny.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2019-09-11 12:43:27 +02:00
Joerg Roedel
e95adb9add Merge branches 'arm/omap', 'arm/exynos', 'arm/smmu', 'arm/mediatek', 'arm/qcom', 'arm/renesas', 'x86/amd', 'x86/vt-d' and 'core' into next 2019-09-11 12:39:19 +02:00
Sean Christopherson
1edce0a9eb KVM: x86: Add kvm_emulate_{rd,wr}msr() to consolidate VXM/SVM code
Move RDMSR and WRMSR emulation into common x86 code to consolidate
nearly identical SVM and VMX code.

Note, consolidating RDMSR introduces an extra indirect call, i.e.
retpoline, due to reaching {svm,vmx}_get_msr() via kvm_x86_ops, but a
guest kernel likely has bigger problems if increasing the latency of
RDMSR VM-Exits by ~70 cycles has a measurable impact on overall VM
performance.  E.g. the only recurring RDMSR VM-Exits (after booting) on
my system running Linux 5.2 in the guest are for MSR_IA32_TSC_ADJUST via
arch_cpu_idle_enter().

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:18:29 +02:00
Sean Christopherson
f20935d85a KVM: x86: Refactor up kvm_{g,s}et_msr() to simplify callers
Refactor the top-level MSR accessors to take/return the index and value
directly instead of requiring the caller to dump them into a msr_data
struct.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:18:14 +02:00
Paolo Bonzini
32d1d15c52 KVM/arm updates for 5.4
- New ITS translation cache
 - Allow up to 512 CPUs to be supported with GICv3 (for real this time)
 - Now call kvm_arch_vcpu_blocking early in the blocking sequence
 - Tidy-up device mappings in S2 when DIC is available
 - Clean icache invalidation on VMID rollover
 - General cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl12QlAPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDXxUQAMd+GlOlmTXqEiuKudVApTkl6WIebfh0vkn6
 /1j8yNgJqRtZEY/YqE/XhAaqz1tx88VtzqSrNG4Pmrl9rDHMD9mDuk+w5UvEN2vy
 D5/nEe/wnzyVpuROBlHhsRbCRkT/6dNpnDnydwxCUqQPhfsAHnTNx6IygVzH9BHS
 D/1+KLI1imW8YziSSf6SGlIKJtk0eo5qo/aT6/mhb+e18Dobax3miItZL4mAqFPd
 tCV8fvOLb/phdSmOZuD/3XF9JOodk2ycvF9MW9Rp/FxDx9HULCXPv/3KnoHg9ca5
 QSGz1Chj0C2avaQJ4GbHZnZZjdvL2TmVxMpixocc/VZCqlO3ifRKf91t/rq4cElG
 HxLE9AX6kqW6UK66RHUQiHxjqRG8ynz8xEmlhwd7YhCLmtmJSXLTrmc2ABf64+BT
 RaexRa3h6D19fLBcMN5gpP8I48XaRpfxg6E/jCw5ZEr/8zhzLajFnE89ftgRR04f
 bSXOnj0kAhrBZ6jRTEata1MrFAt58wiaulxTxgMlnj1hHpqA3b+x6woRECAEVOlc
 6JJuzReJSBuCJL/rVtXGF31mXNnqUo+oTcDpQSle/fDtQ/44+xlYj6V/ZeFIRHAz
 nwUw9DHyZ/JMSwPNsqtdzCnLths1rNw34A7VgdVWiqiPYEcGGUnMzkRrXKMYjjJn
 LD4+Rh/e
 =0dD/
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for 5.4

- New ITS translation cache
- Allow up to 512 CPUs to be supported with GICv3 (for real this time)
- Now call kvm_arch_vcpu_blocking early in the blocking sequence
- Tidy-up device mappings in S2 when DIC is available
- Clean icache invalidation on VMID rollover
- General cleanup
2019-09-10 19:09:14 +02:00
Paolo Bonzini
8146856b0a PPC KVM update for 5.4
- Some prep for extending the uses of the rmap array
 - Various minor fixes
 - Commits from the powerpc topic/ppc-kvm branch, which fix a problem
   with interrupts arriving after free_irq, causing host hangs and crashes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJdZwd7AAoJEJ2a6ncsY3GffDQH/2q+c2z56ZO2lzfk4Hy9piWn
 Z9PR9n72Z6TiMyVCl7CtLCyI+lRy3QVZnol14ugQNX4aFJiiwDGRHJF0wNxjeok4
 4DAIqBc60qD2dkp1LwtUM1YsLsr/n3tdrGU1b0VrHGoGTVhJDpbjhJsblXZ1ujGr
 KxQ1Uf4XsW5T7kovHuzj+FFlbB5nbEX5cBIU68maBGZSCl355wCOW35rKVITTIIv
 +VKkO2aNbk6bRmZmOi2v1D65eQa2+TKe/o48TneJv1WhL4h4hDyHdmVeWRNoAI6C
 ve8mwCAVs7IITjCJ1qcGnI8NzVxMlXgwVir7sQ1aslRLZfeRAm5FOIPNEz1ADXs=
 =3oLd
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-next-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

PPC KVM update for 5.4

- Some prep for extending the uses of the rmap array
- Various minor fixes
- Commits from the powerpc topic/ppc-kvm branch, which fix a problem
  with interrupts arriving after free_irq, causing host hangs and crashes.
2019-09-10 16:51:17 +02:00
Alexander Graf
fdcf756213 KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes
We can easily route hardware interrupts directly into VM context when
they target the "Fixed" or "LowPriority" delivery modes.

However, on modes such as "SMI" or "Init", we need to go via KVM code
to actually put the vCPU into a different mode of operation, so we can
not post the interrupt

Add code in the VMX and SVM PI logic to explicitly refuse to establish
posted mappings for advanced IRQ deliver modes. This reflects the logic
in __apic_accept_irq() which also only ever passes Fixed and LowPriority
interrupts as posted interrupts into the guest.

This fixes a bug I have with code which configures real hardware to
inject virtual SMIs into my guest.

Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 16:39:34 +02:00
Rahul Tanwar
855fa1f362 x86/cpu: Add new Airmont variant to Intel family
Add new Airmont variant CPU model to Intel family.

Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Gayatri Kammela <gayatri.kammela@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190905193020.14707-4-tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06 07:30:39 +02:00
Gayatri Kammela
0f65605a8d x86/cpu: Add Elkhart Lake to Intel family
Add the model number/CPUID of atom based Elkhart Lake to the Intel
family.

Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190905193020.14707-3-tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06 07:30:39 +02:00
Gayatri Kammela
6e1c32c5db x86/cpu: Add Tiger Lake to Intel family
Add the model numbers/CPUIDs of Tiger Lake mobile and desktop to the
Intel family.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190905193020.14707-2-tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06 07:30:39 +02:00
Ingo Molnar
9326011edf Merge branch 'x86/cleanups' into x86/cpu, to pick up dependent changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06 07:30:23 +02:00
Linus Torvalds
19e4147a04 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - EFI boot fix for signed kernels

   - an AC flags fix related to UBSAN

   - Hyper-V infinite loop fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hyper-v: Fix overflow bug in fill_gva_list()
  x86/uaccess: Don't leak the AC flags into __get_user() argument evaluation
  x86/boot: Preserve boot_params.secure_boot from sanitizing
2019-09-05 09:47:32 -07:00
Joao Martins
97d3eb9da8 cpuidle-haltpoll: vcpu hotplug support
When cpus != maxcpus cpuidle-haltpoll will fail to register all vcpus
past the online ones and thus fail to register the idle driver.
This is because cpuidle_add_sysfs() will return with -ENODEV as a
consequence from get_cpu_device() return no device for a non-existing
CPU.

Instead switch to cpuidle_register_driver() and manually register each
of the present cpus through cpuhp_setup_state() callbacks and future
ones that get onlined or offlined. This mimmics similar logic that
intel_idle does.

Fixes: fa86ee90eb ("add cpuidle-haltpoll driver")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-09-03 09:36:36 +02:00
Christoph Hellwig
aeb415fbe9 x86/mm: Remove the unused set_memory_wt() function
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190826075558.8125-5-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-03 09:26:37 +02:00
Christoph Hellwig
185be15143 x86/mm: Remove set_pages_x() and set_pages_nx()
These wrappers don't provide a real benefit over just using
set_memory_x() and set_memory_nx().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190826075558.8125-4-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-03 09:26:37 +02:00
Christoph Hellwig
a919198b97 x86/mm: Remove the unused set_memory_array_*() functions
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190826075558.8125-3-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-03 09:26:37 +02:00
Ingo Molnar
ae1ad26388 Linux 5.3-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl1tSg4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG018IAJGV7SbXggW/iC+e
 cSMlo8kPnuU7dKCUW+ngXnZY1xuDYWPhXMX9+yDYf2NfMYGdDGYZ+GRjSFim816w
 HsNsovnYiyxhkh+wA/DmZPWKdTgYrIxbPRO+MlO5ZfbxWNaLgSjqirz0iBITSv3S
 r2XLmFw8GVACv/GkNGrWBM53wpkJLHzvwaV9hg6dr8HFDipaEn7vEY9/LAN3S3fw
 reVwW6Q4N4+RSofM1eIGgAZsTYbYBDfri94mRQZ3y+Q8EkRGkJ270WKA0OAVFYS7
 KA6nrjvGSYVtmDK3HORjbINQn3bXwIKeMZHl15c+LGM9ePwoHbsN3+smBswRX+R3
 JDQjkhY=
 =DV37
 -----END PGP SIGNATURE-----

Merge tag 'v5.3-rc7' into x86/mm, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-03 09:23:41 +02:00
Peter Zijlstra
db4e919d9a x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64
On x86_64 we can do a u64 * u64 -> u128 widening multiply followed by
a u128 / u64 -> u64 division to implement a sane version of
mul_u64_u32_div().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-03 08:56:14 +02:00
Peter Zijlstra
9b8bd476e7 x86/uaccess: Don't leak the AC flags into __get_user() argument evaluation
Identical to __put_user(); the __get_user() argument evalution will too
leak UBSAN crud into the __uaccess_begin() / __uaccess_end() region.
While uncommon this was observed to happen for:

  drivers/xen/gntdev.c: if (__get_user(old_status, batch->status[i]))

where UBSAN added array bound checking.

This complements commit:

  6ae865615f ("x86/uaccess: Dont leak the AC flag into __put_user() argument evaluation")

Tested-by Sedat Dilek <sedat.dilek@gmail.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: broonie@kernel.org
Cc: sfr@canb.auug.org.au
Cc: akpm@linux-foundation.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: mhocko@suse.cz
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20190829082445.GM2369@hirez.programming.kicks-ass.net
2019-09-02 14:22:38 +02:00
Marco Ammon
32b1cbe380 x86: Correct misc typos
Correct spelling typos in comments in different files under arch/x86/.

 [ bp: Merge into a single patch, massage. ]

Signed-off-by: Marco Ammon <marco.ammon@fau.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190902102436.27396-1-marco.ammon@fau.de
2019-09-02 14:02:59 +02:00
John S. Gruber
29d9a0b507 x86/boot: Preserve boot_params.secure_boot from sanitizing
Commit

  a90118c445 ("x86/boot: Save fields explicitly, zero out everything else")

now zeroes the secure boot setting information (enabled/disabled/...)
passed by the boot loader or by the kernel's EFI handover mechanism.

The problem manifests itself with signed kernels using the EFI handoff
protocol with grub and the kernel loses the information whether secure
boot is enabled in the firmware, i.e., the log message "Secure boot
enabled" becomes "Secure boot could not be determined".

efi_main() arch/x86/boot/compressed/eboot.c sets this field early but it
is subsequently zeroed by the above referenced commit.

Include boot_params.secure_boot in the preserve field list.

 [ bp: restructure commit message and massage. ]

Fixes: a90118c445 ("x86/boot: Save fields explicitly, zero out everything else")
Signed-off-by: John S. Gruber <JohnSGruber@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: stable <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/CAPotdmSPExAuQcy9iAHqX3js_fc4mMLQOTr5RBGvizyCOPcTQQ@mail.gmail.com
2019-09-02 09:17:45 +02:00
Ingo Molnar
e98db89489 Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-02 09:12:21 +02:00
Ingo Molnar
77e5517cb5 Merge branch 'linus' into x86/cpu, to resolve conflicts
Conflicts:
	tools/power/x86/turbostat/turbostat.c

Recent turbostat changes conflicted with a pending rename of x86 model names in tip:x86/cpu,
sort it out.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-02 09:10:07 +02:00
Linus Torvalds
5fb181cba0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Two fixes for perf x86 hardware implementations:

   - Restrict the period on Nehalem machines to prevent perf from
     hogging the CPU

   - Prevent the AMD IBS driver from overwriting the hardwre controlled
     and pre-seeded reserved bits (0-6) in the count register which
     caused a sample bias for dispatched micro-ops"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
  perf/x86/intel: Restrict period on Nehalem
2019-09-01 11:09:42 -07:00
Jisheng Zhang
2e81562731 ftrace/x86: Remove mcount() declaration
Commit 562e14f722 ("ftrace/x86: Remove mcount support") removed the
support for using mcount, so we could remove the mcount() declaration
to clean up.

Link: http://lkml.kernel.org/r/20190826170150.10f101ba@xhacker.debian

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31 06:51:55 -04:00
Kim Phillips
0f4cd769c4 perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
When counting dispatched micro-ops with cnt_ctl=1, in order to prevent
sample bias, IBS hardware preloads the least significant 7 bits of
current count (IbsOpCurCnt) with random values, such that, after the
interrupt is handled and counting resumes, the next sample taken
will be slightly perturbed.

The current count bitfield is in the IBS execution control h/w register,
alongside the maximum count field.

Currently, the IBS driver writes that register with the maximum count,
leaving zeroes to fill the current count field, thereby overwriting
the random bits the hardware preloaded for itself.

Fix the driver to actually retain and carry those random bits from the
read of the IBS control register, through to its write, instead of
overwriting the lower current count bits with zeroes.

Tested with:

perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload>

'perf annotate' output before:

 15.70  65:   addsd     %xmm0,%xmm1
 17.30        add       $0x1,%rax
 15.88        cmp       %rdx,%rax
              je        82
 17.32  72:   test      $0x1,%al
              jne       7c
  7.52        movapd    %xmm1,%xmm0
  5.90        jmp       65
  8.23  7c:   sqrtsd    %xmm1,%xmm0
 12.15        jmp       65

'perf annotate' output after:

 16.63  65:   addsd     %xmm0,%xmm1
 16.82        add       $0x1,%rax
 16.81        cmp       %rdx,%rax
              je        82
 16.69  72:   test      $0x1,%al
              jne       7c
  8.30        movapd    %xmm1,%xmm0
  8.13        jmp       65
  8.24  7c:   sqrtsd    %xmm1,%xmm0
  8.39        jmp       65

Tested on Family 15h and 17h machines.

Machines prior to family 10h Rev. C don't have the RDWROPCNT capability,
and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't
affect their operation.

It is unknown why commit db98c5faf8 ("perf/x86: Implement 64-bit
counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt
field; the number of preloaded random bits has always been 7, AFAICT.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org>
Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: "Namhyung Kim" <namhyung@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com
2019-08-30 14:27:47 +02:00
Thomas Hellstrom
b4dd4f6e36 x86/vmware: Add a header file for hypercall definitions
The new header is intended to be used by drivers using the backdoor.
Follow the KVM example using alternatives self-patching to choose
between vmcall, vmmcall and io instructions.

Also define two new CPU feature flags to indicate hypervisor support
for vmcall- and vmmcall instructions. The new XF86_FEATURE_VMW_VMMCALL
flag is needed because using XF86_FEATURE_VMMCALL might break QEMU/KVM
setups using the vmmouse driver. They rely on XF86_FEATURE_VMMCALL
on AMD to get the kvm_hypercall() right. But they do not yet implement
vmmcall for the VMware hypercall used by the vmmouse driver.

 [ bp: reflow hypercall %edx usage explanation comment. ]

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Doug Covelli <dcovelli@vmware.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-graphics-maintainer@vmware.com
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Robert Hoo <robert.hu@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: <pv-drivers@vmware.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190828080353.12658-3-thomas_os@shipmail.org
2019-08-28 13:32:06 +02:00
Alexander Shishkin
42880f726c perf/x86/intel: Support PEBS output to PT
If PEBS declares ability to output its data to Intel PT stream, use the
aux_output attribute bit to enable PEBS data output to PT. This requires
a PT event to be present and scheduled in the same context. Unlike the
DS area, the kernel does not extract PEBS records from the PT stream to
generate corresponding records in the perf stream, because that would
require real time in-kernel PT decoding, which is not feasible. The PMI,
however, can still be used.

The output setting is per-CPU, so all PEBS events must be either writing
to PT or to the DS area, therefore, in case of conflict, the conflicting
event will fail to schedule, allowing the rotation logic to alternate
between the PEBS->PT and PEBS->DS events.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: kan.liang@linux.intel.com
Link: https://lkml.kernel.org/r/20190806084606.4021-3-alexander.shishkin@linux.intel.com
2019-08-28 11:29:39 +02:00
Peter Zijlstra
a3d8c0d13b x86/intel: Add common OPTDIFFs
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.731530141@infradead.org
2019-08-28 11:29:32 +02:00
Peter Zijlstra
5ebb34edbe x86/intel: Aggregate microserver naming
Currently big microservers have _XEON_D while small microservers have
_X, Make it uniformly: _D.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \
	       -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org
2019-08-28 11:29:32 +02:00
Peter Zijlstra
5e741407ea x86/intel: Aggregate big core graphics naming
Currently big core clients with extra graphics on have:

 - _G
 - _GT3E

Make it uniformly: _G

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_GT3E"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_GT3E/\1_G/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.622802314@infradead.org
2019-08-28 11:29:31 +02:00
Peter Zijlstra
af239c44e3 x86/intel: Aggregate big core mobile naming
Currently big core mobile chips have either:

 - _L
 - _ULT
 - _MOBILE

Make it uniformly: _L.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(MOBILE\|ULT\)"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(MOBILE\|ULT\)/\1_L/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.568978530@infradead.org
2019-08-28 11:29:31 +02:00
Peter Zijlstra
c66f78a6de x86/intel: Aggregate big core client naming
Currently the big core client models either have:

 - no OPTDIFF
 - _CORE
 - _DESKTOP

Make it uniformly: 'no OPTDIFF'.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(CORE\|DESKTOP\)"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(CORE\|DESKTOP\)/\1/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.513945586@infradead.org
2019-08-28 11:29:31 +02:00
Cao Jin
cbb1133b56 x86/cpufeature: Explain the macro duplication
Explain the intent behind the duplication of the

  BUILD_BUG_ON_ZERO(NCAPINTS != n)

check in *_MASK_CHECK and its immediate use in the *MASK_BIT_SET macros
too.

 [ bp: Massage. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Cao Jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <namit@vmware.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190828061100.27032-1-caoj.fnst@cn.fujitsu.com
2019-08-28 08:38:39 +02:00
Jisheng Zhang
248d327ed7 x86/ftrace: Remove mcount() declaration
Commit 562e14f722 ("ftrace/x86: Remove mcount support") removed the
support for mcount, but forgot to remove the mcount() declaration.

Clean it up.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r20190826170150.10f101ba@xhacker.debian
2019-08-26 16:51:04 +02:00
Ingo Molnar
b3e30c9884 Linux 5.3-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl1i2wkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGcDQIAJINYON5WdDSFDpp
 htva213hSIxYLix8Dc4cTMk8qT/P2MAj9pPYERuLwIxWZlfbduW6Fxy8bJANZ7k3
 4cJ/IbmA5M5ZIaOJTTL45w8H0CMR/4mdPl5rb5k/Wkh449Cj101gZLlh0FEtR5zG
 uDJecKSuHjH1ikySk6+zmRG5X+lq6wNY8NkuBtfwAwLffFc0ljQHwPUMJ8ojgqt/
 p3ChNgtb/I6U6ExITlyktKdP59bAoHAoBiKKFZWw5yJWgXE2q4Sv9nT4Btkr5KdJ
 9mnWnSaSLwptNCOtU4tKLwFIZP2WoVXGPNxxq4XLoTEuieXCqmikhc9tSSTwk+Tp
 CKHN6wU=
 =JkJ4
 -----END PGP SIGNATURE-----

Merge tag 'v5.3-rc6' into x86/cpu, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26 11:20:55 +02:00
Sean Christopherson
b63f20a778 x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to
avoid clobbering flags.

KVM's emulator makes indirect calls into a jump table of sorts, where
the destination of the CALL_NOSPEC is a small blob of code that performs
fast emulation by executing the target instruction with fixed operands.

  adcb_al_dl:
     0x000339f8 <+0>:   adc    %dl,%al
     0x000339fa <+2>:   ret

A major motiviation for doing fast emulation is to leverage the CPU to
handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
both an input and output to the target of CALL_NOSPEC.  Clobbering flags
results in all sorts of incorrect emulation, e.g. Jcc instructions often
take the wrong path.  Sans the nops...

  asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
     0x0003595a <+58>:  mov    0xc0(%ebx),%eax
     0x00035960 <+64>:  mov    0x60(%ebx),%edx
     0x00035963 <+67>:  mov    0x90(%ebx),%ecx
     0x00035969 <+73>:  push   %edi
     0x0003596a <+74>:  popf
     0x0003596b <+75>:  call   *%esi
     0x000359a0 <+128>: pushf
     0x000359a1 <+129>: pop    %edi
     0x000359a2 <+130>: mov    %eax,0xc0(%ebx)
     0x000359b1 <+145>: mov    %edx,0x60(%ebx)

  ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
     0x000359a8 <+136>: mov    -0x10(%ebp),%eax
     0x000359ab <+139>: and    $0x8d5,%edi
     0x000359b4 <+148>: and    $0xfffff72a,%eax
     0x000359b9 <+153>: or     %eax,%edi
     0x000359bd <+157>: mov    %edi,0x4(%ebx)

For the most part this has gone unnoticed as emulation of guest code
that can trigger fast emulation is effectively limited to MMIO when
running on modern hardware, and MMIO is rarely, if ever, accessed by
instructions that affect or consume flags.

Breakage is almost instantaneous when running with unrestricted guest
disabled, in which case KVM must emulate all instructions when the guest
has invalid state, e.g. when the guest is in Big Real Mode during early
BIOS.

Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support")
Fixes: 1a29b5b7f3 ("KVM: x86: Make indirect calls in emulator speculation safe")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel.com
2019-08-23 17:38:13 +02:00
Vitaly Kuznetsov
3e2d94535a clocksource/drivers/hyperv: Enable TSC page clocksource on 32bit
There is no particular reason to not enable TSC page clocksource on
32-bit. mul_u64_u64_shr() is available and despite the increased
computational complexity (compared to 64bit) TSC page is still a huge win
compared to MSR-based clocksource.

In-kernel reads:
  MSR based clocksource: 3361 cycles
  TSC page clocksource: 49 cycles

Reads from userspace (utilizing vDSO in case of TSC page):
  MSR based clocksource: 5664 cycles
  TSC page clocksource: 131 cycles

Enabling TSC page on 32bits allows to get rid of CONFIG_HYPERV_TSCPAGE as
it is now not any different from CONFIG_HYPERV_TIMER.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lkml.kernel.org/r/20190822083630.17059-1-vkuznets@redhat.com
2019-08-23 16:59:54 +02:00
Joerg Roedel
c53c47aac4 x86/dma: Get rid of iommu_pass_through
This variable has no users anymore. Remove it and tell the
IOMMU code via its new functions about requested DMA modes.

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-08-23 10:11:01 +02:00
Sean Christopherson
871bd03460 KVM: x86: Rename access permissions cache member in struct kvm_vcpu_arch
Rename "access" to "mmio_access" to match the other MMIO cache members
and to make it more obvious that it's tracking the access permissions
for the MMIO cache.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22 10:09:23 +02:00
Vitaly Kuznetsov
02d4160fbd x86: KVM: add xsetbv to the emulator
To avoid hardcoding xsetbv length to '3' we need to support decoding it in
the emulator.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22 10:09:20 +02:00
Vitaly Kuznetsov
f8ea7c6049 x86: kvm: svm: propagate errors from skip_emulated_instruction()
On AMD, kvm_x86_ops->skip_emulated_instruction(vcpu) can, in theory,
fail: in !nrips case we call kvm_emulate_instruction(EMULTYPE_SKIP).
Currently, we only do printk(KERN_DEBUG) when this happens and this
is not ideal. Propagate the error up the stack.

On VMX, skip_emulated_instruction() doesn't fail, we have two call
sites calling it explicitly: handle_exception_nmi() and
handle_task_switch(), we can just ignore the result.

On SVM, we also have two explicit call sites:
svm_queue_exception() and it seems we don't need to do anything there as
we check if RIP was advanced or not. In task_switch_interception(),
however, we are better off not proceeding to kvm_task_switch() in case
skip_emulated_instruction() failed.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22 10:09:19 +02:00
Ard Biesheuvel
8ce5fac2dc crypto: x86/xts - implement support for ciphertext stealing
Align the x86 code with the generic XTS template, which now supports
ciphertext stealing as described by the IEEE XTS-AES spec P1619.

Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-22 14:57:34 +10:00
John Hubbard
7846f58fba x86/boot: Fix boot regression caused by bootparam sanitizing
commit a90118c445 ("x86/boot: Save fields explicitly, zero out everything
else") had two errors:

    * It preserved boot_params.acpi_rsdp_addr, and
    * It failed to preserve boot_params.hdr

Therefore, zero out acpi_rsdp_addr, and preserve hdr.

Fixes: a90118c445 ("x86/boot: Save fields explicitly, zero out everything else")
Reported-by: Neil MacLeod <neil@nmacleod.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neil MacLeod <neil@nmacleod.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190821192513.20126-1-jhubbard@nvidia.com
2019-08-21 22:37:09 +02:00
Josh Boyer
41fa1ee9c6 acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
This option allows userspace to pass the RSDP address to the kernel, which
makes it possible for a user to modify the workings of hardware. Reject
the option when the kernel is locked down. This requires some reworking
of the existing RSDP command line logic, since the early boot code also
makes use of a command-line passed RSDP when locating the SRAT table
before the lockdown code has been initialised. This is achieved by
separating the command line RSDP path in the early boot code from the
generic RSDP path, and then copying the command line RSDP into boot
params in the kernel proper if lockdown is not enabled. If lockdown is
enabled and an RSDP is provided on the command line, this will only be
used when parsing SRAT (which shouldn't permit kernel code execution)
and will be ignored in the rest of the kernel.

(Modified by Matthew Garrett in order to handle the early boot RSDP
environment)

Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: Dave Young <dyoung@redhat.com>
cc: linux-acpi@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19 21:54:16 -07:00
Heiner Kallweit
d6f83427ff x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() code
Both the 64bit and the 32bit handle_irq() implementation check the irq
descriptor pointer with IS_ERR_OR_NULL() and return failure. That can be
done simpler in the common do_IRQ() code.

This reduces the 64bit handle_irq() function to a wrapper around
generic_handle_irq_desc(). Invoke it directly from do_IRQ() to spare the
extra function call.

[ tglx: Got rid of the #ifdef and massaged changelog ]

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/2ec758c7-9aaa-73ab-f083-cc44c86aa741@gmail.com
2019-08-19 23:19:06 +02:00
Heiner Kallweit
e30c44e2e5 x86/irq: Improve definition of VECTOR_SHUTDOWN et al
These values are used with IS_ERR(), so it's more intuitive to define
them like a standard PTR_ERR() of a negative errno.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/146835e8-c086-4e85-7ece-bcba6795e6db@gmail.com
2019-08-19 23:19:06 +02:00
Cao jin
c84b82dd3e x86/fixmap: Cleanup outdated comments
Remove stale comments and fix the not longer valid pagetable entry
reference.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190809114612.2569-1-caoj.fnst@cn.fujitsu.com
2019-08-19 21:50:19 +02:00
Tom Lendacky
c49a0a8013 x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
There have been reports of RDRAND issues after resuming from suspend on
some AMD family 15h and family 16h systems. This issue stems from a BIOS
not performing the proper steps during resume to ensure RDRAND continues
to function properly.

RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be
reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND
support using CPUID, including the kernel, will believe that RDRAND is
not supported.

Update the CPU initialization to clear the RDRAND CPUID bit for any family
15h and 16h processor that supports RDRAND. If it is known that the family
15h or family 16h system does not have an RDRAND resume issue or that the
system will not be placed in suspend, the "rdrand=force" kernel parameter
can be used to stop the clearing of the RDRAND CPUID bit.

Additionally, update the suspend and resume path to save and restore the
MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in
place after resuming from suspend.

Note, that clearing the RDRAND CPUID bit does not prevent a processor
that normally supports the RDRAND instruction from executing it. So any
code that determined the support based on family and model won't #UD.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>
Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com
2019-08-19 19:42:52 +02:00
Borislav Petkov
342061c53a x86/msr-index: Move AMD MSRs where they belong
... sort them in and fixup comment, while at it.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190819070140.23708-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-19 10:55:44 +02:00
Tony Luck
12ece2d53d x86/cpu: Explain Intel model naming convention
Dave Hansen spelled out the rules in an e-mail:

 https://lkml.kernel.org/r/91eefbe4-e32b-d762-be4d-672ff915db47@intel.com

Copy those right into the <asm/intel-family.h> file to make it easy for
people to find them.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190815224704.GA10025@agluck-desk2.amr.corp.intel.com
2019-08-17 10:06:32 +02:00
John Hubbard
a90118c445 x86/boot: Save fields explicitly, zero out everything else
Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds
memset, if the memset goes accross several fields of a struct. This
generated a couple of warnings on x86_64 builds in sanitize_boot_params().

Fix this by explicitly saving the fields in struct boot_params
that are intended to be preserved, and zeroing all the rest.

[ tglx: Tagged for stable as it breaks the warning free build there as well ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubbard@nvidia.com
2019-08-16 14:20:00 +02:00
Linus Torvalds
7f20fd2337 Bugfixes (arm and x86) and cleanups.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdTfRfAAoJEL/70l94x66DcN0IAIwyaU2+kwP0jd2miQuKxgwl
 WU4u7dZCoQC6meWEVmrSJIVMBONRubmZ9iCqT7807YP8YZSQpOth51FMbULUWuy1
 VW1eaRwqidX0EAihDhg2ZbBZ8H6RQ9Fn0aiEEh44dAZZAwGSVnO3PRKvQEJ15xjk
 q+OQ4hrxtoorwLj+myejmq3YenTFTCMMJfYwwvlCl+J1FfrLZi5k3X5Gjk+j8Ixd
 8CL8/6u5Lu6MCgfYVvxvo8/bUPiATBdF1sWJMMALwXTrDiSy4tQRD0NvZP1HM8G1
 hy0XnhgtsS9rWNLtAFOj+r/XhP9V5lOOGX8yBcj0XQQr+DC9MG6MCL+pXXOaMcA=
 =ZZh8
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes (arm and x86) and cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  selftests: kvm: Adding config fragments
  KVM: selftests: Update gitignore file for latest changes
  kvm: remove unnecessary PageReserved check
  KVM: arm/arm64: vgic: Reevaluate level sensitive interrupts on enable
  KVM: arm: Don't write junk to CP15 registers on reset
  KVM: arm64: Don't write junk to sysregs on reset
  KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
  x86: kvm: remove useless calls to kvm_para_available
  KVM: no need to check return value of debugfs_create functions
  KVM: remove kvm_arch_has_vcpu_debugfs()
  KVM: Fix leak vCPU's VMCS value into other pCPU
  KVM: Check preempted_in_kernel for involuntary preemption
  KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire
  arm64: KVM: hyp: debug-sr: Mark expected switch fall-through
  KVM: arm64: Update kvm_arm_exception_class and esr_class_str for new EC
  KVM: arm: vgic-v3: Mark expected switch fall-through
  arm64: KVM: regmap: Fix unexpected switch fall-through
  KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter index
2019-08-09 15:46:29 -07:00
Paolo Bonzini
0e1c438c44 KVM/arm fixes for 5.3
- A bunch of switch/case fall-through annotation, fixing one actual bug
 - Fix PMU reset bug
 - Add missing exception class debug strings
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl1Bzw8PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDlXYP/ixqJzqpJetTrvpiUpmLjhp4YwjjOxqyeQvo
 bWy/EFz8bSWbTZlwAAstFDVmtGenuwaiOakChvV8GH6USYqRsYdvc/sJu0evQplJ
 JQtOzGhyv1NuM0s9wYBcstAH+YAW+gBK5YFnowreheuidK/1lo3C/EnR2DxCtNal
 gpV3qQt8qfw3ysGlpC/fDjjOYw4lDkFa6CSx9uk3/587fPBqHANRY/i87nJxmhhX
 lGeCJcOrY3cy1HhbedFwxVt4Q/ZbHf0UhTfgwvsBYw7BaWmB1ymoEOoktQcUWoKb
 LL0rBe+OxNQgRnJpn3fMEHiCAmXaI9qE4dohFOl1J3dQvCElcV/jWjkXDD1+KgzW
 S2XZGB6yxet93Fh1x6xv4i6ATJvmZeTIDUXi9KkjcDiycB9YMCDYY2ejTbQv5VUP
 V0DghGGDd3d8sY7dEjxwBakuJ6nqKixSouQaNsWuBTm7tVpEVS8yW+hqWs/IVI5b
 48SDbxaNpKvx7sAyhuWAjCFbZeIm0hd//JN3JoxazF9i9PKuqnZLbNv/ME6hmzj+
 LrETwaAbjsw5Au+ST+OdT2UiauiBm9C6Kg62qagHrKJviuK941+3hjH8aj/e0pYk
 a0DQxumiyofXPQ0pVe8ZfqlPptONz+EKyAsrOm8AjLJ+bBdRUNHLcZKYj7em7YiE
 pANc8/T+
 =kcDj
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm fixes for 5.3

- A bunch of switch/case fall-through annotation, fixing one actual bug
- Fix PMU reset bug
- Add missing exception class debug strings
2019-08-09 16:53:39 +02:00
Thiago Jung Bauermann
284e21fab2 x86, s390/mm: Move sme_active() and sme_me_mask to x86-specific header
Now that generic code doesn't reference them, move sme_active() and
sme_me_mask to x86's <asm/mem_encrypt.h>.

Also remove the export for sme_active() since it's only used in files that
won't be built as modules. sme_me_mask on the other hand is used in
arch/x86/kvm/svm.c (via __sme_set() and __psp_pa()) which can be built as a
module so its export needs to stay.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190806044919.10622-5-bauerman@linux.ibm.com
2019-08-09 22:52:08 +10:00
Ard Biesheuvel
ec7e1605d7 efi/x86: move UV_SYSTAB handling into arch/x86
The SGI UV UEFI machines are tightly coupled to the x86 architecture
so there is no need to keep any awareness of its existence in the
generic EFI layer, especially since we already have the infrastructure
to handle arch-specific configuration tables, and were even already
using it to some extent.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-08-08 11:01:48 +03:00
Ard Biesheuvel
e55f31a599 efi: x86: move efi_is_table_address() into arch/x86
The function efi_is_table_address() and the associated array of table
pointers is specific to x86. Since we will be adding some more x86
specific tables, let's move this code out of the generic code first.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2019-08-08 11:01:48 +03:00
Leo Yan
45880f7b7b error-injection: Consolidate override function definition
The function override_function_with_return() is defined separately for
each architecture and every architecture's definition is almost same
with each other.  E.g. x86 and powerpc both define function in its own
asm/error-injection.h header and override_function_with_return() has
the same definition, the only difference is that x86 defines an extra
function just_return_func() but it is specific for x86 and is only used
by x86's override_function_with_return(), so don't need to export this
function.

This patch consolidates override_function_with_return() definition into
asm-generic/error-injection.h header, thus all architectures can use the
common definition.  As result, the architecture specific headers are
removed; the include/linux/error-injection.h header also changes to
include asm-generic/error-injection.h header rather than architecture
header, furthermore, it includes linux/compiler.h for successful
compilation.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2019-08-07 13:52:43 +01:00
Linus Torvalds
4368c4bc9d Merge branch 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull pti updates from Thomas Gleixner:
 "The performance deterioration departement is not proud at all to
  present yet another set of speculation fences to mitigate the next
  chapter in the 'what could possibly go wrong' story.

  The new vulnerability belongs to the Spectre class and affects GS
  based data accesses and has therefore been dubbed 'Grand Schemozzle'
  for secret communication purposes. It's officially listed as
  CVE-2019-1125.

  Conditional branches in the entry paths which contain a SWAPGS
  instruction (interrupts and exceptions) can be mis-speculated which
  results in speculative accesses with a wrong GS base.

  This can happen on entry from user mode through a mis-speculated
  branch which takes the entry from kernel mode path and therefore does
  not execute the SWAPGS instruction. The following speculative accesses
  are done with user GS base.

  On entry from kernel mode the mis-speculated branch executes the
  SWAPGS instruction in the entry from user mode path which has the same
  effect that the following GS based accesses are done with user GS
  base.

  If there is a disclosure gadget available in these code paths the
  mis-speculated data access can be leaked through the usual side
  channels.

  The entry from user mode issue affects all CPUs which have speculative
  execution. The entry from kernel mode issue affects only Intel CPUs
  which can speculate through SWAPGS. On CPUs from other vendors SWAPGS
  has semantics which prevent that.

  SMAP migitates both problems but only when the CPU is not affected by
  the Meltdown vulnerability.

  The mitigation is to issue LFENCE instructions in the entry from
  kernel mode path for all affected CPUs and on the affected Intel CPUs
  also in the entry from user mode path unless PTI is enabled because
  the CR3 write is serializing.

  The fences are as usual enabled conditionally and can be completely
  disabled on the kernel command line. The Spectre V1 documentation is
  updated accordingly.

  A big "Thank You!" goes to Josh for doing the heavy lifting for this
  round of hardware misfeature 'repair'. Of course also "Thank You!" to
  everybody else who contributed in one way or the other"

* 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: Add swapgs description to the Spectre v1 documentation
  x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
  x86/entry/64: Use JMP instead of JMPQ
  x86/speculation: Enable Spectre v1 swapgs mitigations
  x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
2019-08-06 11:22:22 -07:00
Peter Zijlstra
24a376d651 locking/qspinlock,x86: Clarify virt_spin_lock_key
Add a few comments to clarify how this is supposed to work.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
2019-08-06 12:49:16 +02:00
Paolo Bonzini
741cbbae07 KVM: remove kvm_arch_has_vcpu_debugfs()
There is no need for this function as all arches have to implement
kvm_arch_create_vcpu_debugfs() no matter what.  A #define symbol
let us actually simplify the code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-05 12:55:48 +02:00
Wanpeng Li
17e433b543 KVM: Fix leak vCPU's VMCS value into other pCPU
After commit d73eb57b80 (KVM: Boost vCPUs that are delivering interrupts), a
five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
in the VMs after stress testing:

 INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
 Call Trace:
   flush_tlb_mm_range+0x68/0x140
   tlb_flush_mmu.part.75+0x37/0xe0
   tlb_finish_mmu+0x55/0x60
   zap_page_range+0x142/0x190
   SyS_madvise+0x3cd/0x9c0
   system_call_fastpath+0x1c/0x21

swait_active() sustains to be true before finish_swait() is called in
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
by kvm_vcpu_on_spin() loop greatly increases the probability condition
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
VMCS.

This patch fixes it by checking conservatively a subset of events.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Marc Zyngier <Marc.Zyngier@arm.com>
Cc: stable@vger.kernel.org
Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-05 12:55:47 +02:00
Thomas Gleixner
48593975ae x86: Use CONFIG_PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by
CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same
functionality which today depends on CONFIG_PREEMPT.

Switch the entry code, preempt and kprobes conditionals over to
CONFIG_PREEMPTION.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190726212124.608488448@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-31 19:03:35 +02:00
Thomas Gleixner
d2f5d3fa26 x86/vdso/32: Use 32bit syscall fallback
The generic VDSO implementation uses the Y2038 safe clock_gettime64() and
clock_getres_time64() syscalls as fallback for 32bit VDSO. This breaks
seccomp setups because these syscalls might be not (yet) allowed.

Implement the 32bit variants which use the legacy syscalls and select the
variant in the core library.

The 64bit time variants are not removed because they are required for the
time64 based vdso accessors.

Fixes: 7ac8707479 ("x86/vdso: Switch to generic vDSO implementation")
Reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20190728131648.879156507@linutronix.de
2019-07-31 00:09:10 +02:00
Marcelo Tosatti
a1c4423b02 cpuidle-haltpoll: disable host side polling when kvm virtualized
When performing guest side polling, it is not necessary to
also perform host side polling.

So disable host side polling, via the new MSR interface,
when loading cpuidle-haltpoll driver.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-30 17:27:37 +02:00
Thomas Gleixner
7a30bdd99f Merge branch master from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Pick up the spectre documentation so the Grand Schemozzle can be added.
2019-07-28 22:22:40 +02:00
Thomas Gleixner
f36cf386e3 x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
Intel provided the following information:

 On all current Atom processors, instructions that use a segment register
 value (e.g. a load or store) will not speculatively execute before the
 last writer of that segment retires. Thus they will not use a
 speculatively written segment value.

That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS
entry paths can be excluded from the extra LFENCE if PTI is disabled.

Create a separate bug flag for the through SWAPGS speculation and mark all
out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs
are excluded from the whole mitigation mess anyway.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-07-28 21:39:55 +02:00
Linus Torvalds
ad28fd1cb2 SPDX fixes for 5.3-rc2
Here are some small SPDX fixes for 5.3-rc2 for things that came in
 during the 5.3-rc1 merge window that we previously missed.
 
 Only 3 small patches here:
 	- 2 uapi patches to resolve some SPDX tags that were not correct
 	- fix an invalid SPDX tag in the iomap Makefile file
 
 All have been properly reviewed on the public mailing lists.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXT2N9w8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylY9wCeJIYfs/eNf3tsjLQXxUBMYAJNqnsAn2IaMiTt
 cv2mck7JZm5KyHpP3f5N
 =RSZa
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx

Pull SPDX fixes from Greg KH:
 "Here are some small SPDX fixes for 5.3-rc2 for things that came in
  during the 5.3-rc1 merge window that we previously missed.

  Only three small patches here:

   - two uapi patches to resolve some SPDX tags that were not correct

   - fix an invalid SPDX tag in the iomap Makefile file

  All have been properly reviewed on the public mailing lists"

* tag 'spdx-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
  iomap: fix Invalid License ID
  treewide: remove SPDX "WITH Linux-syscall-note" from kernel-space headers again
  treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers
2019-07-28 10:00:06 -07:00
Ard Biesheuvel
2c53fd11f7 crypto: x86/aes-ni - switch to generic for fallback and key routines
The AES-NI code contains fallbacks for invocations that occur from a
context where the SIMD unit is unavailable, which really only occurs
when running in softirq context that was entered from a hard IRQ that
was taken while running kernel code that was already using the FPU.

That means performance is not really a consideration, and we can just
use the new library code for this use case, which has a smaller
footprint and is believed to be time invariant. This will allow us to
drop the non-SIMD asm routines in a subsequent patch.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26 14:55:34 +10:00
Thomas Gleixner
2510d09e9d x86/apic/flat64: Remove the IPI shorthand decision logic
All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain
the decision logic for shorthand invocation already and invoke
send_IPI_mask() if the prereqisites are not satisfied.

Remove the now redundant decision logic in the APIC code and the duplicate
helper in probe_64.c.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105221.042964120@linutronix.de
2019-07-25 16:12:02 +02:00
Thomas Gleixner
d0a7166bc7 x86/smp: Move smp_function_call implementations into IPI code
Move it where it belongs. That allows to keep all the shorthand logic in
one place.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.677835995@linutronix.de
2019-07-25 16:12:01 +02:00
Thomas Gleixner
22ca7ee933 x86/apic: Provide and use helper for send_IPI_allbutself()
To support IPI shorthands wrap invocations of apic->send_IPI_allbutself()
in a helper function, so the static key controlling the shorthand mode is
only in one place.

Fixup all callers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.492691679@linutronix.de
2019-07-25 16:12:00 +02:00
Thomas Gleixner
6a1cb5f5c6 x86/apic: Add static key to Control IPI shorthands
The IPI shorthand functionality delivers IPI/NMI broadcasts to all CPUs in
the system. This can have similar side effects as the MCE broadcasting when
CPUs are waiting in the BIOS or are offlined.

The kernel tracks already the state of offlined CPUs whether they have been
brought up at least once so that the CR4 MCE bit is set to make sure that
MCE broadcasts can't brick the machine.

Utilize that information and compare it to the cpu_present_mask. If all
present CPUs have been brought up at least once then the broadcast side
effect is mitigated by disabling regular interrupt/IPI delivery in the APIC
itself and by the cpu offline check at the begin of the NMI handler.

Use a static key to switch between broadcasting via shorthands or sending
the IPI/NMI one by one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.386410643@linutronix.de
2019-07-25 16:12:00 +02:00
Thomas Gleixner
60dcaad573 x86/hotplug: Silence APIC and NMI when CPU is dead
In order to support IPI/NMI broadcasting via the shorthand mechanism side
effects of shorthands need to be mitigated:

 Shorthand IPIs and NMIs hit all CPUs including unplugged CPUs

Neither of those can be handled on unplugged CPUs for obvious reasons.

It would be trivial to just fully disable the APIC via the enable bit in
MSR_APICBASE. But that's not possible because clearing that bit on systems
based on the 3 wire APIC bus would require a hardware reset to bring it
back as the APIC would lose track of bus arbitration. On systems with FSB
delivery APICBASE could be disabled, but it has to be guaranteed that no
interrupt is sent to the APIC while in that state and it's not clear from
the SDM whether it still responds to INIT/SIPI messages.

Therefore stay on the safe side and switch the APIC into soft disabled mode
so it won't deliver any regular vector to the CPU.

NMIs are still propagated to the 'dead' CPUs. To mitigate that add a check
for the CPU being offline on early nmi entry and if so bail.

Note, this cannot use the stop/restart_nmi() magic which is used in the
alternatives code. A dead CPU cannot invoke nmi_enter() or anything else
due to RCU and other reasons.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907241723290.1791@nanos.tec.linutronix.de
2019-07-25 16:11:59 +02:00
Thomas Gleixner
9c92374b63 x86/cpu: Move arch_smt_update() to a neutral place
arch_smt_update() will be used to control IPI/NMI broadcasting via the
shorthand mechanism. Keeping it in the bugs file and calling the apic
function from there is possible, but not really intuitive.

Move it to a neutral place and invoke the bugs function from there.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.910317273@linutronix.de
2019-07-25 16:11:59 +02:00
Thomas Gleixner
82e5747823 x86/apic/uv: Make x2apic_extra_bits static
Not used outside of the UV apic source.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.725264153@linutronix.de
2019-07-25 16:11:58 +02:00
Thomas Gleixner
ba77b2a02e x86/apic: Move apic_flat_64 header into apic directory
Only used locally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.526508168@linutronix.de
2019-07-25 16:11:58 +02:00
Thomas Gleixner
8b542da372 x86/apic: Move ipi header into apic directory
Only used locally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.434738036@linutronix.de
2019-07-25 16:11:57 +02:00
Thomas Gleixner
cdc86c9d1f x86/apic: Move IPI inlines into ipi.c
No point in having them in an header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.252225936@linutronix.de
2019-07-25 16:11:57 +02:00
Masahiro Yamada
d9c5252295 treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers
UAPI headers licensed under GPL are supposed to have exception
"WITH Linux-syscall-note" so that they can be included into non-GPL
user space application code.

The exception note is missing in some UAPI headers.

Some of them slipped in by the treewide conversion commit b24413180f
("License cleanup: add SPDX GPL-2.0 license identifier to files with
no license"). Just run:

  $ git show --oneline b24413180f -- arch/x86/include/uapi/asm/

I believe they are not intentional, and should be fixed too.

This patch was generated by the following script:

  git grep -l --not -e Linux-syscall-note --and -e SPDX-License-Identifier \
    -- :arch/*/include/uapi/asm/*.h :include/uapi/ :^*/Kbuild |
  while read file
  do
          sed -i -e '/[[:space:]]OR[[:space:]]/s/\(GPL-[^[:space:]]*\)/(\1 WITH Linux-syscall-note)/g' \
          -e '/[[:space:]]or[[:space:]]/s/\(GPL-[^[:space:]]*\)/(\1 WITH Linux-syscall-note)/g' \
          -e '/[[:space:]]OR[[:space:]]/!{/[[:space:]]or[[:space:]]/!s/\(GPL-[^[:space:]]*\)/\1 WITH Linux-syscall-note/g}' $file
  done

After this patch is applied, there are 5 UAPI headers that do not contain
"WITH Linux-syscall-note". They are kept untouched since this exception
applies only to GPL variants.

  $ git grep --not -e Linux-syscall-note --and -e SPDX-License-Identifier \
    -- :arch/*/include/uapi/asm/*.h :include/uapi/ :^*/Kbuild
  include/uapi/drm/panfrost_drm.h:/* SPDX-License-Identifier: MIT */
  include/uapi/linux/batman_adv.h:/* SPDX-License-Identifier: MIT */
  include/uapi/linux/qemu_fw_cfg.h:/* SPDX-License-Identifier: BSD-3-Clause */
  include/uapi/linux/vbox_err.h:/* SPDX-License-Identifier: MIT */
  include/uapi/linux/virtio_iommu.h:/* SPDX-License-Identifier: BSD-3-Clause */

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25 11:05:10 +02:00
Linus Torvalds
7626077457 Bugfixes, and a pvspinlock optimization
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdOEuhAAoJEL/70l94x66DX/IH/3c6ADaZkuwzUMtJZgib/slX
 V7h4ljoW33M85z3nCF5+kY3CNl8c9F2xKGcAIUlJF8MIsZW+zB3HjuU1LC4fCzuk
 TqpBf74DpQsKCsv1ngiV02lefPVQ7/VT/QFY7EXNuAqNRfgsBRNoi50244a0ZKpD
 KydzKTDKMD5HjE4lHb+bNr+guqkisPx0b0mZtsb4R9uuUSwXEa8DLmWQF2Do7zBj
 6G9UD6a1AP5XQBwRRbo5a78b5NZQcF5R9wVEzsmK7OGUw/yC4Em4HVt46z+oT5cm
 JK9m59XDqJaL6HMAWC2P/mXUj6o+PP+uBE2uuvkGCNcTLQZwWf+dq9961tWg81E=
 =DD/Z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Bugfixes, a pvspinlock optimization, and documentation moving"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption
  Documentation: move Documentation/virtual to Documentation/virt
  KVM: nVMX: Set cached_vmcs12 and cached_shadow_vmcs12 NULL after free
  KVM: X86: Dynamically allocate user_fpu
  KVM: X86: Fix fpu state crash in kvm guest
  Revert "kvm: x86: Use task structs fpu field for user"
  KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
2019-07-24 09:46:13 -07:00
Jan Kiszka
21e450d21c x86/mm: Avoid redundant interrupt disable in load_mm_cr4()
load_mm_cr4() is always called with interrupts disabled from:

 - switch_mm_irqs_off()
 - refresh_pce(), which is a on_each_cpu() callback

Thus, disabling interrupts in cr4_set/clear_bits() is redundant.

Implement cr4_set/clear_bits_irqsoff() helpers, rename load_mm_cr4() to
load_mm_cr4_irqsoff() and use the new helpers. The new helpers do not need
a lockdep assert as __cr4_set() has one already.

The renaming in combination with the checks in __cr4_set() ensure that any
changes in the boundary conditions at the call sites will be detected.

[ tglx: Massaged change log ]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/0fbbcb64-5f26-4ffb-1bb9-4f5f48426893@siemens.com
2019-07-24 14:43:37 +02:00
Masahiro Yamada
bdd50d7421 x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE()
__builtin_constant_p(nr) is used everywhere now. It does not make much
sense to define IS_IMMEDIATE() as its alias.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190723074415.26811-1-yamada.masahiro@socionext.com
2019-07-23 13:44:18 +02:00
Masahiro Yamada
7010105321 x86/build: Remove unneeded uapi asm-generic wrappers
These are listed in include/uapi/asm-generic/Kbuild, so Kbuild will
automatically generate them.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190723112646.14046-1-yamada.masahiro@socionext.com
2019-07-23 13:42:14 +02:00
Wanpeng Li
d9a710e5fc KVM: X86: Dynamically allocate user_fpu
After reverting commit 240c35a378 (kvm: x86: Use task structs fpu field
for user), struct kvm_vcpu is 19456 bytes on my server, PAGE_ALLOC_COSTLY_ORDER(3)
is the order at which allocations are deemed costly to service. In serveless
scenario, one host can service hundreds/thoudands firecracker/kata-container
instances, howerver, new instance will fail to launch after memory is too
fragmented to allocate kvm_vcpu struct on host, this was observed in some
cloud provider product environments.

This patch dynamically allocates user_fpu, kvm_vcpu is 15168 bytes now on my
Skylake server.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:48 +02:00
Paolo Bonzini
ec269475cb Revert "kvm: x86: Use task structs fpu field for user"
This reverts commit 240c35a378
("kvm: x86: Use task structs fpu field for user", 2018-11-06).
The commit is broken and causes QEMU's FPU state to be destroyed
when KVM_RUN is preempted.

Fixes: 240c35a378 ("kvm: x86: Use task structs fpu field for user")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-22 13:55:47 +02:00
Josh Poimboeuf
be261ffce6 x86: Remove X86_FEATURE_MFENCE_RDTSC
AMD and Intel both have serializing lfence (X86_FEATURE_LFENCE_RDTSC).
They've both had it for a long time, and AMD has had it enabled in Linux
since Spectre v1 was announced.

Back then, there was a proposal to remove the serializing mfence feature
bit (X86_FEATURE_MFENCE_RDTSC), since both AMD and Intel have
serializing lfence.  At the time, it was (ahem) speculated that some
hypervisors might not yet support its removal, so it remained for the
time being.

Now a year-and-a-half later, it should be safe to remove.

I asked Andrew Cooper about whether it's still needed:

  So if you're virtualised, you've got no choice in the matter.  lfence
  is either dispatch-serialising or not on AMD, and you won't be able to
  change it.

  Furthermore, you can't accurately tell what state the bit is in, because
  the MSR might not be virtualised at all, or may not reflect the true
  state in hardware.  Worse still, attempting to set the bit may not be
  successful even if there isn't a fault for doing so.

  Xen sets the DE_CFG bit unconditionally, as does Linux by the looks of
  things (see MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT).  ISTR other hypervisor
  vendors saying the same, but I don't have any information to hand.

  If you are running under a hypervisor which has been updated, then
  lfence will almost certainly be dispatch-serialising in practice, and
  you'll almost certainly see the bit already set in DE_CFG.  If you're
  running under a hypervisor which hasn't been patched since Spectre,
  you've already lost in many more ways.

  I'd argue that X86_FEATURE_MFENCE_RDTSC is not worth keeping.

So remove it.  This will reduce some code rot, and also make it easier
to hook barrier_nospec() up to a cmdline disable for performance
raisins, without having to need an alternative_3() macro.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/d990aa51e40063acb9888e8c1b688e41355a9588.1562255067.git.jpoimboe@redhat.com
2019-07-22 12:00:51 +02:00
Pingfan Liu
6973210242 x86/realmode: Remove trampoline_status
There is no reader of trampoline_status, it's only written.

It turns out that after commit ce4b1b1650 ("x86/smpboot: Initialize
secondary CPU only if master CPU will wait for it"), trampoline_status is
not needed any more.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1563266424-3472-1-git-send-email-kernelfans@gmail.com
2019-07-22 11:30:18 +02:00
Maya Nakamura
8c3e44bde7 x86/hyperv: Add functions to allocate/deallocate page for Hyper-V
Introduce two new functions, hv_alloc_hyperv_page() and
hv_free_hyperv_page(), to allocate/deallocate memory with the size and
alignment that Hyper-V expects as a page. Although currently they are not
used, they are ready to be used to allocate/deallocate memory on x86 when
their ARM64 counterparts are implemented, keeping symmetry between
architectures with potentially different guest page sizes.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.1906272334560.32342@nanos.tec.linutronix.de/
Link: https://lore.kernel.org/lkml/87muindr9c.fsf@vitty.brq.redhat.com/
Link: https://lkml.kernel.org/r/706b2e71eb3e587b5f8801e50f090fae2a00e35d.1562916939.git.m.maya.nakamura@gmail.com
2019-07-22 11:06:45 +02:00
Maya Nakamura
fcd3f6222a x86/hyperv: Create and use Hyper-V page definitions
Define HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK because
the Linux guest page size and hypervisor page size concepts are different,
even though they happen to be the same value on x86.

Also, replace PAGE_SIZE with HV_HYP_PAGE_SIZE.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lkml.kernel.org/r/e95111629abf65d016e983f72494cbf110ce605f.1562916939.git.m.maya.nakamura@gmail.com
2019-07-22 11:06:44 +02:00
Gayatri Kammela
018ebca8bd x86/cpufeatures: Enable a new AVX512 CPU feature
Add a new AVX512 instruction group/feature for enumeration in
/proc/cpuinfo: AVX512_VP2INTERSECT.

CPUID.(EAX=7,ECX=0):EDX[bit 8]  AVX512_VP2INTERSECT

Detailed information of CPUID bits for this feature can be found in
the Intel Architecture Intsruction Set Extensions Programming Reference
document (refer to Table 1-2). A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=204215.

Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190717234632.32673-3-gayatri.kammela@intel.com
2019-07-22 10:38:25 +02:00
Andy Lutomirski
6365b842aa x86/syscalls: Split the x32 syscalls into their own table
For unfortunate historical reasons, the x32 syscalls and the x86_64
syscalls are not all numbered the same.  As an example, ioctl() is nr 16 on
x86_64 but 514 on x32.

This has potentially nasty consequences, since it means that there are two
valid RAX values to do ioctl(2) and two invalid RAX values.  The valid
values are 16 (i.e. ioctl(2) using the x86_64 ABI) and (514 | 0x40000000)
(i.e. ioctl(2) using the x32 ABI).

The invalid values are 514 and (16 | 0x40000000).  514 will enter the
"COMPAT_SYSCALL_DEFINE3(ioctl, ...)" entry point with in_compat_syscall()
and in_x32_syscall() returning false, whereas (16 | 0x40000000) will enter
the native entry point with in_compat_syscall() and in_x32_syscall()
returning true.  Both are bogus, and both will exercise code paths in the
kernel and in any running seccomp filters that really ought to be
unreachable.

Splitting out the x32 syscalls into their own tables, allows both bogus
invocations to return -ENOSYS.  I've checked glibc, musl, and Bionic, and
all of them appear to call syscalls with their correct numbers, so this
change should have no effect on them.

There is an added benefit going forward: new syscalls that need special
handling on x32 can share the same number on x32 and x86_64.  This means
that the special syscall range 512-547 can be treated as a legacy wart
instead of something that may need to be extended in the future.

Also add a selftest to verify the new behavior.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/208024256b764312598f014ebfb0a42472c19354.1562185330.git.luto@kernel.org
2019-07-22 10:31:23 +02:00
Andy Lutomirski
45e29d119e x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long
Currently, it's an int.  This is bizarre.  Fortunately, the code using it
still works: ~__X32_SYSCALL_BIT is also int, so, if nr is unsigned long,
then C kindly sign-extends the ~__X32_SYSCALL_BIT part, and it actually
results in the desired value.

This is far more subtle than it deserves to be.  Syscall numbers are, for
all practical purposes, unsigned long, so make __X32_SYSCALL_BIT be
unsigned long.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/99b0d83ad891c67105470a1a6b63243fd63a5061.1562185330.git.luto@kernel.org
2019-07-22 10:31:22 +02:00
Andrew Cooper
83b584d9c6 x86/paravirt: Drop {read,write}_cr8() hooks
There is a lot of infrastructure for functionality which is used
exclusively in __{save,restore}_processor_state() on the suspend/resume
path.

cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
rest of the Local APIC state isn't a clever thing to be doing.

Delete the suspend/resume cr8 handling, which shrinks the size of struct
saved_context, and allows for the removal of both PVOPS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20190715151641.29210-1-andrew.cooper3@citrix.com
2019-07-22 10:12:33 +02:00
Linus Torvalds
c6dd78fcb8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of x86 specific fixes and updates:

   - The CR2 corruption fixes which store CR2 early in the entry code
     and hand the stored address to the fault handlers.

   - Revert a forgotten leftover of the dropped FSGSBASE series.

   - Plug a memory leak in the boot code.

   - Make the Hyper-V assist functionality robust by zeroing the shadow
     page.

   - Remove a useless check for dead processes with LDT

   - Update paravirt and VMware maintainers entries.

   - A few cleanup patches addressing various compiler warnings"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/64: Prevent clobbering of saved CR2 value
  x86/hyper-v: Zero out the VP ASSIST PAGE on allocation
  x86, boot: Remove multiple copy of static function sanitize_boot_params()
  x86/boot/compressed/64: Remove unused variable
  x86/boot/efi: Remove unused variables
  x86/mm, tracing: Fix CR2 corruption
  x86/entry/64: Update comments and sanity tests for create_gap
  x86/entry/64: Simplify idtentry a little
  x86/entry/32: Simplify common_exception
  x86/paravirt: Make read_cr2() CALLEE_SAVE
  MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE
  x86/process: Delete useless check for dead process with LDT
  x86: math-emu: Hide clang warnings for 16-bit overflow
  x86/e820: Use proper booleans instead of 0/1
  x86/apic: Silence -Wtype-limits compiler warnings
  x86/mm: Free sme_early_buffer after init
  x86/boot: Fix memory leak in default_get_smp_config()
  Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test
2019-07-20 11:24:49 -07:00
Linus Torvalds
e6023adc5c Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:

 - A collection of objtool fixes which address recent fallout partially
   exposed by newer toolchains, clang, BPF and general code changes.

 - Force USER_DS for user stack traces

[ Note: the "objtool fixes" are not all to objtool itself, but for
  kernel code that triggers objtool warnings.

  Things like missing function size annotations, or code that confuses
  the unwinder etc.   - Linus]

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  objtool: Support conditional retpolines
  objtool: Convert insn type to enum
  objtool: Fix seg fault on bad switch table entry
  objtool: Support repeated uses of the same C jump table
  objtool: Refactor jump table code
  objtool: Refactor sibling call detection logic
  objtool: Do frame pointer check before dead end check
  objtool: Change dead_end_function() to return boolean
  objtool: Warn on zero-length functions
  objtool: Refactor function alias logic
  objtool: Track original function across branches
  objtool: Add mcsafe_handle_tail() to the uaccess safe list
  bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
  x86/uaccess: Remove redundant CLACs in getuser/putuser error paths
  x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()
  x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()
  x86/head/64: Annotate start_cpu0() as non-callable
  x86/entry: Fix thunk function ELF sizes
  x86/kvm: Don't call kvm_spurious_fault() from .fixup
  x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2
  ...
2019-07-20 10:45:15 -07:00
Linus Torvalds
07ab9d5bc5 Mostly bugfixes, but also:
- s390 support for KVM selftests
 - LAPIC timer offloading to housekeeping CPUs
 - Extend an s390 optimization for overcommitted hosts to all architectures
 - Debugging cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdMr1FAAoJEL/70l94x66DvIkH/iVuUX9jO1NoQ7qhxeo04MnT
 GP9mX3XnWoI/iN0zAIRfQSP2/9a6+KblgdiziABhju58j5dCfAZGb5793TQppweb
 3ubl11vy7YkzaXJ0b35K7CFhOU9oSlHHGyi5Uh+yyje5qWNxwmHpizxjynbFTKb6
 +/S7O2Ua1VrAVvx0i0IRtwanIK/jF4dStVButgVaVdUva3zLaQmeI71iaJl9ddXY
 bh50xoYua5Ek6+ENi+nwCNVy4OF152AwDbXlxrU0QbeA1B888Qio7nIqb3bwwPpZ
 /8wMVvPzQgL7RmgtY5E5Z4cCYuu7mK8wgGxhuk3oszlVwZJ5rmnaYwGEl4x1s7o=
 =giag
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "Mostly bugfixes, but also:

   - s390 support for KVM selftests

   - LAPIC timer offloading to housekeeping CPUs

   - Extend an s390 optimization for overcommitted hosts to all
     architectures

   - Debugging cleanups and improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  KVM: x86: Add fixed counters to PMU filter
  KVM: nVMX: do not use dangling shadow VMCS after guest reset
  KVM: VMX: dump VMCS on failed entry
  KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
  KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup
  KVM: Boost vCPUs that are delivering interrupts
  KVM: selftests: Remove superfluous define from vmx.c
  KVM: SVM: Fix detection of AMD Errata 1096
  KVM: LAPIC: Inject timer interrupt via posted interrupt
  KVM: LAPIC: Make lapic timer unpinned
  KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters
  KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS
  kvm: x86: ioapic and apic debug macros cleanup
  kvm: x86: some tsc debug cleanup
  kvm: vmx: fix coccinelle warnings
  x86: kvm: avoid constant-conversion warning
  x86: kvm: avoid -Wsometimes-uninitized warning
  KVM: x86: expose AVX512_BF16 feature to guest
  KVM: selftests: enable pgste option for the linker on s390
  KVM: selftests: Move kvm_create_max_vcpus test to generic code
  ...
2019-07-20 10:20:27 -07:00
Eric Hankland
30cd860432 KVM: x86: Add fixed counters to PMU filter
Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist
fixed counters.

Signed-off-by: Eric Hankland <ehankland@google.com>
[No need to check padding fields for zero. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:48 +02:00
Linus Torvalds
b5d72dda89 xen: fixes and features for 5.3-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXTFdBAAKCRCAXGG7T9hj
 vkwEAQDKDApCcJymAaq+BP2/lU/kErzFFXQ7seDN84q13ZMfcwEAzDz7vU1zicMP
 Sdq1LzFdiuXjk34BBi2PURXZAVoaXgU=
 =KkHz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
 "Fixes and features:

   - A series to introduce a common command line parameter for disabling
     paravirtual extensions when running as a guest in virtualized
     environment

   - A fix for int3 handling in Xen pv guests

   - Removal of the Xen-specific tmem driver as support of tmem in Xen
     has been dropped (and it was experimental only)

   - A security fix for running as Xen dom0 (XSA-300)

   - A fix for IRQ handling when offlining cpus in Xen guests

   - Some small cleanups"

* tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: let alloc_xenballooned_pages() fail if not enough memory free
  xen/pv: Fix a boot up hang revealed by int3 self test
  x86/xen: Add "nopv" support for HVM guest
  x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable
  xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete
  x86: Add "nopv" parameter to disable PV extensions
  x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init
  xen: remove tmem driver
  Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized"
  xen/events: fix binding user event channels to cpus
2019-07-19 11:41:26 -07:00
Josh Poimboeuf
3901336ed9 x86/kvm: Don't call kvm_spurious_fault() from .fixup
After making a change to improve objtool's sibling call detection, it
started showing the following warning:

  arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame

The problem is the ____kvm_handle_fault_on_reboot() macro.  It does a
fake call by pushing a fake RIP and doing a jump.  That tricks the
unwinder into printing the function which triggered the exception,
rather than the .fixup code.

Instead of the hack to make it look like the original function made the
call, just change the macro so that the original function actually does
make the call.  This allows removal of the hack, and also makes objtool
happy.

I triggered a vmx instruction exception and verified that the stack
trace is still sane:

  kernel BUG at arch/x86/kvm/x86.c:358!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16
  Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017
  RIP: 0010:kvm_spurious_fault+0x5/0x10
  Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
  RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246
  RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000
  RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0
  RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000
  R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0
  R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000
  FS:  00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   loaded_vmcs_init+0x4f/0xe0
   alloc_loaded_vmcs+0x38/0xd0
   vmx_create_vcpu+0xf7/0x600
   kvm_vm_ioctl+0x5e9/0x980
   ? __switch_to_asm+0x40/0x70
   ? __switch_to_asm+0x34/0x70
   ? __switch_to_asm+0x40/0x70
   ? __switch_to_asm+0x34/0x70
   ? free_one_page+0x13f/0x4e0
   do_vfs_ioctl+0xa4/0x630
   ksys_ioctl+0x60/0x90
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x55/0x1c0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7fa349b1ee5b

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.com
2019-07-18 21:01:04 +02:00
Josh Poimboeuf
083db67648 x86/paravirt: Fix callee-saved function ELF sizes
The __raw_callee_save_*() functions have an ELF symbol size of zero,
which confuses objtool and other tools.

Fixes a bunch of warnings like the following:

  arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pte_val() is missing an ELF size annotation
  arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pgd_val() is missing an ELF size annotation
  arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pte() is missing an ELF size annotation
  arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pgd() is missing an ELF size annotation

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/afa6d49bb07497ca62e4fc3b27a2d0cece545b4e.1563413318.git.jpoimboe@redhat.com
2019-07-18 21:01:03 +02:00
Linus Torvalds
818e95c768 The main changes in this release include:
- Add user space specific memory reading for kprobes
  - Allow kprobes to be executed earlier in boot
 
 The rest are mostly just various clean ups and small fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXS88txQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhaPAQDHaAmu6wXtZjZE6GU4ZP61UNgDECmZ
 4wlGrNc1AAlqAQD/QC8339p37aDCp9n27VY1wmJwF3nca+jAHfQLqWkkYgw=
 =n/tz
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The main changes in this release include:

   - Add user space specific memory reading for kprobes

   - Allow kprobes to be executed earlier in boot

  The rest are mostly just various clean ups and small fixes"

* tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  tracing: Make trace_get_fields() global
  tracing: Let filter_assign_type() detect FILTER_PTR_STRING
  tracing: Pass type into tracing_generic_entry_update()
  ftrace/selftest: Test if set_event/ftrace_pid exists before writing
  ftrace/selftests: Return the skip code when tracing directory not configured in kernel
  tracing/kprobe: Check registered state using kprobe
  tracing/probe: Add trace_event_call accesses APIs
  tracing/probe: Add probe event name and group name accesses APIs
  tracing/probe: Add trace flag access APIs for trace_probe
  tracing/probe: Add trace_event_file access APIs for trace_probe
  tracing/probe: Add trace_event_call register API for trace_probe
  tracing/probe: Add trace_probe init and free functions
  tracing/uprobe: Set print format when parsing command
  tracing/kprobe: Set print format right after parsed command
  kprobes: Fix to init kprobes in subsys_initcall
  tracepoint: Use struct_size() in kmalloc()
  ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS
  ftrace: Enable trampoline when rec count returns back to one
  tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline
  tracing: Make a separate config for trace event self tests
  ...
2019-07-18 11:51:00 -07:00
Peter Zijlstra
a0d14b8909 x86/mm, tracing: Fix CR2 corruption
Despite the current efforts to read CR2 before tracing happens there still
exist a number of possible holes:

  idtentry page_fault             do_page_fault           has_error_code=1
    call error_entry
      TRACE_IRQS_OFF
        call trace_hardirqs_off*
          #PF // modifies CR2

      CALL_enter_from_user_mode
        __context_tracking_exit()
          trace_user_exit(0)
            #PF // modifies CR2

    call do_page_fault
      address = read_cr2(); /* whoopsie */

And similar for i386.

Fix it by pulling the CR2 read into the entry code, before any of that
stuff gets a chance to run and ruin things.

Reported-by: He Zhe <zhe.he@windriver.com>
Reported-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: bp@alien8.de
Cc: rostedt@goodmis.org
Cc: torvalds@linux-foundation.org
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: jgross@suse.com
Cc: joel@joelfernandes.org
Link: https://lkml.kernel.org/r/20190711114336.116812491@infradead.org

Debugged-by: Steven Rostedt <rostedt@goodmis.org>
2019-07-17 23:17:38 +02:00
Peter Zijlstra
55aedddb61 x86/paravirt: Make read_cr2() CALLEE_SAVE
The one paravirt read_cr2() implementation (Xen) is actually quite trivial
and doesn't need to clobber anything other than the return register.

Making read_cr2() CALLEE_SAVE avoids all the PUSH/POP nonsense and allows
more convenient use from assembly.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: bp@alien8.de
Cc: rostedt@goodmis.org
Cc: luto@kernel.org
Cc: torvalds@linux-foundation.org
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: zhe.he@windriver.com
Cc: joel@joelfernandes.org
Cc: devel@etsukata.com
Link: https://lkml.kernel.org/r/20190711114335.887392493@infradead.org
2019-07-17 23:17:37 +02:00
Zhenzhong Duan
b23e5844df xen/pv: Fix a boot up hang revealed by int3 self test
Commit 7457c0da02 ("x86/alternatives: Add int3_emulate_call()
selftest") is used to ensure there is a gap setup in int3 exception stack
which could be used for inserting call return address.

This gap is missed in XEN PV int3 exception entry path, then below panic
triggered:

[    0.772876] general protection fault: 0000 [#1] SMP NOPTI
[    0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11
[    0.772893] RIP: e030:int3_magic+0x0/0x7
[    0.772905] RSP: 3507:ffffffff82203e98 EFLAGS: 00000246
[    0.773334] Call Trace:
[    0.773334]  alternative_instructions+0x3d/0x12e
[    0.773334]  check_bugs+0x7c9/0x887
[    0.773334]  ? __get_locked_pte+0x178/0x1f0
[    0.773334]  start_kernel+0x4ff/0x535
[    0.773334]  ? set_init_arg+0x55/0x55
[    0.773334]  xen_start_kernel+0x571/0x57a

For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with
%rcx/%r11 on the stack. To convert back to "normal" looking exceptions,
the xen thunks do 'xen_*: pop %rcx; pop %r11; jmp *'.

E.g. Extracting 'xen_pv_trap xenint3' we have:
xen_xenint3:
 pop %rcx;
 pop %r11;
 jmp xenint3

As xenint3 and int3 entry code are same except xenint3 doesn't generate
a gap, we can fix it by using int3 and drop useless xenint3.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-17 08:09:59 +02:00
Zhenzhong Duan
bef6e0ae74 x86/xen: Add "nopv" support for HVM guest
PVH guest needs PV extentions to work, so "nopv" parameter should be
ignored for PVH but not for HVM guest.

If PVH guest boots up via the Xen-PVH boot entry, xen_pvh is set early,
we know it's PVH guest and ignore "nopv" parameter directly.

If PVH guest boots up via the normal boot entry same as HVM guest, it's
hard to distinguish PVH and HVM guest at that time. In this case, we
have to panic early if PVH is detected and nopv is enabled to avoid a
worse situation later.

Remove static from bool_x86_init_noop/x86_op_int_noop so they could be
used globally. Move xen_platform_hvm() after xen_hvm_guest_late_init()
to avoid compile error.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-17 08:09:59 +02:00
Zhenzhong Duan
cc8f3b4dd2 x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable
.. as "nopv" support needs it to be changeable at boot up stage.

Checkpatch reports warning, so move variable declarations from
hypervisor.c to hypervisor.h

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-17 08:09:59 +02:00
Zhenzhong Duan
3097834637 x86: Add "nopv" parameter to disable PV extensions
In virtualization environment, PV extensions (drivers, interrupts,
timers, etc) are enabled in the majority of use cases which is the
best option.

However, in some cases (kexec not fully working, benchmarking)
we want to disable PV extensions. We have "xen_nopv" for that purpose
but only for XEN. For a consistent admin experience a common command
line parameter "nopv" set across all PV guest implementations is a
better choice.

There are guest types which just won't work without PV extensions,
like Xen PV, Xen PVH and jailhouse. add a "ignore_nopv" member to
struct hypervisor_x86 set to true for those guest types and call
the detect functions only if nopv is false or ignore_nopv is true.

Suggested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-17 08:09:58 +02:00
Zhenzhong Duan
1b37683cda x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init
.. as they are only called at early bootup stage. In fact, other
functions in x86_hyper_xen_hvm.init.* are all marked as __init.

Unexport xen_hvm_need_lapic as it's never used outside.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-17 08:09:58 +02:00
Robin Murphy
175967318c mm: introduce ARCH_HAS_PTE_DEVMAP
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
with the long-out-of-date comment can lead to the impression than an
architecture may just enable it (since __add_pages() now "comprehends
device memory" for itself) and expect things to work.

In practice, however, ZONE_DEVICE users have little chance of
functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
dependency so the real situation is clearer.

Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:25 -07:00
Stephen Kitt
3a7f0adfe7 arch/*: remove unused isa_page_to_bus()
isa_page_to_bus() is deprecated and is no longer used anywhere.  Remove
it entirely.

Link: http://lkml.kernel.org/r/20190613161155.16946-1-steve@sk2.org
Signed-off-by: Stephen Kitt <steve@sk2.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:22 -07:00
Qian Cai
ec63355869 x86/apic: Silence -Wtype-limits compiler warnings
There are many compiler warnings like this,

In file included from ./arch/x86/include/asm/smp.h:13,
                 from ./arch/x86/include/asm/mmzone_64.h:11,
                 from ./arch/x86/include/asm/mmzone.h:5,
                 from ./include/linux/mmzone.h:969,
                 from ./include/linux/gfp.h:6,
                 from ./include/linux/mm.h:10,
                 from arch/x86/kernel/apic/io_apic.c:34:
arch/x86/kernel/apic/io_apic.c: In function 'check_timer':
./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
expression >= 0 is always true [-Wtype-limits]
   if ((v) <= apic_verbosity) \
           ^~
arch/x86/kernel/apic/io_apic.c:2160:2: note: in expansion of macro
'apic_printk'
  apic_printk(APIC_QUIET, KERN_INFO "..TIMER: vector=0x%02X "
  ^~~~~~~~~~~
./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
expression >= 0 is always true [-Wtype-limits]
   if ((v) <= apic_verbosity) \
           ^~
arch/x86/kernel/apic/io_apic.c:2207:4: note: in expansion of macro
'apic_printk'
    apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
    ^~~~~~~~~~~

APIC_QUIET is 0, so silence them by making apic_verbosity type int.

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1562621805-24789-1-git-send-email-cai@lca.pw
2019-07-16 23:13:48 +02:00
Linus Torvalds
5516745311 platform-drivers-x86 for v5.3-1
ASUS WMI driver got a big refactoring in order to support the TUF Gaming
 laptops. Besides that, the regression with backlight being permanently off
 on various EeePC laptops has been fixed.
 
 Accelerometer on HP ProBook 450 G0 shows wrong measurements due to
 X axis being inverted. This has been fixed.
 
 Intel PMC core driver has been extended to be ACPI enumerated
 if the DSDT provides device with _HID "INT33A1". This allows
 to convert the driver to be pure platform and support new hardware
 purely based on ACPI DSDT.
 
 From now on the Intel Speed Select Technology is supported thru
 a corresponding driver. This driver provides an access to the features
 of the ISST, such as Performance Profile, Core Power, Base frequency and
 Turbo Frequency.
 
 Mellanox platform drivers has been refactored and now extended
 to support more systems, including new coming ones.
 
 The OLPC XO-1.75 platform is now supported.
 
 CB4063 Beckhoff Automation board is using PMC clocks,
 provided via pmc_atom driver, for ethernet controllers in a way
 that they can't be managed by the clock driver. The quirk
 has been extended to cover this case.
 
 Touchscreen on Chuwi Hi10 Plus tablet has been enabled. Meanwhile
 the information of Chuwi Hi10 Air has been fixed to cover more models
 based on the same platform.
 
 Xiaomi notebooks have WMI interface enabled. Thus, the driver to support it
 has been provided. It required some extension of the generic WMI library,
 which allows to propagate opaque context to the ->probe() of the
 individual drivers.
 
 This release includes debugfs clean up from Greg KH for several drivers
 that drop return code check and make debugfs absence or failure non-fatal.
 
 Miscellaneous fixes here and there, mostly for Acer WMI and
 various Intel drivers.
 
 The listed below commits are duplicated due to previously pushed fixes in v5.2 cycle:
 - 1dd93f873d platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from asus_nb_wmi
 - 89ae3a0736 platform/x86: intel-vbtn: Report switch events when event wakes device
 - fa882fc80d platform/x86: mlx-platform: Fix parent device in i2c-mux-reg device registration
 - 0bfcd24b39 platform/mellanox: mlxreg-hotplug: Add devm_free_irq call to remove flow
 
 The following is an automated git shortlog grouped by driver:
 
 acer-wmi:
  -  Mark expected switch fall-throughs
  -  no need to check return value of debugfs_create functions
 
 asus-nb-wmi:
  -  Add microphone mute key code
 
 asus-wmi:
  -  Use dev_get_drvdata()
  -  Do not disable keyboard backlight on unloading
  -  Switch fan boost mode
  -  Enhance detection of thermal data
  -  Organize code into sections
  -  Refactor error handling
  -  Support WMI event queue
  -  Refactor WMI event handling
  -  Improve DSTS WMI method ID detection
  -  Increase input buffer size of WMI methods
  -  Fix preserving keyboard backlight intensity on load
  -  Fix hwmon device cleanup
  -  no need to check return value of debugfs_create functions
  -  Only Tell EC the OS will handle display hotkeys from asus_nb_wmi
 
 dell-laptop:
  -  no need to check return value of debugfs_create functions
 
 hp_accel:
  -  Add support for HP ProBook 450 G0
 
 ideapad-laptop:
  -  no need to check return value of debugfs_create functions
 
 intel_int0002_vgpio:
  -  Get rid of custom ICPU() macro
 
 intel_menlow:
  -  avoid null pointer deference error
 
 intel_pmc:
  -  no need to check return value of debugfs_create functions
 
 intel_pmc_core:
  -  Attach using APCI HID "INT33A1"
  -  transform Pkg C-state residency from TSC ticks into microseconds
 
 intel_telemetry:
  -  no need to check return value of debugfs_create functions
 
 intel-vbtn:
  -  Report switch events when event wakes device
 
 ISST:
  -  Restore state on resume
  -  Add Intel Speed Select PUNIT MSR interface
  -  Add Intel Speed Select mailbox interface via MSRs
  -  Add Intel Speed Select mailbox interface via PCI
  -  Add Intel Speed Select mmio interface
  -  Add IOCTL to Translate Linux logical CPU to PUNIT CPU number
  -  Store per CPU information
  -  Add common API to register and handle ioctls
  -  Update ioctl-number.txt for Intel Speed Select interface
  -  A tool to validate Intel Speed Select commands
  -  Add .gitignore file
 
 MAINTAINERS:
  -  Update for Intel Speed Select Technology
 
 mlx-platform:
  -  Fix error handling in mlxplat_init()
  -  Add more reset cause attributes
  -  Modify DMI matching order
  -  Add regmap structure for the next generation systems
  -  Change API for i2c-mlxcpld driver activation
  -  Move regmap initialization before all drivers activation
  -  Fix parent device in i2c-mux-reg device registration
  -  Add new attribute for mlxreg-io sysfs interfaces
 
 pcengines-apuv2:
  -  Make two symbols static
  -  Fix PCENGINES_APU2 Kconfig warning
 
 OLPC:
  -  Add a config menu category for XO 1.75
  -  Require CONFIG_POWER_SUPPLY for XO-1.75 EC
  -  Fix olpc_xo175_ec_cmd() return value
  -  Make olpc_dt_compatible_match() static __init
  -  Add INPUT dependencies
  -  Fix build error without CONFIG_SPI
  -  Add a regulator for the DCON
  -  Add XO-1.75 EC driver
  -  Use BIT() and GENMASK() for event masks
  -  Avoid a warning if the EC didn't register yet
  -  Move EC-specific functionality out from x86
  -  Remove an unused include
  -  Add OLPC XO-1.75 EC bindings
 
 platform/mellanox:
  -  mlxreg-hotplug: Add devm_free_irq call to remove flow
 
 pmc_atom:
  -  Add CB4063 Beckhoff Automation board to critclk_systems DMI table
  -  no need to check return value of debugfs_create functions
 
 Kconfig:
  - Remove left-over BACKLIGHT_LCD_SUPPORT
 
 samsung-laptop:
  -  no need to check return value of debugfs_create functions
 
 touchscreen_dmi:
  -  Update Hi10 Air filter
  -  Add info for the CHUWI Hi10 Plus tablet.
 
 wmi:
  -  add Xiaomi WMI key driver
  -  add context argument to the probe function
  -  add context pointer field to struct wmi_device_id
  -  Add function to get _UID of WMI device
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqaflIX74DDDzMJJtb7wzTHR8rCgFAl0rP88ACgkQb7wzTHR8
 rCis8BAAnRgRgi8x1C7xn66gAUHsDXpY0tF9cp/Fw3HyTmFCQkRSmnLkMM2DqGi+
 dvB9U1zPcGWwdwryKFsJXioEK3erYpiYyT2VwLtW4S7P5jQ+N9biT4TZ8yFp0MEr
 MZC50LZDV1JTp1a0GQyrMpfoMBnE7UhR2GL8UbGli/WwXFE5BLkrJ1pdrjhYZOHl
 rJcgq3HPAhV5qkUkIU7gTC2GGSPydjBqk0OhVIU4dPsYwXIb2gXc0yR0QVwKm5x3
 I/NQwBOBMKmdI6uJ8BJyg/p888Strw65YJaTe5wtvG8ljuIbcN/aQ3ZmClNrUnc0
 58byqJCpRhg9HN39VpF9rsApEGxKTlitAUAUKy7lgue7/mycHbA1Syzz29AIM+2v
 ey2/zgFeeWtgh1cuh2cUWlCE6woW7ED4VpDxhkXlX4xGUp+CILEiFqcsULlcc4j5
 sgojCLRPs78roYj9Y84CwYbsd7J/Ce4r2evBpKYPqYxDbUiuH2aVQtEdPTKv9/xC
 yHtBuJJSxY7a+sf4OZONRo13dfvRoZIPjcccR8yTOakS2/1Fqph7MpHyDkwFAfeS
 M2f+OcJn9IECol1391PTLj9Dx3jApyVk21HJdiIj7sKZgJOSS54AFm0/Ywk0MFpY
 XScXKulV48SdL4ZKup5aIpDzyP5zuvXszKQboRitep1dHiR9bl0=
 =DC5j
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.3-1' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver updates from Andy Shevchenko:
 "Gathered a bunch of x86 platform driver changes. It's rather big,
  since includes two big refactors and completely new driver:

   - ASUS WMI driver got a big refactoring in order to support the TUF
     Gaming laptops. Besides that, the regression with backlight being
     permanently off on various EeePC laptops has been fixed.

   - Accelerometer on HP ProBook 450 G0 shows wrong measurements due to
     X axis being inverted. This has been fixed.

   - Intel PMC core driver has been extended to be ACPI enumerated if
     the DSDT provides device with _HID "INT33A1". This allows to
     convert the driver to be pure platform and support new hardware
     purely based on ACPI DSDT.

   - From now on the Intel Speed Select Technology is supported thru a
     corresponding driver. This driver provides an access to the
     features of the ISST, such as Performance Profile, Core Power, Base
     frequency and Turbo Frequency.

   - Mellanox platform drivers has been refactored and now extended to
     support more systems, including new coming ones.

   - The OLPC XO-1.75 platform is now supported.

   - CB4063 Beckhoff Automation board is using PMC clocks, provided via
     pmc_atom driver, for ethernet controllers in a way that they can't
     be managed by the clock driver. The quirk has been extended to
     cover this case.

   - Touchscreen on Chuwi Hi10 Plus tablet has been enabled. Meanwhile
     the information of Chuwi Hi10 Air has been fixed to cover more
     models based on the same platform.

   - Xiaomi notebooks have WMI interface enabled. Thus, the driver to
     support it has been provided. It required some extension of the
     generic WMI library, which allows to propagate opaque context to
     the ->probe() of the individual drivers.

  This release includes debugfs clean up from Greg KH for several
  drivers that drop return code check and make debugfs absence or
  failure non-fatal.

  Also miscellaneous fixes here and there, mostly for Acer WMI and
  various Intel drivers"

* tag 'platform-drivers-x86-v5.3-1' of git://git.infradead.org/linux-platform-drivers-x86: (74 commits)
  platform/x86: Fix PCENGINES_APU2 Kconfig warning
  tools/power/x86/intel-speed-select: Add .gitignore file
  platform/x86: mlx-platform: Fix error handling in mlxplat_init()
  platform/x86: intel_pmc_core: Attach using APCI HID "INT33A1"
  platform/x86: intel_pmc_core: transform Pkg C-state residency from TSC ticks into microseconds
  platform/x86: asus-wmi: Use dev_get_drvdata()
  Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces
  platform/x86: mlx-platform: Add more reset cause attributes
  platform/x86: mlx-platform: Modify DMI matching order
  platform/x86: mlx-platform: Add regmap structure for the next generation systems
  platform/x86: mlx-platform: Change API for i2c-mlxcpld driver activation
  platform/x86: mlx-platform: Move regmap initialization before all drivers activation
  MAINTAINERS: Update for Intel Speed Select Technology
  tools/power/x86: A tool to validate Intel Speed Select commands
  platform/x86: ISST: Restore state on resume
  platform/x86: ISST: Add Intel Speed Select PUNIT MSR interface
  platform/x86: ISST: Add Intel Speed Select mailbox interface via MSRs
  platform/x86: ISST: Add Intel Speed Select mailbox interface via PCI
  platform/x86: ISST: Add Intel Speed Select mmio interface
  platform/x86: ISST: Add IOCTL to Translate Linux logical CPU to PUNIT CPU number
  ...
2019-07-14 16:51:47 -07:00
Linus Torvalds
5f26f11436 asm-generic: remove ptrace.h
The asm-generic changes for 5.3 consist of a cleanup series from
 Christoph Hellwig, who explains:
 
 "asm-generic/ptrace.h is a little weird in that it doesn't actually
 implement any functionality, but it provided multiple layers of macros
 that just implement trivial inline functions.  We implement those
 directly in the few architectures and be off with a much simpler
 design."
 
 Link: https://lore.kernel.org/lkml/20190624054728.30966-1-hch@lst.de/
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJdKPURAAoJEJpsee/mABjZalgP/idFZKlL9jb32p9eVacW1ngm
 CzwKk+49UBpLlimTh3ZtpiSJEHQyXP/QYlJ0/kV65YJriq5FsqlBrkPnWgsgMG9x
 HBhJEEfVtXolXK3yNEsFIt/0j0Xh7+uCyBNZNuJrIRy/9x2z2nhBgDAenWPpZNuT
 qjpArBAVEWQMsWgmgZUlCKOT7ziSx5+w1bfqiiUZDjwjqimPhLUBfoZmUWHtO49M
 4/95RVOIMoLlIcaCUfqsvfkf7v6mfFAADhTrB/FZWVNX839fnpifqQL9BmOlgrEM
 kxn5wM/dxRDwRT8+mVRyB8ax4/rIgMIFoaA7Hrv+hoUsiOVD7AkNXynZKQh1hhjl
 449j68esoA6vlfdFIhagpKKTiQcWXJDbEgAoSJcM0WIl3JAjc+3nVWShTAAEW65r
 Z+Bgy1OczoCsRXbYR/TwpThHj3197xMRQEluzaLnd5Zx5feUDUKuDcxhPpev/ceO
 qmV5FeGqxRlZhJjVK8lmcHNZP0e4pkodwrNKC/2NIlIp6EKmMNI0nCjVqINigHGC
 97Kc7N94WHdQ3tA7GB8YaUfd8w86W5ZOgRh+uuZ0brPziL1MR5lD/NvzjVSfyvVp
 7UHNP7stNbavg20vDhlWGIsWiwoDlJf0YLUA6kXHryb9i/fh8sqWjz99QFu6QIfs
 BTgeLtNP8hKhMkgew2XL
 =jkfI
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "The asm-generic changes for 5.3 consist of a cleanup series to remove
  ptrace.h from Christoph Hellwig, who explains:

    'asm-generic/ptrace.h is a little weird in that it doesn't actually
     implement any functionality, but it provided multiple layers of
     macros that just implement trivial inline functions. We implement
     those directly in the few architectures and be off with a much
     simpler design.'

  at https://lore.kernel.org/lkml/20190624054728.30966-1-hch@lst.de/"

* tag 'asm-generic-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: remove ptrace.h
  x86: don't use asm-generic/ptrace.h
  sh: don't use asm-generic/ptrace.h
  powerpc: don't use asm-generic/ptrace.h
  arm64: don't use asm-generic/ptrace.h
2019-07-12 15:41:33 -07:00
Linus Torvalds
39d7530d74 ARM:
* support for chained PMU counters in guests
 * improved SError handling
 * handle Neoverse N1 erratum #1349291
 * allow side-channel mitigation status to be migrated
 * standardise most AArch64 system register accesses to msr_s/mrs_s
 * fix host MPIDR corruption on 32bit
 * selftests ckleanups
 
 x86:
 * PMU event {white,black}listing
 * ability for the guest to disable host-side interrupt polling
 * fixes for enlightened VMCS (Hyper-V pv nested virtualization),
 * new hypercall to yield to IPI target
 * support for passing cstate MSRs through to the guest
 * lots of cleanups and optimizations
 
 Generic:
 * Some txt->rST conversions for the documentation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdJzdIAAoJEL/70l94x66DQDoH/i83/8kX4I8AWDlushPru4ts
 Q4lCE5VAPha+o4pLb1dtfFL3gTmSbsB1N++JSlqK3JOo6LphIOy6b0wBjQBbAa6U
 3CT1dJaHJoScLLj09vyBlvClGUH2ZKEQTWOiquCCf7JfPofxwPUA6vJ7TYsdkckx
 zR3ygbADWmnfS7hFfiqN3JzuYh9eoooGNWSU+Giq6VF41SiL3IqhBGZhWS0zE9c2
 2c5lpqqdeHmAYNBqsyzNiDRKp7+zLFSmZ7Z5/0L755L8KYwR6F5beTnmBMHvb4lA
 PWH/SWOC8EYR+PEowfrH+TxKZwp0gMn1kcAKjilHk0uCRwG1IzuHAr2jlNxICCk=
 =t/Oq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - support for chained PMU counters in guests
   - improved SError handling
   - handle Neoverse N1 erratum #1349291
   - allow side-channel mitigation status to be migrated
   - standardise most AArch64 system register accesses to msr_s/mrs_s
   - fix host MPIDR corruption on 32bit
   - selftests ckleanups

  x86:
   - PMU event {white,black}listing
   - ability for the guest to disable host-side interrupt polling
   - fixes for enlightened VMCS (Hyper-V pv nested virtualization),
   - new hypercall to yield to IPI target
   - support for passing cstate MSRs through to the guest
   - lots of cleanups and optimizations

  Generic:
   - Some txt->rST conversions for the documentation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (128 commits)
  Documentation: virtual: Add toctree hooks
  Documentation: kvm: Convert cpuid.txt to .rst
  Documentation: virtual: Convert paravirt_ops.txt to .rst
  KVM: x86: Unconditionally enable irqs in guest context
  KVM: x86: PMU Event Filter
  kvm: x86: Fix -Wmissing-prototypes warnings
  KVM: Properly check if "page" is valid in kvm_vcpu_unmap
  KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register
  KVM: LAPIC: Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane
  kvm: LAPIC: write down valid APIC registers
  KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
  KVM: doc: Add API documentation on the KVM_REG_ARM_WORKAROUNDS register
  KVM: arm/arm64: Add save/restore support for firmware workaround state
  arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests
  KVM: arm/arm64: Support chained PMU counters
  KVM: arm/arm64: Remove pmc->bitmask
  KVM: arm/arm64: Re-create event when setting counter value
  KVM: arm/arm64: Extract duplicated code to own function
  KVM: arm/arm64: Rename kvm_pmu_{enable/disable}_counter functions
  KVM: LAPIC: ARBPRI is a reserved register for x2APIC
  ...
2019-07-12 15:35:14 -07:00
Linus Torvalds
16c97650a5 - Add a module description to the Hyper-V vmbus module.
- Rework some vmbus code to separate architecture specifics out to
 arch/x86/. This is part of the work of adding arm64 support to Hyper-V.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAl0nfBEACgkQ3qZv95d3
 LNyf8Q//fn+Eb+Pun3a5Cus8qYBQIYrVpt/bUmMuisV8A+MdRIOSwUU/oEbzCEwa
 RhKKGKkNSkItKHz0QyqC9FwHZ18+6iLmSF1VwHzKVV+GJ1kQqI9+r4dC23yO/OCz
 cuB6Uc+Jd7eLV9021U0jgmYI2HOLOfd5IUNqnyxuw6BPYYaJhrf3J9qSnlKmJ2tW
 qhzSfKC2nn0+QbYSC5ww990o10vPMw3mcoXu23HKOU7Q2aqKkdK0HWjNrv49eent
 NPbtjFoSABSVS+4eTzW1BrK2Rkwa+TVpAC75a4OPOd4KPUvibqCWjmEl8IE55Pre
 BXwUoZnZN5cC1ayUGYuhyOmzFJtwxSWqgXlZRrmr+/XDgNk4k4V2bv253yFegZw2
 1hMTbGwYrk4MrGNQB3YvnzJz8ObrTmNR98JvJz/bKPtHKmKmIbHcCWkR5gYAApuy
 cY9PqKR5wWQ5A+leePzpXIXDtWlDHy7SDhUU14mbEj2bvQ3tC0yodHabpzl0Lx8e
 feXGuqt/g5Lop+KGcRJj32tx23JBRkwfgIyagfRAdzVCs4urp9pt+tKezPkEpWW6
 dd+Gc43Rnjd94neTCoo1seS9KH7n56ldnmeygMEDzI0rgW0q1m/Qwys0+6BIQR2G
 lmcL1qJlquy3Fjfs1aYkkPqkQYoIqDwGQivSUgIDZ0TLBIsWAfY=
 =wIj3
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyper-v updates from Sasha Levin:

 - Add a module description to the Hyper-V vmbus module.

 - Rework some vmbus code to separate architecture specifics out to
   arch/x86/. This is part of the work of adding arm64 support to
   Hyper-V.

* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Drivers: hv: vmbus: Break out ISA independent parts of mshyperv.h
  drivers: hv: Add a module description line to the hv_vmbus driver
2019-07-12 15:28:38 -07:00
Mike Rapoport
5fba4af445 asm-generic, x86: introduce generic pte_{alloc,free}_one[_kernel]
Most architectures have identical or very similar implementation of
pte_alloc_one_kernel(), pte_alloc_one(), pte_free_kernel() and
pte_free().

Add a generic implementation that can be reused across architectures and
enable its use on x86.

The generic implementation uses

	GFP_KERNEL | __GFP_ZERO

for the kernel page tables and

	GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT

for the user page tables.

The "base" functions for PTE allocation, namely __pte_alloc_one_kernel()
and __pte_alloc_one() are intended for the architectures that require
additional actions after actual memory allocation or must use non-default
GFP flags.

x86 is switched to use generic pte_alloc_one_kernel(), pte_free_kernel() and
pte_free().

x86 still implements pte_alloc_one() to allow run-time control of GFP
flags required for "userpte" command line option.

Link: http://lkml.kernel.org/r/1557296232-15361-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 11:05:45 -07:00
Christoph Hellwig
39656e83da mm: lift the x86_32 PAE version of gup_get_pte to common code
The split low/high access is the only non-READ_ONCE version of gup_get_pte
that did show up in the various arch implemenations.  Lift it to common
code and drop the ifdef based arch override.

Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: James Hogan <jhogan@kernel.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 11:05:44 -07:00
Christoph Hellwig
26f4c32807 mm: simplify gup_fast_permitted
Pass in the already calculated end value instead of recomputing it, and
leave the end > start check in the callers instead of duplicating them in
the arch code.

Link: http://lkml.kernel.org/r/20190625143715.1689-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: James Hogan <jhogan@kernel.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 11:05:44 -07:00
Marco Elver
751ad98d5f asm-generic, x86: add bitops instrumentation for KASAN
This adds a new header to asm-generic to allow optionally instrumenting
architecture-specific asm implementations of bitops.

This change includes the required change for x86 as reference and
changes the kernel API doc to point to bitops-instrumented.h instead.
Rationale: the functions in x86's bitops.h are no longer the kernel API
functions, but instead the arch_ prefixed functions, which are then
instrumented via bitops-instrumented.h.

Other architectures can similarly add support for asm implementations of
bitops.

The documentation text was derived from x86 and existing bitops
asm-generic versions: 1) references to x86 have been removed; 2) as a
result, some of the text had to be reworded for clarity and consistency.

Tested using lib/test_kasan with bitops tests (pre-requisite patch).
Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=198439

Link: http://lkml.kernel.org/r/20190613125950.197667-4-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 11:05:42 -07:00
Linus Torvalds
753c8d9b7d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A collection of assorted fixes:

   - Fix for the pinned cr0/4 fallout which escaped all testing efforts
     because the kvm-intel module was never loaded when the kernel was
     compiled with CONFIG_PARAVIRT=n. The cr0/4 accessors are moved out
     of line and static key is now solely used in the core code and
     therefore can stay in the RO after init section. So the kvm-intel
     and other modules do not longer reference the (read only) static
     key which the module loader tried to update.

   - Prevent an infinite loop in arch_stack_walk_user() by breaking out
     of the loop once the return address is detected to be 0.

   - Prevent the int3_emulate_call() selftest from corrupting the stack
     when KASAN is enabled. KASASN clobbers more registers than covered
     by the emulated call implementation. Convert the int3_magic()
     selftest to a ASM function so the compiler cannot KASANify it.

   - Unbreak the build with old GCC versions and with the Gold linker by
     reverting the 'Move of _etext to the actual end of .text'. In both
     cases the build fails with 'Invalid absolute R_X86_64_32S
     relocation: _etext'

   - Initialize the context lock for init_mm, which was never an issue
     until the alternatives code started to use a temporary mm for
     patching.

   - Fix a build warning vs. the LOWMEM_PAGES constant where clang
     complains rightfully about a signed integer overflow in the shift
     operation by converting the operand to an ULL.

   - Adjust the misnamed ENDPROC() of common_spurious in the 32bit entry
     code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/stacktrace: Prevent infinite loop in arch_stack_walk_user()
  x86/asm: Move native_write_cr0/4() out of line
  x86/pgtable/32: Fix LOWMEM_PAGES constant
  x86/alternatives: Fix int3_emulate_call() selftest stack corruption
  x86/entry/32: Fix ENDPROC of common_spurious
  Revert "x86/build: Move _etext to actual end of .text"
  x86/ldt: Initialize the context lock for init_mm
2019-07-11 13:54:00 -07:00
Linus Torvalds
8f6ccf6159 clone3-v5.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXSMhhgAKCRCRxhvAZXjc
 or7kAP9VzDcQaK/WoDd2ezh2C7Wh5hNy9z/qJVCa6Tb+N+g1UgEAxbhFUg55uGOA
 JNf7fGar5JF5hBMIXR+NqOi1/sb4swg=
 =ELWo
 -----END PGP SIGNATURE-----

Merge tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull clone3 system call from Christian Brauner:
 "This adds the clone3 syscall which is an extensible successor to clone
  after we snagged the last flag with CLONE_PIDFD during the 5.2 merge
  window for clone(). It cleanly supports all of the flags from clone()
  and thus all legacy workloads.

  There are few user visible differences between clone3 and clone.
  First, CLONE_DETACHED will cause EINVAL with clone3 so we can reuse
  this flag. Second, the CSIGNAL flag is deprecated and will cause
  EINVAL to be reported. It is superseeded by a dedicated "exit_signal"
  argument in struct clone_args thus freeing up even more flags. And
  third, clone3 gives CLONE_PIDFD a dedicated return argument in struct
  clone_args instead of abusing CLONE_PARENT_SETTID's parent_tidptr
  argument.

  The clone3 uapi is designed to be easy to handle on 32- and 64 bit:

    /* uapi */
    struct clone_args {
            __aligned_u64 flags;
            __aligned_u64 pidfd;
            __aligned_u64 child_tid;
            __aligned_u64 parent_tid;
            __aligned_u64 exit_signal;
            __aligned_u64 stack;
            __aligned_u64 stack_size;
            __aligned_u64 tls;
    };

  and a separate kernel struct is used that uses proper kernel typing:

    /* kernel internal */
    struct kernel_clone_args {
            u64 flags;
            int __user *pidfd;
            int __user *child_tid;
            int __user *parent_tid;
            int exit_signal;
            unsigned long stack;
            unsigned long stack_size;
            unsigned long tls;
    };

  The system call comes with a size argument which enables the kernel to
  detect what version of clone_args userspace is passing in. clone3
  validates that any additional bytes a given kernel does not know about
  are set to zero and that the size never exceeds a page.

  A nice feature is that this patchset allowed us to cleanup and
  simplify various core kernel codepaths in kernel/fork.c by making the
  internal _do_fork() function take struct kernel_clone_args even for
  legacy clone().

  This patch also unblocks the time namespace patchset which wants to
  introduce a new CLONE_TIMENS flag.

  Note, that clone3 has only been wired up for x86{_32,64}, arm{64}, and
  xtensa. These were the architectures that did not require special
  massaging.

  Other architectures treat fork-like system calls individually and
  after some back and forth neither Arnd nor I felt confident that we
  dared to add clone3 unconditionally to all architectures. We agreed to
  leave this up to individual architecture maintainers. This is why
  there's an additional patch that introduces __ARCH_WANT_SYS_CLONE3
  which any architecture can set once it has implemented support for
  clone3. The patch also adds a cond_syscall(clone3) for architectures
  such as nios2 or h8300 that generate their syscall table by simply
  including asm-generic/unistd.h. The hope is to get rid of
  __ARCH_WANT_SYS_CLONE3 and cond_syscall() rather soon"

* tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  arch: handle arches who do not yet define clone3
  arch: wire-up clone3() syscall
  fork: add clone3
2019-07-11 10:09:44 -07:00
Paolo Bonzini
a45ff5994c KVM/arm updates for 5.3
- Add support for chained PMU counters in guests
 - Improve SError handling
 - Handle Neoverse N1 erratum #1349291
 - Allow side-channel mitigation status to be migrated
 - Standardise most AArch64 system register accesses to msr_s/mrs_s
 - Fix host MPIDR corruption on 32bit
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl0kge4VHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDYyQP/3XY5tFcLKkp/h9rnGaCXwAxhNzn
 TyF/IZEFBKFTSoDMXKLLc8KllvoPQ7aUl03heYbuayYpyKR1+LCx7lDwu1MYyEf+
 aSSuOKlbG//tLUEGp09pTRCgjs2mhhZYqOj5GF2mZ7xpovFVSNOPzTazbXDNQ7tw
 zUAs43YNg+bUMwj+SLWpBlizjrLr7T34utIr6daKJE/GSfmIrcYXhGbZqUh0zbO0
 z5LNasebws8/pHyeGI7+/yoMIKaQ8foMgywTpsRpBsx6YI+AbOLjEmCk2IBOPcEK
 pm9KkSIBZEO2CSxZKl3NQiEow/Qd/lnz2xLMCSfh4XrYoI2Th4gNcsbJpiBDWP5a
 0eZ5jSiexxKngIbM+to7jR3m0yc9RgcuzceJg3Uly7Ya0vb5RqKwOX4Ge4XP4VDT
 DzIVFdQjxDKdVIf3EvGp1cj4P7dRUU3xbZcbzyuRPEmT3vgjEnbxawmPLs3QMAl1
 31Wd2wIsPB86kSxzSMel27Vs5VgMhgyHE26zN91R745CvhDXaDKydIWjGjdVMHsB
 GuX/h2kL+ohx+N/OpZPgwsVUAGLSOQFP3pE/EcGtqc2kkfqa+bx12DKcZ3zdmJvy
 +cu5ixU8q5thPH/pZob/C3hKUY/eLy02emS34RK0Jh2sZHbQgAOtMsiqUxNHEjUm
 6TkpdWa5SRd7CtGV
 =yfCs
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for 5.3

- Add support for chained PMU counters in guests
- Improve SError handling
- Handle Neoverse N1 erratum #1349291
- Allow side-channel mitigation status to be migrated
- Standardise most AArch64 system register accesses to msr_s/mrs_s
- Fix host MPIDR corruption on 32bit
2019-07-11 15:14:16 +02:00
Eric Hankland
66bb8a065f KVM: x86: PMU Event Filter
Some events can provide a guest with information about other guests or the
host (e.g. L3 cache stats); providing the capability to restrict access
to a "safe" set of events would limit the potential for the PMU to be used
in any side channel attacks. This change introduces a new VM ioctl that
sets an event filter. If the guest attempts to program a counter for
any blacklisted or non-whitelisted event, the kernel counter won't be
created, so any RDPMC/RDMSR will show 0 instances of that event.

Signed-off-by: Eric Hankland <ehankland@google.com>
[Lots of changes. All remaining bugs are probably mine. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-11 15:08:28 +02:00
Thomas Gleixner
7652ac9201 x86/asm: Move native_write_cr0/4() out of line
The pinning of sensitive CR0 and CR4 bits caused a boot crash when loading
the kvm_intel module on a kernel compiled with CONFIG_PARAVIRT=n.

The reason is that the static key which controls the pinning is marked RO
after init. The kvm_intel module contains a CR4 write which requires to
update the static key entry list. That obviously does not work when the key
is in a RO section.

With CONFIG_PARAVIRT enabled this does not happen because the CR4 write
uses the paravirt indirection and the actual write function is built in.

As the key is intended to be immutable after init, move
native_write_cr0/4() out of line.

While at it consolidate the update of the cr4 shadow variable and store the
value right away when the pinning is initialized on a booting CPU. No point
in reading it back 20 instructions later. This allows to confine the static
key and the pinning variable to cpu/common and allows to mark them static.

Fixes: 8dbec27a24 ("x86/asm: Pin sensitive CR0 bits")
Fixes: 873d50d58f ("x86/asm: Pin sensitive CR4 bits")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Xi Ruoyao <xry111@mengyan1223.wang>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Xi Ruoyao <xry111@mengyan1223.wang>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907102140340.1758@nanos.tec.linutronix.de
2019-07-10 22:15:05 +02:00
Arnd Bergmann
2651569986 x86/pgtable/32: Fix LOWMEM_PAGES constant
clang points out that the computation of LOWMEM_PAGES causes a signed
integer overflow on 32-bit x86:

arch/x86/kernel/head32.c:83:20: error: signed shift result (0x100000000) requires 34 bits to represent, but 'int' only has 32 bits [-Werror,-Wshift-overflow]
                (PAGE_TABLE_SIZE(LOWMEM_PAGES) << PAGE_SHIFT);
                                 ^~~~~~~~~~~~
arch/x86/include/asm/pgtable_32.h:109:27: note: expanded from macro 'LOWMEM_PAGES'
 #define LOWMEM_PAGES ((((2<<31) - __PAGE_OFFSET) >> PAGE_SHIFT))
                         ~^ ~~
arch/x86/include/asm/pgtable_32.h:98:34: note: expanded from macro 'PAGE_TABLE_SIZE'
 #define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD)

Use the _ULL() macro to make it a 64-bit constant.

Fixes: 1e620f9b23 ("x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190710130522.1802800-1-arnd@arndb.de
2019-07-10 17:19:58 +02:00
Linus Torvalds
e9a83bd232 It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro.  These create more
    than the usual number of simple but annoying merge conflicts with other
    trees, unfortunately.  He has a lot more of these waiting on the wings
    that, I think, will go to you directly later on.
 
  - A new document on how to use merges and rebases in kernel repos, and one
    on Spectre vulnerabilities.
 
  - Various improvements to the build system, including automatic markup of
    function() references because some people, for reasons I will never
    understand, were of the opinion that :c:func:``function()`` is
    unattractive and not fun to type.
 
  - We now recommend using sphinx 1.7, but still support back to 1.4.
 
  - Lots of smaller improvements, warning fixes, typo fixes, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl0krAEPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Yg98H/AuLqO9LpOgUjF4LhyjxGPdzJkY9RExSJ7km
 gznyreLCZgFaJR+AY6YDsd4Jw6OJlPbu1YM/Qo3C3WrZVFVhgL/s2ebvBgCo50A8
 raAFd8jTf4/mGCHnAqRotAPQ3mETJUk315B66lBJ6Oc+YdpRhwXWq8ZW2bJxInFF
 3HDvoFgMf0KhLuMHUkkL0u3fxH1iA+KvDu8diPbJYFjOdOWENz/CV8wqdVkXRSEW
 DJxIq89h/7d+hIG3d1I7Nw+gibGsAdjSjKv4eRKauZs4Aoxd1Gpl62z0JNk6aT3m
 dtq4joLdwScydonXROD/Twn2jsu4xYTrPwVzChomElMowW/ZBBY=
 =D0eO
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.3' of git://git.lwn.net/linux

Pull Documentation updates from Jonathan Corbet:
 "It's been a relatively busy cycle for docs:

   - A fair pile of RST conversions, many from Mauro. These create more
     than the usual number of simple but annoying merge conflicts with
     other trees, unfortunately. He has a lot more of these waiting on
     the wings that, I think, will go to you directly later on.

   - A new document on how to use merges and rebases in kernel repos,
     and one on Spectre vulnerabilities.

   - Various improvements to the build system, including automatic
     markup of function() references because some people, for reasons I
     will never understand, were of the opinion that
     :c:func:``function()`` is unattractive and not fun to type.

   - We now recommend using sphinx 1.7, but still support back to 1.4.

   - Lots of smaller improvements, warning fixes, typo fixes, etc"

* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
  docs: automarkup.py: ignore exceptions when seeking for xrefs
  docs: Move binderfs to admin-guide
  Disable Sphinx SmartyPants in HTML output
  doc: RCU callback locks need only _bh, not necessarily _irq
  docs: format kernel-parameters -- as code
  Doc : doc-guide : Fix a typo
  platform: x86: get rid of a non-existent document
  Add the RCU docs to the core-api manual
  Documentation: RCU: Add TOC tree hooks
  Documentation: RCU: Rename txt files to rst
  Documentation: RCU: Convert RCU UP systems to reST
  Documentation: RCU: Convert RCU linked list to reST
  Documentation: RCU: Convert RCU basic concepts to reST
  docs: filesystems: Remove uneeded .rst extension on toctables
  scripts/sphinx-pre-install: fix out-of-tree build
  docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
  Documentation: PGP: update for newer HW devices
  Documentation: Add section about CPU vulnerabilities for Spectre
  Documentation: platform: Delete x86-laptop-drivers.txt
  docs: Note that :c:func: should no longer be used
  ...
2019-07-09 12:34:26 -07:00
Linus Torvalds
565eb5f8c5 Merge branch 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x865 kdump updates from Thomas Gleixner:
 "Yet more kexec/kdump updates:

   - Properly support kexec when AMD's memory encryption (SME) is
     enabled

   - Pass reserved e820 ranges to the kexec kernel so both PCI and SME
     can work"

* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  fs/proc/vmcore: Enable dumping of encrypted memory when SEV was active
  x86/kexec: Set the C-bit in the identity map page table when SEV is active
  x86/kexec: Do not map kexec area as decrypted when SEV is active
  x86/crash: Add e820 reserved ranges to kdump kernel's e820 table
  x86/mm: Rework ioremap resource mapping determination
  x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED
  x86/mm: Create a workarea in the kernel for SME early encryption
  x86/mm: Identify the end of the kernel area to be reserved
2019-07-09 11:52:34 -07:00
Linus Torvalds
b7d5c92398 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Thomas Gleixner:
 "Assorted updates to kexec/kdump:

   - Proper kexec support for 4/5-level paging and jumping from a
     5-level to a 4-level paging kernel.

   - Make the EFI support for kexec/kdump more robust

   - Enforce that the GDT is properly aligned instead of getting the
     alignment by chance"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kdump/64: Restrict kdump kernel reservation to <64TB
  x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel
  x86/boot: Add xloadflags bits to check for 5-level paging support
  x86/boot: Make the GDT 8-byte aligned
  x86/kexec: Add the ACPI NVS region to the ident map
  x86/boot: Call get_rsdp_addr() after console_init()
  Revert "x86/boot: Disable RSDP parsing temporarily"
  x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels
  x86/kexec: Add the EFI system tables and ACPI tables to the ident map
2019-07-09 11:35:38 -07:00
Josh Poimboeuf
18ec54fdd6 x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
Spectre v1 isn't only about array bounds checks.  It can affect any
conditional checks.  The kernel entry code interrupt, exception, and NMI
handlers all have conditional swapgs checks.  Those may be problematic in
the context of Spectre v1, as kernel code can speculatively run with a user
GS.

For example:

	if (coming from user space)
		swapgs
	mov %gs:<percpu_offset>, %reg
	mov (%reg), %reg1

When coming from user space, the CPU can speculatively skip the swapgs, and
then do a speculative percpu load using the user GS value.  So the user can
speculatively force a read of any kernel value.  If a gadget exists which
uses the percpu value as an address in another load/store, then the
contents of the kernel value may become visible via an L1 side channel
attack.

A similar attack exists when coming from kernel space.  The CPU can
speculatively do the swapgs, causing the user GS to get used for the rest
of the speculative window.

The mitigation is similar to a traditional Spectre v1 mitigation, except:

  a) index masking isn't possible; because the index (percpu offset)
     isn't user-controlled; and

  b) an lfence is needed in both the "from user" swapgs path and the
     "from kernel" non-swapgs path (because of the two attacks described
     above).

The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a
CR3 write when PTI is enabled.  Since CR3 writes are serializing, the
lfences can be skipped in those cases.

On the other hand, the kernel entry swapgs paths don't depend on PTI.

To avoid unnecessary lfences for the user entry case, create two separate
features for alternative patching:

  X86_FEATURE_FENCE_SWAPGS_USER
  X86_FEATURE_FENCE_SWAPGS_KERNEL

Use these features in entry code to patch in lfences where needed.

The features aren't enabled yet, so there's no functional change.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
2019-07-09 14:11:45 +02:00
Sebastian Andrzej Siewior
39ca5fb492 x86/ldt: Initialize the context lock for init_mm
The mutex mm->context->lock for init_mm is not initialized for init_mm.
This wasn't a problem because it remained unused. This changed however
since commit
	4fc19708b1 ("x86/alternatives: Initialize temporary mm for patching")

Initialize the mutex for init_mm.

Fixes: 4fc19708b1 ("x86/alternatives: Initialize temporary mm for patching")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20190701173354.2pe62hhliok2afea@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-09 13:57:27 +02:00
Linus Torvalds
5ad18b2e60 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull force_sig() argument change from Eric Biederman:
 "A source of error over the years has been that force_sig has taken a
  task parameter when it is only safe to use force_sig with the current
  task.

  The force_sig function is built for delivering synchronous signals
  such as SIGSEGV where the userspace application caused a synchronous
  fault (such as a page fault) and the kernel responded with a signal.

  Because the name force_sig does not make this clear, and because the
  force_sig takes a task parameter the function force_sig has been
  abused for sending other kinds of signals over the years. Slowly those
  have been fixed when the oopses have been tracked down.

  This set of changes fixes the remaining abusers of force_sig and
  carefully rips out the task parameter from force_sig and friends
  making this kind of error almost impossible in the future"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
  signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
  signal: Remove the signal number and task parameters from force_sig_info
  signal: Factor force_sig_info_to_task out of force_sig_info
  signal: Generate the siginfo in force_sig
  signal: Move the computation of force into send_signal and correct it.
  signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
  signal: Remove the task parameter from force_sig_fault
  signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
  signal: Explicitly call force_sig_fault on current
  signal/unicore32: Remove tsk parameter from __do_user_fault
  signal/arm: Remove tsk parameter from __do_user_fault
  signal/arm: Remove tsk parameter from ptrace_break
  signal/nds32: Remove tsk parameter from send_sigtrap
  signal/riscv: Remove tsk parameter from do_trap
  signal/sh: Remove tsk parameter from force_sig_info_fault
  signal/um: Remove task parameter from send_sigtrap
  signal/x86: Remove task parameter from send_sigtrap
  signal: Remove task parameter from force_sig_mceerr
  signal: Remove task parameter from force_sig
  signal: Remove task parameter from force_sigsegv
  ...
2019-07-08 21:48:15 -07:00
Linus Torvalds
222a21d295 Merge branch 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 topology updates from Ingo Molnar:
 "Implement multi-die topology support on Intel CPUs and expose the die
  topology to user-space tooling, by Len Brown, Kan Liang and Zhang Rui.

  These changes should have no effect on the kernel's existing
  understanding of topologies, i.e. there should be no behavioral impact
  on cache, NUMA, scheduler, perf and other topologies and overall
  system performance"

* 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/rapl: Cosmetic rename internal variables in response to multi-die/pkg support
  perf/x86/intel/uncore: Cosmetic renames in response to multi-die/pkg support
  hwmon/coretemp: Cosmetic: Rename internal variables to zones from packages
  thermal/x86_pkg_temp_thermal: Cosmetic: Rename internal variables to zones from packages
  perf/x86/intel/cstate: Support multi-die/package
  perf/x86/intel/rapl: Support multi-die/package
  perf/x86/intel/uncore: Support multi-die/package
  topology: Create core_cpus and die_cpus sysfs attributes
  topology: Create package_cpus sysfs attribute
  hwmon/coretemp: Support multi-die/package
  powercap/intel_rapl: Update RAPL domain name and debug messages
  thermal/x86_pkg_temp_thermal: Support multi-die/package
  powercap/intel_rapl: Support multi-die/package
  powercap/intel_rapl: Simplify rapl_find_package()
  x86/topology: Define topology_logical_die_id()
  x86/topology: Define topology_die_id()
  cpu/topology: Export die_id
  x86/topology: Create topology_max_die_per_package()
  x86/topology: Add CPUID.1F multi-die/package support
2019-07-08 18:28:44 -07:00
Linus Torvalds
8faef7125d Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updayes from Ingo Molnar:
 "Most of the commits add ACRN hypervisor guest support, plus two
  cleanups"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/jailhouse: Mark jailhouse_x2apic_available() as __init
  x86/platform/geode: Drop <linux/gpio.h> includes
  x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector
  x86: Add support for Linux guests on an ACRN hypervisor
  x86/Kconfig: Add new X86_HV_CALLBACK_VECTOR config symbol
2019-07-08 17:49:45 -07:00
Linus Torvalds
da17702385 Merge branch 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Ingo Molnar:
 "A handful of paravirt patching code enhancements to make it more
  robust against patching failures, and related cleanups and not so
  related cleanups - by Thomas Gleixner and myself"

* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/paravirt: Rename paravirt_patch_site::instrtype to paravirt_patch_site::type
  x86/paravirt: Standardize 'insn_buff' variable names
  x86/paravirt: Match paravirt patchlet field definition ordering to initialization ordering
  x86/paravirt: Replace the paravirt patch asm magic
  x86/paravirt: Unify the 32/64 bit paravirt patching code
  x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call()
  x86/paravirt: Detect over-sized patching bugs in paravirt_patch_insns()
  x86/paravirt: Remove bogus extern declarations
2019-07-08 17:34:44 -07:00
Linus Torvalds
a1aab6f3d2 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "Most of the changes relate to Peter Zijlstra's cleanup of ptregs
  handling, in particular the i386 part is now much simplified and
  standardized - no more partial ptregs stack frames via the esp/ss
  oddity. This simplifies ftrace, kprobes, the unwinder, ptrace, kdump
  and kgdb.

  There's also a CR4 hardening enhancements by Kees Cook, to make the
  generic platform functions such as native_write_cr4() less useful as
  ROP gadgets that disable SMEP/SMAP. Also protect the WP bit of CR0
  against similar attacks.

  The rest is smaller cleanups/fixes"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternatives: Add int3_emulate_call() selftest
  x86/stackframe/32: Allow int3_emulate_push()
  x86/stackframe/32: Provide consistent pt_regs
  x86/stackframe, x86/ftrace: Add pt_regs frame annotations
  x86/stackframe, x86/kprobes: Fix frame pointer annotations
  x86/stackframe: Move ENCODE_FRAME_POINTER to asm/frame.h
  x86/entry/32: Clean up return from interrupt preemption path
  x86/asm: Pin sensitive CR0 bits
  x86/asm: Pin sensitive CR4 bits
  Documentation/x86: Fix path to entry_32.S
  x86/asm: Remove unused TASK_TI_flags from asm-offsets.c
2019-07-08 16:59:34 -07:00
Linus Torvalds
e192832869 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - rwsem scalability improvements, phase #2, by Waiman Long, which are
     rather impressive:

       "On a 2-socket 40-core 80-thread Skylake system with 40 reader
        and writer locking threads, the min/mean/max locking operations
        done in a 5-second testing window before the patchset were:

         40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
         40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255

        After the patchset, they became:

         40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
         40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"

     There's a lot of changes to the locking implementation that makes
     it similar to qrwlock, including owner handoff for more fair
     locking.

     Another microbenchmark shows how across the spectrum the
     improvements are:

       "With a locking microbenchmark running on 5.1 based kernel, the
        total locking rates (in kops/s) on a 2-socket Skylake system
        with equal numbers of readers and writers (mixed) before and
        after this patchset were:

        # of Threads   Before Patch      After Patch
        ------------   ------------      -----------
             2            2,618             4,193
             4            1,202             3,726
             8              802             3,622
            16              729             3,359
            32              319             2,826
            64              102             2,744"

     The changes are extensive and the patch-set has been through
     several iterations addressing various locking workloads. There
     might be more regressions, but unless they are pathological I
     believe we want to use this new implementation as the baseline
     going forward.

   - jump-label optimizations by Daniel Bristot de Oliveira: the primary
     motivation was to remove IPI disturbance of isolated RT-workload
     CPUs, which resulted in the implementation of batched jump-label
     updates. Beyond the improvement of the real-time characteristics
     kernel, in one test this patchset improved static key update
     overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
     as well.

   - atomic64_t cross-arch type cleanups by Mark Rutland: over the last
     ~10 years of atomic64_t existence the various types used by the
     APIs only had to be self-consistent within each architecture -
     which means they became wildly inconsistent across architectures.
     Mark puts and end to this by reworking all the atomic64
     implementations to use 's64' as the base type for atomic64_t, and
     to ensure that this type is consistently used for parameters and
     return values in the API, avoiding further problems in this area.

   - A large set of small improvements to lockdep by Yuyang Du: type
     cleanups, output cleanups, function return type and othr cleanups
     all around the place.

   - A set of percpu ops cleanups and fixes by Peter Zijlstra.

   - Misc other changes - please see the Git log for more details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
  locking/lockdep: increase size of counters for lockdep statistics
  locking/atomics: Use sed(1) instead of non-standard head(1) option
  locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
  x86/jump_label: Make tp_vec_nr static
  x86/percpu: Optimize raw_cpu_xchg()
  x86/percpu, sched/fair: Avoid local_clock()
  x86/percpu, x86/irq: Relax {set,get}_irq_regs()
  x86/percpu: Relax smp_processor_id()
  x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
  locking/rwsem: Guard against making count negative
  locking/rwsem: Adaptive disabling of reader optimistic spinning
  locking/rwsem: Enable time-based spinning on reader-owned rwsem
  locking/rwsem: Make rwsem->owner an atomic_long_t
  locking/rwsem: Enable readers spinning on writer
  locking/rwsem: Clarify usage of owner's nonspinaable bit
  locking/rwsem: Wake up almost all readers in wait queue
  locking/rwsem: More optimal RT task handling of null owner
  locking/rwsem: Always release wait_lock before waking up tasks
  locking/rwsem: Implement lock handoff to prevent lock starvation
  locking/rwsem: Make rwsem_spin_on_owner() return owner state
  ...
2019-07-08 16:12:03 -07:00
Michael Kelley
765e33f521 Drivers: hv: vmbus: Break out ISA independent parts of mshyperv.h
Break out parts of mshyperv.h that are ISA independent into a
separate file in include/asm-generic. This move facilitates
ARM64 code reusing these definitions and avoids code
duplication. No functionality or behavior is changed.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-08 19:06:27 -04:00