The ORC metadata generated for UNWIND_HINT_FUNC isn't actually very
func-like. With certain usages it can cause stack state mismatches
because it doesn't set the return address (CFI_RA).
Also, users of UNWIND_HINT_RET_OFFSET no longer need to set a custom
return stack offset. Instead they just need to specify a func-like
situation, so the current ret_offset code is hacky for no good reason.
Solve both problems by simplifying the RET_OFFSET handling and
converting it into a more useful UNWIND_HINT_FUNC.
If we end up needing the old 'ret_offset' functionality again in the
future, we should be able to support it pretty easily with the addition
of a custom 'sp_offset' in UNWIND_HINT_FUNC.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/db9d1f5d79dddfbb3725ef6d8ec3477ad199948d.1611263462.git.jpoimboe@redhat.com
Pull x86 fixes from Borislav Petkov:
- Add a new Intel model number for Alder Lake
- Differentiate which aspects of the FPU state get saved/restored when
the FPU is used in-kernel and fix a boot crash on K7 due to early
MXCSR access before CR4.OSFXSR is even set.
- A couple of noinstr annotation fixes
- Correct die ID setting on AMD for users of topology information which
need the correct die ID
- A SEV-ES fix to handle string port IO to/from kernel memory properly
* tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add another Alder Lake CPU to the Intel family
x86/mmx: Use KFPU_387 for MMX string operations
x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
x86/topology: Make __max_die_per_package available unconditionally
x86: __always_inline __{rd,wr}msr()
x86/mce: Remove explicit/superfluous tracing
locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP
locking/lockdep: Cure noinstr fail
x86/sev: Fix nonistr violation
x86/entry: Fix noinstr fail
x86/cpu/amd: Set __max_die_per_package on AMD
x86/sev-es: Handle string port IO to kernel memory properly
The implementation was rather buggy. It unconditionally marked PTEs
read-only, even for VM_SHARED mappings. I'm not sure whether this is
actually a problem, but it certainly seems unwise. More importantly, it
released the mmap lock before flushing the TLB, which could allow a racing
CoW operation to falsely believe that the underlying memory was not
writable.
I can't find any users at all of this mechanism, so just remove it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Stas Sergeev <stsp2@yandex.ru>
Link: https://lkml.kernel.org/r/f3086de0babcab36f69949b5780bde851f719bc8.1611078018.git.luto@kernel.org
Currently, requesting kernel FPU access doesn't distinguish which parts of
the extended ("FPU") state are needed. This is nice for simplicity, but
there are a few cases in which it's suboptimal:
- The vast majority of in-kernel FPU users want XMM/YMM/ZMM state but do
not use legacy 387 state. These users want MXCSR initialized but don't
care about the FPU control word. Skipping FNINIT would save time.
(Empirically, FNINIT is several times slower than LDMXCSR.)
- Code that wants MMX doesn't want or need MXCSR initialized.
_mmx_memcpy(), for example, can run before CR4.OSFXSR gets set, and
initializing MXCSR will fail because LDMXCSR generates an #UD when the
aforementioned CR4 bit is not set.
- Any future in-kernel users of XFD (eXtended Feature Disable)-capable
dynamic states will need special handling.
Add a more specific API that allows callers to specify exactly what they
want.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Krzysztof Piotr Olędzki <ole@ans.pl>
Link: https://lkml.kernel.org/r/aff1cac8b8fc7ee900cf73e8f2369966621b053f.1611205691.git.luto@kernel.org
EFI on x86_64 keeps track of the process's MM pointer by storing it
in a global struct called 'efi_scratch', which also used to contain
the mixed mode stack pointer. Let's clean this up a little bit, by
getting rid of the struct, and pushing the mm handling into the
callees entirely.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
As a first step to removing the awkward 'struct efi_scratch' definition
that conveniently combines the storage of the mixed mode stack pointer
with the MM pointer variable that records the task's MM pointer while it
is being replaced with the EFI MM one, move the mixed mode stack pointer
into a separate variable.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The Camellia, Serpent and Twofish related header files only contain
declarations that are shared between different implementations of the
respective algorithms residing under arch/x86/crypto, and none of their
contents should be used elsewhere. So move the header files into the
same location, and use local #includes instead.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
All dependencies on the x86 glue helper module have been replaced by
local instantiations of the new ECB/CBC preprocessor helper macros, so
the glue helper module can be retired.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Twofish in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Camellia in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Serpent in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Camellia in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Running instruction decoder posttest on an s390 host with an x86 target
with allyesconfig shows errors. Instructions used in a couple of kernel
objects could not be correctly decoded on big endian system.
insn_decoder_test: warning: objdump says 6 bytes, but insn_get_length() says 5
insn_decoder_test: warning: Found an x86 instruction decoder bug, please report this.
insn_decoder_test: warning: ffffffff831eb4e1: 62 d1 fd 48 7f 04 24 vmovdqa64 %zmm0,(%r12)
insn_decoder_test: warning: objdump says 7 bytes, but insn_get_length() says 6
insn_decoder_test: warning: Found an x86 instruction decoder bug, please report this.
insn_decoder_test: warning: ffffffff831eb4e8: 62 51 fd 48 7f 44 24 01 vmovdqa64 %zmm8,0x40(%r12)
insn_decoder_test: warning: objdump says 8 bytes, but insn_get_length() says 6
This is because in a few places instruction field bytes are set directly
with further usage of "value". To address that introduce and use a
insn_set_byte() helper, which correctly updates "value" on big endian
systems.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Correct objtool orc generation endianness problems to enable fully
functional x86 cross-compiles on big endian hardware.
Introduce bswap_if_needed() macro, which does a byte swap if target
endianness doesn't match the host, i.e. cross-compilation for little
endian on big endian and vice versa. The macro is used for conversion
of multi-byte values which are read from / about to be written to a
target native endianness ELF file.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
The x86 instruction decoder code is shared across the kernel source and
the tools. Currently objtool seems to be the only tool from build tools
needed which breaks x86 cross-compilation on big endian systems. Make
the x86 instruction decoder build host endianness agnostic to support
x86 cross-compilation and enable objtool to implement endianness
awareness for big endian architectures support.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Co-developed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Pull hyperv fixes from Wei Liu:
- fix kexec panic/hang (Dexuan Cui)
- fix occasional crashes when flushing TLB (Wei Liu)
* tag 'hyperv-fixes-signed-20210111' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: check cpu mask after interrupt has been disabled
x86/hyperv: Fix kexec panic/hang issues
Pull kvm fixes from Paolo Bonzini:
"x86:
- Fixes for the new scalable MMU
- Fixes for migration of nested hypervisors on AMD
- Fix for clang integrated assembler
- Fix for left shift by 64 (UBSAN)
- Small cleanups
- Straggler SEV-ES patch
ARM:
- VM init cleanups
- PSCI relay cleanups
- Kill CONFIG_KVM_ARM_PMU
- Fixup __init annotations
- Fixup reg_to_encoding()
- Fix spurious PMCR_EL0 access
Misc:
- selftests cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (38 commits)
KVM: x86: __kvm_vcpu_halt can be static
KVM: SVM: Add support for booting APs in an SEV-ES guest
KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode
KVM: nSVM: correctly restore nested_run_pending on migration
KVM: x86/mmu: Clarify TDP MMU page list invariants
KVM: x86/mmu: Ensure TDP MMU roots are freed after yield
kvm: check tlbs_dirty directly
KVM: x86: change in pv_eoi_get_pending() to make code more readable
MAINTAINERS: Really update email address for Sean Christopherson
KVM: x86: fix shift out of bounds reported by UBSAN
KVM: selftests: Implement perf_test_util more conventionally
KVM: selftests: Use vm_create_with_vcpus in create_vm
KVM: selftests: Factor out guest mode code
KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c
KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load
KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte()
KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array
KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE
KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()
...
Add a missing __iomem annotation to address a sparse warning. The caller
is expected to pass an __iomem annotated pointer to this function. The
current usages send a 64-bytes command descriptor to an MMIO location
(portal) on a device for consumption.
Also, from the comment in movdir64b(), which also applies to enqcmds(),
@__dst must be supplied as an lvalue because this tells the compiler
what the object is (its size) the instruction accesses. I.e., not the
pointers but what they point to, thus the deref'ing '*'."
The actual sparse warning is:
drivers/dma/idxd/submit.c: note: in included file (through arch/x86/include/asm/processor.h, \
arch/x86/include/asm/timex.h, include/linux/timex.h, include/linux/time32.h, \
include/linux/time.h, include/linux/stat.h, ...):
./arch/x86/include/asm/special_insns.h:289:41: warning: incorrect type in initializer (different address spaces)
./arch/x86/include/asm/special_insns.h:289:41: expected struct <noident> *__dst
./arch/x86/include/asm/special_insns.h:289:41: got void [noderef] __iomem *dst
[ bp: Massage commit message. ]
Fixes: 7f5933f81b ("x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lkml.kernel.org/r/161003789741.4062451.14362269365703761223.stgit@djiang5-desk3.ch.intel.com
Add a missing __iomem annotation to address a sparse warning. The caller
is expected to pass an __iomem annotated pointer to this function. The
current usages send a 64-bytes command descriptor to an MMIO location
(portal) on a device for consumption. When future usages for the
MOVDIR64B instruction warrant a separate variant of a memory to memory
operation, the argument annotation can be revisited.
Also, from the comment in movdir64b() @__dst must be supplied as an
lvalue because this tells the compiler what the object is (its size) the
instruction accesses. I.e., not the pointers but what they point to,
thus the deref'ing '*'."
The actual sparse warning is:
sparse warnings: (new ones prefixed by >>)
drivers/dma/idxd/submit.c: note: in included file (through include/linux/io.h, include/linux/pci.h):
>> arch/x86/include/asm/io.h:422:27: sparse: sparse: incorrect type in \
argument 1 (different address spaces)
@@ expected void *dst
@@ got void [noderef] __iomem *dst @@
arch/x86/include/asm/io.h:422:27: sparse: expected void *dst
arch/x86/include/asm/io.h:422:27: sparse: got void [noderef] __iomem *dst
[ bp: Massage commit message. ]
Fixes: 0888e1030d ("x86/asm: Carve out a generic movdir64b() helper for general usage")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lkml.kernel.org/r/161003787823.4062451.6564503265464317197.stgit@djiang5-desk3.ch.intel.com
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.
Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.
First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.
Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.
To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The tdp_mmu_roots and tdp_mmu_pages in struct kvm_arch should only contain
pages with tdp_mmu_page set to true. tdp_mmu_pages should not contain any
pages with a non-zero root_count and tdp_mmu_roots should only contain
pages with a positive root_count, unless a thread holds the MMU lock and
is in the process of modifying the list. Various functions expect these
invariants to be maintained, but they are not explictily documented. Add
to the comments on both fields to document the above invariants.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-2-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It's really trivial - the only wrinkle is making sure that
compiler knows that ia32-related side of COMPAT_ARCH_DLINFO
is dead code on such configs (we don't get there without
having passed compat_elf_check_arch(), and on such configs
that'll fail for ia32 binary).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
To get rid of hardcoded size/offset in those macros we need to have
a definition of i386 variant of struct elf_prstatus. However, we can't
do that in asm/compat.h - the types needed for that are not there and
adding an include of asm/user32.h into asm/compat.h would cause a lot
of mess.
That could be conveniently done in elfcore-compat.h, but currently there
is nowhere to put arch-dependent parts of it - no asm/elfcore-compat.h.
So we introduce a new file (asm/elfcore-compat.h, present on architectures
that have CONFIG_ARCH_HAS_ELFCORE_COMPAT set, currently only on x86),
have it pulled by linux/elfcore-compat.h and move the definitions there.
As a side benefit, we don't need to worry about accidental inclusion of
that file into binfmt_elf.c itself, so we don't need the dance with
COMPAT_PRSTATUS_SIZE, etc. - only fs/compat_binfmt_elf.c will see
that header.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently the kexec kernel can panic or hang due to 2 causes:
1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
old VP Assist Pages when the kexec kernel runs. The same issue is fixed
for hibernation in commit 421f090c81 ("x86/hyperv: Suspend/resume the
VP assist page for hibernation"). Now fix it for kexec.
2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
between hv_kexec_handler() and native_machine_shutdown(), the other CPUs
can still try to access the hypercall page and cause panic. The workaround
"hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move
hyperv_cleanup() to a better place.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20201222065541.24312-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Currently, kprobes decodes the opcode right after single-stepping in
resume_execution(). But the opcode was already decoded while preparing
arch_specific_insn in arch_copy_kprobe().
Decode the opcode in arch_copy_kprobe() instead of in resume_execution()
and set some flags which classify the opcode for the resuming process.
[ bp: Massage commit message. ]
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/160830072561.349576.3014979564448023213.stgit@devnote2
On 64bit architectures that support 32bit processes there are
two possible layouts for NT_PRSTATUS note in ELF coredumps.
For one thing, several fields are 64bit for native processes
and 32bit for compat ones (pr_sigpend, etc.). For another,
the register dump is obviously different - the size and number
of registers are not going to be the same for 32bit and 64bit
variants of processor.
Usually that's handled by having two structures - elf_prstatus
for native layout and compat_elf_prstatus for 32bit one.
32bit processes are handled by fs/compat_binfmt_elf.c, which
defines a macro called 'elf_prstatus' that expands to compat_elf_prstatus.
Then it includes fs/binfmt_elf.c, which makes all references to
struct elf_prstatus to be textually replaced with struct
compat_elf_prstatus. Ugly and somewhat brittle, but it works.
However, amd64 is worse - there are _three_ possible layouts.
One for native 64bit processes, another for i386 (32bit) processes
and yet another for x32 (32bit address space with full 64bit
registers).
Both i386 and x32 processes are handled by fs/compat_binfmt_elf.c,
with usual compat_binfmt_elf.c trickery. However, the layouts
for i386 and x32 are not identical - they have the common beginning,
but the register dump part (pr_reg) is bigger on x32. Worse, pr_reg
is not the last field - it's followed by int pr_fpvalid, so that
field ends up at different offsets for i386 and x32 layouts.
Fortunately, there's not much code that cares about any of that -
it's all encapsulated in fill_thread_core_info(). Since x32
variant is bigger, we define compat_elf_prstatus to match that
layout. That way i386 processes have enough space to fit
their layout into.
Moreover, since these layouts are identical prior to pr_reg,
we don't need to distinguish x32 and i386 cases when we are
setting the fields prior to pr_reg.
Filling pr_reg itself is done by calling ->get() method of
appropriate regset, and that method knows what layout (and size)
to use.
We do need to distinguish x32 and i386 cases only for two
things: setting ->pr_fpvalid (offset differs for x32 and
i386) and choosing the right size for our note.
The way it's done is Not Nice, for the lack of more accurate
printable description. There are two macros (PRSTATUS_SIZE and
SET_PR_FPVALID), that default essentially to sizeof(struct elf_prstatus)
and (S)->pr_fpvalid = 1. On x86 asm/compat.h provides its own
variants.
Unfortunately, quite a few things go wrong there:
* PRSTATUS_SIZE doesn't use the normal test for process
being an x32 one; it compares the size reported by regset with
the size of pr_reg.
* it hardcodes the sizes of x32 and i386 variants (296 and 144
resp.), so if some change in includes leads to asm/compat.h pulled
in by fs/binfmt_elf.c we are in trouble - it will end up using
the size of x32 variant for 64bit processes.
* it's in the wrong place; asm/compat.h couldn't define
the structure for i386 layout, since it lacks quite a few types
needed for it. Hardcoded sizes are largely due to that.
The proper fix would be to have an explicitly defined i386 variant
of structure and have PRSTATUS_SIZE/SET_PR_FPVALID check for
TIF_X32 to choose the variant that should be used. Unfortunately,
that requires some manipulations of headers; we'll do that later
in the series, but for now let's go with the minimal variant -
rename PRSTATUS_SIZE in asm/compat.h to COMPAT_PRSTATUS_SIZE,
have fs/compat_binfmt_elf.c define PRSTATUS_SIZE to COMPAT_PRSTATUS_SIZE
and use the normal TIF_X32 check in that macro. The size of i386 variant
is kept hardcoded for now. Similar story for SET_PR_FPVALID.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull EFI updates from Borislav Petkov:
"These got delayed due to a last minute ia64 build issue which got
fixed in the meantime.
EFI updates collected by Ard Biesheuvel:
- Don't move BSS section around pointlessly in the x86 decompressor
- Refactor helper for discovering the EFI secure boot mode
- Wire up EFI secure boot to IMA for arm64
- Some fixes for the capsule loader
- Expose the RT_PROP table via the EFI test module
- Relax DT and kernel placement restrictions on ARM
with a few followup fixes:
- fix the build breakage on IA64 caused by recent capsule loader
changes
- suppress a type mismatch build warning in the expansion of
EFI_PHYS_ALIGN on ARM"
* tag 'efi_updates_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: arm: force use of unsigned type for EFI_PHYS_ALIGN
efi: ia64: disable the capsule loader
efi: stub: get rid of efi_get_max_fdt_addr()
efi/efi_test: read RuntimeServicesSupported
efi: arm: reduce minimum alignment of uncompressed kernel
efi: capsule: clean scatter-gather entries from the D-cache
efi: capsule: use atomic kmap for transient sglist mappings
efi: x86/xen: switch to efi_get_secureboot_mode helper
arm64/ima: add ima_arch support
ima: generalize x86/EFI arch glue for other EFI architectures
efi: generalize efi_get_secureboot
efi/libstub: EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER should not default to yes
efi/x86: Only copy the compressed kernel image in efi_relocate_kernel()
efi/libstub/x86: simplify efi_is_native()
Pull KVM updates from Paolo Bonzini:
"Much x86 work was pushed out to 5.12, but ARM more than made up for it.
ARM:
- PSCI relay at EL2 when "protected KVM" is enabled
- New exception injection code
- Simplification of AArch32 system register handling
- Fix PMU accesses when no PMU is enabled
- Expose CSV3 on non-Meltdown hosts
- Cache hierarchy discovery fixes
- PV steal-time cleanups
- Allow function pointers at EL2
- Various host EL2 entry cleanups
- Simplification of the EL2 vector allocation
s390:
- memcg accouting for s390 specific parts of kvm and gmap
- selftest for diag318
- new kvm_stat for when async_pf falls back to sync
x86:
- Tracepoints for the new pagetable code from 5.10
- Catch VFIO and KVM irqfd events before userspace
- Reporting dirty pages to userspace with a ring buffer
- SEV-ES host support
- Nested VMX support for wait-for-SIPI activity state
- New feature flag (AVX512 FP16)
- New system ioctl to report Hyper-V-compatible paravirtualization features
Generic:
- Selftest improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: SVM: fix 32-bit compilation
KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
KVM: SVM: Provide support to launch and run an SEV-ES guest
KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
KVM: SVM: Provide support for SEV-ES vCPU loading
KVM: SVM: Provide support for SEV-ES vCPU creation/loading
KVM: SVM: Update ASID allocation to support SEV-ES guests
KVM: SVM: Set the encryption mask for the SVM host save area
KVM: SVM: Add NMI support for an SEV-ES guest
KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
KVM: SVM: Do not report support for SMM for an SEV-ES guest
KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
KVM: SVM: Add support for EFER write traps for an SEV-ES guest
KVM: SVM: Support string IO operations for an SEV-ES guest
KVM: SVM: Support MMIO for an SEV-ES guest
KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
KVM: SVM: Create trace events for VMGEXIT processing
...
Pull more xen updates from Juergen Gross:
"Some minor cleanup patches and a small series disentangling some Xen
related Kconfig options"
* tag 'for-linus-5.11-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Kconfig: remove X86_64 depends from XEN_512GB
xen/manage: Fix fall-through warnings for Clang
xen-blkfront: Fix fall-through warnings for Clang
xen: remove trailing semicolon in macro definition
xen: Kconfig: nest Xen guest options
xen: Remove Xen PVH/PVHVM dependency on PCI
x86/xen: Convert to DEFINE_SHOW_ATTRIBUTE
Pull tracing updates from Steven Rostedt:
"The major update to this release is that there's a new arch config
option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS.
Currently, only x86_64 enables it. All the ftrace callbacks now take a
struct ftrace_regs instead of a struct pt_regs. If the architecture
has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will
have enough information to read the arguments of the function being
traced, as well as access to the stack pointer.
This way, if a user (like live kernel patching) only cares about the
arguments, then it can avoid using the heavier weight "regs" callback,
that puts in enough information in the struct ftrace_regs to simulate
a breakpoint exception (needed for kprobes).
A new config option that audits the timestamps of the ftrace ring
buffer at most every event recorded.
Ftrace recursion protection has been cleaned up to move the protection
to the callback itself (this saves on an extra function call for those
callbacks).
Perf now handles its own RCU protection and does not depend on ftrace
to do it for it (saving on that extra function call).
New debug option to add "recursed_functions" file to tracefs that
lists all the places that triggered the recursion protection of the
function tracer. This will show where things need to be fixed as
recursion slows down the function tracer.
The eval enum mapping updates done at boot up are now offloaded to a
work queue, as it caused a noticeable pause on slow embedded boards.
Various clean ups and last minute fixes"
* tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
tracing: Offload eval map updates to a work queue
Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
ring-buffer: Add rb_check_bpage in __rb_allocate_pages
ring-buffer: Fix two typos in comments
tracing: Drop unneeded assignment in ring_buffer_resize()
tracing: Disable ftrace selftests when any tracer is running
seq_buf: Avoid type mismatch for seq_buf_init
ring-buffer: Fix a typo in function description
ring-buffer: Remove obsolete rb_event_is_commit()
ring-buffer: Add test to validate the time stamp deltas
ftrace/documentation: Fix RST C code blocks
tracing: Clean up after filter logic rewriting
tracing: Remove the useless value assignment in test_create_synth_event()
livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
MAINTAINERS: assign ./fs/tracefs to TRACING
tracing: Fix some typos in comments
ftrace: Remove unused varible 'ret'
ring-buffer: Add recording of ring buffer recursion into recursed_functions
...
Pull swiotlb update from Konrad Rzeszutek Wilk:
"A generic (but for right now engaged only with AMD SEV) mechanism to
adjust a larger size SWIOTLB based on the total memory of the SEV
guests which right now require the bounce buffer for interacting with
the outside world.
Normal knobs (swiotlb=XYZ) still work"
* 'stable/for-linus-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
x86,swiotlb: Adjust SWIOTLB bounce buffer size for SEV guests
Pull asm-generic mmu-context cleanup from Arnd Bergmann:
"This is a cleanup series from Nicholas Piggin, preparing for later
changes. The asm/mmu_context.h header are generalized and common code
moved to asm-gneneric/mmu_context.h.
This saves a bit of code and makes it easier to change in the future"
* tag 'asm-generic-mmu-context-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (25 commits)
h8300: Fix generic mmu_context build
m68k: mmu_context: Fix Sun-3 build
xtensa: use asm-generic/mmu_context.h for no-op implementations
x86: use asm-generic/mmu_context.h for no-op implementations
um: use asm-generic/mmu_context.h for no-op implementations
sparc: use asm-generic/mmu_context.h for no-op implementations
sh: use asm-generic/mmu_context.h for no-op implementations
s390: use asm-generic/mmu_context.h for no-op implementations
riscv: use asm-generic/mmu_context.h for no-op implementations
powerpc: use asm-generic/mmu_context.h for no-op implementations
parisc: use asm-generic/mmu_context.h for no-op implementations
openrisc: use asm-generic/mmu_context.h for no-op implementations
nios2: use asm-generic/mmu_context.h for no-op implementations
nds32: use asm-generic/mmu_context.h for no-op implementations
mips: use asm-generic/mmu_context.h for no-op implementations
microblaze: use asm-generic/mmu_context.h for no-op implementations
m68k: use asm-generic/mmu_context.h for no-op implementations
ia64: use asm-generic/mmu_context.h for no-op implementations
hexagon: use asm-generic/mmu_context.h for no-op implementations
csky: use asm-generic/mmu_context.h for no-op implementations
...
Pull power management updates from Rafael Wysocki:
"These update cpufreq (core and drivers), cpuidle (polling state
implementation and the PSCI driver), the OPP (operating performance
points) framework, devfreq (core and drivers), the power capping RAPL
(Running Average Power Limit) driver, the Energy Model support, the
generic power domains (genpd) framework, the ACPI device power
management, the core system-wide suspend code and power management
utilities.
Specifics:
- Use local_clock() instead of jiffies in the cpufreq statistics to
improve accuracy (Viresh Kumar).
- Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
drivers (Viresh Kumar).
- Clean up the cpufreq core, the intel_pstate driver and the
schedutil cpufreq governor (Rafael Wysocki).
- Fix up error code paths in the sti-cpufreq and mediatek cpufreq
drivers (Yangtao Li, Qinglang Miao).
- Fix cpufreq_online() to return error codes instead of success (0)
in all cases when it fails (Wang ShaoBo).
- Add mt8167 support to the mediatek cpufreq driver and blacklist
mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).
- Modify the tegra194 cpufreq driver to always return values from the
frequency table as the current frequency and clean up that driver
(Sumit Gupta, Jon Hunter).
- Modify the arm_scmi cpufreq driver to allow it to discover the
power scale present in the performance protocol and provide this
information to the Energy Model (Lukasz Luba).
- Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
Rohár).
- Clean up the CPPC cpufreq driver (Ionela Voinescu).
- Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
Bergmann).
- Rework the poling interval selection for the polling state in
cpuidle (Mel Gorman).
- Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver
(Ulf Hansson).
- Modify the OPP framework to support empty (node-less) OPP tables in
DT for passing dependency information (Nicola Mazzucato).
- Fix potential lockdep issue in the OPP core and clean up the OPP
core (Viresh Kumar).
- Modify dev_pm_opp_put_regulators() to accept a NULL argument and
update its users accordingly (Viresh Kumar).
- Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).
- Add support for governor feature flags to devfreq, make devfreq
sysfs file permissions depend on the governor and clean up the
devfreq core (Chanwoo Choi).
- Clean up the tegra20 devfreq driver and deprecate it to allow
another driver based on EMC_STAT to be used instead of it (Dmitry
Osipenko).
- Add interconnect support to the tegra30 devfreq driver, allow it to
take the interconnect and OPP information from DT and clean it up
(Dmitry Osipenko).
- Add interconnect support to the exynos-bus devfreq driver along
with interconnect properties documentation (Sylwester Nawrocki).
- Add suport for AMD Fam17h and Fam19h processors to the RAPL power
capping driver (Victor Ding, Kim Phillips).
- Fix handling of overly long constraint names in the powercap
framework (Lukasz Luba).
- Fix the wakeup configuration handling for bridges in the ACPI
device power management core (Rafael Wysocki).
- Add support for using an abstract scale for power units in the
Energy Model (EM) and document it (Lukasz Luba).
- Add em_cpu_energy() micro-optimization to the EM (Pavankumar
Kondeti).
- Modify the generic power domains (genpd) framwework to support
suspend-to-idle (Ulf Hansson).
- Fix creation of debugfs nodes in genpd (Thierry Strudel).
- Clean up genpd (Lina Iyer).
- Clean up the core system-wide suspend code and make it print driver
flags for devices with debug enabled (Alex Shi, Patrice Chotard,
Chen Yu).
- Modify the ACPI system reboot code to make it prepare for system
power off to avoid confusing the platform firmware (Kai-Heng Feng).
- Update the pm-graph (multiple changes, mostly usability-related)
and cpupower (online and offline CPU information support) PM
utilities (Todd Brandt, Brahadambal Srinivasan)"
* tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
cpufreq: Fix cpufreq_online() return value on errors
cpufreq: Fix up several kerneldoc comments
cpufreq: stats: Use local_clock() instead of jiffies
cpufreq: schedutil: Simplify sugov_update_next_freq()
cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()
PM: domains: create debugfs nodes when adding power domains
opp: of: Allow empty opp-table with opp-shared
dt-bindings: opp: Allow empty OPP tables
media: venus: dev_pm_opp_put_*() accepts NULL argument
drm/panfrost: dev_pm_opp_put_*() accepts NULL argument
drm/lima: dev_pm_opp_put_*() accepts NULL argument
PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument
cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument
cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument
opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table
opp: Don't create an OPP table from dev_pm_opp_get_opp_table()
cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table
opp: Reduce the size of critical section in _opp_kref_release()
PM / EM: Micro optimization in em_cpu_energy
cpufreq: arm_scmi: Discover the power scale in performance protocol
...
Merge misc updates from Andrew Morton:
- a few random little subsystems
- almost all of the MM patches which are staged ahead of linux-next
material. I'll trickle to post-linux-next work in as the dependents
get merged up.
Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
mm: cleanup kstrto*() usage
mm: fix fall-through warnings for Clang
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
mm:backing-dev: use sysfs_emit in macro defining functions
mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
mm: use sysfs_emit for struct kobject * uses
mm: fix kernel-doc markups
zram: break the strict dependency from lzo
zram: add stat to gather incompressible pages since zram set up
zram: support page writeback
mm/process_vm_access: remove redundant initialization of iov_r
mm/zsmalloc.c: rework the list_add code in insert_zspage()
mm/zswap: move to use crypto_acomp API for hardware acceleration
mm/zswap: fix passing zero to 'PTR_ERR' warning
mm/zswap: make struct kernel_param_ops definitions const
userfaultfd/selftests: hint the test runner on required privilege
userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd/selftests: always dump something in modes
userfaultfd: selftests: make __{s,u}64 format specifiers portable
...