Pull arm64 updates from Catalin Marinas:
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new
XAFLAG and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the
'arm_boot_flags' introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
perf: arm_spe: Enable ACPI/Platform automatic module loading
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
ACPI/PPTT: Modify node flag detection to find last IDENTICAL
x86/entry: Simplify _TIF_SYSCALL_EMU handling
arm64: rename dump_instr as dump_kernel_instr
arm64/mm: Drop [PTE|PMD]_TYPE_FAULT
arm64: Implement panic_smp_self_stop()
arm64: Improve parking of stopped CPUs
arm64: Expose FRINT capabilities to userspace
arm64: Expose ARMv8.5 CondM capability to userspace
arm64: defconfig: enable CONFIG_RANDOMIZE_BASE
arm64: ARM64_MODULES_PLTS must depend on MODULES
arm64: bpf: do not allocate executable memory
arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
arm64: module: create module allocations without exec permissions
arm64: Allow user selection of ARM64_MODULE_PLTS
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
arm64: Allow selecting Pseudo-NMI again
...
The FSGSBASE series turned out to have serious bugs and there is still an
open issue which is not fully understood yet.
The confidence in those changes has become close to zero especially as the
test cases which have been shipped with that series were obviously never
run before sending the final series out to LKML.
./fsgsbase_64 >/dev/null
Segmentation fault
As the merge window is close, the only sane decision is to revert FSGSBASE
support. The revert is necessary as this branch has been merged into
perf/core already and rebasing all of that a few days before the merge
window is not the most brilliant idea.
I could definitely slap myself for not noticing the test case fail when
merging that series, but TBH my expectations weren't that low back
then. Won't happen again.
Revert the following commits:
539bca535d ("x86/entry/64: Fix and clean up paranoid_exit")
2c7b5ac5d5 ("Documentation/x86/64: Add documentation for GS/FS addressing mode")
f987c955c7 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2")
2032f1f96e ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
5bf0cab60e ("x86/entry/64: Document GSBASE handling in the paranoid path")
708078f657 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
79e1932fa3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
1d07316b13 ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
f60a83df45 ("x86/process/64: Use FSGSBASE instructions on thread copy and ptrace")
1ab5f3f7fe ("x86/process/64: Use FSBSBASE in switch_to() if available")
a86b462513 ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")
8b71340d70 ("x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions")
b64ed19b93 ("x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Quite some time ago the interrupt entry stubs for unused vectors in the
system vector range got removed and directly mapped to the spurious
interrupt vector entry point.
Sounds reasonable, but it's subtly broken. The spurious interrupt vector
entry point pushes vector number 0xFF on the stack which makes the whole
logic in __smp_spurious_interrupt() pointless.
As a consequence any spurious interrupt which comes from a vector != 0xFF
is treated as a real spurious interrupt (vector 0xFF) and not
acknowledged. That subsequently stalls all interrupt vectors of equal and
lower priority, which brings the system to a grinding halt.
This can happen because even on 64-bit the system vector space is not
guaranteed to be fully populated. A full compile time handling of the
unused vectors is not possible because quite some of them are conditonally
populated at runtime.
Bring the entry stubs back, which wastes 160 bytes if all stubs are unused,
but gains the proper handling back. There is no point to selectively spare
some of the stubs which are known at compile time as the required code in
the IDT management would be way larger and convoluted.
Do not route the spurious entries through common_interrupt and do_IRQ() as
the original code did. Route it to smp_spurious_interrupt() which evaluates
the vector number and acts accordingly now that the real vector numbers are
handed in.
Fixup the pr_warn so the actual spurious vector (0xff) is clearly
distiguished from the other vectors and also note for the vectored case
whether it was pending in the ISR or not.
"Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen."
"Spurious interrupt vector 0xed on CPU#1. Acked."
"Spurious interrupt vector 0xee on CPU#1. Not pending!."
Fixes: 2414e021ac ("x86: Avoid building unused IRQ entry stubs")
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jan Beulich <jbeulich@suse.com>
Link: https://lkml.kernel.org/r/20190628111440.550568228@linutronix.de
paranoid_exit needs to restore CR3 before GSBASE. Doing it in the opposite
order crashes if the exception came from a context with user GSBASE and
user CR3 -- RESTORE_CR3 cannot resture user CR3 if run with user GSBASE.
This results in infinitely recursing exceptions if user code does SYSENTER
with TF set if both FSGSBASE and PTI are enabled.
The old code worked if user code just set TF without SYSENTER because #DB
from user mode is special cased in idtentry and paranoid_exit doesn't run.
Fix it by cleaning up the spaghetti code. All that paranoid_exit needs to
do is to disable IRQs, handle IRQ tracing, then restore CR3, and restore
GSBASE. Simply do those actions in that order.
Fixes: 708078f657 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/59725ceb08977359489fbed979716949ad45f616.1562035429.git.luto@kernel.org
With vsyscall emulation on, a readable vsyscall page is still exposed that
contains syscall instructions that validly implement the vsyscalls.
This is required because certain dynamic binary instrumentation tools
attempt to read the call targets of call instructions in the instrumented
code. If the instrumented code uses vsyscalls, then the vsyscall page needs
to contain readable code.
Unfortunately, leaving readable memory at a deterministic address can be
used to help various ASLR bypasses, so some hardening value can be gained
by disallowing vsyscall reads.
Given how rarely the vsyscall page needs to be readable, add a mechanism to
make the vsyscall page be execute only.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/d17655777c21bc09a7af1bbcf74e6f2b69a51152.1561610354.git.luto@kernel.org
Currently pt_regs on x86_32 has an oddity in that kernel regs
(!user_mode(regs)) are short two entries (esp/ss). This means that any
code trying to use them (typically: regs->sp) needs to jump through
some unfortunate hoops.
Change the entry code to fix this up and create a full pt_regs frame.
This then simplifies various trampolines in ftrace and kprobes, the
stack unwinder, ptrace, kdump and kgdb.
Much thanks to Josh for help with the cleanups!
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for wider use, move the ENCODE_FRAME_POINTER macros to
a common header and provide inline asm versions.
These macros are used to encode a pt_regs frame for the unwinder; see
unwind_frame.c:decode_frame_pointer().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The code flow around the return from interrupt preemption point seems
needlessly complicated.
There is only one site jumping to resume_kernel, and none (outside of
resume_kernel) jumping to restore_all_kernel. Inline resume_kernel
in restore_all_kernel and avoid the CONFIG_PREEMPT dependent label.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Without FSGSBASE, user space cannot change GSBASE other than through a
PRCTL. The kernel enforces that the user space GSBASE value is postive as
negative values are used for detecting the kernel space GSBASE value in the
paranoid entry code.
If FSGSBASE is enabled, user space can set arbitrary GSBASE values without
kernel intervention, including negative ones, which breaks the paranoid
entry assumptions.
To avoid this, paranoid entry needs to unconditionally save the current
GSBASE value independent of the interrupted context, retrieve and write the
kernel GSBASE and unconditionally restore the saved value on exit. The
restore happens either in paranoid_exit or in the special exit path of the
NMI low level code.
All other entry code pathes which use unconditional SWAPGS are not affected
as they do not depend on the actual content.
[ tglx: Massaged changelogs and comments ]
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/1557309753-24073-13-git-send-email-chang.seok.bae@intel.com
GCC 5.5.0 sometimes cleverly hoists reads of the pvclock and/or hvclock
pages before the vclock mode checks. This creates a path through
vclock_gettime() in which no vclock is enabled at all (due to disabled
TSC on old CPUs, for example) but the pvclock or hvclock page
nevertheless read. This will segfault on bare metal.
This fixes commit 459e3a2153 ("gcc-9: properly declare the
{pv,hv}clock_page storage") in the sense that, before that commit, GCC
didn't seem to generate the offending code. There was nothing wrong
with that commit per se, and -stable maintainers should backport this to
all supported kernels regardless of whether the offending commit was
present, since the same crash could just as easily be triggered by the
phase of the moon.
On GCC 9.1.1, this doesn't seem to affect the generated code at all, so
I'm not too concerned about performance regressions from this fix.
Cc: stable@vger.kernel.org
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Reported-by: Duncan Roe <duncan_roe@optusnet.com.au>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from from force_sig_fault to make it explicit that is what is going
on.
The two known exceptions that deliver a synchronous exception to a
stopped ptraced task have already been changed to
force_sig_fault_to_task.
The callers have been changed with the following emacs regular expression
(with obvious variations on the architectures that take more arguments)
to avoid typos:
force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
->
force_sig_fault(\1,\2,\3)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
All of the remaining callers pass current into force_sig so
remove the task parameter to make this obvious and to make
misuse more difficult in the future.
This also makes it clear force_sig passes current into force_sig_info.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull more vfs mount updates from Al Viro:
"Propagation of new syscalls to other architectures + cosmetic change
from Christian (fscontext didn't follow the convention for anon inode
names)"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]
uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]
uapi, fsopen: use square brackets around "fscontext" [ver #2]
Pull x86 fixes from Ingo Molnar:
"Misc fixes and updates:
- a handful of MDS documentation/comment updates
- a cleanup related to hweight interfaces
- a SEV guest fix for large pages
- a kprobes LTO fix
- and a final cleanup commit for vDSO HPET support removal"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/mds: Improve CPU buffer clear documentation
x86/speculation/mds: Revert CPU buffer clear on double fault exit
x86/kconfig: Disable CONFIG_GENERIC_HWEIGHT and remove __HAVE_ARCH_SW_HWEIGHT
x86/mm: Do not use set_{pud, pmd}_safe() when splitting a large page
x86/kprobes: Make trampoline_handler() global and visible
x86/vdso: Remove hpet_page from vDSO
Fix the syscall numbering of the mount API syscalls so that the numbers
match between i386 and x86_64 and that they're in the common numbering
scheme space.
Fixes: a07b200047 ("vfs: syscall: Add open_tree(2) to reference or clone a mount")
Fixes: 2db154b3ea ("vfs: syscall: Add move_mount(2) to move mounts around")
Fixes: 24dcb3d90a ("vfs: syscall: Add fsopen() to prepare for superblock creation")
Fixes: ecdab150fd ("vfs: syscall: Add fsconfig() for configuring and managing a context")
Fixes: 93766fbd26 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
Fixes: cf3cba4a42 ("vfs: syscall: Add fspick() to select a superblock for reconfiguration")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull tracing updates from Steven Rostedt:
"The major changes in this tracing update includes:
- Removal of non-DYNAMIC_FTRACE from 32bit x86
- Removal of mcount support from x86
- Emulating a call from int3 on x86_64, fixes live kernel patching
- Consolidated Tracing Error logs file
Minor updates:
- Removal of klp_check_compiler_support()
- kdb ftrace dumping output changes
- Accessing and creating ftrace instances from inside the kernel
- Clean up of #define if macro
- Introduction of TRACE_EVENT_NOP() to disable trace events based on
config options
And other minor fixes and clean ups"
* tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
x86: Hide the int3_emulate_call/jmp functions from UML
livepatch: Remove klp_check_compiler_support()
ftrace/x86: Remove mcount support
ftrace/x86_32: Remove support for non DYNAMIC_FTRACE
tracing: Simplify "if" macro code
tracing: Fix documentation about disabling options using trace_options
tracing: Replace kzalloc with kcalloc
tracing: Fix partial reading of trace event's id file
tracing: Allow RCU to run between postponed startup tests
tracing: Fix white space issues in parse_pred() function
tracing: Eliminate const char[] auto variables
ring-buffer: Fix mispelling of Calculate
tracing: probeevent: Fix to make the type of $comm string
tracing: probeevent: Do not accumulate on ret variable
tracing: uprobes: Re-enable $comm support for uprobe events
ftrace/x86_64: Emulate call function while updating in breakpoint handler
x86_64: Allow breakpoints to emulate call instructions
x86_64: Add gap to int3 to allow for call emulation
tracing: kdb: Allow ftdump to skip all but the last few entries
tracing: Add trace_total_entries() / trace_total_entries_cpu()
...
Pull x86 MDS mitigations from Thomas Gleixner:
"Microarchitectural Data Sampling (MDS) is a hardware vulnerability
which allows unprivileged speculative access to data which is
available in various CPU internal buffers. This new set of misfeatures
has the following CVEs assigned:
CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory
MDS attacks target microarchitectural buffers which speculatively
forward data under certain conditions. Disclosure gadgets can expose
this data via cache side channels.
Contrary to other speculation based vulnerabilities the MDS
vulnerability does not allow the attacker to control the memory target
address. As a consequence the attacks are purely sampling based, but
as demonstrated with the TLBleed attack samples can be postprocessed
successfully.
The mitigation is to flush the microarchitectural buffers on return to
user space and before entering a VM. It's bolted on the VERW
instruction and requires a microcode update. As some of the attacks
exploit data structures shared between hyperthreads, full protection
requires to disable hyperthreading. The kernel does not do that by
default to avoid breaking unattended updates.
The mitigation set comes with documentation for administrators and a
deeper technical view"
* 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/speculation/mds: Fix documentation typo
Documentation: Correct the possible MDS sysfs values
x86/mds: Add MDSUM variant to the MDS documentation
x86/speculation/mds: Add 'mitigations=' support for MDS
x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
x86/speculation/mds: Fix comment
x86/speculation/mds: Add SMT warning message
x86/speculation: Move arch_smt_update() call to after mitigation decisions
x86/speculation/mds: Add mds=full,nosmt cmdline option
Documentation: Add MDS vulnerability documentation
Documentation: Move L1TF to separate directory
x86/speculation/mds: Add mitigation mode VMWERV
x86/speculation/mds: Add sysfs reporting for MDS
x86/speculation/mds: Add mitigation control for MDS
x86/speculation/mds: Conditionally clear CPU buffers on idle entry
x86/kvm/vmx: Add MDS protection when L1D Flush is not active
x86/speculation/mds: Clear CPU buffers on exit to user
x86/speculation/mds: Add mds_clear_cpu_buffers()
x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
x86/speculation/mds: Add BUG_MSBDS_ONLY
...
Pull mount ABI updates from Al Viro:
"The syscalls themselves, finally.
That's not all there is to that stuff, but switching individual
filesystems to new methods is fortunately independent from everything
else, so e.g. NFS series can go through NFS tree, etc.
As those conversions get done, we'll be finally able to get rid of a
bunch of duplication in fs/super.c introduced in the beginning of the
entire thing. I expect that to be finished in the next window..."
* 'work.mount-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Add a sample program for the new mount API
vfs: syscall: Add fspick() to select a superblock for reconfiguration
vfs: syscall: Add fsmount() to create a mount for a superblock
vfs: syscall: Add fsconfig() for configuring and managing a context
vfs: Implement logging through fs_context
vfs: syscall: Add fsopen() to prepare for superblock creation
Make anon_inodes unconditional
teach move_mount(2) to work with OPEN_TREE_CLONE
vfs: syscall: Add move_mount(2) to move mounts around
vfs: syscall: Add open_tree(2) to reference or clone a mount
Pull x86 FPU state handling updates from Borislav Petkov:
"This contains work started by Rik van Riel and brought to fruition by
Sebastian Andrzej Siewior with the main goal to optimize when to load
FPU registers: only when returning to userspace and not on every
context switch (while the task remains in the kernel).
In addition, this optimization makes kernel_fpu_begin() cheaper by
requiring registers saving only on the first invocation and skipping
that in following ones.
What is more, this series cleans up and streamlines many aspects of
the already complex FPU code, hopefully making it more palatable for
future improvements and simplifications.
Finally, there's a __user annotations fix from Jann Horn"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails
x86/pkeys: Add PKRU value to init_fpstate
x86/fpu: Restore regs in copy_fpstate_to_sigframe() in order to use the fastpath
x86/fpu: Add a fastpath to copy_fpstate_to_sigframe()
x86/fpu: Add a fastpath to __fpu__restore_sig()
x86/fpu: Defer FPU state load until return to userspace
x86/fpu: Merge the two code paths in __fpu__restore_sig()
x86/fpu: Restore from kernel memory on the 64-bit path too
x86/fpu: Inline copy_user_to_fpregs_zeroing()
x86/fpu: Update xstate's PKRU value on write_pkru()
x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
x86/entry: Add TIF_NEED_FPU_LOAD
x86/fpu: Eager switch PKRU state
x86/pkeys: Don't check if PKRU is zero before writing it
x86/fpu: Only write PKRU if it is different from current
x86/pkeys: Provide *pkru() helpers
x86/fpu: Use a feature number instead of mask in two more helpers
x86/fpu: Make __raw_xsave_addr() use a feature number instead of mask
x86/fpu: Add an __fpregs_load_activate() internal helper
...
Pull x86 irq updates from Ingo Molnar:
"Here are the main changes in this tree:
- Introduce x86-64 IRQ/exception/debug stack guard pages to detect
stack overflows immediately and deterministically.
- Clean up over a decade worth of cruft accumulated.
The outcome of this should be more clear-cut faults/crashes when any
of the low level x86 CPU stacks overflow, instead of silent memory
corruption and sporadic failures much later on"
* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
x86/irq: Fix outdated comments
x86/irq/64: Remove stack overflow debug code
x86/irq/64: Remap the IRQ stack with guard pages
x86/irq/64: Split the IRQ stack into its own pages
x86/irq/64: Init hardirq_stack_ptr during CPU hotplug
x86/irq/32: Handle irq stack allocation failure proper
x86/irq/32: Invoke irq_ctx_init() from init_IRQ()
x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr
x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr
x86/irq/32: Make irq stack a character array
x86/irq/32: Define IRQ_STACK_SIZE
x86/dumpstack/64: Speedup in_exception_stack()
x86/exceptions: Split debug IST stack
x86/exceptions: Enable IST guard pages
x86/exceptions: Disconnect IST index and stack order
x86/cpu: Remove orig_ist array
x86/cpu: Prepare TSS.IST setup for guard pages
x86/dumpstack/64: Use cpu_entry_area instead of orig_ist
x86/irq/64: Use cpu entry area instead of orig_ist
x86/traps: Use cpu_entry_area instead of orig_ist
...
Pull x86 entry cleanup from Ingo Molnar:
"A single commit that removes a redundant complication from
preempt-schedule handling in the x86 entry code"
* 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Remove unneeded need_resched() loop
Pull x86 asm updates from Ingo Molnar:
"This includes the following changes:
- cpu_has() cleanups
- sync_bitops.h modernization to the rmwcc.h facility, similarly to
bitops.h
- continued LTO annotations/fixes
- misc cleanups and smaller cleanups"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/um/vdso: Drop unnecessary cc-ldoption
x86/vdso: Rename variable to fix -Wshadow warning
x86/cpu/amd: Exclude 32bit only assembler from 64bit build
x86/asm: Mark all top level asm statements as .text
x86/build/vdso: Add FORCE to the build rule of %.so
x86/asm: Modernize sync_bitops.h
x86/mm: Convert some slow-path static_cpu_has() callers to boot_cpu_has()
x86: Convert some slow-path static_cpu_has() callers to boot_cpu_has()
x86/asm: Clarify static_cpu_has()'s intended use
x86/uaccess: Fix implicit cast of __user pointer
x86/cpufeature: Remove __pure attribute to _static_cpu_has()
Pull objtool updates from Ingo Molnar:
"This is a series from Peter Zijlstra that adds x86 build-time uaccess
validation of SMAP to objtool, which will detect and warn about the
following uaccess API usage bugs and weirdnesses:
- call to %s() with UACCESS enabled
- return with UACCESS enabled
- return with UACCESS disabled from a UACCESS-safe function
- recursive UACCESS enable
- redundant UACCESS disable
- UACCESS-safe disables UACCESS
As it turns out not leaking uaccess permissions outside the intended
uaccess functionality is hard when the interfaces are complex and when
such bugs are mostly dormant.
As a bonus we now also check the DF flag. We had at least one
high-profile bug in that area in the early days of Linux, and the
checking is fairly simple. The checks performed and warnings emitted
are:
- call to %s() with DF set
- return with DF set
- return with modified stack frame
- recursive STD
- redundant CLD
It's all x86-only for now, but later on this can also be used for PAN
on ARM and objtool is fairly cross-platform in principle.
While all warnings emitted by this new checking facility that got
reported to us were fixed, there might be GCC version dependent
warnings that were not reported yet - which we'll address, should they
trigger.
The warnings are non-fatal build warnings"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions
x86/uaccess: Dont leak the AC flag into __put_user() argument evaluation
sched/x86_64: Don't save flags on context switch
objtool: Add Direction Flag validation
objtool: Add UACCESS validation
objtool: Fix sibling call detection
objtool: Rewrite alt->skip_orig
objtool: Add --backtrace support
objtool: Rewrite add_ignores()
objtool: Handle function aliases
objtool: Set insn->func for alternatives
x86/uaccess, kcov: Disable stack protector
x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP
x86/uaccess, ubsan: Fix UBSAN vs. SMAP
x86/uaccess, kasan: Fix KASAN vs SMAP
x86/smap: Ditch __stringify()
x86/uaccess: Introduce user_access_{save,restore}()
x86/uaccess, signal: Fix AC=1 bloat
x86/uaccess: Always inline user_access_begin()
x86/uaccess, xen: Suppress SMAP warnings
...
The pvlock_page and hvclock_page variables are (as the name implies)
addresses to pages, created by the linker script.
But we declared them as just "extern u8" variables, which _works_, but
now that gcc does some more bounds checking, it causes warnings like
warning: array subscript 1 is outside array bounds of ‘u8[1]’
when we then access more than one byte from those variables.
Fix this by simply making the declaration of the variables match
reality, which makes the compiler happy too.
Signed-off-by: Linus Torvalds <torvalds@-linux-foundation.org>