In preparation for reworking the EL1 irq/nmi entry code, move the
existing logic to C. We no longer need the asm_nmi_enter() and
asm_nmi_exit() wrappers, so these are removed. The new C functions are
marked noinstr, which prevents compiler instrumentation and runtime
probing.
In subsequent patches we'll want the new C helpers to be called in all
cases, so we don't bother wrapping the calls with ifdeferry. Even when
the new C functions are stubs the trivial calls are unlikely to have a
measurable impact on the IRQ or NMI paths anyway.
Prototypes are added to <asm/exception.h> as otherwise (in some
configurations) GCC will complain about the lack of a forward
declaration. We already do this for existing function, e.g.
enter_from_user_mode().
The new helpers are marked as noinstr (which prevents all
instrumentation, tracing, and kprobes). Otherwise, there should be no
functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-7-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In a subsequent patch ret_to_user will need to make a C function call
(in some configurations) which may clobber x0-x18 at the start of the
finish_ret_to_user block, before enable_step_tsk consumes the flags
loaded into x1.
In preparation for this, let's load the flags into x19, which is
preserved across C function calls. This avoids a redundant reload of the
flags and ensures we operate on a consistent shapshot regardless.
There should be no functional change as a result of this patch. At this
point of the entry/exit paths we only need to preserve x28 (tsk) and the
sp, and x19 is free for this use.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-6-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Depending on configuration options and specific code paths, we either
use the empty_zero_page or the configuration-dependent reserved_ttbr0
as a reserved value for TTBR{0,1}_EL1.
To simplify this code, let's always allocate and use the same
reserved_pg_dir, replacing reserved_ttbr0. Note that this is allocated
(and hence pre-zeroed), and is also marked as read-only in the kernel
Image mapping.
Keeping this separate from the empty_zero_page potentially helps with
robustness as the empty_zero_page is used in a number of cases where a
failure to map it read-only could allow it to become corrupted.
The (presently unused) swapper_pg_end symbol is also removed, and
comments are added wherever we rely on the offsets between the
pre-allocated pg_dirs to keep these cases easily identifiable.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201103102229.8542-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load
and a store exclusive or PAR_EL1 read can cause a deadlock.
The workaround requires a DMB SY before and after a PAR_EL1 register
read. In addition, it's possible an interrupt (doing a device read) or
KVM guest exit could be taken between the DMB and PAR read, so we
also need a DMB before returning from interrupt and before returning to
a guest.
A deadlock is still possible with the workaround as KVM guests must also
have the workaround. IOW, a malicious guest can deadlock an affected
systems.
This workaround also depends on a firmware counterpart to enable the h/w
to insert DMB SY after load and store exclusive instructions. See the
errata document SDEN-1152370 v10 [1] for more information.
[1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/20201028182839.166037-2-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Add userspace support for the Memory Tagging Extension introduced by
Armv8.5.
(Catalin Marinas and others)
* for-next/mte: (30 commits)
arm64: mte: Fix typo in memory tagging ABI documentation
arm64: mte: Add Memory Tagging Extension documentation
arm64: mte: Kconfig entry
arm64: mte: Save tags when hibernating
arm64: mte: Enable swap of tagged pages
mm: Add arch hooks for saving/restoring tags
fs: Handle intra-page faults in copy_mount_options()
arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset
arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks
arm64: mte: Restore the GCR_EL1 register after a suspend
arm64: mte: Allow user control of the generated random tags via prctl()
arm64: mte: Allow user control of the tag check mode via prctl()
mm: Allow arm64 mmap(PROT_MTE) on RAM-based files
arm64: mte: Validate the PROT_MTE request via arch_validate_flags()
mm: Introduce arch_validate_flags()
arm64: mte: Add PROT_MTE support to mmap() and mprotect()
mm: Introduce arch_calc_vm_flag_bits()
arm64: mte: Tags-aware aware memcmp_pages() implementation
arm64: Avoid unnecessary clear_user_page() indirection
...
Rewrite the Spectre-v4 mitigation handling code to follow the same
approach as that taken by Spectre-v2.
For now, report to KVM that the system is vulnerable (by forcing
'ssbd_state' to ARM64_SSBD_UNKNOWN), as this will be cleared up in
subsequent steps.
Signed-off-by: Will Deacon <will@kernel.org>
The spectre mitigations are too configurable for their own good, leading
to confusing logic trying to figure out when we should mitigate and when
we shouldn't. Although the plethora of command-line options need to stick
around for backwards compatibility, the default-on CONFIG options that
depend on EXPERT can be dropped, as the mitigations only do anything if
the system is vulnerable, a mitigation is available and the command-line
hasn't disabled it.
Remove CONFIG_HARDEN_BRANCH_PREDICTOR and CONFIG_ARM64_SSBD in favour of
enabling this code unconditionally.
Signed-off-by: Will Deacon <will@kernel.org>
The Memory Tagging Extension has two modes of notifying a tag check
fault at EL0, configurable through the SCTLR_EL1.TCF0 field:
1. Synchronous raising of a Data Abort exception with DFSC 17.
2. Asynchronous setting of a cumulative bit in TFSRE0_EL1.
Add the exception handler for the synchronous exception and handling of
the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in
do_notify_resume().
On a tag check failure in user-space, whether synchronous or
asynchronous, a SIGSEGV will be raised on the faulting thread.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Instead of dealing with erratum 1418040 on each entry and exit,
let's move the handling to __switch_to() instead, which has
several advantages:
- It can be applied when it matters (switching between 32 and 64
bit tasks).
- It is written in C (yay!)
- It can rely on static keys rather than alternatives
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200731173824.107480-2-maz@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Removal of the tremendously unpopular read_barrier_depends() barrier,
which is a NOP on all architectures apart from Alpha, in favour of
allowing architectures to override READ_ONCE() and do whatever dance
they need to do to ensure address dependencies provide LOAD ->
LOAD/STORE ordering. This work also offers a potential solution if
compilers are shown to convert LOAD -> LOAD address dependencies into
control dependencies (e.g. under LTO), as weakly ordered architectures
will effectively be able to upgrade READ_ONCE() to smp_load_acquire().
The latter case is not used yet, but will be discussed further at LPC.
- Make the MSI/IOMMU input/output ID translation PCI agnostic, augment
the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
bus-specific parameter and apply the resulting changes to the device
ID space provided by the Freescale FSL bus.
- arm64 support for TLBI range operations and translation table level
hints (part of the ARMv8.4 architecture version).
- Time namespace support for arm64.
- Export the virtual and physical address sizes in vmcoreinfo for
makedumpfile and crash utilities.
- CPU feature handling cleanups and checks for programmer errors
(overlapping bit-fields).
- ACPI updates for arm64: disallow AML accesses to EFI code regions and
kernel memory.
- perf updates for arm64.
- Miscellaneous fixes and cleanups, most notably PLT counting
optimisation for module loading, recordmcount fix to ignore
relocations other than R_AARCH64_CALL26, CMA areas reserved for
gigantic pages on 16K and 64K configurations.
- Trivial typos, duplicate words.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl8oTcsACgkQa9axLQDI
XvEj6hAAkn39mO5xrR/Vhpg3DyFPk63ZlMSX9SsOeVyaLbovT6stTs1XAZXPpnkt
rV3gwACyGSrqH6+uey9pHgHJuPF2TdrGEVK08yVKo9KGW/6yXSIncdKFE4jUJ/WJ
wF5j7eMET2aGzcpm5AlzMmq6HOrKB8nZac9H8/x6H+Ox2WdgJkEjOkDvyqACUyum
N3FsTZkWj2pIkTXHNgDZ8KjxVLO8HlFaB2hkxFDl9NPlX2UTCQJ8Tg1KiPLafKaK
gUvH4usQDFdb5RU/UWogre37J4emO0ZTApZOyju+U+PMMWlWVHjZ4isUIS9zz/AE
JNZ23dnKZX2HrYa5p8HZx175zwj/vXUqUHCZPLvQXaAudCEhF8BVljPiG0e80FV5
GHFUgUbylKspp01I/9L+2JvsG96Mr0e+P3Sx7L2HTI42cmtoSa14+MpoSRj7zlft
Qcl8hfrVOjCjUnFRHa/1y1cGvnD9GbgnKJR7zgVxl9bD/Jd48r1HUtwRORZCzWFr
mRPVbPS72fWxMzMV9DZYJm02jJY9kLX2BMl49njbB8MhAhzOvrMVzoVVtMMeRFLR
XHeJpmg36W09FiRGe7LRXlkXIhCQzQG2bJfiphuupCfhjRAitPoq8I925G6Pig60
c8RWaXGU7PrEsdMNrL83vekvGKgqrkoFkRVtsCoQ2X6Hvu/XdYI=
=mh79
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 and cross-arch updates from Catalin Marinas:
"Here's a slightly wider-spread set of updates for 5.9.
Going outside the usual arch/arm64/ area is the removal of
read_barrier_depends() series from Will and the MSI/IOMMU ID
translation series from Lorenzo.
The notable arm64 updates include ARMv8.4 TLBI range operations and
translation level hint, time namespace support, and perf.
Summary:
- Removal of the tremendously unpopular read_barrier_depends()
barrier, which is a NOP on all architectures apart from Alpha, in
favour of allowing architectures to override READ_ONCE() and do
whatever dance they need to do to ensure address dependencies
provide LOAD -> LOAD/STORE ordering.
This work also offers a potential solution if compilers are shown
to convert LOAD -> LOAD address dependencies into control
dependencies (e.g. under LTO), as weakly ordered architectures will
effectively be able to upgrade READ_ONCE() to smp_load_acquire().
The latter case is not used yet, but will be discussed further at
LPC.
- Make the MSI/IOMMU input/output ID translation PCI agnostic,
augment the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
bus-specific parameter and apply the resulting changes to the
device ID space provided by the Freescale FSL bus.
- arm64 support for TLBI range operations and translation table level
hints (part of the ARMv8.4 architecture version).
- Time namespace support for arm64.
- Export the virtual and physical address sizes in vmcoreinfo for
makedumpfile and crash utilities.
- CPU feature handling cleanups and checks for programmer errors
(overlapping bit-fields).
- ACPI updates for arm64: disallow AML accesses to EFI code regions
and kernel memory.
- perf updates for arm64.
- Miscellaneous fixes and cleanups, most notably PLT counting
optimisation for module loading, recordmcount fix to ignore
relocations other than R_AARCH64_CALL26, CMA areas reserved for
gigantic pages on 16K and 64K configurations.
- Trivial typos, duplicate words"
Link: http://lkml.kernel.org/r/20200710165203.31284-1-will@kernel.org
Link: http://lkml.kernel.org/r/20200619082013.13661-1-lorenzo.pieralisi@arm.com
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (82 commits)
arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
arm64/mm: save memory access in check_and_switch_context() fast switch path
arm64: sigcontext.h: delete duplicated word
arm64: ptrace.h: delete duplicated word
arm64: pgtable-hwdef.h: delete duplicated words
bus: fsl-mc: Add ACPI support for fsl-mc
bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
of/irq: Make of_msi_map_rid() PCI bus agnostic
of/irq: make of_msi_map_get_device_domain() bus agnostic
dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
of/device: Add input id to of_dma_configure()
of/iommu: Make of_map_rid() PCI agnostic
ACPI/IORT: Add an input ID to acpi_dma_configure()
ACPI/IORT: Remove useless PCI bus walk
ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
arm64: enable time namespace support
arm64/vdso: Restrict splitting VVAR VMA
arm64/vdso: Handle faults on timens page
...
* for-next/read-barrier-depends:
: Allow architectures to override __READ_ONCE()
arm64: Reduce the number of header files pulled into vmlinux.lds.S
compiler.h: Move compiletime_assert() macros into compiler_types.h
checkpatch: Remove checks relating to [smp_]read_barrier_depends()
include/linux: Remove smp_read_barrier_depends() from comments
tools/memory-model: Remove smp_read_barrier_depends() from informal doc
Documentation/barriers/kokr: Remove references to [smp_]read_barrier_depends()
Documentation/barriers: Remove references to [smp_]read_barrier_depends()
locking/barriers: Remove definitions for [smp_]read_barrier_depends()
alpha: Replace smp_read_barrier_depends() usage with smp_[r]mb()
vhost: Remove redundant use of read_barrier_depends() barrier
asm/rwonce: Don't pull <asm/barrier.h> into 'asm-generic/rwonce.h'
asm/rwonce: Remove smp_read_barrier_depends() invocation
alpha: Override READ_ONCE() with barriered implementation
asm/rwonce: Allow __READ_ONCE to be overridden by the architecture
compiler.h: Split {READ,WRITE}_ONCE definitions out into rwonce.h
tools: bpf: Use local copy of headers including uapi/linux/filter.h
Factor the 12 copies of the SW PAN entry and exit code into callable
subroutines, and use alternatives patching to either emit a 'bl'
instruction to call them, or a NOP if h/w PAN is found to be available
at runtime.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20200721083315.4816-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Although vmlinux.lds.S smells like an assembly file and is compiled
with __ASSEMBLY__ defined, it's actually just fed to the preprocessor to
create our linker script. This means that any assembly macros defined
by headers that it includes will result in a helpful link error:
| aarch64-linux-gnu-ld:./arch/arm64/kernel/vmlinux.lds:1: syntax error
In preparation for an arm64-private asm/rwonce.h implementation, which
will end up pulling assembly macros into linux/compiler.h, reduce the
number of headers we include directly and transitively in vmlinux.lds.S
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will@kernel.org>
Continually butchering our entry code with CPU errata workarounds has
led to it looking a little scruffy. Consistently used /* */ comment
style for multi-line block comments and ensure that small numeric labels
use consecutive integers.
No functional change, but the state of things was irritating.
Signed-off-by: Will Deacon <will@kernel.org>
The current handling of erratum 1414080 has the side effect that
cntkctl_el1 can get changed for both 32 and 64bit tasks.
This isn't a problem so far, but if we ever need to mitigate another
of these errata on the 64bit side, we'd better keep the messing with
cntkctl_el1 local to 32bit tasks.
For that, make sure that on entering the kernel from a 32bit tasks,
userspace access to cntvct gets enabled, and disabled returning to
userspace, while it never gets changed for 64bit tasks.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200706163802.1836732-5-maz@kernel.org
[will: removed branch instructions per Mark's review comments]
Signed-off-by: Will Deacon <will@kernel.org>
Support for Clang's Shadow Call Stack in the kernel
(Sami Tolvanen and Will Deacon)
* for-next/scs:
arm64: entry-ftrace.S: Update comment to indicate that x18 is live
scs: Move DEFINE_SCS macro into core code
scs: Remove references to asm/scs.h from core code
scs: Move scs_overflow_check() out of architecture code
arm64: scs: Use 'scs_sp' register alias for x18
scs: Move accounting into alloc/free functions
arm64: scs: Store absolute SCS stack pointer value in thread_info
efi/libstub: Disable Shadow Call Stack
arm64: scs: Add shadow stacks for SDEI
arm64: Implement Shadow Call Stack
arm64: Disable SCS for hypervisor code
arm64: vdso: Disable Shadow Call Stack
arm64: efi: Restore register x18 if it was corrupted
arm64: Preserve register x18 when CPU is suspended
arm64: Reserve register x18 from general allocation with SCS
scs: Disable when function graph tracing is enabled
scs: Add support for stack usage debugging
scs: Add page accounting for shadow call stack allocations
scs: Add support for Clang's Shadow Call Stack (SCS)
Support for Branch Target Identification (BTI) in user and kernel
(Mark Brown and others)
* for-next/bti: (39 commits)
arm64: vdso: Fix CFI directives in sigreturn trampoline
arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
arm64: bti: Fix support for userspace only BTI
arm64: kconfig: Update and comment GCC version check for kernel BTI
arm64: vdso: Map the vDSO text with guarded pages when built for BTI
arm64: vdso: Force the vDSO to be linked as BTI when built for BTI
arm64: vdso: Annotate for BTI
arm64: asm: Provide a mechanism for generating ELF note for BTI
arm64: bti: Provide Kconfig for kernel mode BTI
arm64: mm: Mark executable text as guarded pages
arm64: bpf: Annotate JITed code for BTI
arm64: Set GP bit in kernel page tables to enable BTI for the kernel
arm64: asm: Override SYM_FUNC_START when building the kernel with BTI
arm64: bti: Support building kernel C code using BTI
arm64: Document why we enable PAC support for leaf functions
arm64: insn: Report PAC and BTI instructions as skippable
arm64: insn: Don't assume unrecognized HINTs are skippable
arm64: insn: Provide a better name for aarch64_insn_is_nop()
arm64: insn: Add constants for new HINT instruction decode
arm64: Disable old style assembly annotations
...
x18 holds the SCS stack pointer value, so introduce a register alias to
make this easier to read in assembly code.
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
This change adds per-CPU shadow call stacks for the SDEI handler.
Similarly to how the kernel stacks are handled, we add separate shadow
stacks for normal and critical events.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
As part of an effort to clarify and clean up the assembler annotations
new macros have been introduced which annotate the start and end of blocks
of code in assembler files. Currently ret_to_user has an out of line slow
path work_pending placed above the main function which makes annotating the
start and end of these blocks of code awkward.
Since work_pending is only referenced from within ret_to_user try to make
things a bit clearer by moving it after the current ret_to_user and then
marking both ret_to_user and work_pending as part of a single ret_to_user
code block.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200501115430.37315-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The 'sync' argument to ptrauth_keys_install_kernel macro is somewhat
opaque at callsites, so instead lets have regular and _nosync variants
of the macro to make this a little more obvious.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200423101606.37601-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
This patch restores the kernel keys from current task during cpu resume
after the mmu is turned on and ptrauth is enabled.
A flag is added in macro ptrauth_keys_install_kernel to check if isb
instruction needs to be executed.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Set up keys to use pointer authentication within the kernel. The kernel
will be compiled with APIAKey instructions, the other keys are currently
unused. Each task is given its own APIAKey, which is initialized during
fork. The key is changed during context switch and on kernel entry from
EL0.
The keys for idle threads need to be set before calling any C functions,
because it is not possible to enter and exit a function with different
keys.
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[Amit: Modified secondary cores key structure, comments]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As we're going to enable pointer auth within the kernel and use a
different APIAKey for the kernel itself, so move the user APIAKey
switch to EL0 exception return.
The other 4 keys could remain switched during task switch, but are also
moved to keep things consistent.
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[Amit: commit msg, re-positioned the patch, comments]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions.
The SDEI entry points are currently annotated as normal functions but
are called from non-kernel contexts with non-standard calling convention
and should therefore be annotated as such so do so.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: James Morse <james.Morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In an effort to clarify and simplify the annotation of assembly functions
in the kernel new macros have been introduced. These include specific
annotations for the start and end of data, update symbols for data to use
these.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In an effort to clarify and simplify the annotation of assembly functions
in the kernel new macros have been introduced. These replace ENTRY and
ENDPROC with separate annotations for standard C callable functions,
data and code with different calling conventions. Update the
remaining annotations in the entry.S code to the new macros.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions.
ret_from_fork is not a normal C function and should therefore be
annotated as code.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions. The vector table and handlers aren't
normal C style code so should be annotated as CODE.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull scheduler updates from Ingo Molnar:
"These were the main changes in this cycle:
- More -rt motivated separation of CONFIG_PREEMPT and
CONFIG_PREEMPTION.
- Add more low level scheduling topology sanity checks and warnings
to filter out nonsensical topologies that break scheduling.
- Extend uclamp constraints to influence wakeup CPU placement
- Make the RT scheduler more aware of asymmetric topologies and CPU
capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y
- Make idle CPU selection more consistent
- Various fixes, smaller cleanups, updates and enhancements - please
see the git log for details"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
sched/fair: Define sched_idle_cpu() only for SMP configurations
sched/topology: Assert non-NUMA topology masks don't (partially) overlap
idle: fix spelling mistake "iterrupts" -> "interrupts"
sched/fair: Remove redundant call to cpufreq_update_util()
sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
sched/fair: calculate delta runnable load only when it's needed
sched/cputime: move rq parameter in irqtime_account_process_tick
stop_machine: Make stop_cpus() static
sched/debug: Reset watchdog on all CPUs while processing sysrq-t
sched/core: Fix size of rq::uclamp initialization
sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
sched/fair: Load balance aggressively for SCHED_IDLE CPUs
sched/fair : Improve update_sd_pick_busiest for spare capacity case
watchdog: Remove soft_lockup_hrtimer_cnt and related code
sched/rt: Make RT capacity-aware
sched/fair: Make EAS wakeup placement consider uclamp restrictions
sched/fair: Make task_fits_capacity() consider uclamp restrictions
sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
sched/uclamp: Make uclamp util helpers use and return UL values
...
kernel_ventry will create alternative entries to potentially replace
0 instructions with 0 instructions for EL1 vectors. While this does not
cause an issue, it pointlessly takes up some bytes in the alternatives
section.
Do not generate such entries.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
The kernel stashes the current task struct in sp_el0 so that this can be
acquired consistently/cheaply when required. When we take an exception
from EL0 we have to:
1) stash the original sp_el0 value
2) find the current task
3) update sp_el0 with the current task pointer
Currently steps #1 and #2 occur in one place, and step #3 a while later.
As the value of sp_el0 is immaterial between these points, let's move
them together to make the code clearer and minimize ifdeffery. This
necessitates moving the comment for MDSCR_EL1.SS.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Commit 582f95835a ("arm64: entry: convert el0_sync to C") caused
the ENDPROC() annotating the end of el0_sync to be placed after the code
for el0_sync_compat. This replaced the previous annotation where it was
located after all the cases that are now converted to C, including after
the currently unannotated el0_irq_compat and el0_error_compat. Move the
annotation to the end of the function and add separate annotations for
the _compat ones.
Fixes: 582f95835a (arm64: entry: convert el0_sync to C)
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.
Switch the Kconfig dependency, entry code and preemption handling over
to use CONFIG_PREEMPTION. Add PREEMPT_RT output in show_stack().
[bigeasy: +traps.c, Kconfig]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20191015191821.11479-3-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Stack overflow checking can be done by testing sp & (1 << THREAD_SHIFT)
only for the stacks are aligned to (2 << THREAD_SHIFT) with size of
(1 << THREAD_SIZE), and this is the case when CONFIG_VMAP_STACK is set.
Fix the code comment to avoid confusion.
Cc: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Heyi Guo <guoheyi@huawei.com>
[catalin.marinas@arm.com: Updated comment following Mark's suggestion]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* for-next/elf-hwcap-docs:
: Update the arm64 ELF HWCAP documentation
docs/arm64: cpu-feature-registers: Rewrite bitfields that don't follow [e, s]
docs/arm64: cpu-feature-registers: Documents missing visible fields
docs/arm64: elf_hwcaps: Document HWCAP_SB
docs/arm64: elf_hwcaps: sort the HWCAP{, 2} documentation by ascending value
* for-next/smccc-conduit-cleanup:
: SMC calling convention conduit clean-up
firmware: arm_sdei: use common SMCCC_CONDUIT_*
firmware/psci: use common SMCCC_CONDUIT_*
arm: spectre-v2: use arm_smccc_1_1_get_conduit()
arm64: errata: use arm_smccc_1_1_get_conduit()
arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
* for-next/zone-dma:
: Reintroduction of ZONE_DMA for Raspberry Pi 4 support
arm64: mm: reserve CMA and crashkernel in ZONE_DMA32
dma/direct: turn ARCH_ZONE_DMA_BITS into a variable
arm64: Make arm64_dma32_phys_limit static
arm64: mm: Fix unused variable warning in zone_sizes_init
mm: refresh ZONE_DMA and ZONE_DMA32 comments in 'enum zone_type'
arm64: use both ZONE_DMA and ZONE_DMA32
arm64: rename variables used to calculate ZONE_DMA32's size
arm64: mm: use arm64_dma_phys_limit instead of calling max_zone_dma_phys()
* for-next/relax-icc_pmr_el1-sync:
: Relax ICC_PMR_EL1 (GICv3) accesses when ICC_CTLR_EL1.PMHE is clear
arm64: Document ICC_CTLR_EL3.PMHE setting requirements
arm64: Relax ICC_PMR_EL1 accesses when ICC_CTLR_EL1.PMHE is clear
* for-next/double-page-fault:
: Avoid a double page fault in __copy_from_user_inatomic() if hw does not support auto Access Flag
mm: fix double page fault on arm64 if PTE_AF is cleared
x86/mm: implement arch_faults_on_old_pte() stub on x86
arm64: mm: implement arch_faults_on_old_pte() on arm64
arm64: cpufeature: introduce helper cpu_has_hw_af()
* for-next/misc:
: Various fixes and clean-ups
arm64: kpti: Add NVIDIA's Carmel core to the KPTI whitelist
arm64: mm: Remove MAX_USER_VA_BITS definition
arm64: mm: simplify the page end calculation in __create_pgd_mapping()
arm64: print additional fault message when executing non-exec memory
arm64: psci: Reduce the waiting time for cpu_psci_cpu_kill()
arm64: pgtable: Correct typo in comment
arm64: docs: cpu-feature-registers: Document ID_AA64PFR1_EL1
arm64: cpufeature: Fix typos in comment
arm64/mm: Poison initmem while freeing with free_reserved_area()
arm64: use generic free_initrd_mem()
arm64: simplify syscall wrapper ifdeffery
* for-next/kselftest-arm64-signal:
: arm64-specific kselftest support with signal-related test-cases
kselftest: arm64: fake_sigreturn_misaligned_sp
kselftest: arm64: fake_sigreturn_bad_size
kselftest: arm64: fake_sigreturn_duplicated_fpsimd
kselftest: arm64: fake_sigreturn_missing_fpsimd
kselftest: arm64: fake_sigreturn_bad_size_for_magic0
kselftest: arm64: fake_sigreturn_bad_magic
kselftest: arm64: add helper get_current_context
kselftest: arm64: extend test_init functionalities
kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht]
kselftest: arm64: mangle_pstate_invalid_daif_bits
kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils
kselftest: arm64: extend toplevel skeleton Makefile
* for-next/kaslr-diagnostics:
: Provide diagnostics on boot for KASLR
arm64: kaslr: Check command line before looking for a seed
arm64: kaslr: Announce KASLR status on boot
Move the synchronous exception paths from entry.S into a C file to
improve the code readability.
* for-next/entry-s-to-c:
arm64: entry-common: don't touch daif before bp-hardening
arm64: Remove asmlinkage from updated functions
arm64: entry: convert el0_sync to C
arm64: entry: convert el1_sync to C
arm64: add local_daif_inherit()
arm64: Add prototypes for functions called by entry.S
arm64: remove __exception annotations
This is largely a 1-1 conversion of asm to C, with a couple of caveats.
The el0_sync{_compat} switches explicitly handle all the EL0 debug
cases, so el0_dbg doesn't have to try to bail out for unexpected EL1
debug ESR values. This also means that an unexpected vector catch from
AArch32 is routed to el0_inv.
We *could* merge the native and compat switches, which would make the
diffstat negative, but I've tried to stay as close to the existing
assembly as possible for the moment.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[split out of a bigger series, added nokprobes. removed irq trace
calls as the C helpers do this. renamed el0_dbg's use of FAR]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch converts the EL1 sync entry assembly logic to C code.
Doing this will allow us to make changes in a slightly more
readable way. A case in point is supporting kernel-first RAS.
do_sea() should be called on the CPU that took the fault.
Largely the assembly code is converted to C in a relatively
straightforward manner.
Since all sync sites share a common asm entry point, the ASM_BUG()
instances are no longer required for effective backtraces back to
assembly, and we don't need similar BUG() entries.
The ESR_ELx.EC codes for all (supported) debug exceptions are now
checked in the el1_sync_handler's switch statement, which renders the
check in el1_dbg redundant. This both simplifies the el1_dbg handler,
and makes the EL1 exception handling more robust to
currently-unallocated ESR_ELx.EC encodings.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[split out of a bigger series, added nokprobes, moved prototypes]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Workaround for Cavium/Marvell ThunderX2 erratum #219.
* errata/tx2-219:
arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
Sign-extending TTBR1 addresses when converting to an untagged address
breaks the documented POSIX semantics for mlock() in some obscure error
cases where we end up returning -EINVAL instead of -ENOMEM as a direct
result of rewriting the upper address bits.
Rework the untagged_addr() macro to preserve the upper address bits for
TTBR1 addresses and only clear the tag bits for user addresses. This
matches the behaviour of the 'clear_address_tag' assembly macro, so
rename that and align the implementations at the same time so that they
use the same instruction sequences for the tag manipulation.
Link: https://lore.kernel.org/stable/20191014162651.GF19200@arrakis.emea.arm.com/
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.
However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.
* Task 1:
PAN == 0 ---| |---------------
| |<- return from IRQ, PSTATE.PAN = 0
| <- IRQ |
+--------+ <- preempt() +--
^
|
reschedule Task 1, PSTATE.PAN == 1
* Init:
--------------------+------------------------
^
|
enable_cpu_features
set PSTATE.PAN on all CPUs
Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).
Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.
This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The GICv3 architecture specification is incredibly misleading when it
comes to PMR and the requirement for a DSB. It turns out that this DSB
is only required if the CPU interface sends an Upstream Control
message to the redistributor in order to update the RD's view of PMR.
This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't
the case in Linux. It can still be set from EL3, so some special care
is required. But the upshot is that in the (hopefuly large) majority
of the cases, we can drop the DSB altogether.
This relies on a new static key being set if the boot CPU has PMHE
set. The drawback is that this static key has to be exported to
modules.
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As a PRFM instruction racing against a TTBR update can have undesirable
effects on TX2, NOP-out such PRFM on cores that are affected by
the TX2-219 erratum.
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Commit bd82d4bd21 ("arm64: Fix incorrect irqflag restore for priority
masking") added a macro to the entry.S call paths that leave the
PSTATE.I bit set. This tells the pPNMI masking logic that interrupts
are masked by the CPU, not by the PMR. This value is read back by
local_daif_save().
Commit bd82d4bd21 added this call to el0_svc, as el0_svc_handler
is called with interrupts masked. el0_svc_compat was missed, but should
be covered in the same way as both of these paths end up in
el0_svc_common(), which expects to unmask interrupts.
Fixes: bd82d4bd21 ("arm64: Fix incorrect irqflag restore for priority masking")
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
When taking an SError or Debug exception from EL0, we run the C
handler for these exceptions before updating the context tracking
code and unmasking lower priority interrupts.
When booting with nohz_full lockdep tells us we got this wrong:
| =============================
| WARNING: suspicious RCU usage
| 5.3.0-rc2-00010-gb4b5e9dcb11b-dirty #11271 Not tainted
| -----------------------------
| include/linux/rcupdate.h:643 rcu_read_unlock() used illegally wh!
|
| other info that might help us debug this:
|
|
| RCU used illegally from idle CPU!
| rcu_scheduler_active = 2, debug_locks = 1
| RCU used illegally from extended quiescent state!
| 1 lock held by a.out/432:
| #0: 00000000c7a79515 (rcu_read_lock){....}, at: brk_handler+0x00
|
| stack backtrace:
| CPU: 1 PID: 432 Comm: a.out Not tainted 5.3.0-rc2-00010-gb4b5e9d1
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno De8
| Call trace:
| dump_backtrace+0x0/0x140
| show_stack+0x14/0x20
| dump_stack+0xbc/0x104
| lockdep_rcu_suspicious+0xf8/0x108
| brk_handler+0x164/0x1b0
| do_debug_exception+0x11c/0x278
| el0_dbg+0x14/0x20
Moving the ct_user_exit calls to be before do_debug_exception() means
they are also before trace_hardirqs_off() has been updated. Add a new
ct_user_exit_irqoff macro to avoid the context-tracking code using
irqsave/restore before we've updated trace_hardirqs_off(). To be
consistent, do this everywhere.
The C helper is called enter_from_user_mode() to match x86 in the hope
we can merge them into kernel/context_tracking.c later.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 6c81fe7925 ("arm64: enable context tracking")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Comparing the arm-arm's pseudocode for AArch64.PCAlignmentFault() with
AArch64.SPAlignmentFault() shows that SP faults don't copy the faulty-SP
to FAR_EL1, but this is where we read from, and the address we provide
to user-space with the BUS_ADRALN signal.
For user-space this value will be UNKNOWN due to the previous ERET to
user-space. If the last value is preserved, on systems with KASLR or KPTI
this will be the user-space link-register left in FAR_EL1 by tramp_exit().
Fix this to retrieve the original sp_el0 value, and pass this to
do_sp_pc_fault().
SP alignment faults from EL1 will cause us to take the fault again when
trying to store the pt_regs. This eventually takes us to the overflow
stack. Remove the ESR_ELx_EC_SP_ALIGN check as we will never make it
this far.
Fixes: 60ffc30d56 ("arm64: Exception handling")
Signed-off-by: James Morse <james.morse@arm.com>
[will: change label name and fleshed out comment]
Signed-off-by: Will Deacon <will@kernel.org>
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG
and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags'
introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl0eHqcACgkQa9axLQDI
XvFyNA/+L+bnkz8m3ncydlqqfXomQn4eJJVQ8Uksb0knJz+1+3CUxxbO4ry4jXZN
fMkbggYrDPRKpDbsUl0lsRipj7jW9bqan+N37c3SWqCkgb6HqDaHViwxdx6Ec/Uk
gHudozDSPh/8c7hxGcSyt/CFyuW6b+8eYIQU5rtIgz8aVY2BypBvS/7YtYCbIkx0
w4CFleRTK1zXD5mJQhrc6jyDx659sVkrAvdhf6YIymOY8nBTv40vwdNo3beJMYp8
Po/+0Ixu+VkHUNtmYYZQgP/AGH96xiTcRnUqd172JdtRPpCLqnLqwFokXeVIlUKT
KZFMDPzK+756Ayn4z4huEePPAOGlHbJje8JVNnFyreKhVVcCotW7YPY/oJR10bnc
eo7yD+DxABTn+93G2yP436bNVa8qO1UqjOBfInWBtnNFJfANIkZweij/MQ6MjaTA
o7KtviHnZFClefMPoiI7HDzwL8XSmsBDbeQ04s2Wxku1Y2xUHLx4iLmadwLQ1ZPb
lZMTZP3N/T1554MoURVA1afCjAwiqU3bt1xDUGjbBVjLfSPBAn/25IacsG9Li9AF
7Rp1M9VhrfLftjFFkB2HwpbhRASOxaOSx+EI3kzEfCtM2O9I1WHgP3rvCdc3l0HU
tbK0/IggQicNgz7GSZ8xDlWPwwSadXYGLys+xlMZEYd3pDIOiFc=
=0TDT
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new
XAFLAG and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the
'arm_boot_flags' introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
perf: arm_spe: Enable ACPI/Platform automatic module loading
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
ACPI/PPTT: Modify node flag detection to find last IDENTICAL
x86/entry: Simplify _TIF_SYSCALL_EMU handling
arm64: rename dump_instr as dump_kernel_instr
arm64/mm: Drop [PTE|PMD]_TYPE_FAULT
arm64: Implement panic_smp_self_stop()
arm64: Improve parking of stopped CPUs
arm64: Expose FRINT capabilities to userspace
arm64: Expose ARMv8.5 CondM capability to userspace
arm64: defconfig: enable CONFIG_RANDOMIZE_BASE
arm64: ARM64_MODULES_PLTS must depend on MODULES
arm64: bpf: do not allocate executable memory
arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
arm64: module: create module allocations without exec permissions
arm64: Allow user selection of ARM64_MODULE_PLTS
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
arm64: Allow selecting Pseudo-NMI again
...
When using IRQ priority masking to disable interrupts, in order to deal
with the PSR.I state, local_irq_save() would convert the I bit into a
PMR value (GIC_PRIO_IRQOFF). This resulted in local_irq_restore()
potentially modifying the value of PMR in undesired location due to the
state of PSR.I upon flag saving [1].
In an attempt to solve this issue in a less hackish manner, introduce
a bit (GIC_PRIO_IGNORE_PMR) for the PMR values that can represent
whether PSR.I is being used to disable interrupts, in which case it
takes precedence of the status of interrupt masking via PMR.
GIC_PRIO_PSR_I_SET is chosen such that (<pmr_value> |
GIC_PRIO_PSR_I_SET) does not mask more interrupts than <pmr_value> as
some sections (e.g. arch_cpu_idle(), interrupt acknowledge path)
requires PMR not to mask interrupts that could be signaled to the
CPU when using only PSR.I.
[1] https://www.spinics.net/lists/arm-kernel/msg716956.html
Fixes: 4a503217ce ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")
Cc: <stable@vger.kernel.org> # 5.1.x-
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wei Li <liwei391@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In the presence of any form of instrumentation, nmi_enter() should be
done before calling any traceable code and any instrumentation code.
Currently, nmi_enter() is done in handle_domain_nmi(), which is much
too late as instrumentation code might get called before. Move the
nmi_enter/exit() calls to the arch IRQ vector handler.
On arm64, it is not possible to know if the IRQ vector handler was
called because of an NMI before acknowledging the interrupt. However, It
is possible to know whether normal interrupts could be taken in the
interrupted context (i.e. if taking an NMI in that context could
introduce a potential race condition).
When interrupting a context with IRQs disabled, call nmi_enter() as soon
as possible. In contexts with IRQs enabled, defer this to the interrupt
controller, which is in a better position to know if an interrupt taken
is an NMI.
Fixes: bc3c03ccb4 ("arm64: Enable the support of pseudo-NMIs")
Cc: <stable@vger.kernel.org> # 5.1.x-
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For el0_dbg and el0_error, DAIF bits get explicitly cleared before
calling ct_user_exit.
When context tracking is disabled, DAIF gets set (almost) immediately
after. When context tracking is enabled, among the first things done
is disabling IRQs.
What is actually needed is:
- PSR.D = 0 so the system can be debugged (should be already the case)
- PSR.A = 0 so async error can be handled during context tracking
Do not clear PSR.I in those two locations.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 503 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We already mitigate erratum 1188873 affecting Cortex-A76 and
Neoverse-N1 r0p0 to r2p0. It turns out that revisions r0p0 to
r3p1 of the same cores are affected by erratum 1418040, which
has the same workaround as 1188873.
Let's expand the range of affected revisions to match 1418040,
and repaint all occurences of 1188873 to 1418040. Whilst we're
there, do a bit of reformating in silicon-errata.txt and drop
a now unnecessary dependency on ARM_ARCH_TIMER_OOL_WORKAROUND.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We currently deal with ARM64_ERRATUM_1188873 by always trapping EL0
accesses for both instruction sets. Although nothing wrong comes out
of that, people trying to squeeze the last drop of performance from
buggy HW find this over the top. Oh well.
Let's change the mitigation by flipping the counter enable bit
on return to userspace. Non-broken HW gets an extra branch on
the fast path, which is hopefully not the end of the world.
The arch timer workaround is also removed.
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The assembly macro get_thread_info() actually returns a task_struct and is
analogous to the current/get_current macro/function.
While it could be argued that thread_info sits at the start of
task_struct and the intention could have been to return a thread_info,
instances of loads from/stores to the address obtained from
get_thread_info() use offsets that are generated with
offsetof(struct task_struct, [...]).
Rename get_thread_info() to state it returns a task_struct.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When an NMI is raised while interrupts where disabled, the IRQ tracing
already is in the correct state (i.e. hardirqs_off) and should be left
as such when returning to the interrupted context.
Check whether PMR was masking interrupts when the NMI was raised and
skip IRQ tracing if necessary.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Handling of an NMI should not set any TIF flags. For NMIs received from
EL0 the current exit path is safe to use.
However, an NMI received at EL1 could have interrupted some task context
that has set the TIF_NEED_RESCHED flag. Preempting a task should not
happen as a result of an NMI.
Skip preemption after handling an NMI from EL1.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to replace PSR.I interrupt disabling/enabling with ICC_PMR_EL1
interrupt masking, ICC_PMR_EL1 needs to be saved/restored when
taking/returning from an exception. This mimics the way hardware saves
and restores PSR.I bit in spsr_el1 for exceptions and ERET.
Add PMR to the registers to save in the pt_regs struct upon kernel entry,
and restore it before ERET. Also, initialize it to a sane value when
creating new tasks.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since the enabling and disabling of IRQs within preempt_schedule_irq()
is contained in a need_resched() loop, we don't need the outer arch
code loop.
Reported-by: Julien Thierry <julien.thierry@arm.com>
Reported-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In commit:
3b7142752e ("arm64: convert native/compat syscall entry to C")
... we moved the syscall invocation code from assembly to C, but left
behind a number of register aliases which are now unused.
Let's remove them before they confuse someone.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 3962446922 ("arm64: preempt: Provide our own implementation of
asm/preempt.h") extended the preempt count field in struct thread_info
to 64 bits, so that it consists of a 32-bit count plus a 32-bit flag
indicating whether or not the current task needs rescheduling.
Whilst the asm-offsets definition of TSK_TI_PREEMPT was updated to point
to this new field, the assembly usage was left untouched meaning that a
32-bit load from TSK_TI_PREEMPT on a big-endian machine actually returns
the reschedule flag instead of the count.
Whilst we could fix this by pointing TSK_TI_PREEMPT at the count field,
we're actually better off reworking the two assembly users so that they
operate on the whole 64-bit value in favour of inspecting the thread
flags separately in order to determine whether a reschedule is needed.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The comment about SYS_MEMBARRIER_SYNC_CORE relying on ERET being
context-synchronizing is confusing and misplaced with kpti. Given that
this is already documented under Documentation/ (see arch-support.txt
for membarrier), remove the comment altogether.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some CPUs can speculate past an ERET instruction and potentially perform
speculative accesses to memory before processing the exception return.
Since the register state is often controlled by a lower privilege level
at the point of an ERET, this could potentially be used as part of a
side-channel attack.
This patch emits an SB sequence after each ERET so that speculation is
held up on exception return.
Signed-off-by: Will Deacon <will.deacon@arm.com>
It recently came to light that userspace can execute WFI, and that
the arm64 kernel doesn't trap this event. This sounds rather benign,
but the kernel should decide when it wants to wait for an interrupt,
and not userspace.
Let's trap WFI and immediately return after having skipped the
instruction. This effectively makes WFI a rather expensive NOP.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Instead of directly generating an UNDEF when trapping a CP15 access,
let's add a new entry point to that effect (which only generates an
UNDEF for now).
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Rather than panic() when taking an undefined instruction exception from
EL1, allow a hook to be registered in case we want to emulate the
instruction, like we will for the SSBS PSTATE manipulation instructions.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This adds support for the STACKLEAK gcc plugin to arm64 by implementing
stackleak_check_alloca(), based heavily on the x86 version, and adding the
two helpers used by the stackleak common code: current_top_of_stack() and
on_thread_stack(). The stack erasure calls are made at syscall returns.
Additionally, this disables the plugin in hypervisor and EFI stub code,
which are out of scope for the protection.
Acked-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We can zero GPRs x0 - x29 upon entry from EL0 to make it harder for
userspace to control values consumed by speculative gadgets.
We don't blat x30, since this is stashed much later, and we'll blat it
before invoking C code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that all of the syscall logic works on the saved pt_regs, apply_ssbd
can safely corrupt x0-x3 in the entry paths, and we no longer need to
restore them. So let's remove the logic doing so.
With that logic gone, we can fold the branch target into the macro, so
that callers need not deal with this. GAS provides \@, which provides a
unique value per macro invocation, which we can use to create a unique
label.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that syscalls are invoked with pt_regs, we no longer need to ensure
that the argument regsiters are live in the entry assembly, and it's
fine to not restore them after context_tracking_user_exit() has
corrupted them.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the syscall invocation logic is in C, we can migrate the rest
of the syscall entry logic over, so that the entry assembly needn't look
at the register values at all.
The SVE reset across syscall logic now unconditionally clears TIF_SVE,
but sve_user_disable() will only write back to CPACR_EL1 when SVE is
actually enabled.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently syscall tracing is a tricky assembly state machine, which can
be rather difficult to follow, and even harder to modify. Before we
start fiddling with it for pt_regs syscalls, let's convert it to C.
This is not intended to have any functional change.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As a first step towards invoking syscalls with a pt_regs argument,
convert the raw syscall invocation logic to C. We end up with a bit more
register shuffling, but the unified invocation logic means we can unify
the tracing paths, too.
Previously, assembly had to open-code calls to ni_sys() when the system
call number was out-of-bounds for the relevant syscall table. This case
is now handled by invoke_syscall(), and the assembly no longer need to
handle this case explicitly. This allows the tracing paths to be
simplified and unified, as we no longer need the __ni_sys_trace path and
the __sys_trace_return label.
This only converts the invocation of the syscall. The rest of the
syscall triage and tracing is left in assembly for now, and will be
converted in subsequent patches.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The arm64 sigreturn* syscall handlers are non-standard. Rather than
taking a number of user parameters in registers as per the AAPCS,
they expect the pt_regs as their sole argument.
To make this work, we override the syscall definitions to invoke
wrappers written in assembly, which mov the SP into x0, and branch to
their respective C functions.
On other architectures (such as x86), the sigreturn* functions take no
argument and instead use current_pt_regs() to acquire the user
registers. This requires less boilerplate code, and allows for other
features such as interposing C code in this path.
This patch takes the same approach for arm64.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tentatively-reviewed-by: Dave Martin <dave.martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Implement calls to rseq_signal_deliver, rseq_handle_notify_resume
and rseq_syscall so that we can select HAVE_RSEQ on arm64.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In order to allow userspace to be mitigated on demand, let's
introduce a new thread flag that prevents the mitigation from
being turned off when exiting to userspace, and doesn't turn
it on on entry into the kernel (with the assumption that the
mitigation is always enabled in the kernel itself).
This will be used by a prctl interface introduced in a later
patch.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to avoid checking arm64_ssbd_callback_required on each
kernel entry/exit even if no mitigation is required, let's
add yet another alternative that by default jumps over the mitigation,
and that gets nop'ed out if we're doing dynamic mitigation.
Think of it as a poor man's static key...
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In a heterogeneous system, we can end up with both affected and
unaffected CPUs. Let's check their status before calling into the
firmware.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order for the kernel to protect itself, let's call the SSBD mitigation
implemented by the higher exception level (either hypervisor or firmware)
on each transition between userspace and kernel.
We must take the PSCI conduit into account in order to target the
right exception level, hence the introduction of a runtime patching
callback.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Spectre v1 mitigation:
- back-end version of array_index_mask_nospec()
- masking of the syscall number to restrict speculation through the
syscall table
- masking of __user pointers prior to deference in uaccess routines
Spectre v2 mitigation update:
- using the new firmware SMC calling convention specification update
- removing the current PSCI GET_VERSION firmware call mitigation as
vendors are deploying new SMCCC-capable firmware
- additional branch predictor hardening for synchronous exceptions and
interrupts while in user mode
Meltdown v3 mitigation update for Cavium Thunder X: unaffected but
hardware erratum gets in the way. The kernel now starts with the page
tables mapped as global and switches to non-global if kpti needs to be
enabled.
Other:
- Theoretical trylock bug fixed
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlp8lqcACgkQa9axLQDI
XvH2lxAAnsYqthpGQ11MtDJB+/UiBAFkg9QWPDkwrBDvNhgpll+J0VQuCN1QJ2GX
qQ8rkv8uV+y4Fqr8hORGJy5At+0aI63ZCJ72RGkZTzJAtbFbFGIDHP7RhAEIGJBS
Lk9kDZ7k39wLEx30UXIFYTTVzyHar397TdI7vkTcngiTzZ8MdFATfN/hiKO906q3
14pYnU9Um4aHUdcJ+FocL3dxvdgniuuMBWoNiYXyOCZXjmbQOnDNU2UrICroV8lS
mB+IHNEhX1Gl35QzNBtC0ET+aySfHBMJmM5oln+uVUljIGx6En1WLj6mrHYcx8U2
rIBm5qO/X/4iuzYPGkxwQtpjq3wPYxsSUnMdKJrsUZqAfy2QeIhFx6XUtJsZPB2J
/lgls5xSXMOS7oiOQtmVjcDLBURDmYXGwljXR4n4jLm4CT1V9qSLcKHu1gdFU9Mq
VuMUdPOnQub1vqKndi154IoYDTo21jAib2ktbcxpJfSJnDYoit4Gtnv7eWY+M3Pd
Toaxi8htM2HSRwbvslHYGW8ZcVpI79Jit+ti7CsFg7m9Lvgs0zxcnNui4uPYDymT
jh2JYxuirIJbX9aGGhnmkNhq9REaeZJg9LA2JM8S77FCHN3bnlSdaG6wy899J6EI
lK4anCuPQKKKhUia/dc1MeKwrmmC18EfPyGUkOzywg/jGwGCmZM=
=Y0TT
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
"As I mentioned in the last pull request, there's a second batch of
security updates for arm64 with mitigations for Spectre/v1 and an
improved one for Spectre/v2 (via a newly defined firmware interface
API).
Spectre v1 mitigation:
- back-end version of array_index_mask_nospec()
- masking of the syscall number to restrict speculation through the
syscall table
- masking of __user pointers prior to deference in uaccess routines
Spectre v2 mitigation update:
- using the new firmware SMC calling convention specification update
- removing the current PSCI GET_VERSION firmware call mitigation as
vendors are deploying new SMCCC-capable firmware
- additional branch predictor hardening for synchronous exceptions
and interrupts while in user mode
Meltdown v3 mitigation update:
- Cavium Thunder X is unaffected but a hardware erratum gets in the
way. The kernel now starts with the page tables mapped as global
and switches to non-global if kpti needs to be enabled.
Other:
- Theoretical trylock bug fixed"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (38 commits)
arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support
arm/arm64: smccc: Implement SMCCC v1.1 inline primitive
arm/arm64: smccc: Make function identifiers an unsigned quantity
firmware/psci: Expose SMCCC version through psci_ops
firmware/psci: Expose PSCI conduit
arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling
arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
arm/arm64: KVM: Turn kvm_psci_version into a static inline
arm/arm64: KVM: Advertise SMCCC v1.1
arm/arm64: KVM: Implement PSCI 1.0 support
arm/arm64: KVM: Add smccc accessors to PSCI code
arm/arm64: KVM: Add PSCI_VERSION helper
arm/arm64: KVM: Consolidate the PSCI include files
arm64: KVM: Increment PC after handling an SMC trap
arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
arm64: entry: Apply BP hardening for suspicious interrupts from EL0
arm64: entry: Apply BP hardening for high-priority synchronous exceptions
arm64: futex: Mask __user pointers prior to dereference
...
It is possible to take an IRQ from EL0 following a branch to a kernel
address in such a way that the IRQ is prioritised over the instruction
abort. Whilst an attacker would need to get the stars to align here,
it might be sufficient with enough calibration so perform BP hardening
in the rare case that we see a kernel address in the ELR when handling
an IRQ from EL0.
Reported-by: Dan Hettena <dhettena@nvidia.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Software-step and PC alignment fault exceptions have higher priority than
instruction abort exceptions, so apply the BP hardening hooks there too
if the user PC appears to reside in kernel space.
Reported-by: Dan Hettena <dhettena@nvidia.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In a similar manner to array_index_mask_nospec, this patch introduces an
assembly macro (mask_nospec64) which can be used to bound a value under
speculation. This macro is then used to ensure that the indirect branch
through the syscall table is bounded under speculation, with out-of-range
addresses speculating as calls to sys_io_setup (0).
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, USER_DS represents an exclusive limit while KERNEL_DS is
inclusive. In order to do some clever trickery for speculation-safe
masking, we need them both to behave equivalently - there aren't enough
bits to make KERNEL_DS exclusive, so we have precisely one option. This
also happens to correct a longstanding false negative for a range
ending on the very top byte of kernel memory.
Mark Rutland points out that we've actually got the semantics of
addresses vs. segments muddled up in most of the places we need to
amend, so shuffle the {USER,KERNEL}_DS definitions around such that we
can correct those properly instead of just pasting "-1"s everywhere.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We don't fully understand the Cavium ThunderX erratum, but it appears
that mapping the kernel as nG can lead to horrible consequences such as
attempting to execute userspace from kernel context. Since kpti isn't
enabled for these CPUs anyway, simplify the comment justifying the lack
of post_ttbr_update_workaround in the exception trampoline.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: David Sehr <sehr@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maged Michael <maged.michael@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-11-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With ARM64_SW_TTBR0_PAN enabled, the exception entry code checks the
active ASID to decide whether user access was enabled (non-zero ASID)
when the exception was taken. On return from exception, if user access
was previously disabled, it re-instates TTBR0_EL1 from the per-thread
saved value (updated in switch_mm() or efi_set_pgd()).
Commit 7655abb953 ("arm64: mm: Move ASID from TTBR0 to TTBR1") makes a
TTBR0_EL1 + ASID switching non-atomic. Subsequently, commit 27a921e757
("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN") changes the
__uaccess_ttbr0_disable() function and asm macro to first write the
reserved TTBR0_EL1 followed by the ASID=0 update in TTBR1_EL1. If an
exception occurs between these two, the exception return code will
re-instate a valid TTBR0_EL1. Similar scenario can happen in
cpu_switch_mm() between setting the reserved TTBR0_EL1 and the ASID
update in cpu_do_switch_mm().
This patch reverts the entry.S check for ASID == 0 to TTBR0_EL1 and
disables the interrupts around the TTBR0_EL1 and ASID switching code in
__uaccess_ttbr0_disable(). It also ensures that, when returning from the
EFI runtime services, efi_set_pgd() doesn't leave a non-zero ASID in
TTBR1_EL1 by using uaccess_ttbr0_{enable,disable}.
The accesses to current_thread_info()->ttbr0 are updated to use
READ_ONCE/WRITE_ONCE.
As a safety measure, __uaccess_ttbr0_enable() always masks out any
existing non-zero ASID TTBR1_EL1 before writing in the new ASID.
Fixes: 27a921e757 ("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN")
Acked-by: Will Deacon <will.deacon@arm.com>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Co-developed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The trampoline page tables are positioned after the early page tables in
the kernel linker script.
As we are about to change the early page table logic to resolve the
swapper size at link time as opposed to compile time, the
SWAPPER_DIR_SIZE variable (currently used to locate the trampline)
will be rendered unsuitable for low level assembler.
This patch solves this issue by moving the trampoline before the PAN
page tables. The offset to the trampoline from ttbr1 can then be
expressed by: PAGE_SIZE + RESERVED_TTBR0_SIZE, which is available to the
entry assembler.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When CONFIG_UNMAP_KERNEL_AT_EL0 is set the SDEI entry point and the rest
of the kernel may be unmapped when we take an event. If this may be the
case, use an entry trampoline that can switch to the kernel page tables.
We can't use the provided PSTATE to determine whether to switch page
tables as we may have interrupted the kernel's entry trampoline, (or a
normal-priority event that interrupted the kernel's entry trampoline).
Instead test for a user ASID in ttbr1_el1.
Save a value in regs->addr_limit to indicate whether we need to restore
the original ASID when returning from this event. This value is only used
by do_page_fault(), which we don't call with the SDEI regs.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Aliasing attacks against CPU branch predictors can allow an attacker to
redirect speculative control flow on some CPUs and potentially divulge
information from one context to another.
This patch adds initial skeleton code behind a new Kconfig option to
enable implementation-specific mitigations against these attacks for
CPUs that are affected.
Co-developed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We will soon need to invoke a CPU-specific function pointer after changing
page tables, so move post_ttbr_update_workaround out into C code to make
this possible.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Speculation attacks against the entry trampoline can potentially resteer
the speculative instruction stream through the indirect branch and into
arbitrary gadgets within the kernel.
This patch defends against these attacks by forcing a misprediction
through the return stack: a dummy BL instruction loads an entry into
the stack, so that the predicted program flow of the subsequent RET
instruction is to a branch-to-self instruction which is finally resolved
as a branch to the kernel vectors with speculation suppressed.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The literal pool entry for identifying the vectors base is the only piece
of information in the trampoline page that identifies the true location
of the kernel.
This patch moves it into a page-aligned region of the .rodata section
and maps this adjacent to the trampoline text via an additional fixmap
entry, which protects against any accidental leakage of the trampoline
contents.
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>