The trusted OS may reject CPU_OFF calls to its resident CPU, so we must
avoid issuing those. We never migrate a Trusted OS and we already take
care to prevent CPU_OFF PSCI call. However, this is not reflected
explicitly to the userspace. Any user can attempt to hotplug trusted OS
resident CPU. The entire motion of going through the various state
transitions in the CPU hotplug state machine gets executed and the
PSCI layer finally refuses to make CPU_OFF call.
This results is unnecessary unwinding of CPU hotplug state machine in
the kernel. Instead we can mark the trusted OS resident CPU as not
available for hotplug, so that the user attempt or request to do the
same will get immediately rejected.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
It seems that LLVM's linker does not correctly handle variable assignments
involving section positions that are updated during the SECTIONS
parsing. Commit aa69fb62be ("arm64/efi: Mark __efistub_stext_offset as
an absolute symbol explicitly") ran into this too, but found a different
workaround.
However, this was not enough, as other variables were also miscalculated
which manifested as boot failures under UEFI where __efistub__end was
not taking the correct _end value (they should be the same):
$ ld.lld -EL -maarch64elf --no-undefined -X -shared \
-Bsymbolic -z notext -z norelro --no-apply-dynamic-relocs \
-o vmlinux.lld -T poc.lds --whole-archive vmlinux.o && \
readelf -Ws vmlinux.lld | egrep '\b(__efistub_|)_end\b'
368272: ffff000002218000 0 NOTYPE LOCAL HIDDEN 38 __efistub__end
368322: ffff000012318000 0 NOTYPE GLOBAL DEFAULT 38 _end
$ aarch64-linux-gnu-ld.bfd -EL -maarch64elf --no-undefined -X -shared \
-Bsymbolic -z notext -z norelro --no-apply-dynamic-relocs \
-o vmlinux.bfd -T poc.lds --whole-archive vmlinux.o && \
readelf -Ws vmlinux.bfd | egrep '\b(__efistub_|)_end\b'
338124: ffff000012318000 0 NOTYPE LOCAL DEFAULT ABS __efistub__end
383812: ffff000012318000 0 NOTYPE GLOBAL DEFAULT 15325 _end
To work around this, all of the __efistub_-prefixed variable assignments
need to be moved after the linker script's SECTIONS entry. As it turns
out, this also solves the problem fixed in commit aa69fb62be, so those
changes are reverted here.
Link: https://github.com/ClangBuiltLinux/linux/issues/634
Link: https://bugs.llvm.org/show_bug.cgi?id=42990
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Will Deacon <will@kernel.org>
Prior to commit:
14c127c957 ("arm64: mm: Flip kernel VA space")
... VA_START described the start of the TTBR1 address space for a given
VA size described by VA_BITS, where all kernel mappings began.
Since that commit, VA_START described a portion midway through the
address space, where the linear map ends and other kernel mappings
begin.
To avoid confusion, let's rename VA_START to PAGE_END, making it clear
that it's not the start of the TTBR1 address space and implying that
it's related to PAGE_OFFSET. Comments and other mnemonics are updated
accordingly, along with a typo fix in the decription of VMEMMAP_SIZE.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Pull in generic CPU topology changes from Paul Walmsley (RISC-V).
* tag 'common/for-v5.4-rc1/cpu-topology' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
MAINTAINERS: Add an entry for generic architecture topology
base: arch_topology: update Kconfig help description
RISC-V: Parse cpu topology during boot.
arm: Use common cpu_topology structure and functions.
cpu-topology: Move cpu topology code to common code.
dt-binding: cpu-topology: Move cpu-map to a common binding.
Documentation: DT: arm: add support for sockets defining package boundaries
All instances of struct sys64_hook contain compile-time constant data,
and are never inentionally modified, so let's make them all const.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The aarch64_insn_encoding_class[] array contains compile-time constant
data, and is never intentionally modified, so let's mark it as const.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The icache_policy_str[] array contains compile-time constant data, and
is never intentionally modified, so let's mark it as const.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
GCC unescapes escaped string section names while Clang does not. Because
__section uses the `#` stringification operator for the section name, it
doesn't need to be escaped.
This antipattern was found with:
$ grep -e __section\(\" -e __section__\(\" -r
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
If a CPU doesn't support the page size for which the kernel is
configured, then we will complain and refuse to bring it online. For
secondary CPUs (and the boot CPU on a system booting with EFI), we will
also print an error identifying the mismatch.
Consequently, the only time that the cpufeature code can detect a
granule size mismatch is for a granule other than the one that is
currently being used. Although we would rather such systems didn't
exist, we've unfortunately lost that battle and Kevin reports that
on his amlogic S922X (odroid-n2 board) we end up warning and taining
with defconfig because 16k pages are not supported by all of the CPUs.
In such a situation, we don't actually care about the feature mismatch,
particularly now that KVM only exposes the sanitised view of the CPU
registers (commit 93390c0a1b - "arm64: KVM: Hide unsupported AArch64
CPU features from guests"). Treat the granule fields as non-strict and
let Kevin run without a tainted kernel.
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
[catalin.marinas@arm.com: changelog updated with KVM sanitised regs commit]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ACPI 6.3 adds a thread flag to represent if a CPU/PE is
actually a thread. Given that the MPIDR_MT bit may not
represent this information consistently on homogeneous machines
we should prefer the PPTT flag if its available.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Robert Richter <rrichter@marvell.com>
[will: made acpi_cpu_is_threaded() return 'bool']
Signed-off-by: Will Deacon <will@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJdTfRfAAoJEL/70l94x66DcN0IAIwyaU2+kwP0jd2miQuKxgwl
WU4u7dZCoQC6meWEVmrSJIVMBONRubmZ9iCqT7807YP8YZSQpOth51FMbULUWuy1
VW1eaRwqidX0EAihDhg2ZbBZ8H6RQ9Fn0aiEEh44dAZZAwGSVnO3PRKvQEJ15xjk
q+OQ4hrxtoorwLj+myejmq3YenTFTCMMJfYwwvlCl+J1FfrLZi5k3X5Gjk+j8Ixd
8CL8/6u5Lu6MCgfYVvxvo8/bUPiATBdF1sWJMMALwXTrDiSy4tQRD0NvZP1HM8G1
hy0XnhgtsS9rWNLtAFOj+r/XhP9V5lOOGX8yBcj0XQQr+DC9MG6MCL+pXXOaMcA=
=ZZh8
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Bugfixes (arm and x86) and cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
selftests: kvm: Adding config fragments
KVM: selftests: Update gitignore file for latest changes
kvm: remove unnecessary PageReserved check
KVM: arm/arm64: vgic: Reevaluate level sensitive interrupts on enable
KVM: arm: Don't write junk to CP15 registers on reset
KVM: arm64: Don't write junk to sysregs on reset
KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
x86: kvm: remove useless calls to kvm_para_available
KVM: no need to check return value of debugfs_create functions
KVM: remove kvm_arch_has_vcpu_debugfs()
KVM: Fix leak vCPU's VMCS value into other pCPU
KVM: Check preempted_in_kernel for involuntary preemption
KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire
arm64: KVM: hyp: debug-sr: Mark expected switch fall-through
KVM: arm64: Update kvm_arm_exception_class and esr_class_str for new EC
KVM: arm: vgic-v3: Mark expected switch fall-through
arm64: KVM: regmap: Fix unexpected switch fall-through
KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter index
Current PSCI code handles idle state entry through the
psci_cpu_suspend_enter() API, that takes an idle state index as a
parameter and convert the index into a previously initialized
power_state parameter before calling the PSCI.CPU_SUSPEND() with it.
This is unwieldly, since it forces the PSCI firmware layer to keep track
of power_state parameter for every idle state so that the
index->power_state conversion can be made in the PSCI firmware layer
instead of the CPUidle driver implementations.
Move the power_state handling out of drivers/firmware/psci
into the respective ACPI/DT PSCI CPUidle backends and convert
the psci_cpu_suspend_enter() API to get the power_state
parameter as input, which makes it closer to its firmware
interface PSCI.CPU_SUSPEND() API.
A notable side effect is that the PSCI ACPI/DT CPUidle backends
now can directly handle (and if needed update) power_state
parameters before handing them over to the PSCI firmware
interface to trigger PSCI.CPU_SUSPEND() calls.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will@kernel.org>
Allow selection of the PSCI CPUidle in the kernel by updating
the respective Kconfig entry.
Remove PSCI callbacks from ARM/ARM64 generic CPU ops
to prevent the PSCI idle driver from clashing with the generic
ARM CPUidle driver initialization, that relies on CPU ops
to initialize and enter idle states.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will@kernel.org>
Previous patches have enabled 52-bit kernel + user VAs and there is no
longer any scenario where user VA != kernel VA size.
This patch removes the, now redundant, vabits_user variable and replaces
usage with vabits_actual where appropriate.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Most of the machinery is now in place to enable 52-bit kernel VAs that
are detectable at boot time.
This patch adds a Kconfig option for 52-bit user and kernel addresses
and plumbs in the requisite CONFIG_ macros as well as sets TCR.T1SZ,
physvirt_offset and vmemmap at early boot.
To simplify things this patch also removes the 52-bit user/48-bit kernel
kconfig option.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
When running with a 52-bit userspace VA and a 48-bit kernel VA we offset
ttbr1_el1 to allow the kernel pagetables with a 52-bit PTRS_PER_PGD to
be used for both userspace and kernel.
Moving on to a 52-bit kernel VA we no longer require this offset to
ttbr1_el1 should we be running on a system with HW support for 52-bit
VAs.
This patch introduces conditional logic to offset_ttbr1 to query
SYS_ID_AA64MMFR2_EL1 whenever 52-bit VAs are selected. If there is HW
support for 52-bit VAs then the ttbr1 offset is skipped.
We choose to read a system register rather than vabits_actual because
offset_ttbr1 can be called in places where the kernel data is not
actually mapped.
Calls to offset_ttbr1 appear to be made from rarely called code paths so
this extra logic is not expected to adversely affect performance.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
In order to support 52-bit kernel addresses detectable at boot time, one
needs to know the actual VA_BITS detected. A new variable vabits_actual
is introduced in this commit and employed for the KVM hypervisor layout,
KASAN, fault handling and phys-to/from-virt translation where there
would normally be compile time constants.
In order to maintain performance in phys_to_virt, another variable
physvirt_offset is introduced.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
In order to support 52-bit kernel addresses detectable at boot time, the
kernel needs to know the most conservative VA_BITS possible should it
need to fall back to this quantity due to lack of hardware support.
A new compile time constant VA_BITS_MIN is introduced in this patch and
it is employed in the KASAN end address, KASLR, and EFI stub.
For Arm, if 52-bit VA support is unavailable the fallback is to 48-bits.
In other words: VA_BITS_MIN = min (48, VA_BITS)
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
In order to allow for a KASAN shadow that changes size at boot time, one
must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the
start address. Also, it is highly desirable to maintain the same
function addresses in the kernel .text between VA sizes. Both of these
requirements necessitate us to flip the kernel address space halves s.t.
the direct linear map occupies the lower addresses.
This patch puts the direct linear map in the lower addresses of the
kernel VA range and everything else in the higher ranges.
We need to adjust:
*) KASAN shadow region placement logic,
*) KASAN_SHADOW_OFFSET computation logic,
*) virt_to_phys, phys_to_virt checks,
*) page table dumper.
These are all small changes, that need to take place atomically, so they
are bundled into this commit.
As part of the re-arrangement, a guard region of 2MB (to preserve
alignment for fixed map) is added after the vmemmap. Otherwise the
vmemmap could intersect with IS_ERR pointers.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The ptrace trace SVE flags are prefixed with SVE_PT_*. Update the
comment accordingly.
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The commit d5370f7548 ("arm64: prefetch: add alternative pattern for
CPUs without a prefetcher") introduced MIDR_IS_CPU_MODEL_RANGE() to be
used in has_no_hw_prefetch() with rv_min=0 which generates a compilation
warning from GCC,
In file included from ./arch/arm64/include/asm/cache.h:8,
from ./include/linux/cache.h:6,
from ./include/linux/printk.h:9,
from ./include/linux/kernel.h:15,
from ./include/linux/cpumask.h:10,
from arch/arm64/kernel/cpufeature.c:11:
arch/arm64/kernel/cpufeature.c: In function 'has_no_hw_prefetch':
./arch/arm64/include/asm/cputype.h:59:26: warning: comparison of
unsigned expression >= 0 is always true [-Wtype-limits]
_model == (model) && rv >= (rv_min) && rv <= (rv_max); \
^~
arch/arm64/kernel/cpufeature.c:889:9: note: in expansion of macro
'MIDR_IS_CPU_MODEL_RANGE'
return MIDR_IS_CPU_MODEL_RANGE(midr, MIDR_THUNDERX,
^~~~~~~~~~~~~~~~~~~~~~~
Fix it by converting MIDR_IS_CPU_MODEL_RANGE to a static inline
function.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Will Deacon <will@kernel.org>
It is not desirable to relax the ABI to allow tagged user addresses into
the kernel indiscriminately. This patch introduces a prctl() interface
for enabling or disabling the tagged ABI with a global sysctl control
for preventing applications from enabling the relaxed ABI (meant for
testing user-space prctl() return error checking without reconfiguring
the kernel). The ABI properties are inherited by threads of the same
application and fork()'ed children but cleared on execve(). A Kconfig
option allows the overall disabling of the relaxed ABI.
The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
MTE-specific settings like imprecise vs precise exceptions.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
RELR is a relocation packing format for relative relocations.
The format is described in a generic-abi proposal:
https://groups.google.com/d/topic/generic-abi/bX460iggiKg/discussion
The LLD linker can be instructed to pack relocations in the RELR
format by passing the flag --pack-dyn-relocs=relr.
This patch adds a new config option, CONFIG_RELR. Enabling this option
instructs the linker to pack vmlinux's relative relocations in the RELR
format, and causes the kernel to apply the relocations at startup along
with the RELA relocations. RELA relocations still need to be applied
because the linker will emit RELA relative relocations if they are
unrepresentable in the RELR format (i.e. address not a multiple of 2).
Enabling CONFIG_RELR reduces the size of a defconfig kernel image
with CONFIG_RANDOMIZE_BASE by 3.5MB/16% uncompressed, or 550KB/5%
compressed (lz4).
Signed-off-by: Peter Collingbourne <pcc@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
The ESR.EC encoding of 0b011010 (0x1a) describes an exception generated
by an ERET, ERETAA or ERETAB instruction as a result of a nested
virtualisation trap to EL2.
Add an encoding for this EC and a string description so that we identify
it correctly if we take one unexpectedly.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
In commit b6b2735514
("tracing: Use str_has_prefix() instead of using fixed sizes")
the newly introduced str_has_prefix() was used
to replace error-prone strncmp(str, const, len).
Here fix codes with the same pattern.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
With commit b6664ba42f ("s390, kexec_file: drop arch_kexec_mem_walk()"),
we introduced the KEXEC_BUF_MEM_UNKNOWN macro. If kexec_buf.mem is set
to this value, kexec_locate_mem_hole() will try to allocate free memory.
While other arch(s) like s390 and x86_64 already use this macro to
initialize kexec_buf.mem with, arm64 uses an equivalent value of 0.
Replace it with KEXEC_BUF_MEM_UNKNOWN, to keep the convention of
initializing 'kxec_buf.mem' consistent across various archs.
Cc: takahiro.akashi@linaro.org
Cc: james.morse@arm.com
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
kprobes manipulates the interrupted PSTATE for single step, and
doesn't restore it. Thus, if we put a kprobe where the pstate.D
(debug) masked, the mask will be cleared after the kprobe hits.
Moreover, in the most complicated case, this can lead a kernel
crash with below message when a nested kprobe hits.
[ 152.118921] Unexpected kernel single-step exception at EL1
When the 1st kprobe hits, do_debug_exception() will be called.
At this point, debug exception (= pstate.D) must be masked (=1).
But if another kprobes hits before single-step of the first kprobe
(e.g. inside user pre_handler), it unmask the debug exception
(pstate.D = 0) and return.
Then, when the 1st kprobe setting up single-step, it saves current
DAIF, mask DAIF, enable single-step, and restore DAIF.
However, since "D" flag in DAIF is cleared by the 2nd kprobe, the
single-step exception happens soon after restoring DAIF.
This has been introduced by commit 7419333fa1 ("arm64: kprobe:
Always clear pstate.D in breakpoint exception handler")
To solve this issue, this stores all DAIF bits and restore it
after single stepping.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: 7419333fa1 ("arm64: kprobe: Always clear pstate.D in breakpoint exception handler")
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Remove rcu_read_lock()/rcu_read_unlock() from debug exception
handlers since we are sure those are not preemptible and
interrupts are off.
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Prohibit probing on return_address() and subroutines which
is called from return_address(), since the it is invoked from
trace_hardirqs_off() which is also kprobe blacklisted.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
If CTR_EL0.{CWG,ERG} are 0b0000 then they must be interpreted to have
their architecturally maximum values, which defeats the use of
FTR_HIGHER_SAFE when sanitising CPU ID registers on heterogeneous
machines.
Introduce FTR_HIGHER_OR_ZERO_SAFE so that these fields effectively
saturate at zero.
Fixes: 3c739b5710 ("arm64: Keep track of CPU feature registers")
Cc: <stable@vger.kernel.org> # 4.4.x-
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When fall-through warnings was enabled by default the following warnings
was starting to show up:
../arch/arm64/kernel/module.c: In function ‘apply_relocate_add’:
../arch/arm64/kernel/module.c:316:19: warning: this statement may fall
through [-Wimplicit-fallthrough=]
overflow_check = false;
~~~~~~~~~~~~~~~^~~~~~~
../arch/arm64/kernel/module.c:317:3: note: here
case R_AARCH64_MOVW_UABS_G0:
^~~~
../arch/arm64/kernel/module.c:322:19: warning: this statement may fall
through [-Wimplicit-fallthrough=]
overflow_check = false;
~~~~~~~~~~~~~~~^~~~~~~
../arch/arm64/kernel/module.c:323:3: note: here
case R_AARCH64_MOVW_UABS_G1:
^~~~
Rework so that the compiler doesn't warn about fall-through.
Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
When fall-through warnings was enabled by default the following warning
was starting to show up:
In file included from ../include/linux/kernel.h:15,
from ../include/linux/list.h:9,
from ../include/linux/kobject.h:19,
from ../include/linux/of.h:17,
from ../include/linux/irqdomain.h:35,
from ../include/linux/acpi.h:13,
from ../arch/arm64/kernel/smp.c:9:
../arch/arm64/kernel/smp.c: In function ‘__cpu_up’:
../include/linux/printk.h:302:2: warning: this statement may fall
through [-Wimplicit-fallthrough=]
printk(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../arch/arm64/kernel/smp.c:156:4: note: in expansion of macro ‘pr_crit’
pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
^~~~~~~
../arch/arm64/kernel/smp.c:157:3: note: here
case CPU_STUCK_IN_KERNEL:
^~~~
Rework so that the compiler doesn't warn about fall-through.
Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Now that -Wimplicit-fallthrough is passed to GCC by default, the kernel
build has suddenly got noisy. Annotate the two fall-through cases in our
hw_breakpoint implementation, since they are both intentional.
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Commit d968d2b801 ("ARM: 7497/1: hw_breakpoint: allow single-byte
watchpoints on all addresses") changed the validation requirements for
hardware watchpoints on arch/arm/. Update our compat layer to implement
the same relaxation.
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
We've added two ESR exception classes for new ARM hardware extensions:
ESR_ELx_EC_PAC and ESR_ELx_EC_SVE, but failed to update the strings
used in tracing and other debug.
Let's update "kvm_arm_exception_class" for these two EC, which the
new EC will be visible to user-space via kvm_exit trace events
Also update to "esr_class_str" for ESR_ELx_EC_PAC, by which we can
get more readable debug info.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
ARMV8_EVENT_ATTR_RESOLVE became unused after commit <4b1a9e6934ec>
("arm64/perf: Filter common events based on PMCEIDn_EL0").
Remove it.
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
Both RISC-V & ARM64 are using cpu-map device tree to describe
their cpu topology. It's better to move the relevant code to
a common place instead of duplicate code.
To: Will Deacon <will.deacon@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
[Tested on QDF2400]
Tested-by: Jeffrey Hugo <jhugo@codeaurora.org>
[Tested on Juno and other embedded platforms.]
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Comparing the arm-arm's pseudocode for AArch64.PCAlignmentFault() with
AArch64.SPAlignmentFault() shows that SP faults don't copy the faulty-SP
to FAR_EL1, but this is where we read from, and the address we provide
to user-space with the BUS_ADRALN signal.
For user-space this value will be UNKNOWN due to the previous ERET to
user-space. If the last value is preserved, on systems with KASLR or KPTI
this will be the user-space link-register left in FAR_EL1 by tramp_exit().
Fix this to retrieve the original sp_el0 value, and pass this to
do_sp_pc_fault().
SP alignment faults from EL1 will cause us to take the fault again when
trying to store the pt_regs. This eventually takes us to the overflow
stack. Remove the ESR_ELx_EC_SP_ALIGN check as we will never make it
this far.
Fixes: 60ffc30d56 ("arm64: Exception handling")
Signed-off-by: James Morse <james.morse@arm.com>
[will: change label name and fleshed out comment]
Signed-off-by: Will Deacon <will@kernel.org>
On a CPU that doesn't support SSBS, PSTATE[12] is RES0. In a system
where only some of the CPUs implement SSBS, we end-up losing track of
the SSBS bit across task migration.
To address this issue, let's force the SSBS bit on context switch.
Fixes: 8f04e8e6e2 ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[will: inverted logic and added comments]
Signed-off-by: Will Deacon <will@kernel.org>
There are some hand-written instances of "32" to express the number
of SVE Z-registers.
Since this code was written a #define was added for this, so
convert trivial instances of this magic number as appropriate.
No functional change.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Currently we convert from FPSIMD to SVE register state in memory in
two places.
To ease future maintenance, let's consolidate this in one place.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The arm64 stacktrace code is careful to only dereference frame records
in valid stack ranges, ensuring that a corrupted frame record won't
result in a faulting access.
However, it's still possible for corrupt frame records to result in
infinite loops in the stacktrace code, which is also undesirable.
This patch ensures that we complete a stacktrace in finite time, by
keeping track of which stacks we have already completed unwinding, and
verifying that if the next frame record is on the same stack, it is at a
higher address.
As this has turned out to be particularly subtle, comments are added to
explain the procedure.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tengfei Fan <tengfeif@codeaurora.org>
Signed-off-by: Will Deacon <will@kernel.org>
Some common code is required by each stacktrace user to initialise
struct stackframe before the first call to unwind_frame().
In preparation for adding to the common code, this patch factors it
out into a separate function start_backtrace(), and modifies the
stacktrace callers appropriately.
No functional change.
Signed-off-by: Dave Martin <dave.martin@arm.com>
[Mark: drop tsk argument, update more callsites]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The recent changes to the vdso library for arm64 and the introduction of
the compat vdso library have generated some misalignment in the
Makefiles.
Cleanup the Makefiles for vdso and vdso32 libraries:
* Removing unused rules.
* Unifying the displayed compilation messages.
* Simplifying the generic library inclusion path for
arm64 vdso.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Running "make" on an already compiled kernel tree will rebuild the kernel
even without any modifications:
$ make ARCH=arm64 CROSS_COMPILE=/usr/bin/aarch64-unknown-linux-gnu-
arch/arm64/Makefile:58: CROSS_COMPILE_COMPAT not defined or empty, the compat vDSO will not be built
CALL scripts/checksyscalls.sh
CALL scripts/atomic/check-atomics.sh
VDSOCHK arch/arm64/kernel/vdso/vdso.so.dbg
VDSOSYM include/generated/vdso-offsets.h
CHK include/generated/compile.h
CC arch/arm64/kernel/signal.o
CC arch/arm64/kernel/vdso.o
CC arch/arm64/kernel/signal32.o
LD arch/arm64/kernel/vdso/vdso.so.dbg
OBJCOPY arch/arm64/kernel/vdso/vdso.so
AS arch/arm64/kernel/vdso/vdso.o
AR arch/arm64/kernel/vdso/built-in.a
AR arch/arm64/kernel/built-in.a
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
AR init/built-in.a
LD vmlinux.o
This is the same bug fixed in commit 92a4728608 ("x86/boot: Fix
if_changed build flip/flop bug"). We cannot use two "if_changed" in one
target. Fix this build bug by merging two commands into one function.
Fixes: a7f71a2c89 ("arm64: compat: Add vDSO")
Fixes: 28b1a824a4 ("arm64: vdso: Substitute gettimeofday() with C implementation")
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
[will: merged in compat fix from Vincenzo and made rule names consistent]
Signed-off-by: Will Deacon <will@kernel.org>
- match the directory structure of the linux-libc-dev package to that of
Debian-based distributions
- fix incorrect include/config/auto.conf generation when Kconfig creates
it along with the .config file
- remove misleading $(AS) from documents
- clean up precious tag files by distclean instead of mrproper
- add a new coccinelle patch for devm_platform_ioremap_resource migration
- refactor module-related scripts to read modules.order instead of
$(MODVERDIR)/*.mod files to get the list of created modules
- remove MODVERDIR
- update list of header compile-test
- add -fcf-protection=none flag to avoid conflict with the retpoline
flags when CONFIG_RETPOLINE=y
- misc cleanups
-----BEGIN PGP SIGNATURE-----
iQJSBAABCgA8FiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl0ye0MeHHlhbWFkYS5t
YXNhaGlyb0Bzb2Npb25leHQuY29tAAoJED2LAQed4NsGfzgQAKtqa3I6avRrT9Nl
ggYU08z6bqxVBRucpiQq5QhQ0YLf7XQ9tSGO6z0wyzqPHqHRZALg5lHp+x6JUuTe
yhE5AYufHfA86XHD+udOkPuTHEkMCtHZn3qHns39qCsJ5sgnQ5OkjE4xHrMYmV+G
FHoWlqYGCSMsr2SGQ8twffyqlZ3LvOW1XzZAlG53ooBUJsLs1CO9eWYzoksrb6O8
yjPwieKnryVwdzVcyR9gFvoXfgC7JBRuug0vYstQaXceJV88v0BCsWLVWylGGqtO
EdGqi05xMqtkKSuPP4WQVlgv8prull57yOHLkdn/ImQic/JUo8BNAaXnr95vFy6y
/QVCMajCakJDV2WNoSRl/4QK+FYBv1nNSEVT/qGtiC4UXBQZf1BaujrY2CvkQA8x
nfj8Z0ckdv5hfNvTxqPHtwzGJUmO9O8r3Jv69oJ0XnsK2ki2mJB0yjl00o7ZQDg9
NLJ+ovgqRnYDqbJcRe/d0of51NuRwlHmV+h9GDX9FH/7ghHwyMVuxC/k6+a/BZ1h
H8NYOevlqb8eAkXVjz2AoyTCL2SkW4oHdQ+vboEgQcl2jQK0kb3XhtALci91wGzE
aoWEBPZ+5O4wK4RE/z7V6yXvuqq/CcU32YRKJKsccWvEx8AMKLXa0G6NgfTZeZTy
WatLqE6jtTw5yPNNVVPnMZXN4c7C
=D36u
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- match the directory structure of the linux-libc-dev package to that
of Debian-based distributions
- fix incorrect include/config/auto.conf generation when Kconfig
creates it along with the .config file
- remove misleading $(AS) from documents
- clean up precious tag files by distclean instead of mrproper
- add a new coccinelle patch for devm_platform_ioremap_resource
migration
- refactor module-related scripts to read modules.order instead of
$(MODVERDIR)/*.mod files to get the list of created modules
- remove MODVERDIR
- update list of header compile-test
- add -fcf-protection=none flag to avoid conflict with the retpoline
flags when CONFIG_RETPOLINE=y
- misc cleanups
* tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
kbuild: add -fcf-protection=none when using retpoline flags
kbuild: update compile-test header list for v5.3-rc1
kbuild: split out *.mod out of {single,multi}-used-m rules
kbuild: remove 'prepare1' target
kbuild: remove the first line of *.mod files
kbuild: create *.mod with full directory path and remove MODVERDIR
kbuild: export_report: read modules.order instead of .tmp_versions/*.mod
kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod
kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.mod
kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.mod
scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.ver
kbuild: remove duplication from modules.order in sub-directories
kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin}
kbuild: do not create empty modules.order in the prepare stage
coccinelle: api: add devm_platform_ioremap_resource script
kbuild: compile-test headers listed in header-test-m as well
kbuild: remove unused hostcc-option
kbuild: remove tag files by distclean instead of mrproper
kbuild: add --hash-style= and --build-id unconditionally
kbuild: get rid of misleading $(AS) from documents
...
As commit 1e0221374e ("mips: vdso: drop unnecessary cc-ldoption")
explained, these flags are supported by the minimal required version
of binutils. They are supported by ld.lld too.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl0tpocACgkQCF8+vY7k
4RWoxA//b/fmDXP3WPzrjjSmpyB9ml0/epKzPbT5S2j0lftqKBmet29k+PCjVrTx
Nq2QauehY9ug5h8UMVUCmzPr95F0tSIGRoqk1vrn7z0K3q6k1SHrtvqbY1Bgb2Uk
Qvh2YFU4fQLJg8WAbExCjxCdbdmBKQVGKTwCtM+tP5OMxwAFOmQrjGaUaKCKIIA2
7Wzrx8CpSji+bJ3uK/d36c+4M9oDly5eaxBhoboL3BI0y+GqwiSASGwTO7BxrPOg
0wq5IZHnqS8+bprT9xQdDOqf+UOY9U1cxE/+sqsHxblfUEx9gfLy/R+FLmJn+SS9
Z3yLy4SqVHQMpWBjEAGodohikF60PAuTdymSC11jqFaKCUxWrIZg5xO+0blMrxPF
7vYIexutCkaBMHBlNaNsHIqB7B/2FGGKoN7QW64hwvwJCGvF7OmJcV+R4bROGvh4
nFuis9/Nm66Fq7I3aw37ThyZ0aWZdaQ0QJTH9ksxU/ZCz2hhMNYu/rXggrDvkS4U
nr77ZT5Gd7nj4b110zf8+99uiGiinY6hTfzPAuTCLBhaxwrv4/xDHAhpwdEB5T4j
8gOkxV8c0XWtL7sKqhGJvs/RRe2za0Y9XH6fyxsYfWcfuLjEvug8ouXMad9gxFWH
DL3WnKJEMGLScei2wux4kGOwEbkR1bUf2cHJfh3GpCB/y8vgLOc=
=smxY
-----END PGP SIGNATURE-----
Merge tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull rst conversion of docs from Mauro Carvalho Chehab:
"As agreed with Jon, I'm sending this big series directly to you, c/c
him, as this series required a special care, in order to avoid
conflicts with other trees"
* tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits)
docs: kbuild: fix build with pdf and fix some minor issues
docs: block: fix pdf output
docs: arm: fix a breakage with pdf output
docs: don't use nested tables
docs: gpio: add sysfs interface to the admin-guide
docs: locking: add it to the main index
docs: add some directories to the main documentation index
docs: add SPDX tags to new index files
docs: add a memory-devices subdir to driver-api
docs: phy: place documentation under driver-api
docs: serial: move it to the driver-api
docs: driver-api: add remaining converted dirs to it
docs: driver-api: add xilinx driver API documentation
docs: driver-api: add a series of orphaned documents
docs: admin-guide: add a series of orphaned documents
docs: cgroup-v1: add it to the admin-guide book
docs: aoe: add it to the driver-api book
docs: add some documentation dirs to the driver-api book
docs: driver-model: move it to the driver-api book
docs: lp855x-driver.rst: add it to the driver-api book
...
Converts ARM the text files to ReST, preparing them to be an
architecture book.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by Corentin Labbe <clabbe.montjoie@gmail.com> # For sun4i-ss
* support for chained PMU counters in guests
* improved SError handling
* handle Neoverse N1 erratum #1349291
* allow side-channel mitigation status to be migrated
* standardise most AArch64 system register accesses to msr_s/mrs_s
* fix host MPIDR corruption on 32bit
* selftests ckleanups
x86:
* PMU event {white,black}listing
* ability for the guest to disable host-side interrupt polling
* fixes for enlightened VMCS (Hyper-V pv nested virtualization),
* new hypercall to yield to IPI target
* support for passing cstate MSRs through to the guest
* lots of cleanups and optimizations
Generic:
* Some txt->rST conversions for the documentation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJdJzdIAAoJEL/70l94x66DQDoH/i83/8kX4I8AWDlushPru4ts
Q4lCE5VAPha+o4pLb1dtfFL3gTmSbsB1N++JSlqK3JOo6LphIOy6b0wBjQBbAa6U
3CT1dJaHJoScLLj09vyBlvClGUH2ZKEQTWOiquCCf7JfPofxwPUA6vJ7TYsdkckx
zR3ygbADWmnfS7hFfiqN3JzuYh9eoooGNWSU+Giq6VF41SiL3IqhBGZhWS0zE9c2
2c5lpqqdeHmAYNBqsyzNiDRKp7+zLFSmZ7Z5/0L755L8KYwR6F5beTnmBMHvb4lA
PWH/SWOC8EYR+PEowfrH+TxKZwp0gMn1kcAKjilHk0uCRwG1IzuHAr2jlNxICCk=
=t/Oq
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- support for chained PMU counters in guests
- improved SError handling
- handle Neoverse N1 erratum #1349291
- allow side-channel mitigation status to be migrated
- standardise most AArch64 system register accesses to msr_s/mrs_s
- fix host MPIDR corruption on 32bit
- selftests ckleanups
x86:
- PMU event {white,black}listing
- ability for the guest to disable host-side interrupt polling
- fixes for enlightened VMCS (Hyper-V pv nested virtualization),
- new hypercall to yield to IPI target
- support for passing cstate MSRs through to the guest
- lots of cleanups and optimizations
Generic:
- Some txt->rST conversions for the documentation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (128 commits)
Documentation: virtual: Add toctree hooks
Documentation: kvm: Convert cpuid.txt to .rst
Documentation: virtual: Convert paravirt_ops.txt to .rst
KVM: x86: Unconditionally enable irqs in guest context
KVM: x86: PMU Event Filter
kvm: x86: Fix -Wmissing-prototypes warnings
KVM: Properly check if "page" is valid in kvm_vcpu_unmap
KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register
KVM: LAPIC: Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane
kvm: LAPIC: write down valid APIC registers
KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
KVM: doc: Add API documentation on the KVM_REG_ARM_WORKAROUNDS register
KVM: arm/arm64: Add save/restore support for firmware workaround state
arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests
KVM: arm/arm64: Support chained PMU counters
KVM: arm/arm64: Remove pmc->bitmask
KVM: arm/arm64: Re-create event when setting counter value
KVM: arm/arm64: Extract duplicated code to own function
KVM: arm/arm64: Rename kvm_pmu_{enable/disable}_counter functions
KVM: LAPIC: ARBPRI is a reserved register for x2APIC
...
While jump_label_init() was moved earlier in the boot process in
efd9e03fac ("arm64: Use static keys for CPU features"), it wasn't early
enough for early params to use it. The old state of things was as
described here...
init/main.c calls out to arch-specific things before general jump label
and early param handling:
asmlinkage __visible void __init start_kernel(void)
{
...
setup_arch(&command_line);
...
smp_prepare_boot_cpu();
...
/* parameters may set static keys */
jump_label_init();
parse_early_param();
...
}
x86 setup_arch() wants those earlier, so it handles jump label and
early param:
void __init setup_arch(char **cmdline_p)
{
...
jump_label_init();
...
parse_early_param();
...
}
arm64 setup_arch() only had early param:
void __init setup_arch(char **cmdline_p)
{
...
parse_early_param();
...
}
with jump label later in smp_prepare_boot_cpu():
void __init smp_prepare_boot_cpu(void)
{
...
jump_label_init();
...
}
This moves arm64 jump_label_init() from smp_prepare_boot_cpu() to
setup_arch(), as done already on x86, in preparation from early param
usage in the init_on_alloc/free() series:
https://lkml.kernel.org/r/1561572949.5154.81.camel@lca.pw
Link: http://lkml.kernel.org/r/201906271003.005303B52@keescook
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drop the pgtable_t variable from all implementation for pte_fn_t as none
of them use it. apply_to_pte_range() should stop computing it as well.
Should help us save some cycles.
Link: http://lkml.kernel.org/r/1556803126-26596-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <jglisse@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with other
trees, unfortunately. He has a lot more of these waiting on the wings
that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos, and one
on Spectre vulnerabilities.
- Various improvements to the build system, including automatic markup of
function() references because some people, for reasons I will never
understand, were of the opinion that :c:func:``function()`` is
unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl0krAEPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Yg98H/AuLqO9LpOgUjF4LhyjxGPdzJkY9RExSJ7km
gznyreLCZgFaJR+AY6YDsd4Jw6OJlPbu1YM/Qo3C3WrZVFVhgL/s2ebvBgCo50A8
raAFd8jTf4/mGCHnAqRotAPQ3mETJUk315B66lBJ6Oc+YdpRhwXWq8ZW2bJxInFF
3HDvoFgMf0KhLuMHUkkL0u3fxH1iA+KvDu8diPbJYFjOdOWENz/CV8wqdVkXRSEW
DJxIq89h/7d+hIG3d1I7Nw+gibGsAdjSjKv4eRKauZs4Aoxd1Gpl62z0JNk6aT3m
dtq4joLdwScydonXROD/Twn2jsu4xYTrPwVzChomElMowW/ZBBY=
=D0eO
-----END PGP SIGNATURE-----
Merge tag 'docs-5.3' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.
- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc"
* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...
Pull force_sig() argument change from Eric Biederman:
"A source of error over the years has been that force_sig has taken a
task parameter when it is only safe to use force_sig with the current
task.
The force_sig function is built for delivering synchronous signals
such as SIGSEGV where the userspace application caused a synchronous
fault (such as a page fault) and the kernel responded with a signal.
Because the name force_sig does not make this clear, and because the
force_sig takes a task parameter the function force_sig has been
abused for sending other kinds of signals over the years. Slowly those
have been fixed when the oopses have been tracked down.
This set of changes fixes the remaining abusers of force_sig and
carefully rips out the task parameter from force_sig and friends
making this kind of error almost impossible in the future"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
signal: Remove the signal number and task parameters from force_sig_info
signal: Factor force_sig_info_to_task out of force_sig_info
signal: Generate the siginfo in force_sig
signal: Move the computation of force into send_signal and correct it.
signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
signal: Remove the task parameter from force_sig_fault
signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
signal: Explicitly call force_sig_fault on current
signal/unicore32: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from ptrace_break
signal/nds32: Remove tsk parameter from send_sigtrap
signal/riscv: Remove tsk parameter from do_trap
signal/sh: Remove tsk parameter from force_sig_info_fault
signal/um: Remove task parameter from send_sigtrap
signal/x86: Remove task parameter from send_sigtrap
signal: Remove task parameter from force_sig_mceerr
signal: Remove task parameter from force_sig
signal: Remove task parameter from force_sigsegv
...
Pull timer updates from Thomas Gleixner:
"The timer and timekeeping departement delivers:
Core:
- The consolidation of the VDSO code into a generic library including
the conversion of x86 and ARM64. Conversion of ARM and MIPS are en
route through the relevant maintainer trees and should end up in
5.4.
This gets rid of the unnecessary different copies of the same code
and brings all architectures on the same level of VDSO
functionality.
- Make the NTP user space interface more robust by restricting the
TAI offset to prevent undefined behaviour. Includes a selftest.
- Validate user input in the compat settimeofday() syscall to catch
invalid values which would be turned into valid values by a
multiplication overflow
- Consolidate the time accessors
- Small fixes, improvements and cleanups all over the place
Drivers:
- Support for the NXP system counter, TI davinci timer
- Move the Microsoft HyperV clocksource/events code into the
drivers/clocksource directory so it can be shared between x86 and
ARM64.
- Overhaul of the Tegra driver
- Delay timer support for IXP4xx
- Small fixes, improvements and cleanups as usual"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
time: Validate user input in compat_settimeofday()
timer: Document TIMER_PINNED
clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic
clocksource/drivers: Make Hyper-V clocksource ISA agnostic
MAINTAINERS: Fix Andy's surname and the directory entries of VDSO
hrtimer: Use a bullet for the returns bullet list
arm64: vdso: Fix compilation with clang older than 8
arm64: compat: Fix __arch_get_hw_counter() implementation
arm64: Fix __arch_get_hw_counter() implementation
lib/vdso: Make delta calculation work correctly
MAINTAINERS: Add entry for the generic VDSO library
arm64: compat: No need for pre-ARMv7 barriers on an ARMv8 system
arm64: vdso: Remove unnecessary asm-offsets.c definitions
vdso: Remove superfluous #ifdef __KERNEL__ in vdso/datapage.h
clocksource/drivers/davinci: Add support for clocksource
clocksource/drivers/davinci: Add support for clockevents
clocksource/drivers/tegra: Set up maximum-ticks limit properly
clocksource/drivers/tegra: Cycles can't be 0
clocksource/drivers/tegra: Restore base address before cleanup
clocksource/drivers/tegra: Add verbose definition for 1MHz constant
...
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG
and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags'
introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl0eHqcACgkQa9axLQDI
XvFyNA/+L+bnkz8m3ncydlqqfXomQn4eJJVQ8Uksb0knJz+1+3CUxxbO4ry4jXZN
fMkbggYrDPRKpDbsUl0lsRipj7jW9bqan+N37c3SWqCkgb6HqDaHViwxdx6Ec/Uk
gHudozDSPh/8c7hxGcSyt/CFyuW6b+8eYIQU5rtIgz8aVY2BypBvS/7YtYCbIkx0
w4CFleRTK1zXD5mJQhrc6jyDx659sVkrAvdhf6YIymOY8nBTv40vwdNo3beJMYp8
Po/+0Ixu+VkHUNtmYYZQgP/AGH96xiTcRnUqd172JdtRPpCLqnLqwFokXeVIlUKT
KZFMDPzK+756Ayn4z4huEePPAOGlHbJje8JVNnFyreKhVVcCotW7YPY/oJR10bnc
eo7yD+DxABTn+93G2yP436bNVa8qO1UqjOBfInWBtnNFJfANIkZweij/MQ6MjaTA
o7KtviHnZFClefMPoiI7HDzwL8XSmsBDbeQ04s2Wxku1Y2xUHLx4iLmadwLQ1ZPb
lZMTZP3N/T1554MoURVA1afCjAwiqU3bt1xDUGjbBVjLfSPBAn/25IacsG9Li9AF
7Rp1M9VhrfLftjFFkB2HwpbhRASOxaOSx+EI3kzEfCtM2O9I1WHgP3rvCdc3l0HU
tbK0/IggQicNgz7GSZ8xDlWPwwSadXYGLys+xlMZEYd3pDIOiFc=
=0TDT
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new
XAFLAG and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the
'arm_boot_flags' introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
perf: arm_spe: Enable ACPI/Platform automatic module loading
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
ACPI/PPTT: Modify node flag detection to find last IDENTICAL
x86/entry: Simplify _TIF_SYSCALL_EMU handling
arm64: rename dump_instr as dump_kernel_instr
arm64/mm: Drop [PTE|PMD]_TYPE_FAULT
arm64: Implement panic_smp_self_stop()
arm64: Improve parking of stopped CPUs
arm64: Expose FRINT capabilities to userspace
arm64: Expose ARMv8.5 CondM capability to userspace
arm64: defconfig: enable CONFIG_RANDOMIZE_BASE
arm64: ARM64_MODULES_PLTS must depend on MODULES
arm64: bpf: do not allocate executable memory
arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
arm64: module: create module allocations without exec permissions
arm64: Allow user selection of ARM64_MODULE_PLTS
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
arm64: Allow selecting Pseudo-NMI again
...
Recent commits added the explicit notion of "workaround not required" to
the state of the Spectre v2 (aka. BP_HARDENING) workaround, where we
just had "needed" and "unknown" before.
Export this knowledge to the rest of the kernel and enhance the existing
kvm_arm_harden_branch_predictor() to report this new state as well.
Export this new state to guests when they use KVM's firmware interface
emulation.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Neoverse-N1 affected by #1349291 may report an Uncontained RAS Error
as Unrecoverable. The kernel's architecture code already considers
Unrecoverable errors as fatal as without kernel-first support no
further error-handling is possible.
Now that KVM attributes SError to the host/guest more precisely
the host's architecture code will always handle host errors that
become pending during world-switch.
Errors misclassified by this errata that affected the guest will be
re-injected to the guest as an implementation-defined SError, which can
be uncontained.
Until kernel-first support is implemented, no workaround is needed
for this issue.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
- Fix module allocation when running with KASLR enabled
- Fix broken build due to bug in LLVM linker (ld.lld)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl0Z9bIACgkQt6xw3ITB
YzS70gf/Trw6+Yy1dHSyz5f2W9OtedFFv+rEGcvUkF6kYFffw7taNj30K6otjkK7
CYPp9kWYpFhGgE7VwAfQ9NGyAwZ62IvGhQDYdAG72Y39zX7yQ4OHWKdr8K53KYN8
CThcgXxEPoZw1pP7fwXkaBiiljW6JGF64Hv3ybA1vzGmjiv6wdjO3pQlbXkJu4kk
xlsLSLOZUDawcRuVNGWwPiToxopVTcAJ3lapYBVmO2dSO00QYv1jvJgV0tK6n68q
ZQMJbTdNHLIKMRdLcDBGQAwetWkkZ5LazwuiaHQcSQcRgp7IkKrIvEz8vzkdAvcR
jniDc7bbKYlvlJdiquIOH2l1ElEQyQ==
=Pp2j
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Fix a build failure with the LLVM linker and a module allocation
failure when KASLR is active:
- Fix module allocation when running with KASLR enabled
- Fix broken build due to bug in LLVM linker (ld.lld)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/efi: Mark __efistub_stext_offset as an absolute symbol explicitly
arm64: kaslr: keep modules inside module region when KASAN is enabled
In traps.c, only __die calls dump_instr.
However, this function has sub-function as __dump_instr.
dump_kernel_instr can replace those functions.
By using aarch64_insn_read, it does not have to change fs to KERNEL_DS.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: jinho lim <jordan.lim@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
After r363059 and r363928 in LLVM, a build using ld.lld as the linker
with CONFIG_RANDOMIZE_BASE enabled fails like so:
ld.lld: error: relocation R_AARCH64_ABS32 cannot be used against symbol
__efistub_stext_offset; recompile with -fPIC
Fangrui and Peter figured out that ld.lld is incorrectly considering
__efistub_stext_offset as a relative symbol because of the order in
which symbols are evaluated. _text is treated as an absolute symbol
and stext is a relative symbol, making __efistub_stext_offset a
relative symbol.
Adding ABSOLUTE will force ld.lld to evalute this expression in the
right context and does not change ld.bfd's behavior. ld.lld will
need to be fixed but the developers do not see a quick or simple fix
without some research (see the linked issue for further explanation).
Add this simple workaround so that ld.lld can continue to link kernels.
Link: https://github.com/ClangBuiltLinux/linux/issues/561
Link: 025a815d75
Link: 249fde8583
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Debugged-by: Fangrui Song <maskray@google.com>
Debugged-by: Peter Smith <peter.smith@linaro.org>
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
[will: add comment]
Signed-off-by: Will Deacon <will@kernel.org>
When KASLR and KASAN are both enabled, we keep the modules where they
are, and randomize the placement of the kernel so it is within 2 GB
of the module region. The reason for this is that putting modules in
the vmalloc region (like we normally do when KASLR is enabled) is not
possible in this case, given that the entire vmalloc region is already
backed by KASAN zero shadow pages, and so allocating dedicated KASAN
shadow space as required by loaded modules is not possible.
The default module allocation window is set to [_etext - 128MB, _etext]
in kaslr.c, which is appropriate for KASLR kernels booted without a
seed or with 'nokaslr' on the command line. However, as it turns out,
it is not quite correct for the KASAN case, since it still intersects
the vmalloc region at the top, where attempts to allocate shadow pages
will collide with the KASAN zero shadow pages, causing a WARN() and all
kinds of other trouble. So cap the top end to MODULES_END explicitly
when running with KASAN.
Cc: <stable@vger.kernel.org> # 4.9+
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Since the VDSO code has moved to C from assembly, there is no need to
define and maintain the corresponding asm offsets.
Fixes: 28b1a824a4 ("arm64: vdso: Substitute gettimeofday() with C implementation")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Cc: Shijith Thotton <sthotton@marvell.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Link: https://lkml.kernel.org/r/20190624135812.GC29120@arrakis.emea.arm.com
Currently arm64 uses the default implementation of panic_smp_self_stop()
where the CPU runs in a cpu_relax() loop unable to receive IPIs anymore.
As a result, when two CPUs panic() simultaneously we get "SMP: failed to
stop secondary CPUs" warnings and extra delays before a reset, because
smp_send_stop() still tries to stop the other paniced CPU.
Provide an implementation of panic_smp_self_stop() that is identical to
the IPI CPU stop handler, so that the online status of stopped CPUs gets
properly updated.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The current code puts the stopped cpus in an 'yield' instruction loop.
Using a busy loop here is unnecessary, we can use the cpu_park_loop()
function here to do a wfi/wfe.
Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ARMv8.5 introduces the FRINT series of instructions for rounding floating
point numbers to integers. Provide a capability to userspace in order to
allow applications to determine if the system supports these instructions.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ARMv8.5 adds new instructions XAFLAG and AXFLAG to translate the
representation of the results of floating point comparisons between the
native ARM format and an alternative format used by some software. Add
a hwcap allowing userspace to determine if they are present, since we
referred to earlier CondM extensions as FLAGM call these extensions
FLAGM2.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to avoid transient inconsistencies where freed code pages
are remapped writable while stale TLB entries still exist on other
cores, mark the kprobes text pages with the VM_FLUSH_RESET_PERMS
attribute. This instructs the core vmalloc code not to defer the
TLB flush when this region is unmapped and returned to the page
allocator.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the core code manages the executable permissions of code
regions of modules explicitly, it is no longer necessary to create
the module vmalloc regions with RWX permissions, and we can create
them with RW- permissions instead, which is preferred from a
security perspective.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Some Qualcomm Snapdragon based laptops built to run Microsoft Windows
are clearly ACPI 5.1 based, given that that is the first ACPI revision
that supports ARM, and introduced the FADT 'arm_boot_flags' field,
which has a non-zero field on those systems.
So in these cases, infer from the ARM boot flags that the FADT must be
5.1 or later, and treat it as 5.1.
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Graeme Gregory <graeme.gregory@linaro.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When the compat vDSO is enabled, the sigreturn trampolines are not
anymore available through [sigpage] but through [vdso].
Add the relevant code the enable the feature.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shijith Thotton <sthotton@marvell.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-15-vincenzo.frascino@arm.com
Most of the code for initializing the vDSOs in arm64 and compat will be
shared, hence refactoring of the current code is required to avoid
duplication and to simplify maintainability.
No functional change.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shijith Thotton <sthotton@marvell.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-12-vincenzo.frascino@arm.com
Provide the arm64 compat (AArch32) vDSO in kernel/vdso32 in a similar
way to what happens in kernel/vdso.
The compat vDSO leverages on an adaptation of the arm architecture code
with few changes:
- Use of lib/vdso for gettimeofday
- Implement a syscall based fallback
- Introduce clock_getres() for the compat library
- Implement trampolines
- Implement elf note
To build the compat vDSO a 32 bit compiler is required and needs to be
specified via CONFIG_CROSS_COMPILE_COMPAT_VDSO.
The code is not yet enabled as other prerequisites are missing.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shijith Thotton <sthotton@marvell.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-11-vincenzo.frascino@arm.com
Update asm-offsets for arm64 to generate the correct offsets for
compat signals.
They will be useful for the implementation of the compat sigreturn
trampolines in vDSO context.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shijith Thotton <sthotton@marvell.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-9-vincenzo.frascino@arm.com
The compat signal data structures are required as part of the compat
vDSO implementation in order to provide the unwinding information for
the sigreturn trampolines.
Expose these data structures as part of signal32.h.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shijith Thotton <sthotton@marvell.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-8-vincenzo.frascino@arm.com
The vDSO needs to be built with x18 reserved in order to accommodate
userspace platform ABIs built on top of Linux that use the register
to carry inter-procedural state, as provided for by the AAPCS.
An example of such a platform ABI is the one that will be used by an
upcoming version of Android.
Although this change is currently a no-op due to the fact that the vDSO
is currently implemented in pure assembly on arm64, it is necessary
in order to prepare for using the generic C implementation of the vDSO.
[ tglx: Massaged changelog ]
Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shijith Thotton <sthotton@marvell.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Cc: Mark Salyzyn <salyzyn@google.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-6-vincenzo.frascino@arm.com
To take advantage of the commonly defined vdso interface for gettimeofday()
the architectural code requires an adaptation.
Re-implement the gettimeofday VDSO in C in order to use lib/vdso.
With the new implementation arm64 gains support for CLOCK_BOOTTIME
and CLOCK_TAI.
[ tglx: Reformatted the function line breaks ]
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shijith Thotton <sthotton@marvell.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-5-vincenzo.frascino@arm.com
If we must preserve the firmware resource assignments, claim the existing
resources rather than reassigning everything.
Link: https://lore.kernel.org/r/20190615002359.29577-4-benh@kernel.crashing.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Call pci_assign_unassigned_root_bus_resources() instead of the simpler:
pci_bus_size_bridges(bus);
pci_bus_assign_resources(bus);
pci_assign_unassigned_root_bus_resources() calls:
__pci_bus_size_bridges(bus, add_list);
__pci_bus_assign_resources(bus, add_list, &fail_head);
so this should be equivalent as long as we're able to assign everything.
If we were unable to assign something, previously we did nothing and left
it unassigned, but after this patch, we will attempt to do some
reallocation.
Once we start honoring FW resource allocations, this will bring up the
"reallocation" feature which can help making room for SR-IOV when
necessary.
Link: https://lore.kernel.org/r/20190615002359.29577-1-benh@kernel.crashing.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Another round of SPDX updates for 5.2-rc6
Here is what I am guessing is going to be the last "big" SPDX update for
5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that
were "easy" to determine by pattern matching. The ones after this are
going to be a bit more difficult and the people on the spdx list will be
discussing them on a case-by-case basis now.
Another 5000+ files are fixed up, so our overall totals are:
Files checked: 64545
Files with SPDX: 45529
Compared to the 5.1 kernel which was:
Files checked: 63848
Files with SPDX: 22576
This is a huge improvement.
Also, we deleted another 20000 lines of boilerplate license crud, always
nice to see in a diffstat.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXQyQYA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymnGQCghETUBotn1p3hTjY56VEs6dGzpHMAnRT0m+lv
kbsjBGEJpLbMRB2krnaU
=RMcT
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull still more SPDX updates from Greg KH:
"Another round of SPDX updates for 5.2-rc6
Here is what I am guessing is going to be the last "big" SPDX update
for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates
that were "easy" to determine by pattern matching. The ones after this
are going to be a bit more difficult and the people on the spdx list
will be discussing them on a case-by-case basis now.
Another 5000+ files are fixed up, so our overall totals are:
Files checked: 64545
Files with SPDX: 45529
Compared to the 5.1 kernel which was:
Files checked: 63848
Files with SPDX: 22576
This is a huge improvement.
Also, we deleted another 20000 lines of boilerplate license crud,
always nice to see in a diffstat"
* tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485
...
When using IRQ priority masking to disable interrupts, in order to deal
with the PSR.I state, local_irq_save() would convert the I bit into a
PMR value (GIC_PRIO_IRQOFF). This resulted in local_irq_restore()
potentially modifying the value of PMR in undesired location due to the
state of PSR.I upon flag saving [1].
In an attempt to solve this issue in a less hackish manner, introduce
a bit (GIC_PRIO_IGNORE_PMR) for the PMR values that can represent
whether PSR.I is being used to disable interrupts, in which case it
takes precedence of the status of interrupt masking via PMR.
GIC_PRIO_PSR_I_SET is chosen such that (<pmr_value> |
GIC_PRIO_PSR_I_SET) does not mask more interrupts than <pmr_value> as
some sections (e.g. arch_cpu_idle(), interrupt acknowledge path)
requires PMR not to mask interrupts that could be signaled to the
CPU when using only PSR.I.
[1] https://www.spinics.net/lists/arm-kernel/msg716956.html
Fixes: 4a503217ce ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")
Cc: <stable@vger.kernel.org> # 5.1.x-
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wei Li <liwei391@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In the presence of any form of instrumentation, nmi_enter() should be
done before calling any traceable code and any instrumentation code.
Currently, nmi_enter() is done in handle_domain_nmi(), which is much
too late as instrumentation code might get called before. Move the
nmi_enter/exit() calls to the arch IRQ vector handler.
On arm64, it is not possible to know if the IRQ vector handler was
called because of an NMI before acknowledging the interrupt. However, It
is possible to know whether normal interrupts could be taken in the
interrupted context (i.e. if taking an NMI in that context could
introduce a potential race condition).
When interrupting a context with IRQs disabled, call nmi_enter() as soon
as possible. In contexts with IRQs enabled, defer this to the interrupt
controller, which is in a better position to know if an interrupt taken
is an NMI.
Fixes: bc3c03ccb4 ("arm64: Enable the support of pseudo-NMIs")
Cc: <stable@vger.kernel.org> # 5.1.x-
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For el0_dbg and el0_error, DAIF bits get explicitly cleared before
calling ct_user_exit.
When context tracking is disabled, DAIF gets set (almost) immediately
after. When context tracking is enabled, among the first things done
is disabling IRQs.
What is actually needed is:
- PSR.D = 0 so the system can be debugged (should be already the case)
- PSR.A = 0 so async error can be handled during context tracking
Do not clear PSR.I in those two locations.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Fix use of #include in UAPI headers for compatability with musl libc
- Update email addresses in MAINTAINERS
- Fix initialisation of pgd_cache due to name collision with weak symbol
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl0LgWIACgkQt6xw3ITB
YzTyYgf7BByaUUDxHTBkUA2fBrZ66L9sHsBzunF6SqIzZqQfC5JdIqq2Iz+eiw8a
0DUARr1jxeC7xsAjkmhIUzpnQjsZab4Gn/T0syTKD0dR4zxoK/g6hrScmSnoTw6t
0AW9UnwMB98aol+yKBwiPYtG9HUzXnMet77LgcQdCby5xiRyJ4xv3vNr0lSmXjSO
+ANC5IFHZz+oyy2n9UZRYbkLwth8uoc1pZJTKLbykDp4ApGXFtayctR0l4Q5L29v
pqxivQgNsQ8QaxCeJ1+UICOG8hnVr6adH5xoWzcev+3sXlX9IoNu78hfrKO7u0J4
+rWacwopqq0fGgo7anzUEx9nznXaDg==
=yyJV
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"This is mainly a couple of email address updates to MAINTAINERS, but
we've also fixed a UAPI build issue with musl libc and an accidental
double-initialisation of our pgd_cache due to a naming conflict with a
weak symbol.
There are a couple of outstanding issues that have been reported, but
it doesn't look like they're new and we're still a long way off from
fully debugging them.
Summary:
- Fix use of #include in UAPI headers for compatability with musl libc
- Update email addresses in MAINTAINERS
- Fix initialisation of pgd_cache due to name collision with weak symbol"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/mm: don't initialize pgd_cache twice
MAINTAINERS: Update my email address
arm64/sve: <uapi/asm/ptrace.h> should not depend on <uapi/linux/prctl.h>
arm64: ssbd: explicitly depend on <linux/prctl.h>
MAINTAINERS: Update my email address to use @kernel.org
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software void you can redistribute it and or
modify it under the terms of the gnu general public license version
2 as published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http void www gnu
org licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 1 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081201.003433009@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 503 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix ssbd.c which depends implicitly on asm/ptrace.h including
linux/prctl.h (through for example linux/compat.h, then linux/time.h,
linux/seqlock.h, linux/spinlock.h and linux/irqflags.h), and uses
PR_SPEC* defines.
This is an issue since we'll soon be removing the include from
asm/ptrace.h.
Fixes: 9cdc0108ba ("arm64: ssbd: Add prctl interface for per-thread mitigation")
Cc: stable@vger.kernel.org
Signed-off-by: Anisse Astier <aastier@freebox.fr>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If the cache line size is greater than ARCH_DMA_MINALIGN (128),
the warning shows and it's tainted as TAINT_CPU_OUT_OF_SPEC.
However, it's not good because as discussed in the thread [1], the cpu
cache line size will be problem only on non-coherent devices.
Since the coherent flag is already introduced to struct device,
show the warning only if the device is non-coherent device and
ARCH_DMA_MINALIGN is smaller than the cpu cache size.
[1] https://lore.kernel.org/linux-arm-kernel/20180514145703.celnlobzn3uh5tc2@localhost/
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Tested-by: Zhang Lei <zhang.lei@jp.fujitsu.com>
[catalin.marinas@arm.com: removed 'if' block for WARN_TAINT]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The documentation is in a format that is very close to ReST format.
The conversion is actually:
- add blank lines in order to identify paragraphs;
- fixing tables markups;
- adding some lists markups;
- marking literal blocks;
- adjust some title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlz8fAYeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1asH/3ySguxqtqL1MCBa
4/SZ37PHeWKMerfX6ZyJdgEqK3B+PWlmuLiOMNK5h2bPLzeQQQAmHU/mfKmpXqgB
dHwUbG9yNnyUtTfsfRqAnCA6vpuw9Yb1oIzTCVQrgJLSWD0j7scBBvmzYqguOkto
ThwigLUq3AILr8EfR4rh+GM+5Dn9OTEFAxwil9fPHQo7QoczwZxpURhScT6Co9TB
DqLA3fvXbBvLs/CZy/S5vKM9hKzC+p39ApFTURvFPrelUVnythAM0dPDJg3pIn5u
g+/+gDxDFa+7ANxvxO2ng1sJPDqJMeY/xmjJYlYyLpA33B7zLNk2vDHhAP06VTtr
XCMhQ9s=
=cb80
-----END PGP SIGNATURE-----
Merge tag 'v5.2-rc4' into mauro
We need to pick up post-rc1 changes to various document files so they don't
get lost in Mauro's massive RST conversion push.
- Fix broken SVE ptrace API when running in a big-endian configuration
- Fix performance regression due to off-by-one in TLBI range checking
- Fix build regression when using Clang
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl0DsZ0ACgkQt6xw3ITB
YzRE3wf9GJibmSnm9q5gmtHcKMrH+atXrI93nyzhBZxYaAYAKSiz7RCMSpc7iudI
bGMruaaqn/2xrdOie3vOOfSqFfzrfcFOuh/0id9R2IyiFSg08BrI369buejNRtm+
BUhdUQCe5p5afJ7PYFa7CYD+tSC1WiHXfOhH6sRYllerwaMiR9y/eqf3Gh5zB26Q
ca/+2Jh59DxXIpSWP9nTzPyV9xKOJ1B8JdMR5BMIUnOgUXQhMwNeuivRrZnEG9yT
PZDGbk5WxKci+LHPOt7stFFuo7hZn3SCKJ0mZ20VUs0w7ETMJuI0Ss4TlE2mgYag
TASmsypuLdRz5mxIeyY5QYXppSyYiA==
=AhWz
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here are some arm64 fixes for -rc5.
The only non-trivial change (in terms of the diffstat) is fixing our
SVE ptrace API for big-endian machines, but the majority of this is
actually the addition of much-needed comments and updates to the
documentation to try to avoid this mess biting us again in future.
There are still a couple of small things on the horizon, but nothing
major at this point.
Summary:
- Fix broken SVE ptrace API when running in a big-endian configuration
- Fix performance regression due to off-by-one in TLBI range checking
- Fix build regression when using Clang"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/sve: Fix missing SVE/FPSIMD endianness conversions
arm64: tlbflush: Ensure start/end of address range are aligned to stride
arm64: Don't unconditionally add -Wno-psabi to KBUILD_CFLAGS
The in-memory representation of SVE and FPSIMD registers is
different: the FPSIMD V-registers are stored as single 128-bit
host-endian values, whereas SVE registers are stored in an
endianness-invariant byte order.
This means that the two representations differ when running on a
big-endian host. But we blindly copy data from one representation
to another when converting between the two, resulting in the
register contents being unintentionally byteswapped in certain
situations. Currently this can be triggered by the first SVE
instruction after a syscall, for example (though the potential
trigger points may vary in future).
So, fix the conversion functions fpsimd_to_sve(), sve_to_fpsimd()
and sve_sync_from_fpsimd_zeropad() to swab where appropriate.
There is no common swahl128() or swab128() that we could use here.
Maybe it would be worth making this generic, but for now add a
simple local hack.
Since the byte order differences are exposed in ABI, also clarify
the documentation.
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Alan Hayward <alan.hayward@arm.com>
Cc: Julien Grall <julien.grall@arm.com>
Fixes: bc0ee47603 ("arm64/sve: Core task context handling")
Fixes: 8cd969d28f ("arm64/sve: Signal handling support")
Fixes: 43d4da2c45 ("arm64/sve: ptrace and ELF coredump support")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[will: Fix typos in comments and docs spotted by Julien]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Another round of SPDX header file fixes for 5.2-rc4
These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
added, based on the text in the files. We are slowly chipping away at
the 700+ different ways people tried to write the license text. All of
these were reviewed on the spdx mailing list by a number of different
people.
We now have over 60% of the kernel files covered with SPDX tags:
$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
Files checked: 64533
Files with SPDX: 40392
Files with errors: 0
I think the majority of the "easy" fixups are now done, it's now the
start of the longer-tail of crazy variants to wade through.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXPuGTg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykBvQCg2SG+HmDH+tlwKLT/q7jZcLMPQigAoMpt9Uuy
sxVEiFZo8ZU9v1IoRb1I
=qU++
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull yet more SPDX updates from Greg KH:
"Another round of SPDX header file fixes for 5.2-rc4
These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
added, based on the text in the files. We are slowly chipping away at
the 700+ different ways people tried to write the license text. All of
these were reviewed on the spdx mailing list by a number of different
people.
We now have over 60% of the kernel files covered with SPDX tags:
$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
Files checked: 64533
Files with SPDX: 40392
Files with errors: 0
I think the majority of the "easy" fixups are now done, it's now the
start of the longer-tail of crazy variants to wade through"
* tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
...
Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Wolfram Sang <wsa@the-dreams.de>
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
- Fix boot crash on platforms with SVE2 due to missing register encoding
- Fix architected timer accessors when CONFIG_OPTIMIZE_INLINING=y
- Move cpu_logical_map into smp.h for use by upcoming irqchip drivers
- Trivial typo fix in comment
- Disable some useless, noisy warnings from GCC 9
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlz6QlIACgkQt6xw3ITB
YzQJRAf/e2PwVOHLwUIbPg//rKKNlJmBMnIUROEiAPjY7atTfjuYd3/65pIq7ZuO
RjMIUT1A2kg+pMOFzmXObLICq3Xl1/7LUUPIQ1iDvEeWIRb7HKQXoJkg9lvUEy89
T4sR1EBkK7uYh0w+/L7k1LESGgl4+VFY/ZY+4NmwsXEK4jfty3by7zE7Vy35MvQ6
XuEbYuMNjfskgBGbZPVqV8qHUlRurfhWWXjdgdAe9E7+fsHPuOrIr4+uXwAyVtRR
1/4tLySNW2GE8o0Ftun82ZasFTTgPc18ozASBYX1FMlQRaXznNREmSHYaIiartJE
VorEdRe25ztdI54fVM2dL0KKmTZi1g==
=ErME
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Another round of mostly-benign fixes, the exception being a boot crash
on SVE2-capable CPUs (although I don't know where you'd find such a
thing, so maybe it's benign too).
We're in the process of resolving some big-endian ptrace breakage, so
I'll probably have some more for you next week.
Summary:
- Fix boot crash on platforms with SVE2 due to missing register
encoding
- Fix architected timer accessors when CONFIG_OPTIMIZE_INLINING=y
- Move cpu_logical_map into smp.h for use by upcoming irqchip drivers
- Trivial typo fix in comment
- Disable some useless, noisy warnings from GCC 9"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Silence gcc warnings about arch ABI drift
ARM64: trivial: s/TIF_SECOMP/TIF_SECCOMP/ comment typo fix
arm64: arch_timer: mark functions as __always_inline
arm64: smp: Moved cpu_logical_map[] to smp.h
arm64: cpufeature: Fix missing ZFR0 in __read_sysreg_by_encoding()
Add PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP support on arm64.
We don't need any special handling for PTRACE_SYSEMU_SINGLESTEP.
It's quite difficult to generalize handling PTRACE_SYSEMU cross
architectures and avoid calls to tracehook_report_syscall_entry twice.
Different architecture have different mechanism to indicate NO_SYSCALL
and trying to generalise adds more code for no gain.
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed as is without any warranty of any kind whether express
or implied without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 2 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141332.617181045@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 06a916feca ("arm64: Expose SVE2 features for
userspace"), new hwcaps are added that are detected via fields in
the SVE-specific ID register ID_AA64ZFR0_EL1.
In order to check compatibility of secondary cpus with the hwcaps
established at boot, the cpufeatures code uses
__read_sysreg_by_encoding() to read this ID register based on the
sys_reg field of the arm64_elf_hwcaps[] table.
This leads to a kernel splat if an hwcap uses an ID register that
__read_sysreg_by_encoding() doesn't explicitly handle, as now
happens when exercising cpu hotplug on an SVE2-capable platform.
So fix it by adding the required case in there.
Fixes: 06a916feca ("arm64: Expose SVE2 features for userspace")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
cpu_enable_ssbs() is called via stop_machine() as part of the cpu_enable
callback. A spin lock is used to ensure the hook is registered before
the rest of the callback is executed.
On -RT spin_lock() may sleep. However, all the callees in stop_machine()
are expected to not sleep. Therefore a raw_spin_lock() is required here.
Given this is already done under stop_machine() and the work done under
the lock is quite small, the latency should not increase too much.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
cache_line_size is derived from CTR_EL0.CWG field and is called mostly
for I/O device drivers. For some platforms like the HiSilicon Kunpeng920
server SoC, cache line sizes are different between L1/2 cache and L3
cache while L1 cache line size is 64-byte and L3 is 128-byte, but
CTR_EL0.CWG is misreporting using L1 cache line size.
We shall correct the right value which is important for I/O performance.
Let's update the cache line size if it is detected from DT or PPTT
information.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Zhenfa Qiu <qiuzhenfa@hisilicon.com>
Reported-by: Zhenfa Qiu <qiuzhenfa@hisilicon.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When the kernel is compiled with CONFIG_KERNEL_MODE_NEON, some part of
the kernel may be able to use FPSIMD/SVE. This is for instance the case
for crypto code.
Any use of FPSIMD/SVE in the kernel are clearly marked by using the
function kernel_neon_{begin, end}. Furthermore, this can only be used
when may_use_simd() returns true.
The current implementation of may_use_simd() allows softirq to use
FPSIMD/SVE unless it is currently in use (i.e kernel_neon_busy is true).
When in use, softirqs usually fall back to a software method.
At the moment, as a softirq may use FPSIMD/SVE, softirqs are disabled
when touching the FPSIMD/SVE context. This has the drawback to disable
all softirqs even if they are not using FPSIMD/SVE.
Since a softirq is supposed to check may_use_simd() anyway before
attempting to use FPSIMD/SVE, there is limited reason to keep softirq
disabled when touching the FPSIMD/SVE context. Instead, we can simply
disable preemption and mark the FPSIMD/SVE context as in use by setting
CPU's fpsimd_context_busy flag.
Two new helpers {get, put}_cpu_fpsimd_context are introduced to mark
the area using FPSIMD/SVE context and they are used to replace
local_bh_{disable, enable}. The functions kernel_neon_{begin, end} are
also re-implemented to use the new helpers.
Additionally, double-underscored versions of the helpers are provided to
called when preemption is already disabled. These are only relevant on
paths where irqs are disabled anyway, so they are not needed for
correctness in the current code. Let's use them anyway though: this
marks critical sections clearly and will help to avoid mistakes during
future maintenance.
The change has been benchmarked on Linux 5.1-rc4 with defconfig.
On Juno2:
* hackbench 100 process 1000 (10 times)
* .7% quicker
On ThunderX 2:
* hackbench 1000 process 1000 (20 times)
* 3.4% quicker
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The only external user of fpsimd_save() and fpsimd_flush_cpu_state() is
the KVM FPSIMD code.
A following patch will introduce a mechanism to acquire owernship of the
FPSIMD/SVE context for performing context management operations. Rather
than having to export the new helpers to get/put the context, we can just
introduce a new function to combine fpsimd_save() and
fpsimd_flush_cpu_state().
This has also the advantage to remove any external call of fpsimd_save()
and fpsimd_flush_cpu_state(), so they can be turned static.
Lastly, the new function can also be used in the PM notifier.
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Here is another set of reviewed patches that adds SPDX tags to different
kernel files, based on a set of rules that are being used to parse the
comments to try to determine that the license of the file is
"GPL-2.0-or-later" or "GPL-2.0-only". Only the "obvious" versions of
these matches are included here, a number of "non-obvious" variants of
text have been found but those have been postponed for later review and
analysis.
There is also a patch in here to add the proper SPDX header to a bunch
of Kbuild files that we have missed in the past due to new files being
added and forgetting that Kbuild uses two different file names for
Makefiles. This issue was reported by the Kbuild maintainer.
These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on the
patches are reviewers.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXPCHLg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykxyACgql6ktH+Tv8Ho1747kKPiFca1Jq0AoK5HORXI
yB0DSTXYNjMtH41ypnsZ
=x2f8
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.2-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull yet more SPDX updates from Greg KH:
"Here is another set of reviewed patches that adds SPDX tags to
different kernel files, based on a set of rules that are being used to
parse the comments to try to determine that the license of the file is
"GPL-2.0-or-later" or "GPL-2.0-only". Only the "obvious" versions of
these matches are included here, a number of "non-obvious" variants of
text have been found but those have been postponed for later review
and analysis.
There is also a patch in here to add the proper SPDX header to a bunch
of Kbuild files that we have missed in the past due to new files being
added and forgetting that Kbuild uses two different file names for
Makefiles. This issue was reported by the Kbuild maintainer.
These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on
the patches are reviewers"
* tag 'spdx-5.2-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (82 commits)
treewide: Add SPDX license identifier - Kbuild
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 225
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 224
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 223
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 222
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 221
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 220
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 218
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 217
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 216
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 215
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 214
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 213
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 211
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 210
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 207
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 203
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201
...
- Fix implementation of our set_personality() system call, which wasn't
being wrapped properly
- Fix system call function types to keep CFI happy
- Fix siginfo layout when delivering SIGKILL after a kernel fault
- Really fix module relocation range checking
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzvv3EACgkQt6xw3ITB
YzQviwf9Gw3VrBZpS9nwz0MQCf9W7+Vpy8XBsY7HJyUNQ4+8ZNR5HoZ3BcJX2HWk
WKwSw721MllzLfJaRMqNV2+C7lm+EypcZApKFpPo7Vs9g78WcUdNZ4YM4XfAX45T
cVPxeSGOj2aswyOn2Xa3UjKZj8deP8nAC/JgJY7t9L6qKObwUldmxBPRnZdclclw
S8sQSMvLc9Q43jmEysPLixExZ6jzmq1i8xxPcyqFUz8DHYPf1irLxtpS7DYA+nk5
nwQ/lnz6Tu8TBXcvgvXayKL8aa8SIsl0cOii2FWsZMkFXz3OZ08hdujvMYsPSSHO
q3rMub7F/0znm00sBGXgTGRjy++v+A==
=pyp4
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The fixes are still trickling in for arm64, but the only really
significant one here is actually fixing a regression in the botched
module relocation range checking merged for -rc2.
Hopefully we've nailed it this time.
- Fix implementation of our set_personality() system call, which
wasn't being wrapped properly
- Fix system call function types to keep CFI happy
- Fix siginfo layout when delivering SIGKILL after a kernel fault
- Really fix module relocation range checking"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: use the correct function type for __arm64_sys_ni_syscall
arm64: use the correct function type in SYSCALL_DEFINE0
arm64: fix syscall_fn_t type
signal/arm64: Use force_sig not force_sig_fault for SIGKILL
arm64/module: revert to unsigned interpretation of ABS16/32 relocations
arm64: Fix the arm64_personality() syscall wrapper redirection
Based on 1 normalized pattern(s):
license terms gnu general public license gpl version 2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 161 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528170027.447718015@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 655 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070034.575739538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from from force_sig_fault to make it explicit that is what is going
on.
The two known exceptions that deliver a synchronous exception to a
stopped ptraced task have already been changed to
force_sig_fault_to_task.
The callers have been changed with the following emacs regular expression
(with obvious variations on the architectures that take more arguments)
to avoid typos:
force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
->
force_sig_fault(\1,\2,\3)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Calling sys_ni_syscall through a syscall_fn_t pointer trips indirect
call Control-Flow Integrity checking due to a function type
mismatch. Use SYSCALL_DEFINE0 for __arm64_sys_ni_syscall instead and
remove the now unnecessary casts.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable@vger.kernel.org
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: af40ff687b ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 1cf24a2cc3
("arm64/module: deal with ambiguity in PRELxx relocation ranges")
updated the overflow checking logic in the relocation handling code to
ensure that PREL16/32 relocations don't overflow signed quantities.
However, the same code path is used for absolute relocations, where the
interpretation is the opposite: the only current use case for absolute
relocations operating on non-native word size quantities is the CRC32
handling in the CONFIG_MODVERSIONS code, and these CRCs are unsigned
32-bit quantities, which are now being rejected by the module loader
if bit 31 happens to be set.
So let's use different ranges for quanties subject to absolute vs.
relative relocations:
- ABS16/32 relocations should be in the range [0, Uxx_MAX)
- PREL16/32 relocations should be in the range [Sxx_MIN, Sxx_MAX)
- otherwise, print an error since no other 16 or 32 bit wide data
relocations are currently supported.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Following commit 4378a7d4be ("arm64: implement syscall wrappers"), the
syscall function names gained the '__arm64_' prefix. Ensure that we
have the correct #define for redirecting a default syscall through a
wrapper.
Fixes: 4378a7d4be ("arm64: implement syscall wrappers")
Cc: <stable@vger.kernel.org> # 4.19.x-
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
All of the callers pass current into force_sig_mceer so remove the
task parameter to make this obvious.
This also makes it clear that force_sig_mceerr passes current
into force_sig_info.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
All of the remaining callers pass current into force_sig so
remove the task parameter to make this obvious and to make
misuse more difficult in the future.
This also makes it clear force_sig passes current into force_sig_info.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable@vger.kernel.org
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Fixes: af40ff687b ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
- Fix incorrect LDADD instruction encoding in our disassembly macros
- Disable the broken ARM64_PSEUDO_NMI support for now
- Add workaround for Cortex-A76 CPU erratum #1463225
- Handle Cortex-A76/Neoverse-N1 erratum #1418040 w/ existing workaround
- Fix IORT build failure if IOMMU_SUPPORT=n
- Fix place-relative module relocation range checking and its
interaction with KASLR
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzoC8MACgkQt6xw3ITB
YzQfiAf+MXFzrAd3o7v40CnZu6ELw+ldedPh34oBjD7h6we3hroxi5Fss2nbwH0o
BmAm4Nv1/Njk5+hA7Mlp3/mRn0vcd3NDP+FyH3inLjUU7owc41thp0SKlCOfFdZk
K8sVCOeCWt7GEEPcnFsPO0nU+7f3ZKDDNBo0L+qJPxrMOTDcbQ3cIjW/ua7vQRHv
pIDGF+iJAhHeNoc1Wjq08F8Q+Dq7dYvhtokeyDivSn4NulmRvdL+z581gMmj7ExT
ARB6WtHGoOo+8UdjBJIDnXRKhJLfGexQaoAojk+IogaV0ACDtz6CuqsSIh1e5SFC
oPqRSP5ITTbXEDS5uaUW1pYlwmGTaw==
=ynUz
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 fixes from Will Deacon:
- Fix incorrect LDADD instruction encoding in our disassembly macros
- Disable the broken ARM64_PSEUDO_NMI support for now
- Add workaround for Cortex-A76 CPU erratum #1463225
- Handle Cortex-A76/Neoverse-N1 erratum #1418040 w/ existing workaround
- Fix IORT build failure if IOMMU_SUPPORT=n
- Fix place-relative module relocation range checking and its
interaction with KASLR
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: insn: Add BUILD_BUG_ON() for invalid masks
arm64: insn: Fix ldadd instruction encoding
arm64: Kconfig: Make ARM64_PSEUDO_NMI depend on BROKEN for now
arm64: Handle erratum 1418040 as a superset of erratum 1188873
arm64/module: deal with ambiguity in PRELxx relocation ranges
ACPI/IORT: Fix build error when IOMMU_SUPPORT is disabled
arm64/kernel: kaslr: reduce module randomization range to 2 GB
arm64: errata: Add workaround for Cortex-A76 erratum #1463225
arm64: Remove useless message during oops
We already mitigate erratum 1188873 affecting Cortex-A76 and
Neoverse-N1 r0p0 to r2p0. It turns out that revisions r0p0 to
r3p1 of the same cores are affected by erratum 1418040, which
has the same workaround as 1188873.
Let's expand the range of affected revisions to match 1418040,
and repaint all occurences of 1188873 to 1418040. Whilst we're
there, do a bit of reformating in silicon-errata.txt and drop
a now unnecessary dependency on ARM_ARCH_TIMER_OOL_WORKAROUND.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The R_AARCH64_PREL16 and R_AARCH64_PREL32 relocations are
documented as permitting a range of [-2^15 .. 2^16), resp.
[-2^31 .. 2^32). It is also documented that this means we
cannot detect overflow in some cases, which is bad.
Since we always interpret the targets of these relocations as
signed quantities (e.g., in the ksymtab handling code), let's
tighten the overflow checks so that targets that are out of
range for our signed interpretation of the relocated quantity
get flagged.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The following commit
7290d58095 ("module: use relative references for __ksymtab entries")
updated the ksymtab handling of some KASLR capable architectures
so that ksymtab entries are emitted as pairs of 32-bit relative
references. This reduces the size of the entries, but more
importantly, it gets rid of statically assigned absolute
addresses, which require fixing up at boot time if the kernel
is self relocating (which takes a 24 byte RELA entry for each
member of the ksymtab struct).
Since ksymtab entries are always part of the same module as the
symbol they export, it was assumed at the time that a 32-bit
relative reference is always sufficient to capture the offset
between a ksymtab entry and its target symbol.
Unfortunately, this is not always true: in the case of per-CPU
variables, a per-CPU variable's base address (which usually differs
from the actual address of any of its per-CPU copies) is allocated
in the vicinity of the ..data.percpu section in the core kernel
(i.e., in the per-CPU reserved region which follows the section
containing the core kernel's statically allocated per-CPU variables).
Since we randomize the module space over a 4 GB window covering
the core kernel (based on the -/+ 4 GB range of an ADRP/ADD pair),
we may end up putting the core kernel out of the -/+ 2 GB range of
32-bit relative references of module ksymtab entries that refer to
per-CPU variables.
So reduce the module randomization range a bit further. We lose
1 bit of randomization this way, but this is something we can
tolerate.
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Revisions of the Cortex-A76 CPU prior to r4p0 are affected by an erratum
that can prevent interrupts from being taken when single-stepping.
This patch implements a software workaround to prevent userspace from
effectively being able to disable interrupts.
Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
During an oops, we print the name of the current task and its pid twice.
We also helpfully advertise its stack limit as "0x(____ptrval____)".
Drop these useless messages.
Signed-off-by: Will Deacon <will.deacon@arm.com>
- Fix SPE probe failure when backing auxbuf with high-order pages
- Fix handling of DMA allocations from outside of the vmalloc area
- Fix generation of build-id ELF section for vDSO object
- Disable huge I/O mappings if kernel page table dumping is enabled
- A few other minor fixes (comments, kconfig etc)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzlRT0ACgkQt6xw3ITB
YzRGOwgArUDryBedDdkxAvjx7fk8O+qjtWctAhdPtyuXIvVLOc3tpiKlayCguF/a
clqr4qAfxswoDLHRMwhh7xdv955A2vraHQWlzvGUj2O2M4mG8RdbVJLm3NxpA09m
dufjSuFcwxcou2c4rXbSXSB4AYJXPmQJiad04VsWj68+TVehy0P45zaPcjHsPNPI
D9sTa9XhBlNa0qpJG7tP9T8FS/QP/hpWHn8v0z/DQ4QetKRTstkpwD5kmJox8WmM
Bw593bvQQ2+5q9g+z0FM3M/7yHwTJw2RLnnIb29YsW8MxM3rUeqt+FMA2OALBgbi
0m7WoTZwO9hDQuPU1DDvZUtw3iOpeg==
=buiS
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Fix SPE probe failure when backing auxbuf with high-order pages
- Fix handling of DMA allocations from outside of the vmalloc area
- Fix generation of build-id ELF section for vDSO object
- Disable huge I/O mappings if kernel page table dumping is enabled
- A few other minor fixes (comments, kconfig etc)
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: vdso: Explicitly add build-id option
arm64/mm: Inhibit huge-vmap with ptdump
arm64: Print physical address of page table base in show_pte()
arm64: don't trash config with compat symbol if COMPAT is disabled
arm64: assembler: Update comment above cond_yield_neon() macro
drivers/perf: arm_spe: Don't error on high-order pages for aux buf
arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtable
* POWER: support for direct access to the POWER9 XIVE interrupt controller,
memory and performance optimizations.
* x86: support for accessing memory not backed by struct page, fixes and refactoring
* Generic: dirty page tracking improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJc3qV/AAoJEL/70l94x66Dn3QH/jX1Bn0P/RZAIt4w0SySklSg
PqxUKDyBQqB9vN9Qeb9jWXAKPH2CtM3+up/rz7oRnBWp7qA6vXcC/R/QJYAvzdXE
nklsR/oYCsflR1KdlVYuDvvPCPP2fLBU5zfN83OsaBQ8fNRkm3gN+N5XQ2SbXbLy
Mo9tybS4otY201UAC96e8N0ipwwyCRpDneQpLcl+F5nH3RBt63cVbs04O+70MXn7
eT4I+8K3+Go7LATzT8hglD21D/7uvE31qQb6yr5L33IfhU4GB51RZzBXTNaAdY8n
hT1rMrRkAMAFWYZPQDfoMadjWU3i5DIfstKjDxOr9oTfuOEp5Z+GvJwvVnUDg1I=
=D0+p
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- support for SVE and Pointer Authentication in guests
- PMU improvements
POWER:
- support for direct access to the POWER9 XIVE interrupt controller
- memory and performance optimizations
x86:
- support for accessing memory not backed by struct page
- fixes and refactoring
Generic:
- dirty page tracking improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits)
kvm: fix compilation on aarch64
Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"
kvm: x86: Fix L1TF mitigation for shadow MMU
KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible
KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device
KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing"
KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs
kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete
tests: kvm: Add tests for KVM_SET_NESTED_STATE
KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state
tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID
tests: kvm: Add tests to .gitignore
KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one
KVM: Fix the bitmap range to copy during clear dirty
KVM: arm64: Fix ptrauth ID register masking logic
KVM: x86: use direct accessors for RIP and RSP
KVM: VMX: Use accessors for GPRs outside of dedicated caching logic
KVM: x86: Omit caching logic for always-available GPRs
kvm, x86: Properly check whether a pfn is an MMIO or not
...
Commit 691efbedc6 ("arm64: vdso: use $(LD) instead of $(CC) to
link VDSO") switched to using LD explicitly. The --build-id option
needs to be passed explicitly, similar to x86. Add this option.
Fixes: 691efbedc6 ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO")
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
[will: drop redundant use of 'call ld-option' as requested by Masahiro]
Signed-off-by: Will Deacon <will.deacon@arm.com>
- guest SVE support
- guest Pointer Authentication support
- Better discrimination of perf counters between host and guests
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlzMM9kVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDEp8P/iqZvvZlLdlnWQwluWh237c28kAo
zELO0L7Wl+OJ66v2hzM+NPBi5kv/9pSv7AoKNLv3398YmKFt0n7yUB+MHi0BC9xi
ZEp4etCOiVcqcWWeDiAXLdR9OQlb7IDBDc56s4V9HQgK3sEb4u8aEJIy/nDBVniv
GVLMh1EOsrviIYso6UVxI1X7lPQevpCS0kv9/llhhzEj8QDxnQThjDuW3wrAyhQi
F9XNVjAMW8rft7vvok9cxT4v+TR1HgUajquoSrjXuonWHgKnC9tSH/dHILNK8Zij
5OApojGlZQrXIa5Sk3JOhGahVVY9Y+ewsw58J5bJxd0/xrKXnWk/Lann7NE+UcBf
RJMHfanIO/+JJRzHhagejK7pqnYXD1PWBwF8z3Hefs1IVw4eBvPBGuhIULJ6+eSP
+3JCwiOiwshG43gZlGmHcgvhPdeX4r/BlopWV9+0X/gAjcU1+3+ZG6J3jeAcC1Kx
i481dSzlZ7Ar7VWDCk7WgcmDvUwHXtxq0HbqzQjPBO04kkakjdPZZrZIX3+Qhlem
GpkPVb2z5h5KTk9Fx03ZXxPVdiOQh1UmNC8jlsYZPWcJVTLkySs7HWXZJe+WTs4Z
NLuen/eA4/NCon+UA6XdIG5Ddn/J39UuF1lCApHPHn576rwz+HmqpcN59XiU6y4h
XHIxzajFcXNpn802
=fjph
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 5.2
- guest SVE support
- guest Pointer Authentication support
- Better discrimination of perf counters between host and guests
Conflicts:
include/uapi/linux/kvm.h
Pull networking updates from David Miller:
"Highlights:
1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.
2) Add fib_sync_mem to control the amount of dirty memory we allow to
queue up between synchronize RCU calls, from David Ahern.
3) Make flow classifier more lockless, from Vlad Buslov.
4) Add PHY downshift support to aquantia driver, from Heiner
Kallweit.
5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
contention on SLAB spinlocks in heavy RPC workloads.
6) Partial GSO offload support in XFRM, from Boris Pismenny.
7) Add fast link down support to ethtool, from Heiner Kallweit.
8) Use siphash for IP ID generator, from Eric Dumazet.
9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
entries, from David Ahern.
10) Move skb->xmit_more into a per-cpu variable, from Florian
Westphal.
11) Improve eBPF verifier speed and increase maximum program size,
from Alexei Starovoitov.
12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
spinlocks. From Neil Brown.
13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.
14) Improve link partner cap detection in generic PHY code, from
Heiner Kallweit.
15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
Maguire.
16) Remove SKB list implementation assumptions in SCTP, your's truly.
17) Various cleanups, optimizations, and simplifications in r8169
driver. From Heiner Kallweit.
18) Add memory accounting on TX and RX path of SCTP, from Xin Long.
19) Switch PHY drivers over to use dynamic featue detection, from
Heiner Kallweit.
20) Support flow steering without masking in dpaa2-eth, from Ioana
Ciocoi.
21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
Pirko.
22) Increase the strict parsing of current and future netlink
attributes, also export such policies to userspace. From Johannes
Berg.
23) Allow DSA tag drivers to be modular, from Andrew Lunn.
24) Remove legacy DSA probing support, also from Andrew Lunn.
25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
Haabendal.
26) Add a generic tracepoint for TX queue timeouts to ease debugging,
from Cong Wang.
27) More indirect call optimizations, from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
cxgb4: Fix error path in cxgb4_init_module
net: phy: improve pause mode reporting in phy_print_status
dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
net: macb: Change interrupt and napi enable order in open
net: ll_temac: Improve error message on error IRQ
net/sched: remove block pointer from common offload structure
net: ethernet: support of_get_mac_address new ERR_PTR error
net: usb: smsc: fix warning reported by kbuild test robot
staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
net: dsa: support of_get_mac_address new ERR_PTR error
net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
vrf: sit mtu should not be updated when vrf netdev is the link
net: dsa: Fix error cleanup path in dsa_init_module
l2tp: Fix possible NULL pointer dereference
taprio: add null check on sched_nest to avoid potential null pointer dereference
net: mvpp2: cls: fix less than zero check on a u32 variable
net_sched: sch_fq: handle non connected flows
net_sched: sch_fq: do not assume EDT packets are ordered
net: hns3: use devm_kcalloc when allocating desc_cb
net: hns3: some cleanup for struct hns3_enet_ring
...
Here is the "big" set of driver core patches for 5.2-rc1
There are a number of ACPI patches in here as well, as Rafael said they
should go through this tree due to the driver core changes they
required. They have all been acked by the ACPI developers.
There are also a number of small subsystem-specific changes in here, due
to some changes to the kobject core code. Those too have all been acked
by the various subsystem maintainers.
As for content, it's pretty boring outside of the ACPI changes:
- spdx cleanups
- kobject documentation updates
- default attribute groups for kobjects
- other minor kobject/driver core fixes
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXNHDbw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynDAgCfbb4LBR6I50wFXb8JM/R6cAS7qrsAn1unshKV
8XCYcif2RxjtdJWXbjdm
=/rLh
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core/kobject updates from Greg KH:
"Here is the "big" set of driver core patches for 5.2-rc1
There are a number of ACPI patches in here as well, as Rafael said
they should go through this tree due to the driver core changes they
required. They have all been acked by the ACPI developers.
There are also a number of small subsystem-specific changes in here,
due to some changes to the kobject core code. Those too have all been
acked by the various subsystem maintainers.
As for content, it's pretty boring outside of the ACPI changes:
- spdx cleanups
- kobject documentation updates
- default attribute groups for kobjects
- other minor kobject/driver core fixes
All have been in linux-next for a while with no reported issues"
* tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
kobject: clean up the kobject add documentation a bit more
kobject: Fix kernel-doc comment first line
kobject: Remove docstring reference to kset
firmware_loader: Fix a typo ("syfs" -> "sysfs")
kobject: fix dereference before null check on kobj
Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
init/config: Do not select BUILD_BIN2C for IKCONFIG
Provide in-kernel headers to make extending kernel easier
kobject: Improve doc clarity kobject_init_and_add()
kobject: Improve docs for kobject_add/del
driver core: platform: Fix the usage of platform device name(pdev->name)
livepatch: Replace klp_ktype_patch's default_attrs with groups
cpufreq: schedutil: Replace default_attrs field with groups
padata: Replace padata_attr_type default_attrs field with groups
irqdesc: Replace irq_kobj_type's default_attrs field with groups
net-sysfs: Replace ktype default_attrs field with groups
block: Replace all ktype default_attrs with groups
samples/kobject: Replace foo_ktype's default_attrs field with groups
kobject: Add support for default attribute groups to kobj_type
driver core: Postpone DMA tear-down until after devres release for probe failure
...
Mostly just incremental improvements here:
- Introduce AT_HWCAP2 for advertising CPU features to userspace
- Expose SVE2 availability to userspace
- Support for "data cache clean to point of deep persistence" (DC PODP)
- Honour "mitigations=off" on the cmdline and advertise status via sysfs
- CPU timer erratum workaround (Neoverse-N1 #1188873)
- Introduce perf PMU driver for the SMMUv3 performance counters
- Add config option to disable the kuser helpers page for AArch32 tasks
- Futex modifications to ensure liveness under contention
- Rework debug exception handling to seperate kernel and user handlers
- Non-critical fixes and cleanup
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFGgACgkQt6xw3ITB
YzTicAf/TX1h1+ecbx4WJAa4qeiOCPoNpG9efldQumqJhKL44MR5bkhuShna5mwE
ptm5qUXkZCxLTjzssZKnbdbgwa3t+emW8Of3D91IfI9akiZbMoDx5FGgcNbqjazb
RLrhOFHwgontA38yppZN+DrL+sXbvif/CVELdHahkEx6KepSGaS2lmPXRmz/W56v
4yIRy/zxc3Dhjgfm3wKh72nBwoZdLiIc4mchd5pthNlR9E2idrYkQegG1C+gA00r
o8uZRVOWgoh7H+QJE+xLUc8PaNCg8xqRRXOuZYg9GOz6hh7zSWhm+f1nRz9S2tIR
gIgsCHNqoO2I3E1uJpAQXDGtt2kFhA==
=ulpJ
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Mostly just incremental improvements here:
- Introduce AT_HWCAP2 for advertising CPU features to userspace
- Expose SVE2 availability to userspace
- Support for "data cache clean to point of deep persistence" (DC PODP)
- Honour "mitigations=off" on the cmdline and advertise status via
sysfs
- CPU timer erratum workaround (Neoverse-N1 #1188873)
- Introduce perf PMU driver for the SMMUv3 performance counters
- Add config option to disable the kuser helpers page for AArch32 tasks
- Futex modifications to ensure liveness under contention
- Rework debug exception handling to seperate kernel and user
handlers
- Non-critical fixes and cleanup"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
Documentation: Add ARM64 to kernel-parameters.rst
arm64/speculation: Support 'mitigations=' cmdline option
arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSB
arm64: enable generic CPU vulnerabilites support
arm64: add sysfs vulnerability show for speculative store bypass
arm64: Fix size of __early_cpu_boot_status
clocksource/arm_arch_timer: Use arch_timer_read_counter to access stable counters
clocksource/arm_arch_timer: Remove use of workaround static key
clocksource/arm_arch_timer: Drop use of static key in arch_timer_reg_read_stable
clocksource/arm_arch_timer: Direcly assign set_next_event workaround
arm64: Use arch_timer_read_counter instead of arch_counter_get_cntvct
watchdog/sbsa: Use arch_timer_read_counter instead of arch_counter_get_cntvct
ARM: vdso: Remove dependency with the arch_timer driver internals
arm64: Apply ARM64_ERRATUM_1188873 to Neoverse-N1
arm64: Add part number for Neoverse N1
arm64: Make ARM64_ERRATUM_1188873 depend on COMPAT
arm64: Restrict ARM64_ERRATUM_1188873 mitigation to AArch32
arm64: mm: Remove pte_unmap_nested()
arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variable
arm64: compat: Reduce address limit for 64K pages
...
Pull stack trace updates from Ingo Molnar:
"So Thomas looked at the stacktrace code recently and noticed a few
weirdnesses, and we all know how such stories of crummy kernel code
meeting German engineering perfection end: a 45-patch series to clean
it all up! :-)
Here's the changes in Thomas's words:
'Struct stack_trace is a sinkhole for input and output parameters
which is largely pointless for most usage sites. In fact if embedded
into other data structures it creates indirections and extra storage
overhead for no benefit.
Looking at all usage sites makes it clear that they just require an
interface which is based on a storage array. That array is either on
stack, global or embedded into some other data structure.
Some of the stack depot usage sites are outright wrong, but
fortunately the wrongness just causes more stack being used for
nothing and does not have functional impact.
Another oddity is the inconsistent termination of the stack trace
with ULONG_MAX. It's pointless as the number of entries is what
determines the length of the stored trace. In fact quite some call
sites remove the ULONG_MAX marker afterwards with or without nasty
comments about it. Not all architectures do that and those which do,
do it inconsistenly either conditional on nr_entries == 0 or
unconditionally.
The following series cleans that up by:
1) Removing the ULONG_MAX termination in the architecture code
2) Removing the ULONG_MAX fixups at the call sites
3) Providing plain storage array based interfaces for stacktrace
and stackdepot.
4) Cleaning up the mess at the callsites including some related
cleanups.
5) Removing the struct stack_trace based interfaces
This is not changing the struct stack_trace interfaces at the
architecture level, but it removes the exposure to the generic
code'"
* 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
x86/stacktrace: Use common infrastructure
stacktrace: Provide common infrastructure
lib/stackdepot: Remove obsolete functions
stacktrace: Remove obsolete functions
livepatch: Simplify stack trace retrieval
tracing: Remove the last struct stack_trace usage
tracing: Simplify stack trace retrieval
tracing: Make ftrace_trace_userstack() static and conditional
tracing: Use percpu stack trace buffer more intelligently
tracing: Simplify stacktrace retrieval in histograms
lockdep: Simplify stack trace handling
lockdep: Remove save argument from check_prev_add()
lockdep: Remove unused trace argument from print_circular_bug()
drm: Simplify stacktrace handling
dm persistent data: Simplify stack trace handling
dm bufio: Simplify stack trace retrieval
btrfs: ref-verify: Simplify stack trace retrieval
dma/debug: Simplify stracktrace retrieval
fault-inject: Simplify stacktrace retrieval
mm/page_owner: Simplify stack trace handling
...
Configure arm64 runtime CPU speculation bug mitigations in accordance
with the 'mitigations=' cmdline option. This affects Meltdown, Spectre
v2, and Speculative Store Bypass.
The default behavior is unchanged.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
[will: reorder checks so KASLR implies KPTI and SSBS is affected by cmdline]
Signed-off-by: Will Deacon <will.deacon@arm.com>
SSBS provides a relatively cheap mitigation for SSB, but it is still a
mitigation and its presence does not indicate that the CPU is unaffected
by the vulnerability.
Tweak the mitigation logic so that we report the correct string in sysfs.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Return status based on ssbd_state and __ssb_safe. If the
mitigation is disabled, or the firmware isn't responding then
return the expected machine state based on a whitelist of known
good cores.
Given a heterogeneous machine, the overall machine vulnerability
defaults to safe but is reset to unsafe when we miss the whitelist
and the firmware doesn't explicitly tell us the core is safe.
In order to make that work we delay transitioning to vulnerable
until we know the firmware isn't responding to avoid a case
where we miss the whitelist, but the firmware goes ahead and
reports the core is not vulnerable. If all the cores in the
machine have SSBS, then __ssb_safe will remain true.
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
__early_cpu_boot_status is of type long. Use quad
assembler directive to allocate proper size.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Arun KS <arunks@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Only arch_timer_read_counter will guarantee that workarounds are
applied. So let's use this one instead of arch_counter_get_cntvct.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Neoverse-N1 is also affected by ARM64_ERRATUM_1188873, so let's
add it to the list of affected CPUs.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[will: Update silicon-errata.txt]
Signed-off-by: Will Deacon <will.deacon@arm.com>
We currently deal with ARM64_ERRATUM_1188873 by always trapping EL0
accesses for both instruction sets. Although nothing wrong comes out
of that, people trying to squeeze the last drop of performance from
buggy HW find this over the top. Oh well.
Let's change the mitigation by flipping the counter enable bit
on return to userspace. Non-broken HW gets an extra branch on
the fast path, which is hopefully not the end of the world.
The arch timer workaround is also removed.
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When executing clock_gettime(), either in the vDSO or via a system call,
we need to ensure that the read of the counter register occurs within
the seqlock reader critical section. This ensures that updates to the
clocksource parameters (e.g. the multiplier) are consistent with the
counter value and therefore avoids the situation where time appears to
go backwards across multiple reads.
Extend the vDSO logic so that the seqlock critical section covers the
read of the counter register as well as accesses to the data page. Since
reads of the counter system registers are not ordered by memory barrier
instructions, introduce dependency ordering from the counter read to a
subsequent memory access so that the seqlock memory barriers apply to
the counter access in both the vDSO and the system call paths.
Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/linux-arm-kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The file offset argument to the arm64 sys_mmap() implementation is
scaled from bytes to pages by shifting right by PAGE_SHIFT.
Unfortunately, the offset is passed in as a signed 'off_t' type and
therefore large offsets (i.e. with the top bit set) are incorrectly
sign-extended by the shift. This has been observed to cause false mmap()
failures when mapping GPU doorbells on an arm64 server part.
Change the type of the file offset argument to sys_mmap() from 'off_t'
to 'unsigned long' so that the shifting scales the value as expected.
Cc: <stable@vger.kernel.org>
Signed-off-by: Boyang Zhou <zhouby_cn@126.com>
[will: rewrote commit message]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016,
lets add support for STADD and use that in favor of LDXR / STXR loop for
the XADD mapping if available. STADD is encoded as an alias for LDADD with
XZR as the destination register, therefore add LDADD to the instruction
encoder along with STADD as special case and use it in the JIT for CPUs
that advertise LSE atomics in CPUID register. If immediate offset in the
BPF XADD insn is 0, then use dst register directly instead of temporary
one.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ensure we are always able to detect whether or not the CPU is affected
by SSB, so that we can later advertise this to userspace.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
[will: Use IS_ENABLED instead of #ifdef]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Track whether all the cores in the machine are vulnerable to Spectre-v2,
and whether all the vulnerable cores have been mitigated. We then expose
this information to userspace via sysfs.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ensure we are always able to detect whether or not the CPU is affected
by Spectre-v2, so that we can later advertise this to userspace.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SMCCC ARCH_WORKAROUND_1 service can indicate that although the
firmware knows about the Spectre-v2 mitigation, this particular
CPU is not vulnerable, and it is thus not necessary to call
the firmware on this CPU.
Let's use this information to our benefit.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We currently have a list of CPUs affected by Spectre-v2, for which
we check that the firmware implements ARCH_WORKAROUND_1. It turns
out that not all firmwares do implement the required mitigation,
and that we fail to let the user know about it.
Instead, let's slightly revamp our checks, and rely on a whitelist
of cores that are known to be non-vulnerable, and let the user know
the status of the mitigation in the kernel log.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We implement page table isolation as a mitigation for meltdown.
Report this to userspace via sysfs.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
spectre-v1 has been mitigated and the mitigation is always active.
Report this to userspace via sysfs
Signed-off-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There are various reasons, such as benchmarking, to disable spectrev2
mitigation on a machine. Provide a command-line option to do so.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
With VHE different exception levels are used between the host (EL2) and
guest (EL1) with a shared exception level for userpace (EL0). We can take
advantage of this and use the PMU's exception level filtering to avoid
enabling/disabling counters in the world-switch code. Instead we just
modify the counter type to include or exclude EL0 at vcpu_{load,put} time.
We also ensure that trapped PMU system register writes do not re-enable
EL0 when reconfiguring the backing perf events.
This approach completely avoids blackout windows seen with !VHE.
Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add support for the :G and :H attributes in perf by handling the
exclude_host/exclude_guest event attributes.
We notify KVM of counters that we wish to be enabled or disabled on
guest entry/exit and thus defer from starting or stopping events based
on their event attributes.
With !VHE we switch the counters between host/guest at EL2. We are able
to eliminate counters counting host events on the boundaries of guest
entry/exit when using :G by filtering out EL2 for exclude_host. When
using !exclude_hv there is a small blackout window at the guest
entry/exit where host events are not captured.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
a typedef to kvm_cpu_context and is used to store host cpu context. The
kvm_cpu_context structure is also used elsewhere to hold vcpu context.
In order to use the percpu to hold additional future host information we
encapsulate kvm_cpu_context in a new structure and rename the typedef and
percpu to match.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The armv8pmu_enable_event_counter function issues an isb instruction
after enabling a pair of counters - this doesn't provide any value
and is inconsistent with the armv8pmu_disable_event_counter.
In any case armv8pmu_enable_event_counter is always called with the
PMU stopped. Starting the PMU with armv8pmu_start results in an isb
instruction being issued prior to writing to PMCR_EL0.
Let's remove the unnecessary isb instruction.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.
Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.
When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.
Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.
Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.
This switch of key is done from guest enter/exit assembly as preparation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
[maz: various fixups]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch provides support for reporting the presence of SVE2 and
its optional features to userspace.
This will also enable visibility of SVE2 for guests, when KVM
support for SVE-enabled guests is available.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When kuser helpers are enabled the kernel maps the relative code at
a fixed address (0xffff0000). Making configurable the option to disable
them means that the kernel can remove this mapping and any access to
this memory area results in a sigfault.
Add a KUSER_HELPERS config option that can be used to disable the
mapping when it is turned off.
This option can be turned off if and only if the applications are
designed specifically for the platform and they do not make use of the
kuser helpers code.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
[will: Use IS_ENABLED() instead of #ifdef]
Signed-off-by: Will Deacon <will.deacon@arm.com>
aarch32_alloc_vdso_pages() needs to be refactored to make it
easier to disable kuser helpers.
Divide the function in aarch32_alloc_kuser_vdso_page() and
aarch32_alloc_sigreturn_vdso_page().
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
[will: Inlined sigpage allocation to simplify error paths]
Signed-off-by: Will Deacon <will.deacon@arm.com>
To make it possible to disable kuser helpers in aarch32 we need to
divide the kuser and the sigreturn functionalities.
Split the current version of kuser32 in kuser32 (for kuser helpers)
and sigreturn32 (for sigreturn helpers).
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For AArch32 tasks, we install a special "[vectors]" page that contains
the sigreturn trampolines and kuser helpers, which is mapped at a fixed
address specified by the kuser helpers ABI.
Having the sigreturn trampolines in the same page as the kuser helpers
makes it impossible to disable the kuser helpers independently.
Follow the Arm implementation, by moving the signal trampolines out of
the "[vectors]" page and into their own "[sigpage]".
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
[will: tweaked comments and fixed sparse warning]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Another bodge for the ftrace PLT code: plt_entries_equal() now takes
the place relative nature of the ADRP/ADD based PLT entries into
account, which means that a struct trampoline instance on the stack
is no longer equal to the same set of opcodes in the module struct,
given that they don't point to the same place in memory anymore.
Work around this by using memcmp() in the ftrace PLT handling code.
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently the meanings of sve_vq_map and the ancillary helpers
__bit_to_vq() and __vq_to_bit() are not clearly explained.
This patch makes the explanatory comment clearer, and removes the
duplicate comment from fpsimd.h.
The WARN_ON() currently present in __bit_to_vq() confuses the
intended use of this helper. Since these are low-level helpers not
intended for general-purpose use anyway, it is better not to make
guesses about how these functions will be used: rather, this patch
removes the WARN_ON() and relies on callers to use the helpers
sensibly.
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
clock_getres() in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
where 'hrtimer_resolution' depends on whether or not high resolution
timers are enabled, which is a runtime decision.
The vDSO incorrectly returns the constant CLOCK_REALTIME_RES. Fix this
by exposing 'hrtimer_resolution' in the vDSO datapage and returning that
instead.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
[will: Use WRITE_ONCE(), move adr off COARSE path, renumber labels, use 'w' reg]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Advertise ARM64_HAS_DCPODP when both DC CVAP and DC CVADP are supported.
Even though we don't use this feature now, we provide it for consistency
with DCPOP and anticipate it being used in the future.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
ARMv8.5 builds upon the ARMv8.2 DC CVAP instruction by introducing a DC
CVADP instruction which cleans the data cache to the point of deep
persistence. Let's expose this support via the arm64 ELF hwcaps.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARMv8.5 DC CVADP instruction may be trapped to EL1 via
SCTLR_EL1.UCI therefore let's provide a handler for it.
Just like the CVAP instruction we use a 'sys' instruction instead of
the 'dc' alias to avoid build issues with older toolchains.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The introduction of AT_HWCAP2 introduced accessors which ensure that
hwcap features are set and tested appropriately.
Let's now mandate access to elf_hwcap via these accessors by making
elf_hwcap static within cpufeature.c.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As we will exhaust the first 32 bits of AT_HWCAP let's start
exposing AT_HWCAP2 to userspace to give us up to 64 caps.
Whilst it's possible to use the remaining 32 bits of AT_HWCAP, we
prefer to expand into AT_HWCAP2 in order to provide a consistent
view to userspace between ILP32 and LP64. However internal to the
kernel we prefer to continue to use the full space of elf_hwcap.
To reduce complexity and allow for future expansion, we now
represent hwcaps in the kernel as ordinals and use a
KERNEL_HWCAP_ prefix. This allows us to support automatic feature
based module loading for all our hwcaps.
We introduce cpu_set_feature to set hwcaps which complements the
existing cpu_have_feature helper. These helpers allow us to clean
up existing direct uses of elf_hwcap and reduce any future effort
required to move beyond 64 caps.
For convenience we also introduce cpu_{have,set}_named_feature which
makes use of the cpu_feature macro to allow providing a hwcap name
without a {KERNEL_}HWCAP_ prefix.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
[will: use const_ilog2() and tweak documentation]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Terminating the last trace entry with ULONG_MAX is a completely pointless
exercise and none of the consumers can rely on it because it's
inconsistently implemented across architectures. In fact quite some of the
callers remove the entry and adjust stack_trace.nr_entries afterwards.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lkml.kernel.org/r/20190410103644.220247845@linutronix.de
We use $(LD) to link vmlinux, modules, decompressors, etc.
VDSO is the only exceptional case where $(CC) is used as the linker
driver, but I do not know why we need to do so. VDSO uses a special
linker script, and does not link standard libraries at all.
I changed the Makefile to use $(LD) rather than $(CC). I tested this,
and VDSO worked for me.
Users will be able to use their favorite linker (e.g. lld instead of
of bfd) by passing LD= from the command line.
My plan is to rewrite all VDSO Makefiles to use $(LD), then delete
cc-ldoption.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The functions armv8pmu_read_counter() and armv8pmu_write_counter()
are `static inline` while they are only referenced when assigned
to a function pointer field in a `struct arm_pmu` instance.
The inline keyword is thus counter intuitive and shouldn't be used.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Raphael Gault <raphael.gault@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some firmwares may reboot CPUs with OS Double Lock set. Make sure that
it is unlocked, in order to use debug exceptions.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
brk_handler() now looks pretty strange and can be refactored to drop its
funny 'handler_found' local variable altogether.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
kprobes and uprobes reserve some BRK immediates for installing their
probes. Define these along with the other reservations in brk-imm.h
and rename the ESR definitions to be consistent with the others that we
already have.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the debug hook dispatching code takes the triggering exception
level into account, there's no need for the hooks themselves to poke
around with user_mode(regs).
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Kprobes bypasses our debug hook registration code so that it doesn't
get tangled up with recursive debug exceptions from things like lockdep:
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/324385.html
However, since then, (a) the hook list has become RCU protected and (b)
the kprobes hooks were found not to filter out exceptions from userspace
correctly. On top of that, the step handler is invoked directly from
single_step_handler(), which *does* use the debug hook list, so it's
clearly not the end of the world.
For now, have kprobes use the debug hook registration API like everybody
else. We can revisit this in the future if this is found to limit
coverage significantly.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mixing kernel and user debug hooks together is highly error-prone as it
relies on all of the hooks to figure out whether the exception came from
kernel or user, and then to act accordingly.
Make our debug hook code a little more robust by maintaining separate
hook lists for user and kernel, with separate registration functions
to force callers to be explicit about the exception levels that they
care about.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The comment next to the definition of our 'break_hook' list head is
at best wrong but mainly just meaningless. Rip it out.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since the 'addr' parameter contains an UNKNOWN value for non-watchpoint
debug exceptions, rename it to 'unused' for those hooks so we don't get
tempted to use it in the future.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for arm64 supporting ftrace built on other compiler
options, let's have the arm64 Makefiles remove the $(CC_FLAGS_FTRACE)
flags, whatever these may be, rather than assuming '-pg'.
There should be no functional change as a result of this patch.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Calling dump_backtrace() with a pt_regs argument corresponding to
userspace doesn't make any sense and our unwinder will simply print
"Call trace:" before unwinding the stack looking for user frames.
Rather than go through this song and dance, just return early if we're
passed a user register state.
Cc: <stable@vger.kernel.org>
Fixes: 1149aad10b ("arm64: Add dump_backtrace() in show_regs")
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ftrace trampoline code (which deals with modules loaded out of
BL range of the core kernel) uses plt_entries_equal() to check whether
the per-module trampoline equals a zero buffer, to decide whether the
trampoline has already been initialized.
This triggers a BUG() in the opcode manipulation code, since we end
up checking the ADRP offset of a 0x0 opcode, which is not an ADRP
instruction.
So instead, add a helper to check whether a PLT is initialized, and
call that from the frace code.
Cc: <stable@vger.kernel.org> # v5.0
Fixes: bdb85cd1d2 ("arm64/module: switch to ADRP/ADD sequences for PLT entries")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Following assembly code is not trivial; make it slightly easier to read by
replacing some of the magic numbers with the defines which are already
present in sysreg.h.
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Parsing entries in an ACPI table had assumed a generic header
structure. There is no standard ACPI header, though, so less common
layouts with different field sizes required custom parsers to go through
their subtable entry list.
Create the infrastructure for adding different table types so parsing
the entries array may be more reused for all ACPI system tables and
the common code doesn't need to be duplicated.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When doing unwind_frame() in the context of pseudo nmi (need enable
CONFIG_ARM64_PSEUDO_NMI), reaching the bottom of the stack (fp == 0,
pc != 0), function on_sdei_stack() will return true while the sdei acpi
table is not inited in fact. This will cause a "NULL pointer dereference"
oops when going on.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- $(call if_changed,...) must have FORCE as a prerequisite
- vdso.lds is a generated file, so it should be prefixed with
$(obj)/ instead of $(src)/.
- cmd_vdsosym is a one-liner rule, so the assignment with '='
is simpler.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The call to of_get_next_child returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./arch/arm64/kernel/cpu_ops.c:102:1-7: ERROR: missing of_node_put;
acquired a node pointer with refcount incremented on line 69, but
without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit ad67b74d24 ("printk: hash addresses printed with %p"),
two obfuscated kernel pointer are printed at every boot:
vdso: 2 pages (1 code @ (____ptrval____), 1 data @ (____ptrval____))
Remove the the print completely, as it's useless without the addresses.
Fixes: ad67b74d24 ("printk: hash addresses printed with %p")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
KVM will need to interrogate the set of SVE vector lengths
available on the system.
This patch exposes the relevant bits to the kernel, along with a
sve_vq_available() helper to check whether a particular vector
length is supported.
__vq_to_bit() and __bit_to_vq() are not intended for use outside
these functions: now that these are exposed outside fpsimd.c, they
are prefixed with __ in order to provide an extra hint that they
are not intended for general-purpose use.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The current FPSIMD/SVE context handling support for non-task (i.e.,
KVM vcpu) contexts does not take SVE into account. This means that
only task contexts can safely use SVE at present.
In preparation for enabling KVM guests to use SVE, it is necessary
to keep track of SVE state for non-task contexts too.
This patch adds the necessary support, removing assumptions from
the context switch code about the location of the SVE context
storage.
When binding a vcpu context, its vector length is arbitrarily
specified as SVE_VL_MIN for now. In any case, because TIF_SVE is
presently cleared at vcpu context bind time, the specified vector
length will not be used for anything yet. In later patches TIF_SVE
will be set here as appropriate, and the appropriate maximum vector
length for the vcpu will be passed when binding.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Due to the way the effective SVE vector length is controlled and
trapped at different exception levels, certain mismatches in the
sets of vector lengths supported by different physical CPUs in the
system may prevent straightforward virtualisation of SVE at parity
with the host.
This patch analyses the extent to which SVE can be virtualised
safely without interfering with migration of vcpus between physical
CPUs, and rejects late secondary CPUs that would erode the
situation further.
It is left up to KVM to decide what to do with this information.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The roles of sve_init_vq_map(), sve_update_vq_map() and
sve_verify_vq_map() are highly non-obvious to anyone who has not dug
through cpufeatures.c in detail.
Since the way these functions interact with each other is more
important here than a full understanding of the cpufeatures code, this
patch adds comments to make the functions' roles clearer.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch updates fpsimd_flush_task_state() to mirror the new
semantics of fpsimd_flush_cpu_state() introduced by commit
d8ad71fa38 ("arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after
invalidating cpu regs"). Both functions now implicitly set
TIF_FOREIGN_FPSTATE to indicate that the task's FPSIMD state is not
loaded into the cpu.
As a side-effect, fpsimd_flush_task_state() now sets
TIF_FOREIGN_FPSTATE even for non-running tasks. In the case of
non-running tasks this is not useful but also harmless, because the
flag is live only while the corresponding task is running. This
function is not called from fast paths, so special-casing this for
the task == current case is not really worth it.
Compiler barriers previously present in restore_sve_fpsimd_context()
are pulled into fpsimd_flush_task_state() so that it can be safely
called with preemption enabled if necessary.
Explicit calls to set TIF_FOREIGN_FPSTATE that accompany
fpsimd_flush_task_state() calls and are now redundant are removed
as appropriate.
fpsimd_flush_task_state() is used to get exclusive access to the
representation of the task's state via task_struct, for the purpose
of replacing the state. Thus, the call to this function should
happen before manipulating fpsimd_state or sve_state etc. in
task_struct. Anomalous cases are reordered appropriately in order
to make the code more consistent, although there should be no
functional difference since these cases are protected by
local_bh_disable() anyway.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
HiSilicon Taishan v110 CPUs didn't implement CSV3 field of the
ID_AA64PFR0_EL1 and are not susceptible to Meltdown, so whitelist
the MIDR in kpti_safe_list[] table.
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Zhangshaokun <zhangshaokun@hisilicon.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The ARM64 implements the save_stack_trace_regs function, but it is
unusable for any diagnostic tooling compiled as a kernel module due
the missing EXPORT_SYMBOL_GPL for the function. Export
save_stack_trace_regs() to align with other architectures such as
s390, openrisc, and powerpc. This is similar to the ARM64 export of
save_stack_trace_tsk() added in git commit e27c7fa015.
Signed-off-by: William Cohen <wcohen@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Use arch_populate_kprobe_blacklist() instead of
arch_within_kprobe_blacklist() so that we can see the full
blacklisted symbols under the debugfs.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[catalin.marinas@arm.com: Add arch_populate_kprobe_blacklist() comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Move exception/irqentry text address check in blacklist,
since those are symbol based rejection.
If we prohibit probing on the symbols in exception_text,
those should be blacklisted.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Remove unneeded RODATA check from arch_prepare_kprobe().
Since check_kprobe_address_safe() already ensured that
the probe address is in kernel text, we don't need to
check whether the address in RODATA or not. That must
be always false.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Move extable address check into arch_prepare_kprobe() from
arch_within_kprobe_blacklist().
The blacklist is exposed via debugfs as a list of symbols.
The extable entries are smaller, so must be filtered out
by arch_prepare_kprobe().
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add check for the return value of memblock_alloc*() functions and call
panic() in case of error. The panic message repeats the one used by
panicing memblock allocators with adjustment of parameters to include
only relevant ones.
The replacement was mostly automated with semantic patches like the one
below with manual massaging of format strings.
@@
expression ptr, size, align;
@@
ptr = memblock_alloc(size, align);
+ if (!ptr)
+ panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);
[anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
[rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
[rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
[akpm@linux-foundation.org: fix xtensa printk warning]
Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Guo Ren <ren_guo@c-sky.com> [c-sky]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
Reviewed-by: Juergen Gross <jgross@suse.com> [Xen]
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Pseudo NMI support for arm64 using GICv3 interrupt priorities
- uaccess macros clean-up (unsafe user accessors also merged but
reverted, waiting for objtool support on arm64)
- ptrace regsets for Pointer Authentication (ARMv8.3) key management
- inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by the
riscv maintainers)
- arm64/perf updates: PMU bindings converted to json-schema, unused
variable and misleading comment removed
- arm64/debug fixes to ensure checking of the triggering exception level
and to avoid the propagation of the UNKNOWN FAR value into the si_code
for debug signals
- Workaround for Fujitsu A64FX erratum 010001
- lib/raid6 ARM NEON optimisations
- NR_CPUS now defaults to 256 on arm64
- Minor clean-ups (documentation/comments, Kconfig warning, unused
asm-offsets, clang warnings)
- MAINTAINERS update for list information to the ARM64 ACPI entry
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlyCl0cACgkQa9axLQDI
XvEyKxAAiogBZLbyhcy8bTUHVzVoJE0FyAkdO2wWnnaff2Ohkhy1Y/npv33IeK2q
RknxqDIx2DUUVPJNRZGoI/WwBtTZdKaAnW4rIKG84yC1eAkFcd96WQasaZzcp1qY
HmvbJiYXM0bh+0J7i3Wgry/QzOkrltJFJW2kp6Wd5aFE+R1WyWyxT6d+Fp0J3vlA
bT70jlpBK6LXEOmmBS+04Ml02+8MvaGxIl8EInBHSfDLRLErj5E8n41rRHKUiSWz
maWI+kVoLYwOE68xiZlDftUBEeQpUSWgg2nxeK+640QSl1wJmVcRcY9nm6TZeMG2
AiZTR9a7cP5rrdSN5suUmb7d4AMMVlVMisGDlwb+9oCxeTRDzg0uwACaVgHfPqQr
UeBdHbL9nStN7uBH23H8L9mKk+tqpFmk0sgzdrKejOwysAiqWV8aazb/Na3qnVRl
J1B5opxMnGOsjXmHvtG/tiZl281Uwz5ZmzfLmIY3gUZgUgdA3511Egp0ry5y1dzJ
SkYC4Hmzb2ybQvXGIDDa3OzCwXXiqyqKsO+O8Egg1k4OIwbp3w+NHE7gKeA+dMgD
gjN7zEalCUi46Q28xiCPEb+88BpQ18czIWGQLb9mAnmYeZPjqqenXKXuRHr4lgVe
jPURJ/vqvFEglZJN1RDuQHKzHEcm5f2XE566sMZYdSoeiUCb0QM=
=2U56
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Pseudo NMI support for arm64 using GICv3 interrupt priorities
- uaccess macros clean-up (unsafe user accessors also merged but
reverted, waiting for objtool support on arm64)
- ptrace regsets for Pointer Authentication (ARMv8.3) key management
- inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by
the riscv maintainers)
- arm64/perf updates: PMU bindings converted to json-schema, unused
variable and misleading comment removed
- arm64/debug fixes to ensure checking of the triggering exception
level and to avoid the propagation of the UNKNOWN FAR value into the
si_code for debug signals
- Workaround for Fujitsu A64FX erratum 010001
- lib/raid6 ARM NEON optimisations
- NR_CPUS now defaults to 256 on arm64
- Minor clean-ups (documentation/comments, Kconfig warning, unused
asm-offsets, clang warnings)
- MAINTAINERS update for list information to the ARM64 ACPI entry
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
arm64: mmu: drop paging_init comments
arm64: debug: Ensure debug handlers check triggering exception level
arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
Revert "arm64: uaccess: Implement unsafe accessors"
arm64: avoid clang warning about self-assignment
arm64: Kconfig.platforms: fix warning unmet direct dependencies
lib/raid6: arm: optimize away a mask operation in NEON recovery routine
lib/raid6: use vdupq_n_u8 to avoid endianness warnings
arm64: io: Hook up __io_par() for inX() ordering
riscv: io: Update __io_[p]ar() macros to take an argument
asm-generic/io: Pass result of I/O accessor to __io_[p]ar()
arm64: Add workaround for Fujitsu A64FX erratum 010001
arm64: Rename get_thread_info()
arm64: Remove documentation about TIF_USEDFPU
arm64: irqflags: Fix clang build warnings
arm64: Enable the support of pseudo-NMIs
arm64: Skip irqflags tracing for NMI in IRQs disabled context
arm64: Skip preemption when exiting an NMI
arm64: Handle serror in NMI context
irqchip/gic-v3: Allow interrupts to be set as pseudo-NMI
...
- Update the ACPICA code in the kernel to upstream revision 20190215
including ACPI 6.3 support and more:
* New predefined methods: _NBS, _NCH, _NIC, _NIH, and _NIG (Erik
Schmauss).
* Update of the PCC Identifier structure in PDTT (Erik Schmauss).
* Support for new Generic Affinity Structure subtable in SRAT
(Erik Schmauss).
* New PCC operation region support (Erik Schmauss).
* Support for GICC statistical profiling for MADT (Erik Schmauss).
* New Error Disconnect Recover notification support (Erik Schmauss).
* New PPTT Processor Structure Flags fields support (Erik Schmauss).
* ACPI 6.3 HMAT updates (Erik Schmauss).
* GTDT Revision 3 support (Erik Schmauss).
* Legacy module-level code (MLC) support removal (Erik Schmauss).
* Update/clarification of messages for control method failures
(Bob Moore).
* Warning on creation of a zero-length opregion (Bob Moore).
* acpiexec option to dump extra info for memory leaks (Bob Moore).
* More ACPI error to firmware error conversions (Bob Moore).
* Debugger fix (Bob Moore).
* Copyrights update (Bob Moore).
- Clean up sleep states support code in ACPICA (Christoph Hellwig).
- Rework in_nmi() handling in the APEI code and add suppor for the
ARM Software Delegated Exception Interface (SDEI) to it (James
Morse).
- Fix possible out-of-bounds accesses in BERT-related core (Ross
Lagerwall).
- Fix the APEI code parsing HEST that includes a Deferred Machine
Check subtable (Yazen Ghannam).
- Use DEFINE_DEBUGFS_ATTRIBUTE for APEI-related debugfs files
(YueHaibing).
- Switch the APEI ERST code to the new generic UUID API (Andy
Shevchenko).
- Update the MAINTAINERS entry for APEI (Borislav Petkov).
- Fix and clean up the ACPI EC driver (Rafael Wysocki, Zhang Rui).
- Fix DMI checks handling in the ACPI backlight driver and add the
"Lunch Box" chassis-type check to it (Hans de Goede).
- Add support for using ACPI table overrides included in built-in
initrd images (Shunyong Yang).
- Update ACPI device enumeration to treat the PWM2 device as "always
present" on Lenovo Yoga Book (Yauhen Kharuzhy).
- Fix up the enumeration of device objects with the PRP0001 device
ID (Andy Shevchenko).
- Clean up PPTT parsing error messages (John Garry).
- Clean up debugfs files creation handling (Greg Kroah-Hartman,
Rafael Wysocki).
- Clean up the ACPI DPTF Makefile (Masahiro Yamada).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcfSIaAAoJEILEb/54YlRxvL8P/2oiG+u3tm3JahQ2tk9iiX3S
4yjYMB5Gmhua3w/t6tnRHHhy3pjjgI6xH5S7WB0VPTMp57E91EQihcbLJNFiJ1Jf
zjeZtWSmoxvcVwHAXq0DZHFMRK9Xgc/1ckzWNH/pwVlBSgaYazuLr6bwtZhtorci
eNWi82abWfAp6kAXjzJkcFbEp9+H6JzseewKcT8VAKn63KZizCEzxT0PuE9c54km
QnILVB9we0aGD2i0w2BRpbz99Wse0vnoUkBcrDw0LFHCaEQjfyAa94YFVQVrkE1Q
ynH26+yQanyzH00q/HWuH7N7YdcYMYT1CgZoIKR5XtJ+CbTc63VQez4csLOgOFMM
VEwmuv5SdRQ+tLCNFn71dxRheAttKI/nGBAZWMRTLQkp412IrQP4BtWw4wFM8SHZ
3G7eReR/bBeS4u1T5KR8CVVxchinDdwnTvqQII1uEniX80AmsHsQZxtU+JdPDp+w
N6gUE+lPF8e4iT+YsrWFMoNsJ9/MoXbSPQK1oYIcL0f5+PjFMxjTbA53wDiMHAhS
9AqVW1fdSPX0ImV3DuDqHph3ekAt26QHKxIA2xj5WTRWKf+29ijO2+5zU8isT7kI
RfGzpvsSYdvPyIRLUqc/Q3d5u/ElacAaaKJNT+6gUT4AkINAZJKQRiw2dWO1g82O
HVuSc5hRfnAJ5ALfCdIG
=r6fU
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These are ACPICA updates including ACPI 6.3 support among other
things, APEI updates including the ARM Software Delegated Exception
Interface (SDEI) support, ACPI EC driver fixes and cleanups and other
assorted improvements.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20190215
including ACPI 6.3 support and more:
* New predefined methods: _NBS, _NCH, _NIC, _NIH, and _NIG (Erik
Schmauss).
* Update of the PCC Identifier structure in PDTT (Erik Schmauss).
* Support for new Generic Affinity Structure subtable in SRAT
(Erik Schmauss).
* New PCC operation region support (Erik Schmauss).
* Support for GICC statistical profiling for MADT (Erik Schmauss).
* New Error Disconnect Recover notification support (Erik
Schmauss).
* New PPTT Processor Structure Flags fields support (Erik
Schmauss).
* ACPI 6.3 HMAT updates (Erik Schmauss).
* GTDT Revision 3 support (Erik Schmauss).
* Legacy module-level code (MLC) support removal (Erik Schmauss).
* Update/clarification of messages for control method failures
(Bob Moore).
* Warning on creation of a zero-length opregion (Bob Moore).
* acpiexec option to dump extra info for memory leaks (Bob Moore).
* More ACPI error to firmware error conversions (Bob Moore).
* Debugger fix (Bob Moore).
* Copyrights update (Bob Moore)
- Clean up sleep states support code in ACPICA (Christoph Hellwig)
- Rework in_nmi() handling in the APEI code and add suppor for the
ARM Software Delegated Exception Interface (SDEI) to it (James
Morse)
- Fix possible out-of-bounds accesses in BERT-related core (Ross
Lagerwall)
- Fix the APEI code parsing HEST that includes a Deferred Machine
Check subtable (Yazen Ghannam)
- Use DEFINE_DEBUGFS_ATTRIBUTE for APEI-related debugfs files
(YueHaibing)
- Switch the APEI ERST code to the new generic UUID API (Andy
Shevchenko)
- Update the MAINTAINERS entry for APEI (Borislav Petkov)
- Fix and clean up the ACPI EC driver (Rafael Wysocki, Zhang Rui)
- Fix DMI checks handling in the ACPI backlight driver and add the
"Lunch Box" chassis-type check to it (Hans de Goede)
- Add support for using ACPI table overrides included in built-in
initrd images (Shunyong Yang)
- Update ACPI device enumeration to treat the PWM2 device as "always
present" on Lenovo Yoga Book (Yauhen Kharuzhy)
- Fix up the enumeration of device objects with the PRP0001 device ID
(Andy Shevchenko)
- Clean up PPTT parsing error messages (John Garry)
- Clean up debugfs files creation handling (Greg Kroah-Hartman,
Rafael Wysocki)
- Clean up the ACPI DPTF Makefile (Masahiro Yamada)"
* tag 'acpi-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits)
ACPI / bus: Respect PRP0001 when retrieving device match data
ACPICA: Update version to 20190215
ACPI/ACPICA: Trivial: fix spelling mistakes and fix whitespace formatting
ACPICA: ACPI 6.3: add GTDT Revision 3 support
ACPICA: ACPI 6.3: HMAT updates
ACPICA: ACPI 6.3: PPTT add additional fields in Processor Structure Flags
ACPICA: ACPI 6.3: add Error Disconnect Recover Notification value
ACPICA: ACPI 6.3: MADT: add support for statistical profiling in GICC
ACPICA: ACPI 6.3: add PCC operation region support for AML interpreter
efi: cper: Fix possible out-of-bounds access
ACPI: APEI: Fix possible out-of-bounds access to BERT region
ACPICA: ACPI 6.3: SRAT: add Generic Affinity Structure subtable
ACPICA: ACPI 6.3: Add Trigger order to PCC Identifier structure in PDTT
ACPICA: ACPI 6.3: Adding predefined methods _NBS, _NCH, _NIC, _NIH, and _NIG
ACPICA: Update/clarify messages for control method failures
ACPICA: Debugger: Fix possible fault with the "test objects" command
ACPICA: Interpreter: Emit warning for creation of a zero-length op region
ACPICA: Remove legacy module-level code support
ACPI / x86: Make PWM2 device always present at Lenovo Yoga Book
ACPI / video: Extend chassis-type detection with a "Lunch Box" check
..
The crashkernel is reserved via memblock_reserve(). memblock_free_all()
will call free_low_memory_core_early(), which will go over all reserved
memblocks, marking the pages as PG_reserved.
So manually marking pages as PG_reserved is not necessary, they are
already in the desired state (otherwise they would have been handed over
to the buddy as free pages and bad things would happen).
Link: http://lkml.kernel.org/r/20190114125903.24845-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Greg Hackmann <ghackmann@android.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: CHANDAN VN <chandan.vn@samsung.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This will be done by free_reserved_page().
Link: http://lkml.kernel.org/r/20190114125903.24845-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: James Morse <james.morse@arm.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* acpi-apei: (29 commits)
efi: cper: Fix possible out-of-bounds access
ACPI: APEI: Fix possible out-of-bounds access to BERT region
MAINTAINERS: Add James Morse to the list of APEI reviewers
ACPI / APEI: Add support for the SDEI GHES Notification type
firmware: arm_sdei: Add ACPI GHES registration helper
ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications
ACPI / APEI: Only use queued estatus entry during in_nmi_queue_one_entry()
ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length
ACPI / APEI: Make GHES estatus header validation more user friendly
ACPI / APEI: Pass ghes and estatus separately to avoid a later copy
ACPI / APEI: Let the notification helper specify the fixmap slot
ACPI / APEI: Move locking to the notification helper
arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI
ACPI / APEI: Don't allow ghes_ack_error() to mask earlier errors
ACPI / APEI: Generalise the estatus queue's notify code
ACPI / APEI: Don't update struct ghes' flags in read/clear estatus
ACPI / APEI: Remove spurious GHES_TO_CLEAR check
...
Debug exception handlers may be called for exceptions generated both by
user and kernel code. In many cases, this is checked explicitly, but
in other cases things either happen to work by happy accident or they
go slightly wrong. For example, executing 'brk #4' from userspace will
enter the kprobes code and be ignored, but the instruction will be
retried forever in userspace instead of delivering a SIGTRAP.
Fix this issue in the most stable-friendly fashion by simply adding
explicit checks of the triggering exception level to all of our debug
exception handlers.
Cc: <stable@vger.kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The assembly macro get_thread_info() actually returns a task_struct and is
analogous to the current/get_current macro/function.
While it could be argued that thread_info sits at the start of
task_struct and the intention could have been to return a thread_info,
instances of loads from/stores to the address obtained from
get_thread_info() use offsets that are generated with
offsetof(struct task_struct, [...]).
Rename get_thread_info() to state it returns a task_struct.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Fix handling of PSTATE.SSBS bit in sigreturn()
- Fix version checking of the GIC during early boot
- Fix clang builds failing due to use of NEON in the crypto code
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlxtk30ACgkQt6xw3ITB
YzSGygf/U9P//TbPwGP2ZrCLHelJ8okYMZgJB3R/MLpGrV/MPWbN39JNMydXUPNT
Kn21TzQeYnGp4blIC5S8RQMJnrqbR03L4ch6DVrFWGJZVkfI3WFefASHUe+Hg/WB
d8GCkiqIMO+qX9+o6e+kPts3bgGsGvYEQF0vvdX6DbNXVkqusJ6TSIEAgEYUQH97
NzIqEfu6xYgmjultmMemfstaWaHI5Mfwx0fSdFhVfCDYKfoAj3U7LDRT9NFEAkTc
cxxZ3Z/BnJYPOhoqboIceZz499g++1SnISTEIIrGHKa51tK/Nuunaarsr1ZbB4gz
BLQhrJxbxxtbdkfDGA7u2mSQupCRbA==
=ujCB
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull late arm64 fixes from Will Deacon:
"Three small arm64 fixes for 5.0.
They fix a build breakage with clang introduced in 4.20, an oversight
in our sigframe restoration relating to the SSBS bit and a boot fix
for systems with newer revisions of our interrupt controller.
Summary:
- Fix handling of PSTATE.SSBS bit in sigreturn()
- Fix version checking of the GIC during early boot
- Fix clang builds failing due to use of NEON in the crypto code"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Relax GIC version check during early boot
arm64/neon: Disable -Wincompatible-pointer-types when building with Clang
arm64: fix SSBS sanitization
There are two issues with assigning random percpu seeds right now:
1. We use for_each_possible_cpu() to iterate over cpus, but cpumask is
not set up yet at the moment of kasan_init(), and thus we only set
the seed for cpu #0.
2. A call to get_random_u32() always returns the same number and produces
a message in dmesg, since the random subsystem is not yet initialized.
Fix 1 by calling kasan_init_tags() after cpumask is set up.
Fix 2 by using get_cycles() instead of get_random_u32(). This gives us
lower quality random numbers, but it's good enough, as KASAN is meant to
be used as a debugging tool and not a mitigation.
Link: http://lkml.kernel.org/r/1f815cc914b61f3516ed4cc9bfd9eeca9bd5d9de.1550677973.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Updates to the GIC architecture allow ID_AA64PFR0_EL1.GIC to have
values other than 0 or 1. At the moment, Linux is quite strict in the
way it handles this field at early boot stage (cpufeature is fine) and
will refuse to use the system register CPU interface if it doesn't
find the value 1.
Fixes: 021f653791 ("irqchip: gic-v3: Initial support for GICv3")
Reported-by: Chase Conklin <Chase.Conklin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In valid_user_regs() we treat SSBS as a RES0 bit, and consequently it is
unexpectedly cleared when we restore a sigframe or fiddle with GPRs via
ptrace.
This patch fixes valid_user_regs() to account for this, updating the
function to refer to the latest ARM ARM (ARM DDI 0487D.a). For AArch32
tasks, SSBS appears in bit 23 of SPSR_EL1, matching its position in the
AArch32-native PSR format, and we don't need to translate it as we have
to for DIT.
There are no other bit assignments that we need to account for today.
As the recent documentation describes the DIT bit, we can drop our
comment regarding DIT.
While removing SSBS from the RES0 masks, existing inconsistent
whitespace is corrected.
Fixes: d71be2b6c0 ("arm64: cpufeature: Detect SSBS and advertise to userspace")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This reverts commit eff8962888, which
deferred the processing of persistent memory reservations to a point
where the memory may have already been allocated and overwritten,
defeating the purpose.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190215123333.21209-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* 'for-next/perf' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
perf: xgene: Remove set but not used variable 'config'
arm64: perf: remove misleading comment
dt-bindings: arm: Convert PMU binding to json-schema
To split up APEIs in_nmi() path, the caller needs to always be
in_nmi(). Add a helper to do the work and claim the notification.
When KVM or the arch code takes an exception that might be a RAS
notification, it asks the APEI firmware-first code whether it wants
to claim the exception. A future kernel-first mechanism may be queried
afterwards, and claim the notification, otherwise we fall through
to the existing default behaviour.
The NOTIFY_SEA code was merged before considering multiple, possibly
interacting, NMI-like notifications and the need to consider kernel
first in the future. Make the 'claiming' behaviour explicit.
Restructuring the APEI code to allow multiple NMI-like notifications
means any notification that might interrupt interrupts-masked
code must always be wrapped in nmi_enter()/nmi_exit(). This will
allow APEI to use in_nmi() to use the right fixmap entries.
Mask SError over this window to prevent an asynchronous RAS error
arriving and tripping 'nmi_enter()'s BUG_ON(in_nmi()).
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a build option and a command line parameter to build and enable the
support of pseudo-NMIs.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When an NMI is raised while interrupts where disabled, the IRQ tracing
already is in the correct state (i.e. hardirqs_off) and should be left
as such when returning to the interrupted context.
Check whether PMR was masking interrupts when the NMI was raised and
skip IRQ tracing if necessary.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Handling of an NMI should not set any TIF flags. For NMIs received from
EL0 the current exit path is safe to use.
However, an NMI received at EL1 could have interrupted some task context
that has set the TIF_NEED_RESCHED flag. Preempting a task should not
happen as a result of an NMI.
Skip preemption after handling an NMI from EL1.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Per definition of the daifflags, Serrors can occur during any interrupt
context, that includes NMI contexts. Trying to nmi_enter in an nmi context
will crash.
Skip nmi_enter/nmi_exit when serror occurred during an NMI.
Suggested-by: James Morse <james.morse@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Once the boot CPU has been prepared or a new secondary CPU has been
brought up, use ICC_PMR_EL1 to mask interrupts on that CPU and clear
PSR.I bit.
Since ICC_PMR_EL1 is initialized at CPU bringup, avoid overwriting
it in the GICv3 driver.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently alternatives are applied very late in the boot process (and
a long time after we enable scheduling). Some alternative sequences,
such as those that alter the way CPU context is stored, must be applied
much earlier in the boot sequence.
Introduce apply_boot_alternatives() to allow some alternatives to be
applied immediately after we detect the CPU features of the boot CPU.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
[julien.thierry@arm.com: rename to fit new cpufeature framework better,
apply BOOT_SCOPE feature early in boot]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for the application of alternatives at different points
during the boot process, provide the possibility to check whether
alternatives for a feature of interest was already applied instead of
having a global boolean for all alternatives.
Make VHE enablement code check for the VHE feature instead of considering
all alternatives.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <Marc.Zyngier@arm.com>
Cc: Christoffer Dall <Christoffer.Dall@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
CPU does not received signals for interrupts with a priority masked by
ICC_PMR_EL1. This means the CPU might not come back from a WFI
instruction.
Make sure ICC_PMR_EL1 does not mask interrupts when doing a WFI.
Since the logic of cpu_do_idle is becoming a bit more complex than just
two instructions, lets turn it from ASM to C.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to replace PSR.I interrupt disabling/enabling with ICC_PMR_EL1
interrupt masking, ICC_PMR_EL1 needs to be saved/restored when
taking/returning from an exception. This mimics the way hardware saves
and restores PSR.I bit in spsr_el1 for exceptions and ERET.
Add PMR to the registers to save in the pt_regs struct upon kernel entry,
and restore it before ERET. Also, initialize it to a sane value when
creating new tasks.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add a cpufeature indicating whether a cpu supports masking interrupts
by priority.
The feature will be properly enabled in a later patch.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It is not supported to have some CPUs using GICv3 sysreg CPU interface
while some others do not.
Once ICC_SRE_EL1.SRE is set on a CPU, the bit cannot be cleared. Since
matching this feature require setting ICC_SRE_EL1.SRE, it cannot be
turned off if found on a CPU.
Set the feature as STRICT_BOOT, if boot CPU has it, all other CPUs are
required to have it.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When using VHE, the host needs to clear HCR_EL2.TGE bit in order
to interact with guest TLBs, switching from EL2&0 translation regime
to EL1&0.
However, some non-maskable asynchronous event could happen while TGE is
cleared like SDEI. Because of this address translation operations
relying on EL2&0 translation regime could fail (tlb invalidation,
userspace access, ...).
Fix this by properly setting HCR_EL2.TGE when entering NMI context and
clear it if necessary when returning to the interrupted context.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Calling strlen() on cmdline == NULL produces a kernel oops. Since having
a NULL cmdline is valid, handle this case explicitly.
Fixes: 52b2a8af74 ("arm64: kexec_file: load initrd and device-tree")
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since the enabling and disabling of IRQs within preempt_schedule_irq()
is contained in a need_resched() loop, we don't need the outer arch
code loop.
Reported-by: Julien Thierry <julien.thierry@arm.com>
Reported-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
During resume hibernate restores all physical memory. Any memory
that is accessed with the MMU disabled needs to be cleaned to the
PoC.
KVMs __hyp_text was previously ommitted as it runs with the MMU
enabled, but now that the hyp-stub is located in this section,
we must clean __hyp_text too.
This ensures secondary CPUs that come online after hibernate
has finished resuming, and load KVM via the freshly written
hyp-stub see the correct instructions.
Signed-off-by: James Morse <james.morse@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
The hyp-stub is loaded by the kernel's early startup code at EL2
during boot, before KVM takes ownership later. The hyp-stub's
text is part of the regular kernel text, meaning it can be kprobed.
A breakpoint in the hyp-stub causes the CPU to spin in el2_sync_invalid.
Add it to the __hyp_text.
Signed-off-by: James Morse <james.morse@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
On systems with VHE the kernel and KVM's world-switch code run at the
same exception level. Code that is only used on a VHE system does not
need to be annotated as __hyp_text as it can reside anywhere in the
kernel text.
__hyp_text was also used to prevent kprobes from patching breakpoint
instructions into this region, as this code runs at a different
exception level. While this is no longer true with VHE, KVM still
switches VBAR_EL1, meaning a kprobe's breakpoint executed in the
world-switch code will cause a hyp-panic.
Move the __hyp_text check in the kprobes blacklist so it applies on
VHE systems too, to cover the common code and guest enter/exit
assembly.
Fixes: 888b3c8720 ("arm64: Treat all entry code as non-kprobe-able")
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 1598ecda7b ("arm64: kaslr: ensure randomized quantities are
clean to the PoC") added cache maintenance to ensure that global
variables set by the kaslr init routine are not wiped clean due to
cache invalidation occurring during the second round of page table
creation.
However, if kaslr_early_init() exits early with no randomization
being applied (either due to the lack of a seed, or because the user
has disabled kaslr explicitly), no cache maintenance is performed,
leading to the same issue we attempted to fix earlier, as far as the
module_alloc_base variable is concerned.
Note that module_alloc_base cannot be initialized statically, because
that would cause it to be subject to a R_AARCH64_RELATIVE relocation,
causing it to be overwritten by the second round of KASLR relocation
processing.
Fixes: f80fb3a3d5 ("arm64: add support for kernel ASLR")
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add two new ptrace regsets, which can be used to request and change the
pointer authentication keys of a thread. NT_ARM_PACA_KEYS gives access
to the instruction/data address keys, and NT_ARM_PACG_KEYS to the
generic authentication key. The keys are also part of the core dump file
of the process.
The regsets are only exposed if the kernel is compiled with
CONFIG_CHECKPOINT_RESTORE=y, as the only intended use case is
checkpointing and restoring processes that are using pointer
authentication. (This can be changed later if there are other use
cases.)
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The arm64 asm/memblock.h header exists only to provide a function
prototype for arm64_memblock_init(), which is called only from
setup_arch().
Move the declaration into mmu.h, where it can live alongside other
init functions such as paging_init() and bootmem_init() without the
need for its own special header file.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There are a number of offsets defined in asm-offsets.c which no longer
have any users. Let's clean this up by removing them.
All the remaining offsets are in use.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The comment for the armv8pmu_set_event_filter function suggests that
it only works for PMUv2 PMUs - this is incorrect.
Let's remove the incorrect comment.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
kaslr_early_init() is called with the kernel mapped at its
link time offset, and if it returns with a non-zero offset,
the kernel is unmapped and remapped again at the randomized
offset.
During its execution, kaslr_early_init() also randomizes the
base of the module region and of the linear mapping of DRAM,
and sets two variables accordingly. However, since these
variables are assigned with the caches on, they may get lost
during the cache maintenance that occurs when unmapping and
remapping the kernel, so ensure that these values are cleaned
to the PoC.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: f80fb3a3d5 ("arm64: add support for kernel ASLR")
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In kexec_file_load, kaslr-seed property of the current dtb will be deleted
any way before setting a new value if possible. It doesn't matter whether
it exists in the current dtb.
So "ret" should be reset to 0 here.
Fixes: commit 884143f60c ("arm64: kexec_file: add kaslr support")
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
A side effect of commit c55191e96c ("arm64: mm: apply r/o permissions
of VM areas to its linear alias as well") is that the linear map is
created with page granularity, which means that transitioning the early
page table from global to non-global mappings when enabling kpti can
take a significant amount of time during boot.
Given that most CPU implementations do not require kpti, this mainly
impacts KASLR builds where kpti is forcefully enabled. However, in these
situations we know early on that non-global mappings are required and
can avoid the use of global mappings from the beginning. The only gotcha
is Cavium erratum #27456, which we must detect based on the MIDR value
of the boot CPU.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
# define HAVE_JUMP_LABEL
#endif
We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
- Prevent KASLR from mapping the top page of the virtual address space
- Fix device-tree probing of SDEI driver
- Fix incorrect register offset definition in Hisilicon DDRC PMU driver
- Fix compilation issue with older binutils not liking unsigned immediates
- Fix uapi headers so that libc can provide its own sigcontext definition
- Fix handling of private compat syscalls
- Hook up compat io_pgetevents() syscall for 32-bit tasks
- Cleanup to arm64 Makefile (including now to avoid silly conflicts)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJcL3d+AAoJELescNyEwWM0PNcIAIdjWQeBQYMBc8C/A2dBqL2s
tWBI+ormmZO72eAOVuGr1ZBqPhIpqXPQQquchnPDEzL+vZiq5Y6HP6ND8a+ISN2c
0NmWH2aURR+SZG5Mfpa9PffUlDu1LVbssbzt3Vk89BmOEFwBbr5w9FEO96c8drJC
MJ5NICtHnTvuI9jRs9zQoJOk+LKAL1Ei3v7EEyJGKVlRahtaYGZIkfx9t1BmFXzB
SFCA7Zf8kHQItKAwfGWsocd7CP7hQZcmpFcn/GfjXML2FQ+sa9Slys+u+8mvSziQ
EiU5os5krKPUpXXmyOeWXzEukZSJMRm2f9FBR2YquYm5RJ7Y0xQH1pB4aLsCR0g=
=LvTk
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"I'm safely chained back up to my desk, so please pull these arm64
fixes for -rc1 that address some issues that cropped up during the
merge window:
- Prevent KASLR from mapping the top page of the virtual address
space
- Fix device-tree probing of SDEI driver
- Fix incorrect register offset definition in Hisilicon DDRC PMU
driver
- Fix compilation issue with older binutils not liking unsigned
immediates
- Fix uapi headers so that libc can provide its own sigcontext
definition
- Fix handling of private compat syscalls
- Hook up compat io_pgetevents() syscall for 32-bit tasks
- Cleanup to arm64 Makefile (including now to avoid silly conflicts)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: compat: Hook up io_pgetevents() for 32-bit tasks
arm64: compat: Don't pull syscall number from regs in arm_compat_syscall
arm64: compat: Avoid sending SIGILL for unallocated syscall numbers
arm64/sve: Disentangle <uapi/asm/ptrace.h> from <uapi/asm/sigcontext.h>
arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition
drivers/perf: hisi: Fixup one DDRC PMU register offset
arm64: replace arm64-obj-* in Makefile with obj-*
arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear region
firmware: arm_sdei: Fix DT platform device creation
firmware: arm_sdei: fix wrong of_node_put() in init function
arm64: entry: remove unused register aliases
arm64: smp: Fix compilation error
The syscall number may have been changed by a tracer, so we should pass
the actual number in from the caller instead of pulling it from the
saved r7 value directly.
Cc: <stable@vger.kernel.org>
Cc: Pi-Hsun Shih <pihsun@chromium.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARM Linux kernel handles the EABI syscall numbers as follows:
0 - NR_SYSCALLS-1 : Invoke syscall via syscall table
NR_SYSCALLS - 0xeffff : -ENOSYS (to be allocated in future)
0xf0000 - 0xf07ff : Private syscall or -ENOSYS if not allocated
> 0xf07ff : SIGILL
Our compat code gets this wrong and ends up sending SIGILL in response
to all syscalls greater than NR_SYSCALLS which have a value greater
than 0x7ff in the bottom 16 bits.
Fix this by defining the end of the ARM private syscall region and
checking the syscall number against that directly. Update the comment
while we're at it.
Cc: <stable@vger.kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Reported-by: Pi-Hsun Shih <pihsun@chromium.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Use the standard obj-$(CONFIG_...) syntex. The behavior is still the
same.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit:
3b7142752e ("arm64: convert native/compat syscall entry to C")
... we moved the syscall invocation code from assembly to C, but left
behind a number of register aliases which are now unused.
Let's remove them before they confuse someone.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mostly clean ups although whilst Doug's was chasing down a odd
lockdep warning he also did some work to improved debugger resilience
when some CPUs fail to respond to the round up request.
The main changes are:
* Fixing a lockdep warning on architectures that cannot use an NMI for
the round up plus related changes to make CPU round up and all CPU
backtrace more resilient.
* Constify the arch ops tables
* A couple of other small clean ups
Two of the three patchsets here include changes that spill over into
arch/. Changes in the arch space are relatively narrow in scope
(and directly related to kgdb). Didn't get comprehensive acks but
all impacted maintainers were Cc:ed in good time.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAlwonoUbHGRhbmllbC50
aG9tcHNvbkBsaW5hcm8ub3JnAAoJEHzjJV0594ihmooP/1uzSMGQIoQMB8XeU/jT
Da2iILybi6hGp7ILA27d0yN3tsJBxWGWs8wzNdzMo3NQ3J0o4foAUnS/R0Vjkg9w
uphe5EA4HDsIrH05OouNb984BeEgNaC9HSqtyr9fXuh024NboULFKIm7REYm+QHT
C5SrBtmonL1xE7FmAhudLWjl7ZlvxM6DJeoVViH4kKq0raTiILt6VJaGl9JfcAdL
m9GEf9r/nh0sCq3GNgyc0y4BvHed+Kxzy1fsIi3jE6t8elaYYR72gNRQ5LaFxcnQ
F04/UtH75qB4rqYsqqV1q0rFi+tj+p9wYTmxixaGWsVDX4Gb5KXuLWJhaRb5IvwC
bdq/0IAXRr4vUL3y0tFWfCj7pHGaVc/gfXi8aieRXLGAZG+tdfuu99NCiulIZTfc
QqZz12Z+99/qi6dK7dBQtaN8SyPeB1QXKWefeGo2Bt5QqiBmcKHxsQYMUo3nkf3J
UXHpj4LG6Ldsi/w8VZfvXmM0/vbO/jrus9m+X2v+4tJyisjrsyv0FRnREI4avfbC
l09P1ajv7RrAaxtab0smV9krqWZ/mSn0zcgcaD6RdKe0+SwsiP/CEx1z1Wb1MH9c
wjEiClXjdVB39YVT0YVfG2Ho7qH8WRErxVyNb/f4QKHMXL1Mu91hFWhBBpUOGUj2
7Jrq2zK1uWramtt7GBDpHYYH
=Aqlc
-----END PGP SIGNATURE-----
Merge tag 'kgdb-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"Mostly clean ups although while Doug's was chasing down a odd lockdep
warning he also did some work to improved debugger resilience when
some CPUs fail to respond to the round up request.
The main changes are:
- Fixing a lockdep warning on architectures that cannot use an NMI
for the round up plus related changes to make CPU round up and all
CPU backtrace more resilient.
- Constify the arch ops tables
- A couple of other small clean ups
Two of the three patchsets here include changes that spill over into
arch/. Changes in the arch space are relatively narrow in scope (and
directly related to kgdb). Didn't get comprehensive acks but all
impacted maintainers were Cc:ed in good time"
* tag 'kgdb-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops
mips/kgdb: prepare arch_kgdb_ops for constness
kdb: use bool for binary state indicators
kdb: Don't back trace on a cpu that didn't round up
kgdb: Don't round up a CPU that failed rounding up before
kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
kgdb: Remove irq flags from roundup
- Rework of the kprobe/uprobe and synthetic events to consolidate all
the dynamic event code. This will make changes in the future easier.
- Partial rewrite of the function graph tracing infrastructure.
This will allow for multiple users of hooking onto functions
to get the callback (return) of the function. This is the ground
work for having kprobes and function graph tracer using one code base.
- Clean up of the histogram code that will facilitate adding more
features to the histograms in the future.
- Addition of str_has_prefix() and a few use cases. There currently
is a similar function strstart() that is used in a few places, but
only returns a bool and not a length. These instances will be
removed in the future to use str_has_prefix() instead.
- A few other various clean ups as well.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXCawlBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhbcAQCFeT0fWWTUxofBQz5jqsHaRnVg21+9
X4sTldYRYEn4YgEAmWOyiwq7zvrsAu4ZwkNBMeqxn3tVymYHiGOGe3Y4BAw=
=u96o
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- Rework of the kprobe/uprobe and synthetic events to consolidate all
the dynamic event code. This will make changes in the future easier.
- Partial rewrite of the function graph tracing infrastructure. This
will allow for multiple users of hooking onto functions to get the
callback (return) of the function. This is the ground work for having
kprobes and function graph tracer using one code base.
- Clean up of the histogram code that will facilitate adding more
features to the histograms in the future.
- Addition of str_has_prefix() and a few use cases. There currently is
a similar function strstart() that is used in a few places, but only
returns a bool and not a length. These instances will be removed in
the future to use str_has_prefix() instead.
- A few other various clean ups as well.
* tag 'trace-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
tracing: Use the return of str_has_prefix() to remove open coded numbers
tracing: Have the historgram use the result of str_has_prefix() for len of prefix
tracing: Use str_has_prefix() instead of using fixed sizes
tracing: Use str_has_prefix() helper for histogram code
string.h: Add str_has_prefix() helper function
tracing: Make function ‘ftrace_exports’ static
tracing: Simplify printf'ing in seq_print_sym
tracing: Avoid -Wformat-nonliteral warning
tracing: Merge seq_print_sym_short() and seq_print_sym_offset()
tracing: Add hist trigger comments for variable-related fields
tracing: Remove hist trigger synth_var_refs
tracing: Use hist trigger's var_ref array to destroy var_refs
tracing: Remove open-coding of hist trigger var_ref management
tracing: Use var_refs[] for hist trigger reference checking
tracing: Change strlen to sizeof for hist trigger static strings
tracing: Remove unnecessary hist trigger struct field
tracing: Fix ftrace_graph_get_ret_stack() to use task and not current
seq_buf: Use size_t for len in seq_buf_puts()
seq_buf: Make seq_buf_puts() null-terminate the buffer
arm64: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack
...
checkpatch.pl reports the following:
WARNING: struct kgdb_arch should normally be const
#28: FILE: arch/mips/kernel/kgdb.c:397:
+struct kgdb_arch arch_kgdb_ops = {
This report makes sense, as all other ops struct, this
one should also be const. This patch does the change.
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
When I had lockdep turned on and dropped into kgdb I got a nice splat
on my system. Specifically it hit:
DEBUG_LOCKS_WARN_ON(current->hardirq_context)
Specifically it looked like this:
sysrq: SysRq : DEBUG
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(current->hardirq_context)
WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
pstate: 604003c9 (nZCv DAIF +PAN -UAO)
pc : lockdep_hardirqs_on+0xf0/0x160
...
Call trace:
lockdep_hardirqs_on+0xf0/0x160
trace_hardirqs_on+0x188/0x1ac
kgdb_roundup_cpus+0x14/0x3c
kgdb_cpu_enter+0x53c/0x5cc
kgdb_handle_exception+0x180/0x1d4
kgdb_compiled_brk_fn+0x30/0x3c
brk_handler+0x134/0x178
do_debug_exception+0xfc/0x178
el1_dbg+0x18/0x78
kgdb_breakpoint+0x34/0x58
sysrq_handle_dbg+0x54/0x5c
__handle_sysrq+0x114/0x21c
handle_sysrq+0x30/0x3c
qcom_geni_serial_isr+0x2dc/0x30c
...
...
irq event stamp: ...45
hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4
hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34
softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
---[ end trace adf21f830c46e638 ]---
Looking closely at it, it seems like a really bad idea to be calling
local_irq_enable() in kgdb_roundup_cpus(). If nothing else that seems
like it could violate spinlock semantics and cause a deadlock.
Instead, let's use a private csd alongside
smp_call_function_single_async() to round up the other CPUs. Using
smp_call_function_single_async() doesn't require interrupts to be
enabled so we can remove the offending bit of code.
In order to avoid duplicating this across all the architectures that
use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
to debug_core.c.
Looking at all the people who previously had copies of this code,
there were a few variants. I've attempted to keep the variants
working like they used to. Specifically:
* For arch/arc we passed NULL to kgdb_nmicallback() instead of
get_irq_regs().
* For arch/mips there was a bit of extra code around
kgdb_nmicallback()
NOTE: In this patch we will still get into trouble if we try to round
up a CPU that failed to round up before. We'll try to round it up
again and potentially hang when we try to grab the csd lock. That's
not new behavior but we'll still try to do better in a future patch.
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
The function kgdb_roundup_cpus() was passed a parameter that was
documented as:
> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().
Nobody used those flags. Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them. So we can definitely remove the flags.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Tag-based KASAN inline instrumentation mode (which embeds checks of shadow
memory into the generated code, instead of inserting a callback) generates
a brk instruction when a tag mismatch is detected.
This commit adds a tag-based KASAN specific brk handler, that decodes the
immediate value passed to the brk instructions (to extract information
about the memory access that triggered the mismatch), reads the register
values (x0 contains the guilty address) and reports the bug.
Link: http://lkml.kernel.org/r/c91fe7684070e34dc34b419e6b69498f4dcacc2d.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the end, we ended up with quite a lot more than I expected:
- Support for ARMv8.3 Pointer Authentication in userspace (CRIU and
kernel-side support to come later)
- Support for per-thread stack canaries, pending an update to GCC that
is currently undergoing review
- Support for kexec_file_load(), which permits secure boot of a kexec
payload but also happens to improve the performance of kexec
dramatically because we can avoid the sucky purgatory code from
userspace. Kdump will come later (requires updates to libfdt).
- Optimisation of our dynamic CPU feature framework, so that all
detected features are enabled via a single stop_machine() invocation
- KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that
they can benefit from global TLB entries when KASLR is not in use
- 52-bit virtual addressing for userspace (kernel remains 48-bit)
- Patch in LSE atomics for per-cpu atomic operations
- Custom preempt.h implementation to avoid unconditional calls to
preempt_schedule() from preempt_enable()
- Support for the new 'SB' Speculation Barrier instruction
- Vectorised implementation of XOR checksumming and CRC32 optimisations
- Workaround for Cortex-A76 erratum #1165522
- Improved compatibility with Clang/LLD
- Support for TX2 system PMUS for profiling the L3 cache and DMC
- Reflect read-only permissions in the linear map by default
- Ensure MMIO reads are ordered with subsequent calls to Xdelay()
- Initial support for memory hotplug
- Tweak the threshold when we invalidate the TLB by-ASID, so that
mremap() performance is improved for ranges spanning multiple PMDs.
- Minor refactoring and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJcE4TmAAoJELescNyEwWM0Nr0H/iaU7/wQSzHyNXtZoImyKTul
Blu2ga4/EqUrTU7AVVfmkl/3NBILWlgQVpY6tH6EfXQuvnxqD7CizbHyLdyO+z0S
B5PsFUH2GLMNAi48AUNqGqkgb2knFbg+T+9IimijDBkKg1G/KhQnRg6bXX32mLJv
Une8oshUPBVJMsHN1AcQknzKariuoE3u0SgJ+eOZ9yA2ZwKxP4yy1SkDt3xQrtI0
lojeRjxcyjTP1oGRNZC+BWUtGOT35p7y6cGTnBd/4TlqBGz5wVAJUcdoxnZ6JYVR
O8+ob9zU+4I0+SKt80s7pTLqQiL9rxkKZ5joWK1pr1g9e0s5N5yoETXKFHgJYP8=
=sYdt
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 festive updates from Will Deacon:
"In the end, we ended up with quite a lot more than I expected:
- Support for ARMv8.3 Pointer Authentication in userspace (CRIU and
kernel-side support to come later)
- Support for per-thread stack canaries, pending an update to GCC
that is currently undergoing review
- Support for kexec_file_load(), which permits secure boot of a kexec
payload but also happens to improve the performance of kexec
dramatically because we can avoid the sucky purgatory code from
userspace. Kdump will come later (requires updates to libfdt).
- Optimisation of our dynamic CPU feature framework, so that all
detected features are enabled via a single stop_machine()
invocation
- KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that
they can benefit from global TLB entries when KASLR is not in use
- 52-bit virtual addressing for userspace (kernel remains 48-bit)
- Patch in LSE atomics for per-cpu atomic operations
- Custom preempt.h implementation to avoid unconditional calls to
preempt_schedule() from preempt_enable()
- Support for the new 'SB' Speculation Barrier instruction
- Vectorised implementation of XOR checksumming and CRC32
optimisations
- Workaround for Cortex-A76 erratum #1165522
- Improved compatibility with Clang/LLD
- Support for TX2 system PMUS for profiling the L3 cache and DMC
- Reflect read-only permissions in the linear map by default
- Ensure MMIO reads are ordered with subsequent calls to Xdelay()
- Initial support for memory hotplug
- Tweak the threshold when we invalidate the TLB by-ASID, so that
mremap() performance is improved for ranges spanning multiple PMDs.
- Minor refactoring and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (125 commits)
arm64: kaslr: print PHYS_OFFSET in dump_kernel_offset()
arm64: sysreg: Use _BITUL() when defining register bits
arm64: cpufeature: Rework ptr auth hwcaps using multi_entry_cap_matches
arm64: cpufeature: Reduce number of pointer auth CPU caps from 6 to 4
arm64: docs: document pointer authentication
arm64: ptr auth: Move per-thread keys from thread_info to thread_struct
arm64: enable pointer authentication
arm64: add prctl control for resetting ptrauth keys
arm64: perf: strip PAC when unwinding userspace
arm64: expose user PAC bit positions via ptrace
arm64: add basic pointer authentication support
arm64/cpufeature: detect pointer authentication
arm64: Don't trap host pointer auth use to EL2
arm64/kvm: hide ptrauth from guests
arm64/kvm: consistently handle host HCR_EL2 flags
arm64: add pointer authentication register bits
arm64: add comments about EC exception levels
arm64: perf: Treat EXCLUDE_EL* bit definitions as unsigned
arm64: kpti: Whitelist Cortex-A CPUs that don't implement the CSV3 field
arm64: enable per-task stack canaries
...
The structure of the ret_stack array on the task struct is going to
change, and accessing it directly via the curr_ret_stack index will no
longer give the ret_stack entry that holds the return address. To access
that, architectures must now use ftrace_graph_get_ret_stack() to get the
associated ret_stack that matches the saved return address.
Cc: linux-arm-kernel@lists.infradead.org
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When debug with kaslr, it is sometimes necessary to have PHYS_OFFSET to
perform linear virtual address to physical address translation.
Sometimes we're debugging with only few information such as a kernel log
and a symbol file, print PHYS_OFFSET in dump_kernel_offset() for that case.
Tested by:
echo c > /proc/sysrq-trigger
[ 11.996161] SMP: stopping secondary CPUs
[ 11.996732] Kernel Offset: 0x2522200000 from 0xffffff8008000000
[ 11.996881] PHYS_OFFSET: 0xffffffeb40000000
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Open-coding the pointer-auth HWCAPs is a mess and can be avoided by
reusing the multi-cap logic from the CPU errata framework.
Move the multi_entry_cap_matches code to cpufeature.h and reuse it for
the pointer auth HWCAPs.
Reviewed-by: Suzuki Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We can easily avoid defining the two meta-capabilities for the address
and generic keys, so remove them and instead just check both of the
architected and impdef capabilities when determining the level of system
support.
Reviewed-by: Suzuki Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We don't need to get at the per-thread keys from assembly at all, so
they can live alongside the rest of the per-thread register state in
thread_struct instead of thread_info.
This will also allow straighforward whitelisting of the keys for
hardened usercopy should we expose them via a ptrace request later on.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add an arm64-specific prctl to allow a thread to reinitialize its
pointer authentication keys to random values. This can be useful when
exec() is not used for starting new processes, to ensure that different
processes still have different keys.
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When the kernel is unwinding userspace callchains, we can't expect that
the userspace consumer of these callchains has the data necessary to
strip the PAC from the stored LR.
This patch has the kernel strip the PAC from user stackframes when the
in-kernel unwinder is used. This only affects the LR value, and not the
FP.
This only affects the in-kernel unwinder. When userspace performs
unwinding, it is up to userspace to strip PACs as necessary (which can
be determined from DWARF information).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When pointer authentication is in use, data/instruction pointers have a
number of PAC bits inserted into them. The number and position of these
bits depends on the configured TCR_ELx.TxSZ and whether tagging is
enabled. ARMv8.3 allows tagging to differ for instruction and data
pointers.
For userspace debuggers to unwind the stack and/or to follow pointer
chains, they need to be able to remove the PAC bits before attempting to
use a pointer.
This patch adds a new structure with masks describing the location of
the PAC bits in userspace instruction and data pointers (i.e. those
addressable via TTBR0), which userspace can query via PTRACE_GETREGSET.
By clearing these bits from pointers (and replacing them with the value
of bit 55), userspace can acquire the PAC-less versions.
This new regset is exposed when the kernel is built with (user) pointer
authentication support, and the address authentication feature is
enabled. Otherwise, the regset is hidden.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[will: Fix to use vabits_user instead of VA_BITS and rename macro]
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds basic support for pointer authentication, allowing
userspace to make use of APIAKey, APIBKey, APDAKey, APDBKey, and
APGAKey. The kernel maintains key values for each process (shared by all
threads within), which are initialised to random values at exec() time.
The ID_AA64ISAR1_EL1.{APA,API,GPA,GPI} fields are exposed to userspace,
to describe that pointer authentication instructions are available and
that the kernel is managing the keys. Two new hwcaps are added for the
same reason: PACA (for address authentication) and PACG (for generic
authentication).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[will: Fix sizeof() usage and unroll address key initialisation]
Signed-off-by: Will Deacon <will.deacon@arm.com>
So that we can dynamically handle the presence of pointer authentication
functionality, wire up probing code in cpufeature.c.
From ARMv8.3 onwards, ID_AA64ISAR1 is no longer entirely RES0, and now
has four fields describing the presence of pointer authentication
functionality:
* APA - address authentication present, using an architected algorithm
* API - address authentication present, using an IMP DEF algorithm
* GPA - generic authentication present, using an architected algorithm
* GPI - generic authentication present, using an IMP DEF algorithm
This patch checks for both address and generic authentication,
separately. It is assumed that if all CPUs support an IMP DEF algorithm,
the same algorithm is used across all CPUs.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In KVM we define the configuration of HCR_EL2 for a VHE HOST in
HCR_HOST_VHE_FLAGS, but we don't have a similar definition for the
non-VHE host flags, and open-code HCR_RW. Further, in head.S we
open-code the flags for VHE and non-VHE configurations.
In future, we're going to want to configure more flags for the host, so
lets add a HCR_HOST_NVHE_FLAGS defintion, and consistently use both
HCR_HOST_VHE_FLAGS and HCR_HOST_NVHE_FLAGS in the kvm code and head.S.
We now use mov_q to generate the HCR_EL2 value, as we use when
configuring other registers in head.S.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Will Deacon <will.deacon@arm.com>
While the CSV3 field of the ID_AA64_PFR0 CPU ID register can be checked
to see if a CPU is susceptible to Meltdown and therefore requires kpti
to be enabled, existing CPUs do not implement this field.
We therefore whitelist all unaffected Cortex-A CPUs that do not implement
the CSV3 field.
Signed-off-by: Will Deacon <will.deacon@arm.com>
This enables the use of per-task stack canary values if GCC has
support for emitting the stack canary reference relative to the
value of sp_el0, which holds the task struct pointer in the arm64
kernel.
The $(eval) extends KBUILD_CFLAGS at the moment the make rule is
applied, which means asm-offsets.o (which we rely on for the offset
value) is built without the arguments, and everything built afterwards
has the options set.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 3962446922 ("arm64: preempt: Provide our own implementation of
asm/preempt.h") extended the preempt count field in struct thread_info
to 64 bits, so that it consists of a 32-bit count plus a 32-bit flag
indicating whether or not the current task needs rescheduling.
Whilst the asm-offsets definition of TSK_TI_PREEMPT was updated to point
to this new field, the assembly usage was left untouched meaning that a
32-bit load from TSK_TI_PREEMPT on a big-endian machine actually returns
the reschedule flag instead of the count.
Whilst we could fix this by pointing TSK_TI_PREEMPT at the count field,
we're actually better off reworking the two assembly users so that they
operate on the whole 64-bit value in favour of inspecting the thread
flags separately in order to determine whether a reschedule is needed.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This is needed for compilation in some configurations that don't
include it implicitly:
arch/arm64/kernel/machine_kexec_file.c: In function 'arch_kimage_file_post_load_cleanup':
arch/arm64/kernel/machine_kexec_file.c:37:2: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
Fixes: 52b2a8af74 ("arm64: kexec_file: load initrd and device-tree")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The __cpu_up() routine ignores the errors reported by the firmware
for a CPU bringup operation and looks for the error status set by the
booting CPU. If the CPU never entered the kernel, we could end up
in assuming stale error status, which otherwise would have been
set/cleared appropriately by the booting CPU.
Reported-by: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Rather than add additional variables to detect specific early feature
mismatches with secondary CPUs, we can instead dedicate the upper bits
of the CPU boot status word to flag specific mismatches.
This allows us to communicate both granule and VA-size mismatches back
to the primary CPU without the need for additional book-keeping.
Tested-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Enabling 52-bit VAs for userspace is pretty confusing, since it requires
you to select "48-bit" virtual addressing in the Kconfig.
Rework the logic so that 52-bit user virtual addressing is advertised in
the "Virtual address space size" choice, along with some help text to
describe its interaction with Pointer Authentication. The EXPERT-only
option to force all user mappings to the 52-bit range is then made
available immediately below the VA size selection.
Signed-off-by: Will Deacon <will.deacon@arm.com>
On arm64 there is optional support for a 52-bit virtual address space.
To exploit this one has to be running with a 64KB page size and be
running on hardware that supports this.
For an arm64 kernel supporting a 48 bit VA with a 64KB page size,
some changes are needed to support a 52-bit userspace:
* TCR_EL1.T0SZ needs to be 12 instead of 16,
* TASK_SIZE needs to reflect the new size.
This patch implements the above when the support for 52-bit VAs is
detected at early boot time.
On arm64 userspace addresses translation is controlled by TTBR0_EL1. As
well as userspace, TTBR0_EL1 controls:
* The identity mapping,
* EFI runtime code.
It is possible to run a kernel with an identity mapping that has a
larger VA size than userspace (and for this case __cpu_set_tcr_t0sz()
would set TCR_EL1.T0SZ as appropriate). However, when the conditions for
52-bit userspace are met; it is possible to keep TCR_EL1.T0SZ fixed at
12. Thus in this patch, the TCR_EL1.T0SZ size changing logic is
disabled.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For cases where there is a mismatch in ARMv8.2-LVA support between CPUs
we have to be careful in allowing secondary CPUs to boot if 52-bit
virtual addresses have already been enabled on the boot CPU.
This patch adds code to the secondary startup path. If the boot CPU has
enabled 52-bit VAs then ID_AA64MMFR2_EL1 is checked to see if the
secondary can also enable 52-bit support. If not, the secondary is
prevented from booting and an error message is displayed indicating why.
Technically this patch could be implemented using the cpufeature code
when considering 52-bit userspace support. However, we employ low level
checks here as the cpufeature code won't be able to run if we have
mismatched 52-bit kernel va support.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Enabling 52-bit VAs on arm64 requires that the PGD table expands from 64
entries (for the 48-bit case) to 1024 entries. This quantity,
PTRS_PER_PGD is used as follows to compute which PGD entry corresponds
to a given virtual address, addr:
pgd_index(addr) -> (addr >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)
Userspace addresses are prefixed by 0's, so for a 48-bit userspace
address, uva, the following is true:
(uva >> PGDIR_SHIFT) & (1024 - 1) == (uva >> PGDIR_SHIFT) & (64 - 1)
In other words, a 48-bit userspace address will have the same pgd_index
when using PTRS_PER_PGD = 64 and 1024.
Kernel addresses are prefixed by 1's so, given a 48-bit kernel address,
kva, we have the following inequality:
(kva >> PGDIR_SHIFT) & (1024 - 1) != (kva >> PGDIR_SHIFT) & (64 - 1)
In other words a 48-bit kernel virtual address will have a different
pgd_index when using PTRS_PER_PGD = 64 and 1024.
If, however, we note that:
kva = 0xFFFF << 48 + lower (where lower[63:48] == 0b)
and, PGDIR_SHIFT = 42 (as we are dealing with 64KB PAGE_SIZE)
We can consider:
(kva >> PGDIR_SHIFT) & (1024 - 1) - (kva >> PGDIR_SHIFT) & (64 - 1)
= (0xFFFF << 6) & 0x3FF - (0xFFFF << 6) & 0x3F // "lower" cancels out
= 0x3C0
In other words, one can switch PTRS_PER_PGD to the 52-bit value globally
provided that they increment ttbr1_el1 by 0x3C0 * 8 = 0x1E00 bytes when
running with 48-bit kernel VAs (TCR_EL1.T1SZ = 16).
For kernel configuration where 52-bit userspace VAs are possible, this
patch offsets ttbr1_el1 and sets PTRS_PER_PGD corresponding to the
52-bit value.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
[will: added comment to TTBR1_BADDR_4852_OFFSET calculation]
Signed-off-by: Will Deacon <will.deacon@arm.com>
It has been reported that ftrace_replace_code() which is called by
ftrace_modify_all_code() can cause a soft lockup warning for an
allmodconfig kernel. This is because all the debug options enabled
causes the loop in ftrace_replace_code() (which loops over all the
functions being enabled where there can be 10s of thousands), is too
slow, and never schedules out.
To solve this, setting FTRACE_MAY_SLEEP to the command passed into
ftrace_replace_code() will make it call cond_resched() in the loop,
which prevents the soft lockup warning from triggering.
Link: http://lkml.kernel.org/r/20181204192903.8193-1-anders.roxell@linaro.org
Link: http://lkml.kernel.org/r/20181205183304.000714627@goodmis.org
Acked-by: Will Deacon <will.deacon@arm.com>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In order to easily mitigate ARM erratum 1165522, we need to force
affected CPUs to run in VHE mode if using KVM.
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that arm64ksyms.c has been reduced to a stub, let's remove it
entirely. New exports should be associated with their function
definition.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the ftrace exports
to the assembly files the functions are defined in.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the string routine
exports to the assembly files the functions are defined in. Routines
which should only be exported for !KASAN builds are exported using the
EXPORT_SYMBOL_NOKASAN() helper.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the uaccess exports
to the assembly files the functions are defined in. As we have to
include <asm/assembler.h>, the existing includes are fixed to follow the
usual ordering conventions.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the copy_page and
clear_page exports to the assembly files the functions are defined in.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the SMCCC exports to
the assembly file the functions are defined in.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the tishift exports
to the assembly file the functions are defined in.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since we define memstart_addr in a C file, we can have the export
immediately after the definition of the symbol, as we do elsewhere.
As a step towards removing arm64ksyms.c, move the export of
memstart_addr to init.c, where the symbol is defined.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the arm64 bitops are inlines built atop of the regular atomics,
we don't need to export anything.
Remove the redundant exports.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Functions in the set_graph_notrace no longer subtract FTRACE_NOTRACE_DEPTH
from curr_ret_stack, as that is now implemented via the trace_recursion
flags. Access to curr_ret_stack no longer needs to worry about checking for
this. curr_ret_stack is still initialized to -1, when there's not a shadow
stack allocated.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since commit 3b8c9f1cdf ("arm64: IPI each CPU after invalidating the
I-cache for kernel mappings"), a call to flush_icache_range() will use
an IPI to cross-call other online CPUs so that any stale instructions
are flushed from their pipelines. This triggers a WARN during the
hibernation resume path, where flush_icache_range() is called with
interrupts disabled and is therefore prone to deadlock:
| Disabling non-boot CPUs ...
| CPU1: shutdown
| psci: CPU1 killed.
| CPU2: shutdown
| psci: CPU2 killed.
| CPU3: shutdown
| psci: CPU3 killed.
| WARNING: CPU: 0 PID: 1 at ../kernel/smp.c:416 smp_call_function_many+0xd4/0x350
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc4 #1
Since all secondary CPUs have been taken offline prior to invalidating
the I-cache, there's actually no need for an IPI and we can simply call
__flush_icache_range() instead.
Cc: <stable@vger.kernel.org>
Fixes: 3b8c9f1cdf ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
Reported-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Tested-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that kexec_walk_memblock() can do the crash-kernel placement itself
architectures that don't support kdump via kexe_file_load() need to
explicitly forbid it.
We don't support this on arm64 until the kernel can add the elfcorehdr
and usable-memory-range fields to the DT. Without these the crash-kernel
overwrites the previous kernel's memory during startup.
Add a check to refuse crash image loading.
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The comment about SYS_MEMBARRIER_SYNC_CORE relying on ERET being
context-synchronizing is confusing and misplaced with kpti. Given that
this is already documented under Documentation/ (see arch-support.txt
for membarrier), remove the comment altogether.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some CPUs can speculate past an ERET instruction and potentially perform
speculative accesses to memory before processing the exception return.
Since the register state is often controlled by a lower privilege level
at the point of an ERET, this could potentially be used as part of a
side-channel attack.
This patch emits an SB sequence after each ERET so that speculation is
held up on exception return.
Signed-off-by: Will Deacon <will.deacon@arm.com>
We currently use a DSB; ISB sequence to inhibit speculation in set_fs().
Whilst this works for current CPUs, future CPUs may implement a new SB
barrier instruction which acts as an architected speculation barrier.
On CPUs that support it, patch in an SB; NOP sequence over the DSB; ISB
sequence and advertise the presence of the new instruction to userspace.
Signed-off-by: Will Deacon <will.deacon@arm.com>
setup_dtb() is a little difficult to read. This is largely because it
duplicates the FDT -> Linux errno conversion for every intermediate
return value, but also because of silly cosmetic things like naming
and formatting.
Given that this is all brand new, refactor the function to get us off on
the right foot.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Adding "kaslr-seed" to dtb enables triggering kaslr, or kernel virtual
address randomization, at secondary kernel boot. We always do this as
it will have no harm on kaslr-incapable kernel.
We don't have any "switch" to turn off this feature directly, but still
can suppress it by passing "nokaslr" as a kernel boot argument.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[will: Use rng_is_initialized()]
Signed-off-by: Will Deacon <will.deacon@arm.com>
With this patch, kernel verification can be done without IMA security
subsystem enabled. Turn on CONFIG_KEXEC_VERIFY_SIG instead.
On x86, a signature is embedded into a PE file (Microsoft's format) header
of binary. Since arm64's "Image" can also be seen as a PE file as far as
CONFIG_EFI is enabled, we adopt this format for kernel signing.
You can create a signed kernel image with:
$ sbsign --key ${KEY} --cert ${CERT} Image
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
[will: removed useless pr_debug()]
Signed-off-by: Will Deacon <will.deacon@arm.com>
We use a stop_machine call for each available capability to
enable it on all the CPUs available at boot time. Instead
we could batch the cpu_enable callbacks to a single stop_machine()
call to save us some time.
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Use the sorted list of capability entries for the detection and
verification.
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Make use of the sorted capability list to access the capability
entry in this_cpu_has_cap() to avoid iterating over the two
tables.
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We maintain two separate tables of capabilities, errata and features,
which decide the system capabilities. We iterate over each of these
tables for various operations (e.g, detection, verification etc.).
We do not have a way to map a system "capability" to its entry,
(i.e, cap -> struct arm64_cpu_capabilities) which is needed for
this_cpu_has_cap(). So we iterate over the table one by one to
find the entry and then do the operation. Also, this prevents
us from optimizing the way we "enable" the capabilities on the
CPUs, where we now issue a stop_machine() for each available
capability.
One solution is to merge the two tables into a single table,
sorted by the capability. But this is has the following
disadvantages:
- We loose the "classification" of an errata vs. feature
- It is quite easy to make a mistake when adding an entry,
unless we sort the table at runtime.
So we maintain a list of pointers to the capability entry, sorted
by the "cap number" in a separate array, initialized at boot time.
The only restriction is that we can have one "entry" per capability.
While at it, remove the duplicate declaration of arm64_errata table.
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
On arm64, purgatory would do almost nothing. So just invoke secondary
kernel directly by jumping into its entry code.
While, in this case, cpu_soft_restart() must be called with dtb address
in the fifth argument, the behavior still stays compatible with kexec_load
case as long as the argument is null.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch provides kexec_file_ops for "Image"-format kernel. In this
implementation, a binary is always loaded with a fixed offset identified
in text_offset field of its header.
Regarding signature verification for trusted boot, this patch doesn't
contains CONFIG_KEXEC_VERIFY_SIG support, which is to be added later
in this series, but file-attribute-based verification is still a viable
option by enabling IMA security subsystem.
You can sign(label) a to-be-kexec'ed kernel image on target file system
with:
$ evmctl ima_sign --key /path/to/private_key.pem Image
On live system, you must have IMA enforced with, at least, the following
security policy:
"appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig"
See more details about IMA here:
https://sourceforge.net/p/linux-ima/wiki/Home/
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
load_other_segments() is expected to allocate and place all the necessary
memory segments other than kernel, including initrd and device-tree
blob (and elf core header for crash).
While most of the code was borrowed from kexec-tools' counterpart,
users may not be allowed to specify dtb explicitly, instead, the dtb
presented by the original boot loader is reused.
arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
specific data allocated in load_other_segments().
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Those image head's flags will be used later by kexec_file loader.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Remove duplicate entries for Qualcomm erratum 1003. Since the entries
are not purely based on generic MIDR checks, use the multi_cap_entry
type to merge the entries.
Cc: Christopher Covington <cov@codeaurora.org>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Merge duplicate entries for a single capability using the midr
range list for Cavium errata 30115 and 27456.
Cc: Andrew Pinski <apinski@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability :
1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012]
2) ARM Errata 819472 on A53 r0p[01]
Both have the same work around. Merge these entries to avoid
duplicate entries for a single capability. Add a new Kconfig
entry to control the "capability" entry to make it easier
to handle combinations of the CONFIGs.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
readelf complains about the section layout of vmlinux when building
with CONFIG_RELOCATABLE=y (for KASLR):
readelf: Warning: [21]: Link field (0) should index a symtab section.
readelf: Warning: [21]: Info field (0) should index a relocatable section.
Also, it seems that our use of '-pie -shared' is contradictory, and
thus ambiguous. In general, the way KASLR is wired up at the moment
is highly tailored to how ld.bfd happens to implement (and conflate)
PIE executables and shared libraries, so given the current effort to
support other toolchains, let's fix some of these issues as well.
- Drop the -pie linker argument and just leave -shared. In ld.bfd,
the differences between them are unclear (except for the ELF type
of the produced image [0]) but lld chokes on seeing both at the
same time.
- Rename the .rela output section to .rela.dyn, as is customary for
shared libraries and PIE executables, so that it is not misidentified
by readelf as a static relocation section (producing the warnings
above).
- Pass the -z notext and -z norelro options to explicitly instruct the
linker to permit text relocations, and to omit the RELRO program
header (which requires a certain section layout that we don't adhere
to in the kernel). These are the defaults for current versions of
ld.bfd.
- Discard .eh_frame and .gnu.hash sections to avoid them from being
emitted between .head.text and .text, screwing up the section layout.
These changes only affect the ELF image, and produce the same binary
image.
[0] b9dce7f1ba ("arm64: kernel: force ET_DYN ELF type for ...")
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Smith <peter.smith@linaro.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
was introduced by a patch that tried to fix one bug, but by doing so created
another bug. As both bugs corrupt the output (but they do not crash the
kernel), I decided to fix the design such that it could have both bugs
fixed. The original fix, fixed time reporting of the function graph tracer
when doing a max_depth of one. This was code that can test how much the
kernel interferes with userspace. But in doing so, it could corrupt the time
keeping of the function profiler.
The issue is that the curr_ret_stack variable was being used for two
different meanings. One was to keep track of the stack pointer on the
ret_stack (shadow stack used by the function graph tracer), and the other
use case was the graph call depth. Although, the two may be closely
related, where they got updated was the issue that lead to the two different
bugs that required the two use cases to be updated differently.
The big issue with this fix is that it requires changing each architecture.
The good news is, I was able to remove a lot of code that was duplicated
within the architectures and place it into a single location. Then I could
make the fix in one place.
I pushed this code into linux-next to let it settle over a week, and before
doing so, I cross compiled all the affected architectures to make sure that
they built fine.
In the mean time, I also pulled in a patch that fixes the sched_switch
previous tasks state output, that was not actually correct.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW/4NPhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnWAAQCyUIRLgYImr81eTl52lxNRsULk+aiI
U29kRFWWU0c40AEA1X9sDF0MgOItbRGfZtnHTZEousXRDaDf4Fge2kF7Egg=
=liQ0
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"While rewriting the function graph tracer, I discovered a design flaw
that was introduced by a patch that tried to fix one bug, but by doing
so created another bug.
As both bugs corrupt the output (but they do not crash the kernel), I
decided to fix the design such that it could have both bugs fixed. The
original fix, fixed time reporting of the function graph tracer when
doing a max_depth of one. This was code that can test how much the
kernel interferes with userspace. But in doing so, it could corrupt
the time keeping of the function profiler.
The issue is that the curr_ret_stack variable was being used for two
different meanings. One was to keep track of the stack pointer on the
ret_stack (shadow stack used by the function graph tracer), and the
other use case was the graph call depth. Although, the two may be
closely related, where they got updated was the issue that lead to the
two different bugs that required the two use cases to be updated
differently.
The big issue with this fix is that it requires changing each
architecture. The good news is, I was able to remove a lot of code
that was duplicated within the architectures and place it into a
single location. Then I could make the fix in one place.
I pushed this code into linux-next to let it settle over a week, and
before doing so, I cross compiled all the affected architectures to
make sure that they built fine.
In the mean time, I also pulled in a patch that fixes the sched_switch
previous tasks state output, that was not actually correct"
* tag 'trace-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
sched, trace: Fix prev_state output in sched_switch tracepoint
function_graph: Have profiler use curr_ret_stack and not depth
function_graph: Reverse the order of pushing the ret_stack and the callback
function_graph: Move return callback before update of curr_ret_stack
function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack
function_graph: Make ftrace_push_return_trace() static
sparc/function_graph: Simplify with function_graph_enter()
sh/function_graph: Simplify with function_graph_enter()
s390/function_graph: Simplify with function_graph_enter()
riscv/function_graph: Simplify with function_graph_enter()
powerpc/function_graph: Simplify with function_graph_enter()
parisc: function_graph: Simplify with function_graph_enter()
nds32: function_graph: Simplify with function_graph_enter()
MIPS: function_graph: Simplify with function_graph_enter()
microblaze: function_graph: Simplify with function_graph_enter()
arm64: function_graph: Simplify with function_graph_enter()
ARM: function_graph: Simplify with function_graph_enter()
x86/function_graph: Simplify with function_graph_enter()
function_graph: Create function_graph_enter() to consolidate architecture code
The core ftrace hooks take the instrumented PC in x0, but for some
reason arm64's prepare_ftrace_return() takes this in x1.
For consistency, let's flip the argument order and always pass the
instrumented PC in x0.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The save_return_regs and restore_return_regs macros are only used by
return_to_handler, and having them defined out-of-line only serves to
obscure the logic.
Before we complicate, let's clean this up and fold the logic directly
into return_to_handler, saving a few lines of macro boilerplate in the
process. At the same time, a missing trailing space is added to the
comments, fixing a code style violation.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The core ftrace code requires that when it is handed the PC of an
instrumented function, this PC is the address of the instrumented
instruction. This is necessary so that the core ftrace code can identify
the specific instrumentation site. Since the instrumented function will
be a BL, the address of the instrumented function is LR - 4 at entry to
the ftrace code.
This fixup is applied in the mcount_get_pc and mcount_get_pc0 helpers,
which acquire the PC of the instrumented function.
The mcount_get_lr helper is used to acquire the LR of the instrumented
function, whose value does not require this adjustment, and cannot be
adjusted to anything meaningful. No adjustment of this value is made on
other architectures, including arm. However, arm64 adjusts this value by
4.
This patch brings arm64 in line with other architectures and removes the
adjustment of the LR value.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The core frace code has an optional sanity check on the frame pointer
passed by ftrace_graph_caller and return_to_handler. This is cheap,
useful, and enabled unconditionally on x86, sparc, and riscv.
Let's do the same on arm64, so that we can catch any problems early.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The global exports of ftrace_call and ftrace_graph_call are somewhat
painful to read. Let's use the generic GLOBAL() macro to ameliorate
matters.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 1212f7a16a ("scripts/kallsyms: filter arm64's __efistub_
symbols") updated the kallsyms code to filter out symbols with
the __efistub_ prefix explicitly, so we no longer require the
hack in our linker script to emit them as absolute symbols.
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
On the affected Cortex-A76 cores (r0p0 to r3p0), if a virtual address
for a cacheable mapping of a location is being accessed by a core while
another core is remapping the virtual address to a new physical page
using the recommended break-before-make sequence, then under very rare
circumstances TLBI+DSB completes before a read using the translation
being invalidated has been observed by other observers. The workaround
repeats the TLBI+DSB operation and is shared with the Qualcomm Falkor
erratum 1009
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The function_graph_enter() function does the work of calling the function
graph hook function and the management of the shadow stack, simplifying the
work done in the architecture dependent prepare_ftrace_return().
Have arm64 use the new code, and remove the shadow stack management as well as
having to set up the trace structure.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@kernel.org
Fixes: 03274a3ffb ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Now that we have switched to the small code model entirely, and
reduced the extended KASLR range to 4 GB, we can be sure that the
targets of relative branches that are out of range are in range
for a ADRP/ADD pair, which is one instruction shorter than our
current MOVN/MOVK/MOVK sequence, and is more idiomatic and so it
is more likely to be implemented efficiently by micro-architectures.
So switch over the ordinary PLT code and the special handling of
the Cortex-A53 ADRP errata, as well as the ftrace trampline
handling.
Reviewed-by: Torsten Duwe <duwe@lst.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: Added a couple of comments in the plt equality check]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add support for emitting ADR and ADRP instructions so we can switch
over our PLT generation code in a subsequent patch.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
__install_bp_hardening_cb() is called via stop_machine() as part
of the cpu_enable callback. To force each CPU to take its turn
when allocating slots, they take a spinlock.
With the RT patches applied, the spinlock becomes a mutex,
and we get warnings about sleeping while in stop_machine():
| [ 0.319176] CPU features: detected: RAS Extension Support
| [ 0.319950] BUG: scheduling while atomic: migration/3/36/0x00000002
| [ 0.319955] Modules linked in:
| [ 0.319958] Preemption disabled at:
| [ 0.319969] [<ffff000008181ae4>] cpu_stopper_thread+0x7c/0x108
| [ 0.319973] CPU: 3 PID: 36 Comm: migration/3 Not tainted 4.19.1-rt3-00250-g330fc2c2a880 #2
| [ 0.319975] Hardware name: linux,dummy-virt (DT)
| [ 0.319976] Call trace:
| [ 0.319981] dump_backtrace+0x0/0x148
| [ 0.319983] show_stack+0x14/0x20
| [ 0.319987] dump_stack+0x80/0xa4
| [ 0.319989] __schedule_bug+0x94/0xb0
| [ 0.319991] __schedule+0x510/0x560
| [ 0.319992] schedule+0x38/0xe8
| [ 0.319994] rt_spin_lock_slowlock_locked+0xf0/0x278
| [ 0.319996] rt_spin_lock_slowlock+0x5c/0x90
| [ 0.319998] rt_spin_lock+0x54/0x58
| [ 0.320000] enable_smccc_arch_workaround_1+0xdc/0x260
| [ 0.320001] __enable_cpu_capability+0x10/0x20
| [ 0.320003] multi_cpu_stop+0x84/0x108
| [ 0.320004] cpu_stopper_thread+0x84/0x108
| [ 0.320008] smpboot_thread_fn+0x1e8/0x2b0
| [ 0.320009] kthread+0x124/0x128
| [ 0.320010] ret_from_fork+0x10/0x18
Switch this to a raw spinlock, as we know this is only called with
IRQs masked.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When merging support for SSBD and the CRC32 instructions, the conflict
resolution for the new capability entries in arm64_features[]
inadvertedly predicated the availability of the CRC32 instructions on
CONFIG_ARM64_SSBD, despite the functionality being entirely unrelated.
Move the #ifdef CONFIG_ARM64_SSBD down so that it only covers the SSBD
capability.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fix up one typos: Onl -> Only
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There have been some additional events added to the PMU architecture
since Armv8.0, so expose them via our sysfs infrastructure.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The PMU event numbers are split between perf_event.h and perf_event.c,
which makes it difficult to spot any gaps in the numbers which may be
allocated in the future.
This patch sorts the events numerically, adds some missing events and
moves the definitions into perf_event.h.
Signed-off-by: Will Deacon <will.deacon@arm.com>
We cannot distinguish reads from writes in our generic cache events, so
drop the WRITE entries and leave the READ entries pointing to the combined
read/write events, as is done by other CPUs and architectures.
Reported-by: Ganapatrao Kulkarni <Ganapatrao.Kulkarni@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Armv8.1 allocated the upper 32-bits of the PMCEID registers to describe
the common architectural and microarchitecture events beginning at 0x4000.
Add support for these registers to our probing code, so that we can
advertise the SPE events when they are supported by the CPU.
Signed-off-by: Will Deacon <will.deacon@arm.com>
As a hangover from when this code used a designated initialiser, we've
been using commas to terminate the arm_pmu field assignments. Whilst
harmless, it's also weird, so replace them with semicolons instead.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Instead of saving a pointer to the .plt and .init.plt sections to apply
plt-based relocations, save and use their section indices instead.
The mod->arch.{core,init}.plt pointers were problematic for livepatch
because they pointed within temporary section headers (provided by the
module loader via info->sechdrs) that would be freed after module load.
Since livepatch modules may need to apply relocations post-module-load
(for example, to patch a module that is loaded later), using section
indices to offset into the section headers (instead of accessing them
through a saved pointer) allows livepatch modules on arm64 to pass in
their own copy of the section headers to apply_relocate_add() to apply
delayed relocations.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The new memory EFI reservation feature we introduced to allow memory
reservations to persist across kexec may trigger an unbounded number
of calls to memblock_reserve(). The memblock subsystem can deal with
this fine, but not before memblock resizing is enabled, which we can
only do after paging_init(), when the memory we reallocate the array
into is actually mapped.
So break out the memreserve table processing into a separate routine
and call it after paging_init() on arm64. On ARM, because of limited
reviewing bandwidth of the maintainer, we cannot currently fix this,
so instead, disable the EFI persistent memreserve entirely on ARM so
we can fix it later.
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181114175544.12860-5-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Fix W+X page (mark RO) allocated by the arm64 kprobes code
- Makefile fix for .i files in out of tree modules
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlvdeMUACgkQa9axLQDI
XvGVAQ/8Dd6M1NwDsPqdemdWVOGJJi+c+Ou39c29dCvPkZV63ZgPBaQiOX5DJ18T
TUjC+Q2zW4Oag86q4N0REOQiafDfScNU/ZsDgnfHvagKNc6+V1lqzb8DsteeeCW9
YOPnbL/VV+dBKKXphjW23VQfsz5ryDU6HKoDHRgOOtHisnTOKJGj/HCXzn1LY6x4
us/Gl6U/kJyRs0/7F8lmfSatDK2o+bKo0/0X6OV7dNE3bo++rpWxCX8D/dcBiEwV
BZDkWu5noglnzYz/LwobYwIjshd6cNjSjJKgoudp3+6WtcFGiK347HDQyo6WqBSd
5hmo/R0My5SUWrwb3GVmxFQmDDxIwywneSkKdx00PNygoNBhu7VYOrf7C/8NOl2h
a0lMCl1Q9x+/2ZDWHhgcwZ6Nfkj/3hJ/3jQVtfqt7ldXgPmZQPrBcx0+CzjGAAiK
gmIpr7VH701KkQGMljV4W0AurWx4v/+YpewkSODBOcbEQTd6trl8I5+A0SA+o6eC
F479l8meU9H0vf9fMB1bkRxBipyaFRKNaTuabO3wHN45C4fzQCXQi5pjfvyAfC+f
zZbnTKeWVzAafnYGcS6Fml+hUD3QQdARnd3WDOyzwBC7EvZM3gGmFWnlkMxbgSuV
9c9+t7fMLChQiKUZ+bjNQpaXZ0YmA9+1fRqb9xCOP1/Ll7933A0=
=hH50
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
- fix W+X page (mark RO) allocated by the arm64 kprobes code
- Makefile fix for .i files in out of tree modules
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kprobe: make page to RO mode when allocate it
arm64: kdump: fix small typo
arm64: makefile fix build of .i file in external module case
This brings the kernel doc in line with the function signature.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Introduces the stackleak gcc plugin ported from grsecurity by Alexander
Popov, with x86 and arm64 support.
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlvQvn4WHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJpSfD/sErFreuPT1beSw994Lr9Zx4k9v
ERsuXxWBENaJOJXbOOHMfVEcEeG/1uhPSp7hlw/dpHfh0anATTrcYqm8RNKbfK+k
o06+JK14OJfpm5Ghq/7OizhdNLCMT8wMU3XZtWfy65VSJGjEFx8Y48vMeQtpWtUK
ylSzi9JV6j2iUBF9oibtiT53+yqsqAtX80X1G7HRCgv9kxuKMhZr+Q5oGV6+ViyQ
Azj8mNn06iRnhHKd17WxDJr0GjSibzz4weS/9XgP3t3EcNWJo1EgBlD2KV3tOfP5
nzmqfqTqrcjxs/tyjdh6vVCSlYucNtyCQGn63qyShQYSg6mZwclR2fY8YSTw6PWw
GfYWFOWru9z+qyQmwFkQ9bSQS2R+JIT0oBCj9VmtF9XmPCy7K2neJsQclzSPBiCW
wPgXVQS4IA4684O5CmDOVMwmDpGvhdBNUR6cqSzGLxQOHY1csyXubMNUsqU3g9xk
Ob4pEy/xrrIw4WpwHcLHSEW5gV1/OLhsT0fGRJJiC947L3cN5s9EZp7FLbIS0zlk
qzaXUcLmn6AgcfkYwg5cI3RMLaN2V0eDCMVTWZJ1wbrmUV9chAaOnTPTjNqLOTht
v3b1TTxXG4iCpMmOFf59F8pqgAwbBDlfyNSbySZ/Pq5QH69udz3Z9pIUlYQnSJHk
u6q++2ReDpJXF81rBw==
=Ks6B
-----END PGP SIGNATURE-----
Merge tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull stackleak gcc plugin from Kees Cook:
"Please pull this new GCC plugin, stackleak, for v4.20-rc1. This plugin
was ported from grsecurity by Alexander Popov. It provides efficient
stack content poisoning at syscall exit. This creates a defense
against at least two classes of flaws:
- Uninitialized stack usage. (We continue to work on improving the
compiler to do this in other ways: e.g. unconditional zero init was
proposed to GCC and Clang, and more plugin work has started too).
- Stack content exposure. By greatly reducing the lifetime of valid
stack contents, exposures via either direct read bugs or unknown
cache side-channels become much more difficult to exploit. This
complements the existing buddy and heap poisoning options, but
provides the coverage for stacks.
The x86 hooks are included in this series (which have been reviewed by
Ingo, Dave Hansen, and Thomas Gleixner). The arm64 hooks have already
been merged through the arm64 tree (written by Laura Abbott and
reviewed by Mark Rutland and Will Deacon).
With VLAs having been removed this release, there is no need for
alloca() protection, so it has been removed from the plugin"
* tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
arm64: Drop unneeded stackleak_check_alloca()
stackleak: Allow runtime disabling of kernel stack erasing
doc: self-protection: Add information about STACKLEAK feature
fs/proc: Show STACKLEAK metrics in the /proc file system
lkdtm: Add a test for STACKLEAK
gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
When a memblock allocation APIs are called with align = 0, the alignment
is implicitly set to SMP_CACHE_BYTES.
Implicit alignment is done deep in the memblock allocator and it can
come as a surprise. Not that such an alignment would be wrong even
when used incorrectly but it is better to be explicit for the sake of
clarity and the prinicple of the least surprise.
Replace all such uses of memblock APIs with the 'align' parameter
explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
in the memblock internal allocation functions.
For the case when memblock APIs are used via helper functions, e.g. like
iommu_arena_new_node() in Alpha, the helper functions were detected with
Coccinelle's help and then manually examined and updated where
appropriate.
The direct memblock APIs users were updated using the semantic patch below:
@@
expression size, min_addr, max_addr, nid;
@@
(
|
- memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
|
- memblock_alloc(size, 0)
+ memblock_alloc(size, SMP_CACHE_BYTES)
|
- memblock_alloc_raw(size, 0)
+ memblock_alloc_raw(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from(size, 0, min_addr)
+ memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_nopanic(size, 0)
+ memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low(size, 0)
+ memblock_alloc_low(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low_nopanic(size, 0)
+ memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from_nopanic(size, 0, min_addr)
+ memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_node(size, 0, nid)
+ memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
)
[mhocko@suse.com: changelog update]
[akpm@linux-foundation.org: coding-style fixes]
[rppt@linux.ibm.com: fix missed uses of implicit alignment]
Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.
The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>
@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>
[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits)
hugetlbfs: dirty pages as they are added to pagecache
mm: export add_swap_extent()
mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS
tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE
mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page()
mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()
mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition
mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t
mm/gup: cache dev_pagemap while pinning pages
Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"
mm: return zero_resv_unavail optimization
mm: zero remaining unavailable struct pages
tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option
tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option
tools/testing/selftests/vm/gup_benchmark.c: allow user specified file
tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage
mm/gup_benchmark.c: add additional pinning methods
mm/gup_benchmark.c: time put_page()
mm: don't raise MEMCG_OOM event due to failed high-order allocation
mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock
...
ARM64 has asm implementation of memchr(), memcmp(), str[r]chr(),
str[n]cmp(), str[n]len(). KASAN don't see memory accesses in asm code,
thus it can potentially miss many bugs.
Ifdef out __HAVE_ARCH_* defines of these functions when KASAN is enabled,
so the generic implementations from lib/string.c will be used.
We can't just remove the asm functions because efistub uses them. And we
can't have two non-weak functions either, so declare the asm functions as
weak.
Link: http://lkml.kernel.org/r/20180920135631.23833-2-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Sync dtc with upstream version v1.4.7-14-gc86da84d30e4
- Work to get rid of direct accesses to struct device_node name and
type pointers in preparation for removing them. New helpers for
parsing DT cpu nodes and conversions to use the helpers. printk
conversions to %pOFn for printing DT node names. Most went thru
subystem trees, so this is the remainder.
- Fixes to DT child node lookups to actually be restricted to child
nodes instead of treewide.
- Refactoring of dtb targets out of arch code. This makes the support
more uniform and enables building all dtbs on c6x, microblaze, and
powerpc.
- Various DT binding updates for Renesas r8a7744 SoC
- Vendor prefixes for Facebook, OLPC
- Restructuring of some ARM binding docs moving some peripheral bindings
out of board/SoC binding files
- New "secure-chosen" binding for secure world settings on ARM
- Dual licensing of 2 DT IRQ binding headers
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAlvTKWYQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhw8J5EACMAnrTxWQmXfQXOZEVxztcFavH6LP8mh2e
7FZIZ38jzHXXvl81tAg1nBhzFUU/qtvqW8NDCZ9OBxKvp6PFDNhWu241ZodSB1Kw
MZWy2A9QC+qbHYCC+SB5gOT0+Py3v7LNCBa5/TxhbFd35THJM8X0FP7gmcCGX593
9Ml1rqawT4mK5XmCpczT0cXxyC4TgVtpfDWZH2KgJTR/kwXVQlOQOGZ8a1y/wrt7
8TLIe7Qy4SFRzjhwbSta1PUehyYfe4uTSsXIJ84kMvNMxinLXQtvd7t9TfsK8p/R
WjYUneJskVjtxVrMQfdV4MxyFL1YEt2mYcr0PMKIWxMCgGDAZsHPoUZmjyh/PrCI
uiZtEHn3fXpUZAV/xEHHNirJxYyQfHGiksAT+lPrUXYYLCcZ3ZmqiTEYhGoQAfH5
CQPMuxA6yXxp6bov6zJwZSTZtkXciju8aQRhUhlxIfHTqezmGYeql/bnWd+InNuR
upANLZBh6D2jTWzDyobconkCCLlVkSqDoqOx725mMl6hIcdH9d2jVX7hwRf077VI
5i3CyPSJOkSOLSdB8bAPYfBoaDtH2bthxieUrkkSbIjbwHO1H6a2lxPeG/zah0a3
ePMGhi7J84UM4VpJEi000cP+bhPumJtJrG7zxP7ldXdfAF436sQ6KRptlcpLpj5i
IwMhUQNH+g==
=335v
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree updates from Rob Herring:
"A bit bigger than normal as I've been busy this cycle.
There's a few things with dependencies and a few things subsystem
maintainers didn't pick up, so I'm taking them thru my tree.
The fixes from Johan didn't get into linux-next, but they've been
waiting for some time now and they are what's left of what subsystem
maintainers didn't pick up.
Summary:
- Sync dtc with upstream version v1.4.7-14-gc86da84d30e4
- Work to get rid of direct accesses to struct device_node name and
type pointers in preparation for removing them. New helpers for
parsing DT cpu nodes and conversions to use the helpers. printk
conversions to %pOFn for printing DT node names. Most went thru
subystem trees, so this is the remainder.
- Fixes to DT child node lookups to actually be restricted to child
nodes instead of treewide.
- Refactoring of dtb targets out of arch code. This makes the support
more uniform and enables building all dtbs on c6x, microblaze, and
powerpc.
- Various DT binding updates for Renesas r8a7744 SoC
- Vendor prefixes for Facebook, OLPC
- Restructuring of some ARM binding docs moving some peripheral
bindings out of board/SoC binding files
- New "secure-chosen" binding for secure world settings on ARM
- Dual licensing of 2 DT IRQ binding headers"
* tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (78 commits)
ARM: dt: relicense two DT binding IRQ headers
power: supply: twl4030-charger: fix OF sibling-node lookup
NFC: nfcmrvl_uart: fix OF child-node lookup
net: stmmac: dwmac-sun8i: fix OF child-node lookup
net: bcmgenet: fix OF child-node lookup
drm/msm: fix OF child-node lookup
drm/mediatek: fix OF sibling-node lookup
of: Add missing exports of node name compare functions
dt-bindings: Add OLPC vendor prefix
dt-bindings: misc: bk4: Add device tree binding for Liebherr's BK4 SPI bus
dt-bindings: thermal: samsung: Add SPDX license identifier
dt-bindings: clock: samsung: Add SPDX license identifiers
dt-bindings: timer: ostm: Add R7S9210 support
dt-bindings: phy: rcar-gen2: Add r8a7744 support
dt-bindings: can: rcar_can: Add r8a7744 support
dt-bindings: timer: renesas, cmt: Document r8a7744 CMT support
dt-bindings: watchdog: renesas-wdt: Document r8a7744 support
dt-bindings: thermal: rcar: Add device tree support for r8a7744
Documentation: dt: Add binding for /secure-chosen/stdout-path
dt-bindings: arm: zte: Move sysctrl bindings to their own doc
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlvPV7IUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyaUg//WnCaRIu2oKOp8c/bplZJDW5eT10d
oYAN9qeyptU9RYrg4KBNbZL9UKGFTk3AoN5AUjrk8njxc/dY2ra/79esOvZyyYQy
qLXBvrXKg3yZnlNlnyBneGSnUVwv/kl2hZS+kmYby2YOa8AH/mhU0FIFvsnfRK2I
XvwABFm2ZYvXCqh3e5HXaHhOsR88NQ9In0AXVC7zHGqv1r/bMVn2YzPZHL/zzMrF
mS79tdBTH+shSvchH9zvfgIs+UEKvvjEJsG2liwMkcQaV41i5dZjSKTdJ3EaD/Y2
BreLxXRnRYGUkBqfcon16Yx+P6VCefDRLa+RhwYO3dxFF2N4ZpblbkIdBATwKLjL
npiGc6R8yFjTmZU0/7olMyMCm7igIBmDvWPcsKEE8R4PezwoQv6YKHBMwEaflIbl
Rv4IUqjJzmQPaA0KkRoAVgAKHxldaNqno/6G1FR2gwz+fr68p5WSYFlQ3axhvTjc
bBMJpB/fbp9WmpGJieTt6iMOI6V1pnCVjibM5ZON59WCFfytHGGpbYW05gtZEod4
d/3yRuU53JRSj3jQAQuF1B6qYhyxvv5YEtAQqIFeHaPZ67nL6agw09hE+TlXjWbE
rTQRShflQ+ydnzIfKicFgy6/53D5hq7iH2l7HwJVXbXRQ104T5DB/XHUUTr+UWQn
/Nkhov32/n6GjxQ=
=58I4
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- Fix ASPM link_state teardown on removal (Lukas Wunner)
- Fix misleading _OSC ASPM message (Sinan Kaya)
- Make _OSC optional for PCI (Sinan Kaya)
- Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set
(Patrick Talbert)
- Remove x86 and arm64 node-local allocation for host bridge structures
(Punit Agrawal)
- Pay attention to device-specific _PXM node values (Jonathan Cameron)
- Support new Immediate Readiness bit (Felipe Balbi)
- Differentiate between pciehp surprise and safe removal (Lukas Wunner)
- Remove unnecessary pciehp includes (Lukas Wunner)
- Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner)
- Tolerate PCIe Slot Presence Detect being hardwired to zero to
workaround broken hardware, e.g., the Wilocity switch/wireless device
(Lukas Wunner)
- Unify pciehp controller & slot structs (Lukas Wunner)
- Constify hotplug_slot_ops (Lukas Wunner)
- Drop hotplug_slot_info (Lukas Wunner)
- Embed hotplug_slot struct into users instead of allocating it
separately (Lukas Wunner)
- Initialize PCIe port service drivers directly instead of relying on
initcall ordering (Keith Busch)
- Restore PCI config state after a slot reset (Keith Busch)
- Save/restore DPC config state along with other PCI config state
(Keith Busch)
- Reference count devices during AER handling to avoid race issue with
concurrent hot removal (Keith Busch)
- If an Upstream Port reports ERR_FATAL, don't try to read the Port's
config space because it is probably unreachable (Keith Busch)
- During error handling, use slot-specific reset instead of secondary
bus reset to avoid link up/down issues on hotplug ports (Keith Busch)
- Restore previous AER/DPC handling that does not remove and
re-enumerate devices on ERR_FATAL (Keith Busch)
- Notify all drivers that may be affected by error recovery resets
(Keith Busch)
- Always generate error recovery uevents, even if a driver doesn't have
error callbacks (Keith Busch)
- Make PCIe link active reporting detection generic (Keith Busch)
- Support D3cold in PCIe hierarchies during system sleep and runtime,
including hotplug and Thunderbolt ports (Mika Westerberg)
- Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots
are empty or occupied (Jon Derrick)
- Remove duplicated include from pci/pcie/err.c and unused variable
from cpqphp (YueHaibing)
- Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza
Pawandeep)
- Uninline PCI bus accessors for better ftracing (Keith Busch)
- Remove unused AER Root Port .error_resume method (Keith Busch)
- Use kfifo in AER instead of a local version (Keith Busch)
- Use threaded IRQ in AER bottom half (Keith Busch)
- Use managed resources in AER core (Keith Busch)
- Reuse pcie_port_find_device() for AER injection (Keith Busch)
- Abstract AER interrupt handling to disconnect error injection (Keith
Busch)
- Refactor AER injection callbacks to simplify future improvments
(Keith Busch)
- Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski)
- Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko)
- Add switch fall-through annotations (Gustavo A. R. Silva)
- Remove unused Switchtec quirk variable (Joshua Abraham)
- Fix pci.c kernel-doc warning (Randy Dunlap)
- Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig)
- Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng)
- Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid
useless dmesg errors (Logan Gunthorpe)
- Update Switchtec NTB documentation (Wesley Yung)
- Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz)
- Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang)
- Add PCI support for peer-to-peer DMA (Logan Gunthorpe)
- Add sysfs group for PCI peer-to-peer memory statistics (Logan
Gunthorpe)
- Add PCI peer-to-peer DMA scatterlist mapping interface (Logan
Gunthorpe)
- Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan
Gunthorpe)
- Add PCI peer-to-peer DMA driver writer's documentation (Logan
Gunthorpe)
- Add block layer flag to indicate driver support for PCI peer-to-peer
DMA (Logan Gunthorpe)
- Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P
memory (Logan Gunthorpe)
- Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan
Gunthorpe)
- Add nvme-pci support for PCI peer-to-peer memory in requests (Logan
Gunthorpe)
- Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise,
Christoph Hellwig, Logan Gunthorpe)
- Cache VF config space size to optimize enumeration of many VFs
(KarimAllah Ahmed)
- Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas)
- Fix VMD AERSID quirk Device ID matching (Jon Derrick)
- Fix Cadence PHY handling during probe (Alan Douglas)
- Signal Cadence Endpoint interrupts via AXI region 0 instead of last
region (Alan Douglas)
- Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan
Douglas)
- Remove redundant controller tests for "device_type == pci" (Rob
Herring)
- Document R-Car E3 (R8A77990) bindings (Tho Vu)
- Add device tree support for R-Car r8a7744 (Biju Das)
- Drop unused mvebu PCIe capability code (Thomas Petazzoni)
- Add shared PCI bridge emulation code (Thomas Petazzoni)
- Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni)
- Add aardvark Root Port emulation (Thomas Petazzoni)
- Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach)
- Add initial power management for i.MX7 (Leonard Crestez)
- Add PME_Turn_Off support for i.MX7 (Leonard Crestez)
- Fix qcom runtime power management error handling (Bjorn Andersson)
- Update TI dra7xx unaligned access errata workaround for host mode as
well as endpoint mode (Vignesh R)
- Fix kirin section mismatch warning (Nathan Chancellor)
- Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare)
- Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I)
- Update Keystone to use MRRS quirk for host bridge instead of open
coding (Kishon Vijay Abraham I)
- Refactor Keystone link establishment (Kishon Vijay Abraham I)
- Simplify and speed up Keystone link training (Kishon Vijay Abraham I)
- Remove unused Keystone host_init argument (Kishon Vijay Abraham I)
- Merge Keystone driver files into one (Kishon Vijay Abraham I)
- Remove redundant Keystone platform_set_drvdata() (Kishon Vijay
Abraham I)
- Rename Keystone functions for uniformity (Kishon Vijay Abraham I)
- Add Keystone device control module DT binding (Kishon Vijay Abraham
I)
- Use SYSCON API to get Keystone control module device IDs (Kishon
Vijay Abraham I)
- Clean up Keystone PHY handling (Kishon Vijay Abraham I)
- Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I)
- Clean up Keystone config space access checks (Kishon Vijay Abraham I)
- Get Keystone outbound window count from DT (Kishon Vijay Abraham I)
- Clean up Keystone outbound window configuration (Kishon Vijay Abraham
I)
- Clean up Keystone DBI setup (Kishon Vijay Abraham I)
- Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I)
- Fix Keystone IRQ status checking (Kishon Vijay Abraham I)
- Add debug messages for all Keystone errors (Kishon Vijay Abraham I)
- Clean up Keystone includes and macros (Kishon Vijay Abraham I)
- Fix Mediatek unchecked return value from devm_pci_remap_iospace()
(Gustavo A. R. Silva)
- Fix Mediatek endpoint/port matching logic (Honghui Zhang)
- Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui
Zhang)
- Remove redundant Mediatek PM domain check (Honghui Zhang)
- Convert Mediatek to pci_host_probe() (Honghui Zhang)
- Fix Mediatek MSI enablement (Honghui Zhang)
- Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang)
- Add Mediatek loadable module support (Honghui Zhang)
- Detach VMD resources after stopping root bus to prevent orphan
resources (Jon Derrick)
- Convert pcitest build process to that used by other tools (iio, perf,
etc) (Gustavo Pimentel)
* tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
PCI/AER: Refactor error injection fallbacks
PCI/AER: Abstract AER interrupt handling
PCI/AER: Reuse existing pcie_port_find_device() interface
PCI/AER: Use managed resource allocations
PCI: pcie: Remove redundant 'default n' from Kconfig
PCI: aardvark: Implement emulated root PCI bridge config space
PCI: mvebu: Convert to PCI emulated bridge config space
PCI: mvebu: Drop unused PCI express capability code
PCI: Introduce PCI bridge emulated config space common logic
PCI: vmd: Detach resources after stopping root bus
nvmet: Optionally use PCI P2P memory
nvmet: Introduce helper functions to allocate and free request SGLs
nvme-pci: Add support for P2P memory in requests
nvme-pci: Use PCI p2pmem subsystem to manage the CMB
IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
block: Add PCI P2P flag for request queue
PCI/P2PDMA: Add P2P DMA driver writer's documentation
docs-rst: Add a new directory for PCI documentation
PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
...
Pull security subsystem updates from James Morris:
"In this patchset, there are a couple of minor updates, as well as some
reworking of the LSM initialization code from Kees Cook (these prepare
the way for ordered stackable LSMs, but are a valuable cleanup on
their own)"
* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
LSM: Don't ignore initialization failures
LSM: Provide init debugging infrastructure
LSM: Record LSM name in struct lsm_info
LSM: Convert security_initcall() into DEFINE_LSM()
vmlinux.lds.h: Move LSM_TABLE into INIT_DATA
LSM: Convert from initcall to struct lsm_info
LSM: Remove initcall tracing
LSM: Rename .security_initcall section to .lsm_info
vmlinux.lds.h: Avoid copy/paste of security_init section
LSM: Correctly announce start of LSM initialization
security: fix LSM description location
keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.h
seccomp: remove unnecessary unlikely()
security: tomoyo: Fix obsolete function
security/capabilities: remove check for -EINVAL
Pull siginfo updates from Eric Biederman:
"I have been slowly sorting out siginfo and this is the culmination of
that work.
The primary result is in several ways the signal infrastructure has
been made less error prone. The code has been updated so that manually
specifying SEND_SIG_FORCED is never necessary. The conversion to the
new siginfo sending functions is now complete, which makes it
difficult to send a signal without filling in the proper siginfo
fields.
At the tail end of the patchset comes the optimization of decreasing
the size of struct siginfo in the kernel from 128 bytes to about 48
bytes on 64bit. The fundamental observation that enables this is by
definition none of the known ways to use struct siginfo uses the extra
bytes.
This comes at the cost of a small user space observable difference.
For the rare case of siginfo being injected into the kernel only what
can be copied into kernel_siginfo is delivered to the destination, the
rest of the bytes are set to 0. For cases where the signal and the
si_code are known this is safe, because we know those bytes are not
used. For cases where the signal and si_code combination is unknown
the bits that won't fit into struct kernel_siginfo are tested to
verify they are zero, and the send fails if they are not.
I made an extensive search through userspace code and I could not find
anything that would break because of the above change. If it turns out
I did break something it will take just the revert of a single change
to restore kernel_siginfo to the same size as userspace siginfo.
Testing did reveal dependencies on preferring the signo passed to
sigqueueinfo over si->signo, so bit the bullet and added the
complexity necessary to handle that case.
Testing also revealed bad things can happen if a negative signal
number is passed into the system calls. Something no sane application
will do but something a malicious program or a fuzzer might do. So I
have fixed the code that performs the bounds checks to ensure negative
signal numbers are handled"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits)
signal: Guard against negative signal numbers in copy_siginfo_from_user32
signal: Guard against negative signal numbers in copy_siginfo_from_user
signal: In sigqueueinfo prefer sig not si_signo
signal: Use a smaller struct siginfo in the kernel
signal: Distinguish between kernel_siginfo and siginfo
signal: Introduce copy_siginfo_from_user and use it's return value
signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
signal: Fail sigqueueinfo if si_signo != sig
signal/sparc: Move EMT_TAGOVF into the generic siginfo.h
signal/unicore32: Use force_sig_fault where appropriate
signal/unicore32: Generate siginfo in ucs32_notify_die
signal/unicore32: Use send_sig_fault where appropriate
signal/arc: Use force_sig_fault where appropriate
signal/arc: Push siginfo generation into unhandled_exception
signal/ia64: Use force_sig_fault where appropriate
signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
signal/ia64: Use the generic force_sigsegv in setup_frame
signal/arm/kvm: Use send_sig_mceerr
signal/arm: Use send_sig_fault where appropriate
signal/arm: Use force_sig_fault where appropriate
...
Pull x86 paravirt updates from Ingo Molnar:
"Two main changes:
- Remove no longer used parts of the paravirt infrastructure and put
large quantities of paravirt ops under a new config option
PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross)
- Enable PV spinlocks on Hyperv (Yi Sun)"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Enable PV qspinlock for Hyper-V
x86/hyperv: Add GUEST_IDLE_MSR support
x86/paravirt: Clean up native_patch()
x86/paravirt: Prevent redefinition of SAVE_FLAGS macro
x86/xen: Make xen_reservation_lock static
x86/paravirt: Remove unneeded mmu related paravirt ops bits
x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella
x86/paravirt: Introduce new config option PARAVIRT_XXL
x86/paravirt: Remove unused paravirt bits
x86/paravirt: Use a single ops structure
x86/paravirt: Remove clobbers from struct paravirt_patch_site
x86/paravirt: Remove clobbers parameter from paravirt patch functions
x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static
x86/xen: Add SPDX identifier in arch/x86/xen files
x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM
x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c
x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
Pull locking and misc x86 updates from Ingo Molnar:
"Lots of changes in this cycle - in part because locking/core attracted
a number of related x86 low level work which was easier to handle in a
single tree:
- Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E.
McKenney, Andrea Parri)
- lockdep scalability improvements and micro-optimizations (Waiman
Long)
- rwsem improvements (Waiman Long)
- spinlock micro-optimization (Matthew Wilcox)
- qspinlocks: Provide a liveness guarantee (more fairness) on x86.
(Peter Zijlstra)
- Add support for relative references in jump tables on arm64, x86
and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens)
- Be a lot less permissive on weird (kernel address) uaccess faults
on x86: BUG() when uaccess helpers fault on kernel addresses (Jann
Horn)
- macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav
Amit)
- ... and a handful of other smaller changes as well"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
locking/lockdep: Make global debug_locks* variables read-mostly
locking/lockdep: Fix debug_locks off performance problem
locking/pvqspinlock: Extend node size when pvqspinlock is configured
locking/qspinlock_stat: Count instances of nested lock slowpaths
locking/qspinlock, x86: Provide liveness guarantee
x86/asm: 'Simplify' GEN_*_RMWcc() macros
locking/qspinlock: Rework some comments
locking/qspinlock: Re-order code
locking/lockdep: Remove duplicated 'lock_class_ops' percpu array
x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
futex: Replace spin_is_locked() with lockdep
locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs
x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
x86/extable: Macrofy inline assembly code to work around GCC inlining bugs
x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops
x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs
x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs
x86/refcount: Work around GCC inlining bug
x86/objtool: Use asm macros to work around GCC inlining bugs
...
- Core mmu_gather changes which allow tracking the levels of page-table
being cleared together with the arm64 low-level flushing routines
- Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to
mitigate Spectre-v4 dynamically without trapping to EL3 firmware
- Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack
- Optimise emulation of MRS instructions to ID_* registers on ARMv8.4
- Support for Common Not Private (CnP) translations allowing threads of
the same CPU to share the TLB entries
- Accelerated crc32 routines
- Move swapper_pg_dir to the rodata section
- Trap WFI instruction executed in user space
- ARM erratum 1188874 workaround (arch_timer)
- Miscellaneous fixes and clean-ups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlvKGdEACgkQa9axLQDI
XvGSQBAAiOH6aQABL4TB7c5KIc7C+Unjm6QCFCoaeGWoHuemnM6cFJ7RQsi0GqnP
dVEX5V/FKfmeTWO5g24Ah+MbTm3Bt6+81gywAmi1rrHhmCaCIPjT7xDqy/WsLlvt
7WtgegSGvQ7DIMj2dbfFav6+ra67qAiYZTc46jvuynVl6DrE3BCiyTDbXAWt2nzP
Xf3un4AHRbg3UEMUZTLqU5q4z0tbM6rEAZru8O0UOTnD2q7uttUqW3Ab7fpuEkkj
lEVrMWD3h8SJg+Df9CbXmCNOjh4VhwBwDb5LgO8vA/AcyV/YLEF5b2OUAk/28qwo
0GBwjqRyI4+YQ9LPg41MhGzrlnta0HCdYoeNLgLQZiDcUkuSfGhoA+MNZNOR8B08
sCWF7F6f8UIQm8KMMBiYYdlVyUYgHLsWE/1+CyeLV0oIoWT5k3c+Xe3pho9KpVb0
Co04TqMlqalry0sbevHz5c55H7iWIjB1Tpo3SxM105dVJVibXRPXkz+WZ5iPO+xa
ex2j1kjNdA/AUzrSCZ5lh22zhg0WsfwD++E5meAaJMxieim8FeZDRga43rowJ0BA
zMbSNB/+NDFZ9EhC40VaUfKk8Tkgiug9J5swv0+v7hy1QLDyydHhbOecTuIueauM
6taiT2Iuov5yFng1eonYj4htvouVF4WOhPGthFPJMOcrB9mLMhs=
=3Mc8
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Apart from some new arm64 features and clean-ups, this also contains
the core mmu_gather changes for tracking the levels of the page table
being cleared and a minor update to the generic
compat_sys_sigaltstack() introducing COMPAT_SIGMINSKSZ.
Summary:
- Core mmu_gather changes which allow tracking the levels of
page-table being cleared together with the arm64 low-level flushing
routines
- Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to
mitigate Spectre-v4 dynamically without trapping to EL3 firmware
- Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack
- Optimise emulation of MRS instructions to ID_* registers on ARMv8.4
- Support for Common Not Private (CnP) translations allowing threads
of the same CPU to share the TLB entries
- Accelerated crc32 routines
- Move swapper_pg_dir to the rodata section
- Trap WFI instruction executed in user space
- ARM erratum 1188874 workaround (arch_timer)
- Miscellaneous fixes and clean-ups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits)
arm64: KVM: Guests can skip __install_bp_hardening_cb()s HYP work
arm64: cpufeature: Trap CTR_EL0 access only where it is necessary
arm64: cpufeature: Fix handling of CTR_EL0.IDC field
arm64: cpufeature: ctr: Fix cpu capability check for late CPUs
Documentation/arm64: HugeTLB page implementation
arm64: mm: Use __pa_symbol() for set_swapper_pgd()
arm64: Add silicon-errata.txt entry for ARM erratum 1188873
Revert "arm64: uaccess: implement unsafe accessors"
arm64: mm: Drop the unused cpu parameter
MAINTAINERS: fix bad sdei paths
arm64: mm: Use #ifdef for the __PAGETABLE_P?D_FOLDED defines
arm64: Fix typo in a comment in arch/arm64/mm/kasan_init.c
arm64: xen: Use existing helper to check interrupt status
arm64: Use daifflag_restore after bp_hardening
arm64: daifflags: Use irqflags functions for daifflags
arm64: arch_timer: avoid unused function warning
arm64: Trap WFI executed in userspace
arm64: docs: Document SSBS HWCAP
arm64: docs: Fix typos in ELF hwcaps
arm64/kprobes: remove an extra semicolon in arch_prepare_kprobe
...
enable_smccc_arch_workaround_1() passes NULL as the hyp_vecs start and
end if the HVC conduit is in use, and ARM_SMCCC_ARCH_WORKAROUND_1 is
detected.
If the guest kernel happened to be built with KVM_INDIRECT_VECTORS,
we go on to allocate a slot, memcpy() the empty workaround in and
do the appropriate cache maintenance.
This works as we always tell memcpy() the range is 0, so it never
accesses the NULL src pointer, but we still do the cache maintenance.
If hyp_vecs_start is NULL we know we're a guest, just update the fn
like the !KVM_INDIRECT_VECTORS version.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When there is a mismatch in the CTR_EL0 field, we trap
access to CTR from EL0 on all CPUs to expose the safe
value. However, we could skip trapping on a CPU which
matches the safe value.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
CTR_EL0.IDC reports the data cache clean requirements for instruction
to data coherence. However, if the field is 0, we need to check the
CLIDR_EL1 fields to detect the status of the feature. Currently we
don't do this and generate a warning with tainting the kernel, when
there is a mismatch in the field among the CPUs. Also the userspace
doesn't have a reliable way to check the CLIDR_EL1 register to check
the status.
This patch fixes the problem by checking the CLIDR_EL1 fields, when
(CTR_EL0.IDC == 0) and updates the kernel's copy of the CTR_EL0 for
the CPU with the actual status of the feature. This would allow the
sanity check infrastructure to do the proper checking of the fields
and also allow the CTR_EL0 emulation code to supply the real status
of the feature.
Now, if a CPU has raw CTR_EL0.IDC == 0 and effective IDC == 1 (with
overall system wide IDC == 1), we need to expose the real value to
the user. So, we trap CTR_EL0 access on the CPU which reports incorrect
CTR_EL0.IDC.
Fixes: commit 6ae4b6e057 ("arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC")
Cc: Shanker Donthineni <shankerd@codeaurora.org>
Cc: Philip Elcan <pelcan@codeaurora.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The matches() routine for a capability must honor the "scope"
passed to it and return the proper results.
i.e, when passed with SCOPE_LOCAL_CPU, it should check the
status of the capability on the current CPU. This is used by
verify_local_cpu_capabilities() on a late secondary CPU to make
sure that it's compliant with the established system features.
However, ARM64_HAS_CACHE_{IDC/DIC} always checks the system wide
registers and this could mean that a late secondary CPU could return
"true" (since the CPU hasn't updated the system wide registers yet)
and thus lead the system in an inconsistent state, where
the system assumes it has IDC/DIC feature, while the new CPU
doesn't.
Fixes: commit 6ae4b6e057 ("arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC")
Cc: Philip Elcan <pelcan@codeaurora.org>
Cc: Shanker Donthineni <shankerd@codeaurora.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It doesn't make sense for a perf event to be configured as a CHAIN event
in isolation, so extend the arm_pmu structure with a ->filter_match()
function to allow the backend PMU implementation to reject CHAIN events
early.
Cc: <stable@vger.kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We describe ranges of 'reserved' memory to userspace via /proc/iomem.
Commit 50d7ba36b9 ("arm64: export memblock_reserve()d regions via
/proc/iomem") updated the logic to export regions that were reserved
because their contents should be preserved. This allowed kexec-tools
to tell the difference between 'reserved' memory that must be
preserved and not overwritten, (e.g. the ACPI tables), and 'nomap'
memory that must not be touched without knowing the memory-attributes
(e.g. RAS CPER regions).
The above commit wrongly assumed that memblock_reserve() would not
be used to reserve regions that aren't memory. It turns out this is
exactly what early_init_dt_reserve_memory_arch() will do if it finds
a DT reserved-memory that was also carved out of the memory node, which
results in a WARN_ON_ONCE() and the region being reserved instead of
ignored. The ramoops description on hikey and dragonboard-410c both do
this, so we can't simply write this configuration off as "buggy firmware".
Avoid this issue by rewriting reserve_memblock_reserved_regions() so
that only the portions of reserved regions which overlap with mapped
memory are actually reserved.
Fixes: 50d7ba36b9 ("arm64: export memblock_reserve()d regions via /proc/iomem")
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Paolo Pisati <p.pisati@gmail.com>
CC: Akashi Takahiro <takahiro.akashi@linaro.org>
CC: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since the struct lsm_info table is not an initcall, we can just move it
into INIT_DATA like all the other tables.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: James Morris <james.morris@microsoft.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
It recently came to light that userspace can execute WFI, and that
the arm64 kernel doesn't trap this event. This sounds rather benign,
but the kernel should decide when it wants to wait for an interrupt,
and not userspace.
Let's trap WFI and immediately return after having skipped the
instruction. This effectively makes WFI a rather expensive NOP.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There is an extra semicolon in arch_prepare_kprobe, remove it.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When running on Cortex-A76, a timer access from an AArch32 EL0
task may end up with a corrupted value or register. The workaround for
this is to trap these accesses at EL1/EL2 and execute them there.
This only affects versions r0p0, r1p0 and r2p0 of the CPU.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Just like CNTVCT, we need to handle userspace trapping into the
kernel if we're decided that the timer wasn't fit for purpose...
64bit userspace is already dealt with, but we're missing the
equivalent compat handling.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since people seem to make a point in breaking the userspace visible
counter, we have no choice but to trap the access. We already do this
for 64bit userspace, but this is lacking for compat. Let's provide
the required handler.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We're now ready to start handling CP15 access. Let's add (empty)
arrays for both 32 and 64bit accessors, and the code that deals
with them.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Here's a /really nice/ part of the architecture: a CP15 access is
allowed to trap even if it fails its condition check, and SW must
handle it. This includes decoding the IT state if this happens in
am IT block. As a consequence, SW must also deal with advancing
the IT state machine.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Instead of directly generating an UNDEF when trapping a CP15 access,
let's add a new entry point to that effect (which only generates an
UNDEF for now).
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arm64 does not define CONFIG_HAVE_ARCH_COMPILER_H, nor does it keep
anything useful in its copy of asm/compiler.h, so let's remove it
before anybody starts using it.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Use the for_each_of_cpu_node iterator to iterate over cpu nodes. This
has the side effect of defaulting to iterating using "cpu" node names in
preference to the deprecated (for FDT) device_type == "cpu".
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Add arm64_force_sig_ptrace_errno_trap for consistency with
arm64_force_sig_fault and use it where appropriate.
This adds the show_signal logic to the force_sig_errno_trap case,
where it was apparently overlooked earlier.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This will let the description be reused shortly.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function has no more callers so remove it.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add arm64_force_sig_mceerr for consistency with arm64_force_sig_fault,
and use it in the one location that can take advantage of it.
This removes the fiddly filling out of siginfo before sending a signal
reporting an memory error to userspace.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Wrap force_sig_fault with a helper that calls arm64_show_signal
and call arm64_force_sig_fault where appropraite.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Filling in siginfo is error prone and so it is wise to use more
specialized helpers to do that work. Factor out the arm specific
unhandled signal reporting from the work of delivering a signal so
the code can be modified to use functions that take the information
to fill out siginfo as parameters.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Every caller passes in current for tsk so there is no need to pass
tsk. Instead make tsk a local variable initialized to current.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Instead of generating a struct siginfo before calling arm64_notify_die
pass the signal number, tne sicode and the fault address into
arm64_notify_die and have it call force_sig_fault instead of
force_sig_info to let the generic code generate the struct siginfo.
This keeps code passing just the needed information into
siginfo generating code, making it easier to see what
is happening and harder to get wrong. Further by letting
the generic code handle the generation of struct siginfo
it reduces the number of sites generating struct siginfo
making it possible to review them and verify that all
of the fiddly details for a structure passed to userspace
are handled properly.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
On a randomly chosen distro kernel build for arm64, vmlinux.o shows the
following sections, containing jump label entries, and the associated
RELA relocation records, respectively:
...
[38088] __jump_table PROGBITS 0000000000000000 00e19f30
000000000002ea10 0000000000000000 WA 0 0 8
[38089] .rela__jump_table RELA 0000000000000000 01fd8bb0
000000000008be30 0000000000000018 I 38178 38088 8
...
In other words, we have 190 KB worth of 'struct jump_entry' instances,
and 573 KB worth of RELA entries to relocate each entry's code, target
and key members. This means the RELA section occupies 10% of the .init
segment, and the two sections combined represent 5% of vmlinux's entire
memory footprint.
So let's switch from 64-bit absolute references to 32-bit relative
references for the code and target field, and a 64-bit relative
reference for the 'key' field (which may reside in another module or the
core kernel, which may be more than 4 GB way on arm64 when running with
KASLR enable): this reduces the size of the __jump_table by 33%, and
gets rid of the RELA section entirely.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jessica Yu <jeyu@kernel.org>
Link: https://lkml.kernel.org/r/20180919065144.25010-4-ard.biesheuvel@linaro.org
Now that deliberate writes to swapper_pg_dir are made via the fixmap, we
can defend against errant writes by moving it into the rodata section.
Since tramp_pg_dir and reserved_ttbr0 must be at a fixed offset from
swapper_pg_dir, and are not modified at runtime, these are also moved
into the rodata section. Likewise, idmap_pg_dir is not modified at
runtime, and is moved into rodata.
Signed-off-by: Jun Yao <yaojun8558363@gmail.com>
Reviewed-by: James Morse <james.morse@arm.com>
[Mark: simplify linker script, commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since the address of swapper_pg_dir is fixed for a given kernel image,
it is an attractive target for manipulation via an arbitrary write. To
mitigate this we'd like to make it read-only by moving it into the
rodata section.
We require that swapper_pg_dir is at a fixed offset from tramp_pg_dir
and reserved_ttbr0, so these will also need to move into rodata.
However, swapper_pg_dir is allocated along with some transient page
tables used for boot which we do not want to move into rodata.
As a step towards this, this patch separates the boot-time page tables
into a new init_pg_dir, and reduces swapper_pg_dir to the single page it
needs to be. This allows us to retain the relationship between
swapper_pg_dir, tramp_pg_dir, and swapper_pg_dir, while cleanly
separating these from the boot-time page tables.
The init_pg_dir holds all of the pgd/pud/pmd/pte levels needed during
boot, and all of these levels will be freed when we switch to the
swapper_pg_dir, which is initialized by the existing code in
paging_init(). Since we start off on the init_pg_dir, we no longer need
to allocate a transient page table in paging_init() in order to ensure
that swapper_pg_dir isn't live while we initialize it.
There should be no functional change as a result of this patch.
Signed-off-by: Jun Yao <yaojun8558363@gmail.com>
Reviewed-by: James Morse <james.morse@arm.com>
[Mark: place init_pg_dir after BSS, fold mm changes, commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In subsequent patches we'll use a transient pgd during the primary cpu's
boot process. To make this work while allowing secondary cpus to use the
swapper_pg_dir, we need to pass the relevant TTBR1 pgd as a parameter
to __enable_mmu().
This patch updates __enable__mmu() to take this as a parameter, updating
callsites to pass swapper_pg_dir for now.
There should be no functional change as a result of this patch.
Signed-off-by: Jun Yao <yaojun8558363@gmail.com>
Reviewed-by: James Morse <james.morse@arm.com>
[Mark: simplify assembly, clarify commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Support for VGA_CONSOLE is not allowable due to commit ee23794b86
("video: vgacon: Don't build on arm64"), thus remove the associated
unused code.
Whilst PCI on arm64 would support VGA a valid screen_info structure
is missing.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Armv8.4-A extension enables MRS instruction encodings inside ESR_ELx.ISS
during exception class ESR_ELx_EC_SYS64 (0x18). This encoding can be used
to emulate MRS instructions which can avoid fetch/decode from user space
thus improving performance. This adds a new sys64_hook structure element
with applicable ESR mask/value pair for MRS instructions on various system
registers but constrained by sysreg encodings which is currently allowed
to be emulated.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
MRS emulation gets triggered with exception class (0x00 or 0x18) eventually
calling the function emulate_mrs() which fetches the user space instruction
and analyses it's encodings (OP0, OP1, OP2, CRN, CRM, RT). The kernel tries
to emulate the given instruction looking into the encoding details. Going
forward these encodings can also be parsed from ESR_ELx.ISS fields without
requiring to fetch/decode faulting userspace instruction which can improve
performance. This factorizes emulate_mrs() function in a way that it can be
called directly with MRS encodings (OP0, OP1, OP2, CRN, CRM) for any given
target register which can then be used directly from 0x18 exception class.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Extracting target register from ESR.ISS encoding has already been required
at multiple instances. Just make it a macro definition and replace all the
existing use cases.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There's no need to treat mismatched cache-line sizes reported by CTR_EL0
differently to any other mismatched fields that we treat as "STRICT" in
the cpufeature code. In both cases we need to trap and emulate EL0
accesses to the register, so drop ARM64_MISMATCHED_CACHE_LINE_SIZE and
rely on ARM64_MISMATCHED_CACHE_TYPE instead.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[catalin.marinas@arm.com: move ARM64_HAS_CNP in the empty cpucaps.h slot]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Common Not Private (CNP) is a feature of ARMv8.2 extension which
allows translation table entries to be shared between different PEs in
the same inner shareable domain, so the hardware can use this fact to
optimise the caching of such entries in the TLB.
CNP occupies one bit in TTBRx_ELy and VTTBR_EL2, which advertises to
the hardware that the translation table entries pointed to by this
TTBR are the same as every PE in the same inner shareable domain for
which the equivalent TTBR also has CNP bit set. In case CNP bit is set
but TTBR does not point at the same translation table entries for a
given ASID and VMID, then the system is mis-configured, so the results
of translations are UNPREDICTABLE.
For kernel we postpone setting CNP till all cpus are up and rely on
cpufeature framework to 1) patch the code which is sensitive to CNP
and 2) update TTBR1_EL1 with CNP bit set. TTBR1_EL1 can be
reprogrammed as result of hibernation or cpuidle (via __enable_mmu).
For these two cases we restore CnP bit via __cpu_suspend_exit().
There are a few cases we need to care of changes in TTBR0_EL1:
- a switch to idmap
- software emulated PAN
we rule out latter via Kconfig options and for the former we make
sure that CNP is set for non-zero ASIDs only.
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
[catalin.marinas@arm.com: default y for CONFIG_ARM64_CNP]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Memory for host controller data structures is allocated local to the node
to which the controller is associated with. This has been the behaviour
since support for ACPI was added in commit 0cb0786bac ("ARM64: PCI:
Support ACPI-based PCI host controller").
Drop the node local allocation as there is no benefit from doing so - the
usage of these structures is independent from where the controller is
located.
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Instructions for modifying the PSTATE fields which were not supported
in the older toolchains (e.g, PAN, UAO) are generated using macros.
We have so far used the normal sys_reg() helper for defining the PSTATE
fields. While this works fine, it is really difficult to correlate the
code with the Arm ARM definition.
As per Arm ARM, the PSTATE fields are defined only using Op1, Op2 fields,
with fixed values for Op0, CRn. Also the CRm field has been reserved
for the Immediate value for the instruction. So using the sys_reg()
looks quite confusing.
This patch cleans up the instruction helpers by bringing them
in line with the Arm ARM definitions to make it easier to correlate
code with the document. No functional changes.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The bad_mode() handler is called if we encounter an uunknown exception,
with the expectation that the subsequent call to panic() will halt the
system. Unfortunately, if the exception calling bad_mode() is taken from
EL0, then the call to die() can end up killing the current user task and
calling schedule() instead of falling through to panic().
Remove the die() call altogether, since we really want to bring down the
machine in this "impossible" case.
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
force_signal_inject() is designed to send a fatal signal to userspace,
so WARN if the current pt_regs indicates a kernel context. This can
currently happen for the undefined instruction trap, so patch that up so
we always BUG() if we didn't have a handler.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The cpu errata and feature enable callbacks are only called via their
respective arm64_cpu_capabilities structure and therefore shouldn't
exist in the global namespace.
Move the PAN, RAS and cache maintenance emulation enable callbacks into
the same files as their corresponding arm64_cpu_capabilities structures,
making them static in the process.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On CPUs with support for PSTATE.SSBS, the kernel can toggle the SSBD
state without needing to call into firmware.
This patch hooks into the existing SSBD infrastructure so that SSBS is
used on CPUs that support it, but it's all made horribly complicated by
the very real possibility of big/little systems that don't uniformly
provide the new capability.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Rather than panic() when taking an undefined instruction exception from
EL1, allow a hook to be registered in case we want to emulate the
instruction, like we will for the SSBS PSTATE manipulation instructions.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that we're all merged nicely into mainline, there's no need to check
to see if PR_SPEC_STORE_BYPASS is defined.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Armv8.5 introduces a new PSTATE bit known as Speculative Store Bypass
Safe (SSBS) which can be used as a mitigation against Spectre variant 4.
Additionally, a CPU may provide instructions to manipulate PSTATE.SSBS
directly, so that userspace can toggle the SSBS control without trapping
to the kernel.
This patch probes for the existence of SSBS and advertise the new instructions
to userspace if they exist.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since commit 23c85094fe ("proc/kcore: add vmcoreinfo note to /proc/kcore")
the kernel has exported the vmcoreinfo PT_NOTE on /proc/kcore as well
as /proc/vmcore.
arm64 only exposes it's additional arch information via
arch_crash_save_vmcoreinfo() if built with CONFIG_KEXEC, as kdump was
previously the only user of vmcoreinfo.
Move this weak function to a separate file that is built at the same
time as its caller in kernel/crash_core.c. This ensures values like
'kimage_voffset' are always present in the vmcoreinfo PT_NOTE.
CC: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add a CRC32 feature bit and wire it up to the CPU id register so we
will be able to use alternatives patching for CRC32 operations.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Drop stackleak_check_alloca() for arm64 since the STACKLEAK gcc plugin now
doesn't track stack depth overflow caused by alloca().
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS, allowing error retrival and injection
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAltxmb4VHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpD7E0P/0qn1IMtskaC7EglFCm72+NXe1CW
ZAtxTHzetjf7977dA3bVsg4gEKvVx5b3YuRT76u4hBoSa0rFJ8Q9iSC8wL4u9Idf
JUQjwVIUxMeGW5fR0VFDkd9SkDYtNGdjQcVl2I8UpV+lnLC/2Vfr4xR5qBad2pAQ
zjthdpQMjZWClyhPkOv6WjVsW0lNw0xDkZWgCViBY+TdT7Gmw/q8hmvj9TEwbMGT
7tmQl9MupQ2bLY8WuTiGA6eNiEZld9esJGthI43xGQDJl4Y3FeciIZWcBru20+wu
GnC3QS3FlmYlp2WuWcKU9lEGXhmoX/7/1WVhZkoMsIvi05c2JCxSxstK7QNfUaAH
8q2/Wc0fYIGm2owH+b1Mpn0w37GZtgl7Bxxzakg7B7Ko0q/EnO7z6XVup1/abKRU
NtUKlWIL7NDiHjHO6j0hBb3rGi7B3wo86P7GTPJb12Dg9EBF5DVhekXeGI/ChzE9
WIV1PxR0seSapzlJ92HHmWLAtcRLtXXesqcctmN4d2URBtsx9DEwo0Upiz//reYE
TBncQbtniVt2xXEl7sqNEYei75IxC3Dg1AgDL/zVQDl8PW0UvKo8Qb0cW7EnF9Vg
AcjD6R72dAgbqUMYOP0nriKxzXwa0Jls9aF3zBgcikKMGeyD6Z/Exlq4LexhSeuw
cWKsrQUYcLGKZPRN
=b6+A
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 4.19
- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS, allowing error retrival and injection
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups
- Fix boot on Hikey-960 by avoiding an IPI with interrupts disabled
- Fix address truncation in pfn_valid() implementation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJbdp+EAAoJELescNyEwWM0Ld8H/iJqjvPwNLRC0KGL/rCQJH70
D80qlNBnwlrs2eUJTeNeRVZC+t2l9vJIoT17W938WkjxV+DSGDsfFDy3/BQ7VTji
7e33mwFBNoH+feAfMYmzht3sRlvyZ0oqXSIq/GrdZ8a4Gg/6iNVz7K1kpboBVFXp
LFnFIN4I7mNwdl1nAyNmnU081MMWfyvgRB82Xd9eS00KCAm3ueHfkwBNcwkfulDg
RT2ZXPzwd3Yxsdy3Z+r1vyXMHAw2GjcYpL5pjvHf34zMdvqkk03sMsx2yReuSR1U
M6MpNCdZfWHgMlFWbsEoEOd0g0CF5s6TQK3hBqoUEE3AUVNrQ8ixZMip326axoQ=
=C2YW
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"A couple of arm64 fixes
- Fix boot on Hikey-960 by avoiding an IPI with interrupts disabled
- Fix address truncation in pfn_valid() implementation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid()
arm64: Avoid calling stop_machine() when patching jump labels
Patching a jump label involves patching a single instruction at a time,
swizzling between a branch and a NOP. The architecture treats these
instructions specially, so a concurrently executing CPU is guaranteed to
see either the NOP or the branch, rather than an amalgamation of the two
instruction encodings.
However, in order to guarantee that the new instruction is visible, it
is necessary to send an IPI to the concurrently executing CPU so that it
discards any previously fetched instructions from its pipeline. This
operation therefore cannot be completed from a context with IRQs
disabled, but this is exactly what happens on the jump label path where
the hotplug lock is held and irqs are subsequently disabled by
stop_machine_cpuslocked(). This results in a deadlock during boot on
Hikey-960.
Due to the architectural guarantees around patching NOPs and branches,
we don't actually need to stop_machine() at all on the jump label path,
so we can avoid the deadlock by using the "nosync" variant of our
instruction patching routine.
Fixes: 693350a799 ("arm64: insn: Don't fallback on nosync path for general insn patching")
Reported-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Tuomas Tynkkynen <tuomas@tuxera.com>
Tested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
- verify depmod is installed before modules_install
- support build salt in case build ids must be unique between builds
- allow users to specify additional host compiler flags via HOST*FLAGS,
and rename internal variables to KBUILD_HOST*FLAGS
- update buildtar script to drop vax support, add arm64 support
- update builddeb script for better debarch support
- document the pit-fall of if_changed usage
- fix parallel build of UML with O= option
- make 'samples' target depend on headers_install to fix build errors
- remove deprecated host-progs variable
- add a new coccinelle script for refcount_t vs atomic_t check
- improve double-test coccinelle script
- misc cleanups and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbdFZ0AAoJED2LAQed4NsGcHYP/23txxk3GRP7O4UkfPw9Rtky
MHiXTgcoy2vbG+l12BgzWX+qFii8XTUe3dQtK4HnGQFUIBtEBV/hpZPJtxfgGSev
Zou5cv1kr5rNzTkCn//TG3O6/WIkTBCe2hahDCtmGDI3kd/cPK4dHbU/q6KpaqIJ
qzZYBXIvCeu2GM8idQoCRrwdMpgu1pBz1gz2sDje1yHH2toI7T6cXHRLQDgx+HPq
LIP7W9GUsoDdXjecvPD51LiW89E6BUxETBh5Ft9r9uzwB5ylQQMcw6Qyu2DiYDUX
PPsHCMiolYV+Ttcy+vj/67KOvKmEaFotssck+RD/xDCF17zKhRkup+YM8kPLHTVZ
TcAUZadbnT6U/s2W6GFwvVbN/P7cc3aif+aNCC/Pl23yagp3pydlSCocYxQgiVR7
/rx48haYDEgu/MJ1X0dOpSO0ErY7zu2OoAlNerW+D9QizwbP+WtZO/CJH8SxQRuN
dQ1xmyNrie+ODgi9tbc4eBrsb+1rioX927TP5MbJcfXt5CTsxDmIqop5XwyYIoQN
ZWWlzC8Ii3P2trAVpBgM2IEbngSxwr6T9Wbf1ScJnPKr/o1rq+pBk49cYstTz3kQ
OwJ8gPwUrkW4R+hlD7L6mL/WcrKzZBQS0Ij1QW2kVSEhRrsKo99psE1/rGehnHu9
KGB0LYYCqGSOHR4zOjg0
=VjfG
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- verify depmod is installed before modules_install
- support build salt in case build ids must be unique between builds
- allow users to specify additional host compiler flags via HOST*FLAGS,
and rename internal variables to KBUILD_HOST*FLAGS
- update buildtar script to drop vax support, add arm64 support
- update builddeb script for better debarch support
- document the pit-fall of if_changed usage
- fix parallel build of UML with O= option
- make 'samples' target depend on headers_install to fix build errors
- remove deprecated host-progs variable
- add a new coccinelle script for refcount_t vs atomic_t check
- improve double-test coccinelle script
- misc cleanups and fixes
* tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits)
coccicheck: return proper error code on fail
Coccinelle: doubletest: reduce side effect false positives
kbuild: remove deprecated host-progs variable
kbuild: make samples really depend on headers_install
um: clean up archheaders recipe
kbuild: add %asm-generic to no-dot-config-targets
um: fix parallel building with O= option
scripts: Add Python 3 support to tracing/draw_functrace.py
builddeb: Add automatic support for sh{3,4}{,eb} architectures
builddeb: Add automatic support for riscv* architectures
builddeb: Add automatic support for m68k architecture
builddeb: Add automatic support for or1k architecture
builddeb: Add automatic support for sparc64 architecture
builddeb: Add automatic support for mips{,64}r6{,el} architectures
builddeb: Add automatic support for mips64el architecture
builddeb: Add automatic support for ppc64 and powerpcspe architectures
builddeb: Introduce functions to simplify kconfig tests in set_debarch
builddeb: Drop check for 32-bit s390
builddeb: Change architecture detection fallback to use dpkg-architecture
builddeb: Skip architecture detection when KBUILD_DEBARCH is set
...
A bunch of good stuff in here:
- Wire up support for qspinlock, replacing our trusty ticket lock code
- Add an IPI to flush_icache_range() to ensure that stale instructions
fetched into the pipeline are discarded along with the I-cache lines
- Support for the GCC "stackleak" plugin
- Support for restartable sequences, plus an arm64 port for the selftest
- Kexec/kdump support on systems booting with ACPI
- Rewrite of our syscall entry code in C, which allows us to zero the
GPRs on entry from userspace
- Support for chained PMU counters, allowing 64-bit event counters to be
constructed on current CPUs
- Ensure scheduler topology information is kept up-to-date with CPU
hotplug events
- Re-enable support for huge vmalloc/IO mappings now that the core code
has the correct hooks to use break-before-make sequences
- Miscellaneous, non-critical fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJbbV41AAoJELescNyEwWM0WoEIALhrKtsIn6vqFlSs/w6aDuJL
cMWmFxjTaKLmIq2+cJIdFLOJ3CH80Pu9gB+nEv/k+cZdCTfUVKfRf28HTpmYWsht
bb4AhdHMC7yFW752BHk+mzJspeC8h/2Rm8wMuNVplZ3MkPrwo3vsiuJTofLhVL/y
BihlU3+5sfBvCYIsWnuEZIev+/I/s/qm1ASiqIcKSrFRZP6VTt5f9TC75vFI8seW
7yc3odKb0CArexB8yBjiPNziehctQF42doxQyL45hezLfWw4qdgHOSiwyiOMxEz9
Fwwpp8Tx33SKLNJgqoqYznGW9PhYJ7n2Kslv19uchJrEV+mds82vdDNaWRULld4=
=kQn6
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"A bunch of good stuff in here. Worth noting is that we've pulled in
the x86/mm branch from -tip so that we can make use of the core
ioremap changes which allow us to put down huge mappings in the
vmalloc area without screwing up the TLB. Much of the positive
diffstat is because of the rseq selftest for arm64.
Summary:
- Wire up support for qspinlock, replacing our trusty ticket lock
code
- Add an IPI to flush_icache_range() to ensure that stale
instructions fetched into the pipeline are discarded along with the
I-cache lines
- Support for the GCC "stackleak" plugin
- Support for restartable sequences, plus an arm64 port for the
selftest
- Kexec/kdump support on systems booting with ACPI
- Rewrite of our syscall entry code in C, which allows us to zero the
GPRs on entry from userspace
- Support for chained PMU counters, allowing 64-bit event counters to
be constructed on current CPUs
- Ensure scheduler topology information is kept up-to-date with CPU
hotplug events
- Re-enable support for huge vmalloc/IO mappings now that the core
code has the correct hooks to use break-before-make sequences
- Miscellaneous, non-critical fixes and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
arm64: alternative: Use true and false for boolean values
arm64: kexec: Add comment to explain use of __flush_icache_range()
arm64: sdei: Mark sdei stack helper functions as static
arm64, kaslr: export offset in VMCOREINFO ELF notes
arm64: perf: Add cap_user_time aarch64
efi/libstub: Only disable stackleak plugin for arm64
arm64: drop unused kernel_neon_begin_partial() macro
arm64: kexec: machine_kexec should call __flush_icache_range
arm64: svc: Ensure hardirq tracing is updated before return
arm64: mm: Export __sync_icache_dcache() for xen-privcmd
drivers/perf: arm-ccn: Use devm_ioremap_resource() to map memory
arm64: Add support for STACKLEAK gcc plugin
arm64: Add stack information to on_accessible_stack
drivers/perf: hisi: update the sccl_id/ccl_id when MT is supported
arm64: fix ACPI dependencies
rseq/selftests: Add support for arm64
arm64: acpi: fix alignment fault in accessing ACPI
efi/arm: map UEFI memory map even w/o runtime services enabled
efi/arm: preserve early mapping of UEFI memory map longer for BGRT
drivers: acpi: add dependency of EFI for arm64
...
Pull perf update from Thomas Gleixner:
"The perf crowd presents:
Kernel updates:
- Removal of jprobes
- Cleanup and consolidatation the handling of kprobes
- Cleanup and consolidation of hardware breakpoints
- The usual pile of fixes and updates to PMUs and event descriptors
Tooling updates:
- Updates and improvements all over the place. Nothing outstanding,
just the (good) boring incremental grump work"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
perf trace: Do not require --no-syscalls to suppress strace like output
perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h
perf tools: Allow overriding MAX_NR_CPUS at compile time
perf bpf: Show better message when failing to load an object
perf list: Unify metric group description format with PMU event description
perf vendor events arm64: Update ThunderX2 implementation defined pmu core events
perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet
perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packet
perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet
perf cs-etm: Fix start tracing packet handling
perf build: Fix installation directory for eBPF
perf c2c report: Fix crash for empty browser
perf tests: Fix indexing when invoking subtests
perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' args
perf trace beauty: Add beautifiers for 'socket''s 'protocol' arg
perf trace beauty: Do not print NULL strarray entries
perf beauty: Add a generator for IPPROTO_ socket's protocol constants
tools include uapi: Grab a copy of linux/in.h
perf tests: Fix complex event name parsing
perf evlist: Fix error out while applying initial delay and LBR
...
Pull genirq updates from Thomas Gleixner:
"The irq departement provides:
- A synchronization fix for free_irq() to synchronize just the
removed interrupt thread on shared interrupt lines.
- Consolidate the multi low level interrupt entry handling and mvoe
it to the generic code instead of adding yet another copy for
RISC-V
- Refactoring of the ARM LPI allocator and LPI exposure to the
hypervisor
- Yet another interrupt chip driver for the JZ4725B SoC
- Speed up for /proc/interrupts as people seem to love reading this
file with high frequency
- Miscellaneous fixes and updates"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
irqchip/gic-v3-its: Make its_lock a raw_spin_lock_t
genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete
openrisc: Use the new GENERIC_IRQ_MULTI_HANDLER
arm64: Use the new GENERIC_IRQ_MULTI_HANDLER
ARM: Convert to GENERIC_IRQ_MULTI_HANDLER
irqchip: Port the ARM IRQ drivers to GENERIC_IRQ_MULTI_HANDLER
irqchip/gic-v3-its: Reduce minimum LPI allocation to 1 for PCI devices
dt-bindings: irqchip: renesas-irqc: Document r8a77980 support
dt-bindings: irqchip: renesas-irqc: Document r8a77470 support
irqchip/ingenic: Add support for the JZ4725B SoC
irqchip/stm32: Add exti0 translation for stm32mp1
genirq: Remove redundant NULL pointer check in __free_irq()
irqchip/gic-v3-its: Honor hypervisor enforced LPI range
irqchip/gic-v3: Expose GICD_TYPER in the rdist structure
irqchip/gic-v3-its: Drop chunk allocation compatibility
irqchip/gic-v3-its: Move minimum LPI requirements to individual busses
irqchip/gic-v3-its: Use full range of LPIs
irqchip/gic-v3-its: Refactor LPI allocator
genirq: Synchronize only with single thread on free_irq()
genirq: Update code comments wrt recycled thread_mask
...
Return statements in functions returning bool should use true or false
instead of an integer value. This code was detected with the help of
Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
- GICv3 ITS LPI allocation revamp
- GICv3 support for hypervisor-enforced LPI range
- GICv3 ITS conversion to raw spinlock
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAltoBXMVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDyUYP/1feAq3F7ZmhCIZka4c6y/m4EBpq
BjWEEgOAGMEyyB4s98flsRtZcEUxxp6CqEXo2FgCsd1Nj+og7oA7vwOlqy3aGzsi
9f/Z5Wi6SlG06lH5tmYNkyVbGk2tE3s2FzkH5Rg8qZGk+X3OCOdNs/+G20pYAkSp
ESePWSapbQUJSExJ1MqzfdHFidtVA1V+ev8BKdIp2ykl1NRae8LJeKHIbqac49Ym
JclfCLFpQM1M1ElB9j0E8hAvZhz10oOz7TtBR737O/1QEifVyFqGBckPzldvwIJM
zZ+nR+Yzj1ruD109xwaF1iKy9AinZWhiqrtN7UXJ3jwHtNih+sy0R6FQ38GMNoOC
0K02n/qStR5xglGr4BmAcWlOuFtBYWfz6HpSVMqaTWWmOxHEiqS6pXtEA+dV/YyI
wHLbo0YzpWTQm6t1+b/PoByAJ0/hOcD1nOD57b+NGjX7tZV0sGjpGsecvFhTSywh
BN3COBi9k/FOBrOTGDX1qUAI+mEf76vc2BAC+BkkoiiMg3WlY0E9qfQJguUxHdrb
0LS3lDZoHCNoz8RZLrUyenTT0NYGcjPGUTinMDJWG79VGXOWFexTDdCuX0kF90CK
1Zie3O6lrTYolmaiyLUxwukKp1SVUyoA5IpKVwfDJQYUhEfk27yvlzg2MBMcHDRA
uy3QSkmjx9vw/sAu
=gKw8
-----END PGP SIGNATURE-----
Merge tag 'irqchip-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- GICv3 ITS LPI allocation revamp
- GICv3 support for hypervisor-enforced LPI range
- GICv3 ITS conversion to raw spinlock
Now that we understand the deadlock arising from flush_icache_range()
on the kexec crash kernel path, add a comment to justify the use of
__flush_icache_range() here.
Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SDEI stack helper functions are only used by _on_sdei_stack() and
refer to symbols (e.g. sdei_stack_normal_ptr) that are only defined if
CONFIG_VMAP_STACK=y.
Mark these functions as static, so we don't run into errors at link-time
due to references to undefined symbols. Stick all the parameters onto
the same line whilst we're passing through.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Include KASLR offset in arm64 VMCOREINFO ELF notes to assist in
debugging. vmcore parsing in user-space already expects this value in
the notes and we are providing it for portability of those existing
tools with x86.
Ideally we would like core code to do this (so that way this
information won't be missed when an architecture adds KASLR support),
but mips has CONFIG_RANDOMIZE_BASE, and doesn't provide kaslr_offset(),
so I am not sure if this is needed for mips (and other such similar arch
cases in future). So, lets keep this architecture specific for now.
As an example of a user-space use-case, consider the
makedumpfile user-space utility which will need fixup to use this
KASLR offset to work with cases where we need to find a way to
translate symbol address from vmlinux to kernel run time address
in case of KASLR boot on arm64.
I have already submitted the makedumpfile user-space patch upstream
and the maintainer has suggested to wait for the kernel changes to be
included (see [0]).
I tested this on my qualcomm amberwing board both for KASLR and
non-KASLR boot cases:
Without this patch:
# cat > scrub.conf << EOF
[vmlinux]
erase jiffies
erase init_task.utime
for tsk in init_task.tasks.next within task_struct:tasks
erase tsk.utime
endfor
EOF
# makedumpfile --split -d 31 -x vmlinux --config scrub.conf vmcore dumpfile_{1,2,3}
readpage_elf: Attempt to read non-existent page at 0xffffa8a5bf180000.
readmem: type_addr: 1, addr:ffffa8a5bf180000, size:8
vaddr_to_paddr_arm64: Can't read pgd
readmem: Can't convert a virtual address(ffff0000092a542c) to physical
address.
readmem: type_addr: 0, addr:ffff0000092a542c, size:390
check_release: Can't get the address of system_utsname
After this patch check_release() is ok, and also we are able to erase
symbol from vmcore (I checked this with kernel 4.18.0-rc4+):
# makedumpfile --split -d 31 -x vmlinux --config scrub.conf vmcore dumpfile_{1,2,3}
The kernel version is not supported.
The makedumpfile operation may be incomplete.
Checking for memory holes : [100.0 %] \
Checking for memory holes : [100.0 %] |
Checking foExcluding unnecessary pages : [100.0 %]
\
Excluding unnecessary pages : [100.0 %] \
The dumpfiles are saved to dumpfile_1, dumpfile_2, and dumpfile_3.
makedumpfile Completed.
[0] https://www.spinics.net/lists/kexec/msg21195.html
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It is useful to get the running time of a thread. Doing so in an
efficient manner can be important for performance of user applications.
Avoiding system calls in `clock_gettime` when handling
CLOCK_THREAD_CPUTIME_ID is important. Other clocks are handled in the
VDSO, but CLOCK_THREAD_CPUTIME_ID falls back on the system call.
CLOCK_THREAD_CPUTIME_ID is not handled in the VDSO since it would have
costs associated with maintaining updated user space accessible time
offsets. These offsets have to be updated everytime the a thread is
scheduled/descheduled. However, for programs regularly checking the
running time of a thread, this is a performance improvement.
This patch takes a middle ground, and adds support for cap_user_time an
optional feature of the perf_event API. This way costs are only
incurred when the perf_event api is enabled. This is done the same way
as it is in x86.
Ultimately this allows calculating the thread running time in userspace
on aarch64 as follows (adapted from perf_event_open manpage):
u32 seq, time_mult, time_shift;
u64 running, count, time_offset, quot, rem, delta;
struct perf_event_mmap_page *pc;
pc = buf; // buf is the perf event mmaped page as documented in the API.
if (pc->cap_usr_time) {
do {
seq = pc->lock;
barrier();
running = pc->time_running;
count = readCNTVCT_EL0(); // Read ARM hardware clock.
time_offset = pc->time_offset;
time_mult = pc->time_mult;
time_shift = pc->time_shift;
barrier();
} while (pc->lock != seq);
quot = (count >> time_shift);
rem = count & (((u64)1 << time_shift) - 1);
delta = time_offset + quot * time_mult +
((rem * time_mult) >> time_shift);
running += delta;
// running now has the current nanosecond level thread time.
}
Summary of changes in the patch:
For aarch64 systems, make arch_perf_update_userpage update the timing
information stored in the perf_event page. Requiring the following
calculations:
- Calculate the appropriate time_mult, and time_shift factors to convert
ticks to nano seconds for the current clock frequency.
- Adjust the mult and shift factors to avoid shift factors of 32 bits.
(possibly unnecessary)
- The time_offset userspace should apply when doing calculations:
negative the current sched time (now), because time_running and
time_enabled fields of the perf_event page have just been updated.
Toggle bits to appropriate values:
- Enable cap_user_time
Signed-off-by: Michael O'Farrell <micpof@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
machine_kexec flushes the reboot_code_buffer from the icache
after stopping the other cpus.
Commit 3b8c9f1cdf ("arm64: IPI each CPU after invalidating the I-cache
for kernel mappings") added an IPI call to flush_icache_range, which
causes a hang here, so replace the call with __flush_icache_range
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We always run userspace with interrupts enabled, but with the recent
conversion of the syscall entry/exit code to C, we don't inform the
hardirq tracing code that interrupts are about to become enabled by
virtue of restoring the EL0 SPSR.
This patch ensures that trace_hardirqs_on() is called on the syscall
return path when we return to the assembly code with interrupts still
disabled.
Fixes: f37099b699 ("arm64: convert syscall trace logic to C")
Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull in arm perf updates, including support for 64-bit (chained) event
counters and some non-critical fixes for some of the system PMU drivers.
Signed-off-by: Will Deacon <will.deacon@arm.com>
This adds support for the STACKLEAK gcc plugin to arm64 by implementing
stackleak_check_alloca(), based heavily on the x86 version, and adding the
two helpers used by the stackleak common code: current_top_of_stack() and
on_thread_stack(). The stack erasure calls are made at syscall returns.
Additionally, this disables the plugin in hypervisor and EFI stub code,
which are out of scope for the protection.
Acked-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for enabling the stackleak plugin on arm64,
we need a way to get the bounds of the current stack. Extend
on_accessible_stack to get this information.
Acked-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
[will: folded in fix for allmodconfig build breakage w/ sdei]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit d3aec8a28b ("arm64: capabilities: Restrict KPTI
detection to boot-time CPUs") we rely on errata flags being already
populated during feature enumeration. The order of errata and
features was flipped as part of commit ed478b3f9e ("arm64:
capabilities: Group handling of features and errata workarounds").
Return to the orginal order of errata and feature evaluation to
ensure errata flags are present during feature evaluation.
Fixes: ed478b3f9e ("arm64: capabilities: Group handling of
features and errata workarounds")
CC: Suzuki K Poulose <suzuki.poulose@arm.com>
CC: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This is a fix against the issue that crash dump kernel may hang up
during booting, which can happen on any ACPI-based system with "ACPI
Reclaim Memory."
(kernel messages after panic kicked off kdump)
(snip...)
Bye!
(snip...)
ACPI: Core revision 20170728
pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
Internal error: Oops: 96000021 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
task: ffff000008d05180 task.stack: ffff000008cc0000
PC is at acpi_ns_lookup+0x25c/0x3c0
LR is at acpi_ds_load1_begin_op+0xa4/0x294
(snip...)
Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
Call trace:
(snip...)
[<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
[<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
[<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
[<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
[<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
[<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
[<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
[<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
[<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
[<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
[<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
[<ffff000008badc20>] acpi_early_init+0x9c/0xd0
[<ffff000008b70d50>] start_kernel+0x3b4/0x43c
Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
---[ end trace c46ed37f9651c58e ]---
Kernel panic - not syncing: Fatal exception
Rebooting in 10 seconds..
(diagnosis)
* This fault is a data abort, alignment fault (ESR=0x96000021)
during reading out ACPI table.
* Initial ACPI tables are normally stored in system ram and marked as
"ACPI Reclaim memory" by the firmware.
* After the commit f56ab9a5b7 ("efi/arm: Don't mark ACPI reclaim
memory as MEMBLOCK_NOMAP"), those regions are differently handled
as they are "memblock-reserved", without NOMAP bit.
* So they are now excluded from device tree's "usable-memory-range"
which kexec-tools determines based on a current view of /proc/iomem.
* When crash dump kernel boots up, it tries to accesses ACPI tables by
mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
since they are no longer part of mapped system ram.
* Given that ACPI accessor/helper functions are compiled in without
unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
any unaligned access to ACPI tables can cause a fatal panic.
With this patch, acpi_os_ioremap() always honors memory attribute
information provided by the firmware (EFI) and retaining cacheability
allows the kernel safe access to ACPI tables.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by and Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There has been some confusion around what is necessary to prevent kexec
overwriting important memory regions. memblock: reserve, or nomap?
Only memblock nomap regions are reported via /proc/iomem, kexec's
user-space doesn't know about memblock_reserve()d regions.
Until commit f56ab9a5b7 ("efi/arm: Don't mark ACPI reclaim memory
as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
and thus possible for kexec to overwrite with the new kernel or initrd.
But this was always broken, as the UEFI memory map is also reserved
and not marked as nomap.
Exporting both nomap and reserved memblock types is a nuisance as
they live in different memblock structures which we can't walk at
the same time.
Take a second walk over memblock.reserved and add new 'reserved'
subnodes for the memblock_reserved() regions that aren't already
described by the existing code. (e.g. Kernel Code)
We use reserve_region_with_split() to find the gaps in existing named
regions. This handles the gap between 'kernel code' and 'kernel data'
which is memblock_reserve()d, but already partially described by
request_standard_resources(). e.g.:
| 80000000-dfffffff : System RAM
| 80080000-80ffffff : Kernel code
| 81000000-8158ffff : reserved
| 81590000-8237efff : Kernel data
| a0000000-dfffffff : Crash kernel
| e00f0000-f949ffff : System RAM
reserve_region_with_split needs kzalloc() which isn't available when
request_standard_resources() is called, use an initcall.
Reported-by: Bhupesh Sharma <bhsharma@redhat.com>
Reported-by: Tyler Baicar <tbaicar@codeaurora.org>
Suggested-by: Akashi Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: James Morse <james.morse@arm.com>
Fixes: d28f6df130 ("arm64/kexec: Add core kexec support")
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
CC: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's possible for userspace to control idx. Sanitize idx when using it
as an array index, to inhibit the potential spectre-v1 write gadget.
Found by smatch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The vDSO needs to have a unique build id in a similar manner
to the kernel and modules. Use the build salt macro.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
syscall_trace_{enter,exit} are only called from C code, so drop the
asmlinkage qualifier from their definitions.
Signed-off-by: Will Deacon <will.deacon@arm.com>
To minimize the risk of userspace-controlled values being used under
speculation, this patch adds pt_regs based syscall wrappers for arm64,
which pass the minimum set of required userspace values to syscall
implementations. For each syscall, a wrapper which takes a pt_regs
argument is automatically generated, and this extracts the arguments
before calling the "real" syscall implementation.
Each syscall has three functions generated:
* __do_<compat_>sys_<name> is the "real" syscall implementation, with
the expected prototype.
* __se_<compat_>sys_<name> is the sign-extension/narrowing wrapper,
inherited from common code. This takes a series of long parameters,
casting each to the requisite types required by the "real" syscall
implementation in __do_<compat_>sys_<name>.
This wrapper *may* not be necessary on arm64 given the AAPCS rules on
unused register bits, but it seemed safer to keep the wrapper for now.
* __arm64_<compat_>_sys_<name> takes a struct pt_regs pointer, and
extracts *only* the relevant register values, passing these on to the
__se_<compat_>sys_<name> wrapper.
The syscall invocation code is updated to handle the calling convention
required by __arm64_<compat_>_sys_<name>, and passes a single struct
pt_regs pointer.
The compiler can fold the syscall implementation and its wrappers, such
that the overhead of this approach is minimized.
Note that we play games with sys_ni_syscall(). It can't be defined with
SYSCALL_DEFINE0() because we must avoid the possibility of error
injection. Additionally, there are a couple of locations where we need
to call it from C code, and we don't (currently) have a
ksys_ni_syscall(). While it has no wrapper, passing in a redundant
pt_regs pointer is benign per the AAPCS.
When ARCH_HAS_SYSCALL_WRAPPER is selected, no prototype is defines for
sys_ni_syscall(). Since we need to treat it differently for in-kernel
calls and the syscall tables, the prototype is defined as-required.
The wrappers are largely the same as their x86 counterparts, but
simplified as we don't have a variety of compat calling conventions that
require separate stubs. Unlike x86, we have some zero-argument compat
syscalls, and must define COMPAT_SYSCALL_DEFINE0() to ensure that these
are also given an __arm64_compat_sys_ prefix.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for converting to pt_regs syscall wrappers, convert our
existing compat wrappers to C. This will allow the pt_regs wrappers to
be automatically generated, and will allow for the compat register
manipulation to be folded in with the pt_regs accesses.
To avoid confusion with the upcoming pt_regs wrappers and existing
compat wrappers provided by core code, the C wrappers are renamed to
compat_sys_aarch32_<syscall>.
With the assembly wrappers gone, we can get rid of entry32.S and the
associated boilerplate.
Note that these must call the ksys_* syscall entry points, as the usual
sys_* entry points will be modified to take a single pt_regs pointer
argument.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We don't currently annotate our mmap implementation as a syscall, as we
need to do to use pt_regs syscall wrappers.
Let's mark it as a real syscall.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We don't currently annotate our various sigreturn functions as syscalls,
as we need to do to use pt_regs syscall wrappers.
Let's mark them as real syscalls.
For compat_sys_sigreturn and compat_sys_rt_sigreturn, this changes the
return type from int to long, matching the prototypes in sys32.c.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
With pt_regs syscall wrappers, the calling convention for
sys_personality() will change. Use ksys_personality(), which is
functionally equivalent.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Our syscall tables are aligned to 4096 bytes, which allowed their
addresses to be generated with a single adrp in entry.S. This has the
unfortunate property of wasting space in .rodata for the necessary
padding.
Now that the address is generated by C code, we can rely on the compiler
to do the right thing, and drop the alignemnt.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We can zero GPRs x0 - x29 upon entry from EL0 to make it harder for
userspace to control values consumed by speculative gadgets.
We don't blat x30, since this is stashed much later, and we'll blat it
before invoking C code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that all of the syscall logic works on the saved pt_regs, apply_ssbd
can safely corrupt x0-x3 in the entry paths, and we no longer need to
restore them. So let's remove the logic doing so.
With that logic gone, we can fold the branch target into the macro, so
that callers need not deal with this. GAS provides \@, which provides a
unique value per macro invocation, which we can use to create a unique
label.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that syscalls are invoked with pt_regs, we no longer need to ensure
that the argument regsiters are live in the entry assembly, and it's
fine to not restore them after context_tracking_user_exit() has
corrupted them.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the syscall invocation logic is in C, we can migrate the rest
of the syscall entry logic over, so that the entry assembly needn't look
at the register values at all.
The SVE reset across syscall logic now unconditionally clears TIF_SVE,
but sve_user_disable() will only write back to CPACR_EL1 when SVE is
actually enabled.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently syscall tracing is a tricky assembly state machine, which can
be rather difficult to follow, and even harder to modify. Before we
start fiddling with it for pt_regs syscalls, let's convert it to C.
This is not intended to have any functional change.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As a first step towards invoking syscalls with a pt_regs argument,
convert the raw syscall invocation logic to C. We end up with a bit more
register shuffling, but the unified invocation logic means we can unify
the tracing paths, too.
Previously, assembly had to open-code calls to ni_sys() when the system
call number was out-of-bounds for the relevant syscall table. This case
is now handled by invoke_syscall(), and the assembly no longer need to
handle this case explicitly. This allows the tracing paths to be
simplified and unified, as we no longer need the __ni_sys_trace path and
the __sys_trace_return label.
This only converts the invocation of the syscall. The rest of the
syscall triage and tracing is left in assembly for now, and will be
converted in subsequent patches.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for invoking arbitrary syscalls from C code, let's define
a type for an arbitrary syscall, matching the parameter passing rules of
the AAPCS.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The arm64 sigreturn* syscall handlers are non-standard. Rather than
taking a number of user parameters in registers as per the AAPCS,
they expect the pt_regs as their sole argument.
To make this work, we override the syscall definitions to invoke
wrappers written in assembly, which mov the SP into x0, and branch to
their respective C functions.
On other architectures (such as x86), the sigreturn* functions take no
argument and instead use current_pt_regs() to acquire the user
registers. This requires less boilerplate code, and allows for other
features such as interposing C code in this path.
This patch takes the same approach for arm64.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tentatively-reviewed-by: Dave Martin <dave.martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In subsequent patches, we'll want to make use of sve_user_enable() and
sve_user_disable() outside of kernel/fpsimd.c. Let's move these to
<asm/fpsimd.h> where we can make use of them.
To avoid ifdeffery in sequences like:
if (system_supports_sve() && some_condition)
sve_user_disable();
... empty stubs are provided when support for SVE is not enabled. Note
that system_supports_sve() contains as IS_ENABLED(CONFIG_ARM64_SVE), so
the sve_user_disable() call should be optimized away entirely when
CONFIG_ARM64_SVE is not selected.
To ensure that this is the case, the stub definitions contain a
BUILD_BUG(), as we do for other stubs for which calls should always be
optimized away when the relevant config option is not selected.
At the same time, the include list of <asm/fpsimd.h> is sorted while
adding <asm/sysreg.h>.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that we have sysreg_clear_set(), we can use this instead of
change_cpacr().
Note that the order of the set and clear arguments differs between
change_cpacr() and sysreg_clear_set(), so these are flipped as part of
the conversion. Also, sve_user_enable() redundantly clears
CPACR_EL1_ZEN_EL0EN before setting it; this is removed for clarity.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that we have sysreg_clear_set(), we can consistently use this
instead of config_sctlr_el1().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In do_notify_resume, we manipulate thread_flags as a 32-bit unsigned
int, whereas thread_info::flags is a 64-bit unsigned long, and elsewhere
(e.g. in the entry assembly) we manipulate the flags as a 64-bit
quantity.
For consistency, and to avoid problems if we end up with more than 32
flags, let's make do_notify_resume take the flags as a 64-bit unsigned
long.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This reverts commit 7e7df71fd5.
When unwinding out of the IRQ stack and onto the interrupted EL1 stack,
we cannot rely on the frame pointer being strictly increasing, as this
could terminate the backtrace early depending on how the stacks have
been allocated.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Implement calls to rseq_signal_deliver, rseq_handle_notify_resume
and rseq_syscall so that we can select HAVE_RSEQ on arm64.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add support for 64bit event by using chained event counters
and 64bit cycle counters.
PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
forming a 64-bit counter. The low/even counter is programmed to count
the event of interest, and the high/odd counter is programmed to count
the CHAIN event, taken when the low/even counter overflows.
For CPU cycles, when 64bit mode is requested, the cycle counter
is used in 64bit mode. If the cycle counter is not available,
falls back to chaining.
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The arm64 PMU updates the event counters and reprograms the
counters in the overflow IRQ handler without disabling the
PMU. This could potentially cause skews in for group counters,
where the overflowed counters may potentially loose some event
counts, while they are reprogrammed. To prevent this, disable
the PMU while we process the counter overflows and enable it
right back when we are done.
This patch also moves the PMU stop/start routines to avoid a
forward declaration.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
armv8pmu_select_counter always returns the passed idx. So
let us make that void and get rid of the pointless checks.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The armpmu uses get_event_idx callback to allocate an event
counter for a given event, which marks the selected counter
as "used". Now, when we delete the counter, the arm_pmu goes
ahead and clears the "used" bit and then invokes the "clear_event_idx"
call back, which kind of splits the job between the core code
and the backend. To keep things tidy, mandate the implementation
of clear_event_idx() and add it for exisiting backends.
This will be useful for adding the chained event support, where
we leave the event idx maintenance to the backend.
Also, when an event is removed from the PMU, reset the hw.idx
to indicate that a counter is not allocated for this event,
to help the backends do better checks. This will be also used
for the chain counter support.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Convert the {read/write}_counter APIs to handle 64bit values
to enable supporting chained event counters. The backends still
use 32bit values and we pass them 32bit values only. So in effect
there are no functional changes.
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Each PMU defines their max_period of the counter as the maximum
value that can be counted. Since all the PMU backends support
32bit counters by default, let us remove the redundant field.
No functional changes.
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Current ACPI ARM64 NUMA initialization code in
acpi_numa_gicc_affinity_init()
carries out NUMA nodes creation and cpu<->node mappings at the same time
in the arch backend so that a single SRAT walk is needed to parse both
pieces of information. This implies that the cpu<->node mappings must
be stashed in an array (sized NR_CPUS) so that SMP code can later use
the stashed values to avoid another SRAT table walk to set-up the early
cpu<->node mappings.
If the kernel is configured with a NR_CPUS value less than the actual
processor entries in the SRAT (and MADT), the logic in
acpi_numa_gicc_affinity_init() is broken in that the cpu<->node mapping
is only carried out (and stashed for future use) only for a number of
SRAT entries up to NR_CPUS, which do not necessarily correspond to the
possible cpus detected at SMP initialization in
acpi_map_gic_cpu_interface() (ie MADT and SRAT processor entries order
is not enforced), which leaves the kernel with broken cpu<->node
mappings.
Furthermore, given the current ACPI NUMA code parsing logic in
acpi_numa_gicc_affinity_init(), PXM domains for CPUs that are not parsed
because they exceed NR_CPUS entries are not mapped to NUMA nodes (ie the
PXM corresponding node is not created in the kernel) leaving the system
with a broken NUMA topology.
Rework the ACPI ARM64 NUMA initialization process so that the NUMA
nodes creation and cpu<->node mappings are decoupled. cpu<->node
mappings are moved to SMP initialization code (where they are needed),
at the cost of an extra SRAT walk so that ACPI NUMA mappings can be
batched before being applied, fixing current parsing pitfalls.
Acked-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: John Garry <john.garry@huawei.com>
Fixes: d8b47fca8c ("arm64, ACPI, NUMA: NUMA support based on SRAT and
SLIT")
Link: http://lkml.kernel.org/r/1527768879-88161-2-git-send-email-xiexiuqi@huawei.com
Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Up to ARMv8.3, the combinaison of Stage-1 and Stage-2 attributes
results in the strongest attribute of the two stages. This means
that the hypervisor has to perform quite a lot of cache maintenance
just in case the guest has some non-cacheable mappings around.
ARMv8.4 solves this problem by offering a different mode (FWB) where
Stage-2 has total control over the memory attribute (this is limited
to systems where both I/O and instruction fetches are coherent with
the dcache). This is achieved by having a different set of memory
attributes in the page tables, and a new bit set in HCR_EL2.
On such a system, we can then safely sidestep any form of dcache
management.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Commit 37c3ec2d81 ("arm64: topology: divorce MC scheduling domain from
core_siblings") selected the smallest of LLC, socket siblings, and NUMA
node siblings to ensure that the sched domain we build for the MC layer
isn't larger than the DIE above it or it's shrunk to the socket or NUMA
node if LLC exist acrosis NUMA node/chiplets.
Commit acd32e52e4e0 ("arm64: topology: Avoid checking numa mask for
scheduler MC selection") reverted the NUMA siblings checks since the
CPU topology masks weren't updated on hotplug at that time.
This patch re-introduces numa mask check as the CPU and NUMA topology
is now updated in hotplug paths. Effectively, this patch does the
partial revert of commit acd32e52e4e0.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Similar to core_sibling and thread_sibling, it's better to align and
rename llc_siblings to llc_sibling.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We already repopulate the information on CPU hotplug-in, so we can safely
remove the CPU topology and NUMA cpumap information during CPU hotplug
out operation. This will help to provide the correct cpumask for
scheduler domains.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's incorrect to iterate over all the possible CPUs to update the
sibling masks when any CPU is hotplugged in. In case the topology
siblings masks of the CPU is removed when is it hotplugged out, we
end up updating those masks when one of it's sibling is powered up
again. This will provide inconsistent view.
Further, since the CPU calling update_sibling_masks is yet to be set
online, there's no need to compare itself with each online CPU when
updating the siblings masks.
This patch restricts updation of sibling masks only for CPUs that are
already online. It also the drops the unnecessary cpuid check.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds support to remove all the CPU topology information using
clear_cpu_topology and also resetting the sibling information on other
sibling CPUs. This will be used in cpu_disable so that all the topology
sibling information is removed on CPU hotplug out.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently numa_clear_node removes both cpu information from the NUMA
node cpumap as well as the NUMA node id from the cpu. Similarly
numa_store_cpu_info updates both percpu nodeid and NUMA cpumap.
However we need to retain the numa node id for the cpu and only remove
the cpu information from the numa node cpumap during CPU hotplug out.
The same can be extended for hotplugging in the CPU.
This patch separates out numa_{add,remove}_cpu from numa_clear_node and
numa_store_cpu_info.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently reset_cpu_topology clears all the CPU topology information
and resets to default values. However we may need to just clear the
information when we hotplug out the CPU. In preparation to add the
support the same, let's refactor reset_cpu_topology to just reset
the information and move clearing out the topology information to
clear_cpu_topology.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ERRATA_MIDR_REV_RANGE macro assigns ARM64_CPUCAP_LOCAL_CPU_ERRATUM
to the '.type' field of the 'struct arm64_cpu_capabilities', so there's
no need to assign it explicitly as well.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Patching kernel instructions at runtime requires other CPUs to undergo
a context synchronisation event via an explicit ISB or an IPI in order
to ensure that the new instructions are visible. This is required even
for "hotpatch" instructions such as NOP and BL, so avoid optimising in
this case and always go via stop_machine() when performing general
patching.
ftrace isn't quite as strict, so it can continue to call the nosync
code directly.
Signed-off-by: Will Deacon <will.deacon@arm.com>
When invalidating the instruction cache for a kernel mapping via
flush_icache_range(), it is also necessary to flush the pipeline for
other CPUs so that instructions fetched into the pipeline before the
I-cache invalidation are discarded. For example, if module 'foo' is
unloaded and then module 'bar' is loaded into the same area of memory,
a CPU could end up executing instructions from 'foo' when branching into
'bar' if these instructions were fetched into the pipeline before 'foo'
was unloaded.
Whilst this is highly unlikely to occur in practice, particularly as
any exception acts as a context-synchronizing operation, following the
letter of the architecture requires us to execute an ISB on each CPU
in order for the new instruction stream to be visible.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some code cares about the SPSR_ELx format for exceptions taken from
AArch32 to inspect or manipulate the SPSR_ELx value, which is already in
the SPSR_ELx format, and not in the AArch32 PSR format.
To separate these from cases where we care about the AArch32 PSR format,
migrate these cases to use the PSR_AA32_* definitions rather than
COMPAT_PSR_*.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SPSR_ELx format for exceptions taken from AArch32 is slightly
different to the AArch32 PSR format.
Map between the two in the compat ptrace code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a5 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SPSR_ELx format for exceptions taken from AArch32 differs from the
AArch32 PSR format. Thus, we must translate between the two when setting
up a compat sigframe, or restoring context from a compat sigframe.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a5 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently valid_user_regs() treats SPSR_ELx.DIT as a RES0 bit, causing
it to be zeroed upon exception return, rather than preserved. Thus, code
relying on DIT will not function as expected, and may expose an
unexpected timing sidechannel.
Let's remove DIT from the set of RES0 bits, such that it is preserved.
At the same time, the related comment is updated to better describe the
situation, and to take into account the most recent documentation of
SPSR_ELx, in ARM DDI 0487C.a.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a5 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Track mismatches in the cache type register (CTR_EL0), other
than the D/I min line sizes and trap user accesses if there are any.
Fixes: be68a8aaf9 ("arm64: cpufeature: Fix CTR_EL0 field definitions")
Cc: <stable@vger.kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If there is a mismatch in the I/D min line size, we must
always use the system wide safe value both in applications
and in the kernel, while performing cache operations. However,
we have been checking more bits than just the min line sizes,
which triggers false negatives. We may need to trap the user
accesses in such cases, but not necessarily patch the kernel.
This patch fixes the check to do the right thing as advertised.
A new capability will be added to check mismatches in other
fields and ensure we trap the CTR accesses.
Fixes: be68a8aaf9 ("arm64: cpufeature: Fix CTR_EL0 field definitions")
Cc: <stable@vger.kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently machine_kexec() doesn't reset to EL2 in the case of a
crashdump kernel. This leaves potentially dodgy state active at EL2, and
means that if the crashdump kernel attempts to online secondary CPUs,
these will be booted as mismatched ELs.
Let's reset to EL2, as we do in all other cases, and simplify things. If
EL2 state is corrupt, things are already sufficiently bad that kdump is
unlikely to work, and it's best-effort regardless.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The implementation of flush_icache_range() includes instruction sequences
which are themselves patched at runtime, so it is not safe to call from
the patching framework.
This patch reworks the alternatives cache-flushing code so that it rolls
its own internal D-cache maintenance using DC CIVAC before invalidating
the entire I-cache after all alternatives have been applied at boot.
Modules don't cause any issues, since flush_icache_range() is safe to
call by the time they are loaded.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Rohit Khanna <rokhanna@nvidia.com>
Cc: Alexander Van Brunt <avanbrunt@nvidia.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joel Fernandes <joel.opensrc@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1529981939-8231-7-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We can't pass the breakpoint directly on arch_check_bp_in_kernelspace()
anymore because its architecture internal datas (struct arch_hw_breakpoint)
are not yet filled by the time we call the function, and most
implementation need this backend to be up to date. So arrange the
function to take the probing struct instead.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joel Fernandes <joel.opensrc@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1529981939-8231-3-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We inspect __kpti_forced early on as part of the cpufeature enable
callback which remaps the swapper page table using non-global entries.
Ensure that __kpti_forced has been updated to reflect the kpti=
command-line option before we start using it.
Fixes: ea1e3de85e ("arm64: entry: Add fake CPU feature for unmapping the kernel at EL0")
Cc: <stable@vger.kernel.org> # 4.16.x-
Reported-by: Wei Xu <xuwei5@hisilicon.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Clear current_kprobe and enable preemption in kprobe
even if pre_handler returns !0.
This simplifies function override using kprobes.
Jprobe used to require to keep the preemption disabled and
keep current_kprobe until it returned to original function
entry. For this reason kprobe_int3_handler() and similar
arch dependent kprobe handers checks pre_handler result
and exit without enabling preemption if the result is !0.
After removing the jprobe, Kprobes does not need to
keep preempt disabled even if user handler returns !0
anymore.
But since the function override handler in error-inject
and bpf is also returns !0 if it overrides a function,
to balancing the preempt count, it enables preemption
and reset current kprobe by itself.
That is a bad design that is very buggy. This fixes
such unbalanced preempt-count and current_kprobes setting
in kprobes, bpf and error-inject.
Note: for powerpc and x86, this removes all preempt_disable
from kprobe_ftrace_handler because ftrace callbacks are
called under preempt disabled.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: https://lore.kernel.org/lkml/152942494574.15209.12323837825873032258.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Don't call the ->break_handler() from the arm64 kprobes code,
because it was only used by jprobes which got removed.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/lkml/152942474231.15209.17684808374429473004.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We can't call function trace hook before setup percpu offset.
When entering secondary_start_kernel(), percpu offset has not
been initialized. So this lead hotplug malfunction.
Here is the flow to reproduce this bug:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo function > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on
echo 1 > /sys/devices/system/cpu/cpu1/online
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Zhizhou Zhang <zhizhouzhang@asrmicro.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The changes to automatically test for working stack protector compiler
support in the Kconfig files removed the special STACKPROTECTOR_AUTO
option that picked the strongest stack protector that the compiler
supported.
That was all a nice cleanup - it makes no sense to have the AUTO case
now that the Kconfig phase can just determine the compiler support
directly.
HOWEVER.
It also meant that doing "make oldconfig" would now _disable_ the strong
stackprotector if you had AUTO enabled, because in a legacy config file,
the sane stack protector configuration would look like
CONFIG_HAVE_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_NONE is not set
# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_CC_STACKPROTECTOR_AUTO=y
and when you ran this through "make oldconfig" with the Kbuild changes,
it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
used to be disabled (because it was really enabled by AUTO), and would
disable it in the new config, resulting in:
CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
That's dangerously subtle - people could suddenly find themselves with
the weaker stack protector setup without even realizing.
The solution here is to just rename not just the old RECULAR stack
protector option, but also the strong one. This does that by just
removing the CC_ prefix entirely for the user choices, because it really
is not about the compiler support (the compiler support now instead
automatially impacts _visibility_ of the options to users).
This results in "make oldconfig" actually asking the user for their
choice, so that we don't have any silent subtle security model changes.
The end result would generally look like this:
CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
where the "CC_" versions really are about internal compiler
infrastructure, not the user selections.
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed (Kees)
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsgVtMWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhsJEACLYe2EbwLFJz7emOT1KUGK5R1b
oVxJog0893WyMqgk9XBlA2lvTBRBYzR3tzsadfYo87L3VOBzazUv0YZaweJb65sF
bAvxW3nY06brhKKwTRed1PrMa1iG9R63WISnNAuZAq7+79mN6YgW4G6YSAEF9lW7
oPJoPw93YxcI8JcG+dA8BC9w7pJFKooZH4gvLUSUNl5XKr8Ru5YnWcV8F+8M4vZI
EJtXFmdlmxAledUPxTSCIojO8m/tNOjYTreBJt9K1DXKY6UcgAdhk75TRLEsp38P
fPvMigYQpBDnYz2pi9ourTgvZLkffK1OBZ46PPt8BgUZVf70D6CBg10vK47KO6N2
zreloxkMTrz5XohyjfNjYFRkyyuwV2sSVrRJqF4dpyJ4NJQRjvyywxIP4Myifwlb
ONipCM1EjvQjaEUbdcqKgvlooMdhcyxfshqJWjHzXB6BL22uPzq5jHXXugz8/ol8
tOSM2FuJ2sBLQso+szhisxtMd11PihzIZK9BfxEG3du+/hlI+2XgN7hnmlXuA2k3
BUW6BSDhab41HNd6pp50bDJnL0uKPWyFC6hqSNZw+GOIb46jfFcQqnCB3VZGCwj3
LH53Be1XlUrttc/NrtkvVhm4bdxtfsp4F7nsPFNDuHvYNkalAVoC3An0BzOibtkh
AtfvEeaPHaOyD8/h2Q==
=zUUp
-----END PGP SIGNATURE-----
Merge tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull more overflow updates from Kees Cook:
"The rest of the overflow changes for v4.18-rc1.
This includes the explicit overflow fixes from Silvio, further
struct_size() conversions from Matthew, and a bug fix from Dan.
But the bulk of it is the treewide conversions to use either the
2-factor argument allocators (e.g. kmalloc(a * b, ...) into
kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a *
b) into vmalloc(array_size(a, b)).
Coccinelle was fighting me on several fronts, so I've done a bunch of
manual whitespace updates in the patches as well.
Summary:
- Error path bug fix for overflow tests (Dan)
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed
(Kees)"
* tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
treewide: Use array_size in f2fs_kvzalloc()
treewide: Use array_size() in f2fs_kzalloc()
treewide: Use array_size() in f2fs_kmalloc()
treewide: Use array_size() in sock_kmalloc()
treewide: Use array_size() in kvzalloc_node()
treewide: Use array_size() in vzalloc_node()
treewide: Use array_size() in vzalloc()
treewide: Use array_size() in vmalloc()
treewide: devm_kzalloc() -> devm_kcalloc()
treewide: devm_kmalloc() -> devm_kmalloc_array()
treewide: kvzalloc() -> kvcalloc()
treewide: kvmalloc() -> kvmalloc_array()
treewide: kzalloc_node() -> kcalloc_node()
treewide: kzalloc() -> kcalloc()
treewide: kmalloc() -> kmalloc_array()
mm: Introduce kvcalloc()
video: uvesafb: Fix integer overflow in allocation
UBIFS: Fix potential integer overflow in allocation
leds: Use struct_size() in allocation
Convert intel uncore to struct_size
...
* ARM: lazy context-switching of FPSIMD registers on arm64, "split"
regions for vGIC redistributor
* s390: cleanups for nested, clock handling, crypto, storage keys and
control register bits
* x86: many bugfixes, implement more Hyper-V super powers,
implement lapic_timer_advance_ns even when the LAPIC timer
is emulated using the processor's VMX preemption timer. Two
security-related bugfixes at the top of the branch.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJbH8Z/AAoJEL/70l94x66DF+UIAJeOuTp6LGasT/9uAb2OovaN
+5kGmOPGFwkTcmg8BQHI2fXT4vhxMXWPFcQnyig9eXJVxhuwluXDOH4P9IMay0yw
VDCBsWRdMvZDQad2hn6Z5zR4Jx01XrSaG/KqvXbbDKDCy96mWG7SYAY2m3ZwmeQi
3Pa3O3BTijr7hBYnMhdXGkSn4ZyU8uPaAgIJ8795YKeOJ2JmioGYk6fj6y2WCxA3
ztJymBjTmIoZ/F8bjuVouIyP64xH4q9roAyw4rpu7vnbWGqx1fjPYJoB8yddluWF
JqCPsPzhKDO7mjZJy+lfaxIlzz2BN7tKBNCm88s5GefGXgZwk3ByAq/0GQ2M3rk=
=H5zI
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small update for KVM:
ARM:
- lazy context-switching of FPSIMD registers on arm64
- "split" regions for vGIC redistributor
s390:
- cleanups for nested
- clock handling
- crypto
- storage keys
- control register bits
x86:
- many bugfixes
- implement more Hyper-V super powers
- implement lapic_timer_advance_ns even when the LAPIC timer is
emulated using the processor's VMX preemption timer.
- two security-related bugfixes at the top of the branch"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (79 commits)
kvm: fix typo in flag name
kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access
KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system
KVM: x86: introduce linear_{read,write}_system
kvm: nVMX: Enforce cpl=0 for VMX instructions
kvm: nVMX: Add support for "VMWRITE to any supported field"
kvm: nVMX: Restrict VMX capability MSR changes
KVM: VMX: Optimize tscdeadline timer latency
KVM: docs: nVMX: Remove known limitations as they do not exist now
KVM: docs: mmu: KVM support exposing SLAT to guests
kvm: no need to check return value of debugfs_create functions
kvm: Make VM ioctl do valloc for some archs
kvm: Change return type to vm_fault_t
KVM: docs: mmu: Fix link to NPT presentation from KVM Forum 2008
kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation
KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation
KVM: introduce kvm_make_vcpus_request_mask() API
KVM: x86: hyperv: do rep check for each hypercall separately
...
- Spectre v4 mitigation (Speculative Store Bypass Disable) support for
arm64 using SMC firmware call to set a hardware chicken bit
- ACPI PPTT (Processor Properties Topology Table) parsing support and
enable the feature for arm64
- Report signal frame size to user via auxv (AT_MINSIGSTKSZ). The
primary motivation is Scalable Vector Extensions which requires more
space on the signal frame than the currently defined MINSIGSTKSZ
- ARM perf patches: allow building arm-cci as module, demote dev_warn()
to dev_dbg() in arm-ccn event_init(), miscellaneous cleanups
- cmpwait() WFE optimisation to avoid some spurious wakeups
- L1_CACHE_BYTES reverted back to 64 (for performance reasons that have
to do with some network allocations) while keeping ARCH_DMA_MINALIGN
to 128. cache_line_size() returns the actual hardware Cache Writeback
Granule
- Turn LSE atomics on by default in Kconfig
- Kernel fault reporting tidying
- Some #include and miscellaneous cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlsaoqsACgkQa9axLQDI
XvH+8RAAqRCrEtkNPS7zxHyMK/D2cxSy9EVtlJ1sxhmsONEe5t5MDTWX9byobQ5A
PAKMSQBQgUvecqHLOtD7SJWef1il30zgWmc/yPcgNv3OsA1Au7j2g3ht/Drw+N5I
Vy0aOUEtw+Jzs7y/CJyl6lufSkkOzszOujt2Nybiz6omztOrwkW9isKnURzQBNj5
gquZI35h604YJ9F0TqS6ZqU7tNcuB9q02FxvVBpLmb83jP4jSEjYACUJwVVxvEAB
UXjdD4N130rRXDS5OMRWo5+4SAj+kPYhdVYEvaDx7xTOIRHhXK05GlJbsUAc5E6l
xy810fH5Dm0diYpVvYWTA5J+BU1jNOvCys5zKWl7gs2P8YB59PdqY4M2YBPNGb5H
PaVgq73TZAsww6ZInbZlK+wZOIxZZIOf//Z+QKn6EPtu3RmzIFWwyttTj01w1E3i
LhjcUoGnvxJFcMoCr59ihDwfP9nkCVrNc4REOGaWDk6L/t/bOfaZfDz+OCGbwQdL
akCFKZI6q5O/no+YfhtdtNFpCQb/Bo1J88KuotICRXq8z4vO41zIG53bi97W8QeG
rCBiX0NxUxYJ3ybus7kZHTmMGieMyEHP28n12QffwvJj4vJBsUXQBrV8hclx0djZ
HMt7iPi/0BW6nVV7ngIgN3cdCpaDCEGRsfO4Ch0rFZrC9UbYQnE=
=uums
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Apart from the core arm64 and perf changes, the Spectre v4 mitigation
touches the arm KVM code and the ACPI PPTT support touches drivers/
(acpi and cacheinfo). I should have the maintainers' acks in place.
Summary:
- Spectre v4 mitigation (Speculative Store Bypass Disable) support
for arm64 using SMC firmware call to set a hardware chicken bit
- ACPI PPTT (Processor Properties Topology Table) parsing support and
enable the feature for arm64
- Report signal frame size to user via auxv (AT_MINSIGSTKSZ). The
primary motivation is Scalable Vector Extensions which requires
more space on the signal frame than the currently defined
MINSIGSTKSZ
- ARM perf patches: allow building arm-cci as module, demote
dev_warn() to dev_dbg() in arm-ccn event_init(), miscellaneous
cleanups
- cmpwait() WFE optimisation to avoid some spurious wakeups
- L1_CACHE_BYTES reverted back to 64 (for performance reasons that
have to do with some network allocations) while keeping
ARCH_DMA_MINALIGN to 128. cache_line_size() returns the actual
hardware Cache Writeback Granule
- Turn LSE atomics on by default in Kconfig
- Kernel fault reporting tidying
- Some #include and miscellaneous cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (53 commits)
arm64: Fix syscall restarting around signal suppressed by tracer
arm64: topology: Avoid checking numa mask for scheduler MC selection
ACPI / PPTT: fix build when CONFIG_ACPI_PPTT is not enabled
arm64: cpu_errata: include required headers
arm64: KVM: Move VCPU_WORKAROUND_2_FLAG macros to the top of the file
arm64: signal: Report signal frame size to userspace via auxv
arm64/sve: Thin out initialisation sanity-checks for sve_max_vl
arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
arm64: KVM: Add HYP per-cpu accessors
arm64: ssbd: Add prctl interface for per-thread mitigation
arm64: ssbd: Introduce thread flag to control userspace mitigation
arm64: ssbd: Restore mitigation status on CPU resume
arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
arm64: ssbd: Add global mitigation state accessor
arm64: Add 'ssbd' command-line option
arm64: Add ARCH_WORKAROUND_2 probing
arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
...
Commit 17c2895 ("arm64: Abstract syscallno manipulation") abstracts
out the pt_regs.syscallno value for a syscall cancelled by a tracer
as NO_SYSCALL, and provides helpers to set and check for this
condition. However, the way this was implemented has the
unintended side-effect of disabling part of the syscall restart
logic.
This comes about because the second in_syscall() check in
do_signal() re-evaluates the "in a syscall" condition based on the
updated pt_regs instead of the original pt_regs. forget_syscall()
is explicitly called prior to the second check in order to prevent
restart logic in the ret_to_user path being spuriously triggered,
which means that the second in_syscall() check always yields false.
This triggers a failure in
tools/testing/selftests/seccomp/seccomp_bpf.c, when using ptrace to
suppress a signal that interrups a nanosleep() syscall.
Misbehaviour of this type is only expected in the case where a
tracer suppresses a signal and the target process is either being
single-stepped or the interrupted syscall attempts to restart via
-ERESTARTBLOCK.
This patch restores the old behaviour by performing the
in_syscall() check only once at the start of the function.
Fixes: 17c2895860 ("arm64: Abstract syscallno manipulation")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reported-by: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.14.x-
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The numa mask subset check can often lead to system hang or crash during
CPU hotplug and system suspend operation if NUMA is disabled. This is
mostly observed on HMP systems where the CPU compute capacities are
different and ends up in different scheduler domains. Since
cpumask_of_node is returned instead core_sibling, the scheduler is
confused with incorrect cpumasks(e.g. one CPU in two different sched
domains at the same time) on CPU hotplug.
Lets disable the NUMA siblings checks for the time being, as NUMA in
socket machines have LLC's that will assure that the scheduler topology
isn't "borken".
The NUMA check exists to assure that if a LLC within a socket crosses
NUMA nodes/chiplets the scheduler domains remain consistent. This code will
likely have to be re-enabled in the near future once the NUMA mask story
is sorted. At the moment its not necessary because the NUMA in socket
machines LLC's are contained within the NUMA domains.
Further, as a defensive mechanism during hot-plug, lets assure that the
LLC siblings are also masked.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Without including psci.h and arm-smccc.h, we now get a build failure in
some configurations:
arch/arm64/kernel/cpu_errata.c: In function 'arm64_update_smccc_conduit':
arch/arm64/kernel/cpu_errata.c:278:10: error: 'psci_ops' undeclared (first use in this function); did you mean 'sysfs_ops'?
arch/arm64/kernel/cpu_errata.c: In function 'arm64_set_ssbd_mitigation':
arch/arm64/kernel/cpu_errata.c:311:3: error: implicit declaration of function 'arm_smccc_1_1_hvc' [-Werror=implicit-function-declaration]
arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL);
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull timers and timekeeping updates from Thomas Gleixner:
- Core infrastucture work for Y2038 to address the COMPAT interfaces:
+ Add a new Y2038 safe __kernel_timespec and use it in the core
code
+ Introduce config switches which allow to control the various
compat mechanisms
+ Use the new config switch in the posix timer code to control the
32bit compat syscall implementation.
- Prevent bogus selection of CPU local clocksources which causes an
endless reselection loop
- Remove the extra kthread in the clocksource code which has no value
and just adds another level of indirection
- The usual bunch of trivial updates, cleanups and fixlets all over the
place
- More SPDX conversions
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
clocksource/drivers/mxs_timer: Switch to SPDX identifier
clocksource/drivers/timer-imx-tpm: Switch to SPDX identifier
clocksource/drivers/timer-imx-gpt: Switch to SPDX identifier
clocksource/drivers/timer-imx-gpt: Remove outdated file path
clocksource/drivers/arc_timer: Add comments about locking while read GFRC
clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages
clocksource/drivers/sprd: Fix Kconfig dependency
clocksource: Move inline keyword to the beginning of function declarations
timer_list: Remove unused function pointer typedef
timers: Adjust a kernel-doc comment
tick: Prefer a lower rating device only if it's CPU local device
clocksource: Remove kthread
time: Change nanosleep to safe __kernel_* types
time: Change types to new y2038 safe __kernel_* types
time: Fix get_timespec64() for y2038 safe compat interfaces
time: Add new y2038 safe __kernel_timespec
posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME
time: Introduce CONFIG_COMPAT_32BIT_TIME
time: Introduce CONFIG_64BIT_TIME in architectures
compat: Enable compat_get/put_timespec64 always
...
Pull siginfo updates from Eric Biederman:
"This set of changes close the known issues with setting si_code to an
invalid value, and with not fully initializing struct siginfo. There
remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64
and x86 to get the code that generates siginfo into a simpler and more
maintainable state. Most of that work involves refactoring the signal
handling code and thus careful code review.
Also not included is the work to shrink the in kernel version of
struct siginfo. That depends on getting the number of places that
directly manipulate struct siginfo under control, as it requires the
introduction of struct kernel_siginfo for the in kernel things.
Overall this set of changes looks like it is making good progress, and
with a little luck I will be wrapping up the siginfo work next
development cycle"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
signal/sh: Stop gcc warning about an impossible case in do_divide_error
signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions
signal/um: More carefully relay signals in relay_signal.
signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR}
signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code
signal/signalfd: Add support for SIGSYS
signal/signalfd: Remove __put_user from signalfd_copyinfo
signal/xtensa: Use force_sig_fault where appropriate
signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
signal/um: Use force_sig_fault where appropriate
signal/sparc: Use force_sig_fault where appropriate
signal/sparc: Use send_sig_fault where appropriate
signal/sh: Use force_sig_fault where appropriate
signal/s390: Use force_sig_fault where appropriate
signal/riscv: Replace do_trap_siginfo with force_sig_fault
signal/riscv: Use force_sig_fault where appropriate
signal/parisc: Use force_sig_fault where appropriate
signal/parisc: Use force_sig_mceerr where appropriate
signal/openrisc: Use force_sig_fault where appropriate
signal/nios2: Use force_sig_fault where appropriate
...
- Lazy context-switching of FPSIMD registers on arm64
- Allow virtual redistributors to be part of two or more MMIO ranges
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlsRY14VHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDMGwP/A3FDrzGSjgC65m037/dsQj/Eniv
NkpueEVO3Z8UN44j0TNdeUzj6vQD376GVDwnW3mFlQ416A4ZwwHkk8cQhbpP2UvQ
EqKKUgujvLueZeuAwYG/DtrR9VZ6fh7QLD7Mv8DW/0AaNdBN2LyHEkW0qx7cSXqu
PijTsImj9B8TSykYc0SlJz7Q7Y5QUOYbWrJqqa1cskOdmpN2ATInnA2haXeO7j8v
lkb+WZ9R6xiJSzMCeLEzFV6tUvTiaSw5lVL64jpJhbkBNWPIVAza0erm9TSlQaTw
d3uJlAy0W9UkXSSqvbmtXvBFqCyEOzZ0hwi2MF6RoVuFt1yXwLgHGps6OUkho4Kq
pXWImaRHwxyQGrOY0qm0cxr+6TjYnjn8rIOzmzBOrKKq+aCIQ+Sl+CtNYzczQYeE
rOFBQFsMlzSRJWyabUjhBGFNfDmZZaVFKnUekEqXXETtLxzLZtx+W9i4tzoA1stv
y0+4yAjEyOQoRsAAE3GmzpDsu7Eu2sae6+lTo7DX1y+A7Wi94HKmy47sVjrS+evV
2SLyVZ4mhwMzaQ7ngrjHLD1GXDlBxxk2X+NSmBVe5z4AsuWeoqy81f0rgjyCQNxo
swEqs0k7mMDo8GQNjawwzhdDuHYm4gTX5iGs/Nxx6K4OoJ0bgv83yb/goArp+LEU
/QWT4T37A/pEEECe
=DUmC
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM updates for 4.18
- Lazy context-switching of FPSIMD registers on arm64
- Allow virtual redistributors to be part of two or more MMIO ranges
Stateful CPU architecture extensions may require the signal frame
to grow to a size that exceeds the arch's MINSIGSTKSZ #define.
However, changing this #define is an ABI break.
To allow userspace the option of determining the signal frame size
in a more forwards-compatible way, this patch adds a new auxv entry
tagged with AT_MINSIGSTKSZ, which provides the maximum signal frame
size that the process can observe during its lifetime.
If AT_MINSIGSTKSZ is absent from the aux vector, the caller can
assume that the MINSIGSTKSZ #define is sufficient. This allows for
a consistent interface with older kernels that do not provide
AT_MINSIGSTKSZ.
The idea is that libc could expose this via sysconf() or some
similar mechanism.
There is deliberately no AT_SIGSTKSZ. The kernel knows nothing
about userspace's own stack overheads and should not pretend to
know.
For arm64:
The primary motivation for this interface is the Scalable Vector
Extension, which can require at least 4KB or so of extra space
in the signal frame for the largest hardware implementations.
To determine the correct value, a "Christmas tree" mode (via the
add_all argument) is added to setup_sigframe_layout(), to simulate
addition of all possible records to the signal frame at maximum
possible size.
If this procedure goes wrong somehow, resulting in a stupidly large
frame layout and hence failure of sigframe_alloc() to allocate a
record to the frame, then this is indicative of a kernel bug. In
this case, we WARN() and no attempt is made to populate
AT_MINSIGSTKSZ for userspace.
For arm64 SVE:
The SVE context block in the signal frame needs to be considered
too when computing the maximum possible signal frame size.
Because the size of this block depends on the vector length, this
patch computes the size based not on the thread's current vector
length but instead on the maximum possible vector length: this
determines the maximum size of SVE context block that can be
observed in any signal frame for the lifetime of the process.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the kernel SVE support is reasonably mature, it is
excessive to default sve_max_vl to the invalid value -1 and then
sprinkle WARN_ON()s around the place to make sure it has been
initialised before use. The cpufeatures code already runs pretty
early, and will ensure sve_max_vl gets initialised.
This patch initialises sve_max_vl to something sane that will be
supported by every SVE implementation, and removes most of the
sanity checks.
The checks in find_supported_vector_length() are retained for now.
If anything goes horribly wrong, we are likely to trip a check here
sooner or later.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3,
add a small(-ish) sequence to handle it at EL2. Special care must
be taken to track the state of the guest itself by updating the
workaround flags. We also rely on patching to enable calls into
the firmware.
Note that since we need to execute branches, this always executes
after the Spectre-v2 mitigation has been applied.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If running on a system that performs dynamic SSBD mitigation, allow
userspace to request the mitigation for itself. This is implemented
as a prctl call, allowing the mitigation to be enabled or disabled at
will for this particular thread.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to allow userspace to be mitigated on demand, let's
introduce a new thread flag that prevents the mitigation from
being turned off when exiting to userspace, and doesn't turn
it on on entry into the kernel (with the assumption that the
mitigation is always enabled in the kernel itself).
This will be used by a prctl interface introduced in a later
patch.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On a system where firmware can dynamically change the state of the
mitigation, the CPU will always come up with the mitigation enabled,
including when coming back from suspend.
If the user has requested "no mitigation" via a command line option,
let's enforce it by calling into the firmware again to disable it.
Similarily, for a resume from hibernate, the mitigation could have
been disabled by the boot kernel. Let's ensure that it is set
back on in that case.
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to avoid checking arm64_ssbd_callback_required on each
kernel entry/exit even if no mitigation is required, let's
add yet another alternative that by default jumps over the mitigation,
and that gets nop'ed out if we're doing dynamic mitigation.
Think of it as a poor man's static key...
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On a system where the firmware implements ARCH_WORKAROUND_2,
it may be useful to either permanently enable or disable the
workaround for cases where the user decides that they'd rather
not get a trap overhead, and keep the mitigation permanently
on or off instead of switching it on exception entry/exit.
In any case, default to the mitigation being enabled.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As for Spectre variant-2, we rely on SMCCC 1.1 to provide the
discovery mechanism for detecting the SSBD mitigation.
A new capability is also allocated for that purpose, and a
config option.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In a heterogeneous system, we can end up with both affected and
unaffected CPUs. Let's check their status before calling into the
firmware.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order for the kernel to protect itself, let's call the SSBD mitigation
implemented by the higher exception level (either hypervisor or firmware)
on each transition between userspace and kernel.
We must take the PSCI conduit into account in order to target the
right exception level, hence the introduction of a runtime patching
callback.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the host SVE context can be saved on demand from Hyp,
there is no longer any need to save this state in advance before
entering the guest.
This patch removes the relevant call to
kvm_fpsimd_flush_cpu_state().
Since the problem that function was intended to solve now no longer
exists, the function and its dependencies are also deleted.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to make sve_save_state()/sve_load_state() more easily
reusable and to get rid of a potential branch on context switch
critical paths, this patch makes sve_pffr() inline and moves it to
fpsimd.h.
<asm/processor.h> must be included in fpsimd.h in order to make
this work, and this creates an #include cycle that is tricky to
avoid without modifying core code, due to the way the PR_SVE_*()
prctl helpers are included in the core prctl implementation.
Instead of breaking the cycle, this patch defers inclusion of
<asm/fpsimd.h> in <asm/processor.h> until the point where it is
actually needed: i.e., immediately before the prctl definitions.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
sve_pffr(), which is used to derive the base address used for
low-level SVE save/restore routines, currently takes the relevant
task_struct as an argument.
The only accessed fields are actually part of thread_struct, so
this patch changes the argument type accordingly. This is done in
preparation for moving this function to a header, where we do not
want to have to include <linux/sched.h> due to the consequent
circular #include problems.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Having read_zcr_features() inline in cpufeature.h results in that
header requiring #includes which make it hard to include
<asm/fpsimd.h> elsewhere without triggering header inclusion
cycles.
This is not a hot-path function and arguably should not be in
cpufeature.h in the first place, so this patch moves it to
fpsimd.c, compiled conditionally if CONFIG_ARM64_SVE=y.
This allows some SVE-related #includes to be dropped from
cpufeature.h, which will ease future maintenance.
A couple of missing #includes of <asm/fpsimd.h> are exposed by this
change under arch/arm64/. This patch adds the missing #includes as
necessary.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch refactors KVM to align the host and guest FPSIMD
save/restore logic with each other for arm64. This reduces the
number of redundant save/restore operations that must occur, and
reduces the common-case IRQ blackout time during guest exit storms
by saving the host state lazily and optimising away the need to
restore the host state before returning to the run loop.
Four hooks are defined in order to enable this:
* kvm_arch_vcpu_run_map_fp():
Called on PID change to map necessary bits of current to Hyp.
* kvm_arch_vcpu_load_fp():
Set up FP/SIMD for entering the KVM run loop (parse as
"vcpu_load fp").
* kvm_arch_vcpu_ctxsync_fp():
Get FP/SIMD into a safe state for re-enabling interrupts after a
guest exit back to the run loop.
For arm64 specifically, this involves updating the host kernel's
FPSIMD context tracking metadata so that kernel-mode NEON use
will cause the vcpu's FPSIMD state to be saved back correctly
into the vcpu struct. This must be done before re-enabling
interrupts because kernel-mode NEON may be used by softirqs.
* kvm_arch_vcpu_put_fp():
Save guest FP/SIMD state back to memory and dissociate from the
CPU ("vcpu_put fp").
Also, the arm64 FPSIMD context switch code is updated to enable it
to save back FPSIMD state for a vcpu, not just current. A few
helpers drive this:
* fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
mark this CPU as having context fp (which may belong to a vcpu)
currently loaded in its registers. This is the non-task
equivalent of the static function fpsimd_bind_to_cpu() in
fpsimd.c.
* task_fpsimd_save():
exported to allow KVM to save the guest's FPSIMD state back to
memory on exit from the run loop.
* fpsimd_flush_state():
invalidate any context's FPSIMD state that is currently loaded.
Used to disassociate the vcpu from the CPU regs on run loop exit.
These changes allow the run loop to enable interrupts (and thus
softirqs that may use kernel-mode NEON) without having to save the
guest's FPSIMD state eagerly.
Some new vcpu_arch fields are added to make all this work. Because
host FPSIMD state can now be saved back directly into current's
thread_struct as appropriate, host_cpu_context is no longer used
for preserving the FPSIMD state. However, it is still needed for
preserving other things such as the host's system registers. To
avoid ABI churn, the redundant storage space in host_cpu_context is
not removed for now.
arch/arm is not addressed by this patch and continues to use its
current save/restore logic. It could provide implementations of
the helpers later if desired.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In preparation for optimising the way KVM manages switching the
guest and host FPSIMD state, it is necessary to provide a means for
code outside arch/arm64/kernel/fpsimd.c to restore the user trap
configuration for SVE correctly for the current task.
Rather than requiring external code to duplicate the maintenance
explicitly, this patch moves the trap maintenenace to
fpsimd_bind_to_cpu(), since it is logically part of the work of
associating the current task with the cpu.
Because fpsimd_bind_to_cpu() is rather a cryptic name to publish
alongside fpsimd_bind_state_to_cpu(), the former function is
renamed to fpsimd_bind_task_to_cpu() to make its purpose more
explicit.
This patch makes appropriate changes to ensure that
fpsimd_bind_task_to_cpu() is always called alongside
task_fpsimd_load(), so that the trap maintenance continues to be
done in every situation where it was done prior to this patch.
As a side-effect, the metadata updates done by
fpsimd_bind_task_to_cpu() now change from conditional to
unconditional in the "already bound" case of sigreturn. This is
harmless, and a couple of extra stores on this slow path will not
impact performance. I consider this a reasonable price to pay for
a slightly cleaner interface.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Currently the FPSIMD handling code uses the condition task->mm ==
NULL as a hint that task has no FPSIMD register context.
The ->mm check is only there to filter out tasks that cannot
possibly have FPSIMD context loaded, for optimisation purposes.
Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
saving FPSIMD context back to memory. For these reasons, the ->mm
checks are not useful, providing that TIF_FOREIGN_FPSTATE is
maintained in a consistent way for all threads.
The context switch logic is already deliberately optimised to defer
reloads of the regs until ret_to_user (or sigreturn as a special
case), and save them only if they have been previously loaded.
These paths are the only places where the wrong_task and wrong_cpu
conditions can be made false, by calling fpsimd_bind_task_to_cpu().
Kernel threads by definition never reach these paths. As a result,
the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
always yield true for kernel threads.
This patch removes the redundant checks and special-case code,
ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread
is scheduled in, and ensures that this flag is set for the init
task. The fpsimd_flush_task_state() call already present in
copy_thread() ensures the same for any new task.
With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
ensures that no extra context save work is added for kernel
threads, and eliminates the redundant context saving that may
currently occur for kernel threads that have acquired an mm via
use_mm().
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In preparation for allowing non-task (i.e., KVM vcpu) FPSIMD
contexts to be handled by the fpsimd common code, this patch adapts
task_fpsimd_save() to save back the currently loaded context,
removing the explicit dependency on current.
The relevant storage to write back to in memory is now found by
examining the fpsimd_last_state percpu struct.
fpsimd_save() does nothing unless TIF_FOREIGN_FPSTATE is clear, and
fpsimd_last_state is updated under local_bh_disable() or
local_irq_disable() everywhere that TIF_FOREIGN_FPSTATE is cleared:
thus, fpsimd_save() will write back to the correct storage for the
loaded context.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch uses the new update_thread_flag() helpers to simplify a
couple of if () set; else clear; constructs.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
fpsimd_last_state.st is set to NULL as a way of indicating that
current's FPSIMD registers are no longer loaded in the cpu. In
particular, this is done when the kernel temporarily uses or
clobbers the FPSIMD registers for its own purposes, as in CPU PM or
kernel-mode NEON, resulting in them being populated with garbage
data not belonging to a task.
Commit 17eed27b02 ("arm64/sve: KVM: Prevent guests from using
SVE") factors this operation out as a new helper
fpsimd_flush_cpu_state() to make it clearer what is being done
here, and on SVE systems this helper is now used, via
kvm_fpsimd_flush_cpu_state(), to invalidate the registers after KVM
has run a vcpu. The reason for this is that KVM does not yet
understand how to restore the full host SVE registers itself after
loading the guest FPSIMD context into them.
This exposes a particular problem: if fpsimd_last_state.st is set
to NULL without also setting TIF_FOREIGN_FPSTATE, the kernel may
continue to think that current's FPSIMD registers are live even
though they have actually been clobbered.
Prior to the aforementioned commit, the only path where
fpsimd_last_state.st is set to NULL without setting
TIF_FOREIGN_FPSTATE is when kernel_neon_begin() is called by a
kernel thread (where current->mm can be NULL). This does not
matter, because the only harm is that at context-switch time
fpsimd_thread_switch() may unnecessarily save the FPSIMD registers
back to current's thread_struct (even though kernel threads are not
considered to have any FPSIMD context of their own and the
registers will never be reloaded).
Note that although CPU_PM_ENTER lacks the TIF_FOREIGN_FPSTATE
setting, every CPU passing through that path must subsequently pass
through CPU_PM_EXIT before it can re-enter the kernel proper.
CPU_PM_EXIT sets the flag.
The sve_flush_cpu_state() function added by commit 17eed27b02
also lacks the proper maintenance of TIF_FOREIGN_FPSTATE. This may
cause the bits of a host task's SVE registers that do not alias the
FPSIMD register file to spontaneously appear zeroed if a KVM vcpu
runs in the same task in the meantime. Although this effect is
hidden by the fact that the non-FPSIMD bits of the SVE registers
are zeroed by a syscall anyway, it is doubtless a bad idea to rely
on these different code paths interacting correctly under future
maintenance.
This patch makes TIF_FOREIGN_FPSTATE an unconditional side-effect
of fpsimd_flush_cpu_state(), and removes the set_thread_flag()
calls that become redundant as a result. This ensures that
TIF_FOREIGN_FPSTATE cannot remain clear if the FPSIMD state in the
FPSIMD registers is invalid.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Otherwise modules that use these arithmetic operations will fail to
link. We accomplish this with the usual EXPORT_SYMBOL, which on most
architectures goes in the .S file but the ARM64 maintainers prefer that
insead it goes into arm64ksyms.
While we're at it, we also fix this up to use SPDX, and I personally
choose to relicense this as GPL2||BSD so that these symbols don't need
to be export_symbol_gpl, so all modules can use the routines, since
these are important general purpose compiler-generated function calls.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reported-by: PaX Team <pageexec@freemail.hu>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
The arm_pmu::handle_irq() callback has the same prototype as a generic
IRQ handler, taking the IRQ number and a void pointer argument which it
must convert to an arm_pmu pointer.
This means that all arm_pmu::handle_irq() take an IRQ number they never
use, and all must explicitly cast the void pointer to an arm_pmu
pointer.
Instead, let's change arm_pmu::handle_irq to take an arm_pmu pointer,
allowing these casts to be removed. The redundant IRQ number parameter
is also removed.
Suggested-by: Hoeun Ryu <hoeun.ryu@lge.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Writes to ZCR_EL1 are self-synchronising, and so may be expensive
in typical implementations.
This patch adopts the approach used for costly system register
writes elsewhere in the kernel: the system register write is
suppressed if it would not change the stored value.
Since the common case will be that of switching between tasks that
use the same vector length as one another, prediction hit rates on
the conditional branch should be reasonably good, with lower
expected amortised cost than the unconditional execution of a
heavyweight self-synchronising instruction.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that we have an accurate view of the physical topology
we need to represent it correctly to the scheduler. Generally MC
should equal the LLC in the system, but there are a number of
special cases that need to be dealt with.
In the case of NUMA in socket, we need to assure that the sched
domain we build for the MC layer isn't larger than the DIE above it.
Similarly for LLC's that might exist in cross socket interconnect or
directory hardware we need to assure that MC is shrunk to the socket
or NUMA node.
This patch builds a sibling mask for the LLC, and then picks the
smallest of LLC, socket siblings, or NUMA node siblings, which
gives us the behavior described above. This is ever so slightly
different than the similar alternative where we look for a cache
layer less than or equal to the socket/NUMA siblings.
The logic to pick the MC layer affects all arm64 machines, but
only changes the behavior for DT/MPIDR systems if the NUMA domain
is smaller than the core siblings (generally set to the cluster).
Potentially this fixes a possible bug in DT systems, but really
it only affects ACPI systems where the core siblings is correctly
set to the socket siblings. Thus all currently available ACPI
systems should have MC equal to LLC, including the NUMA in socket
machines where the LLC is partitioned between the NUMA nodes.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Vijaya Kumar K <vkilari@codeaurora.org>
Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki@cavium.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id and core_id by assuming
certain levels of the PPTT tree correspond to those concepts.
The package_id is flagged in the tree and can be found by calling
find_acpi_cpu_topology_package() which terminates
its search when it finds an ACPI node flagged as the physical
package. If the tree doesn't contain enough levels to represent
all of the requested levels then the root node will be returned
for all subsequent levels.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Vijaya Kumar K <vkilari@codeaurora.org>
Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki@cavium.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The cluster concept isn't architecturally defined for arm64.
Lets match the name of the arm64 topology field to the kernel macro
that uses it.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Vijaya Kumar K <vkilari@codeaurora.org>
Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki@cavium.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The /sys cache entries should support ACPI/PPTT generated cache
topology information. For arm64, if ACPI is enabled, determine
the max number of cache levels and populate them using the PPTT
table if one is available.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Vijaya Kumar K <vkilari@codeaurora.org>
Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki@cavium.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
"make includecheck" detected few duplicated includes in arch/arm64.
This patch removes the double inclusions.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
VMLINUX_SYMBOL() is no-op unless CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX
is defined. It has ever been selected only by BLACKFIN and METAG.
VMLINUX_SYMBOL() is unneeded for ARM64-specific code.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch increases the ARCH_DMA_MINALIGN to 128 so that it covers the
currently known Cache Writeback Granule (CTR_EL0.CWG) on arm64 and moves
the fallback in cache_line_size() from L1_CACHE_BYTES to this constant.
In addition, it warns (and taints) if the CWG is larger than
ARCH_DMA_MINALIGN as this is not safe with non-coherent DMA.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The NVIDIA Denver CPU also needs a PSCI call to harden the branch
predictor.
Signed-off-by: David Gilhooley <dgilhooley@nvidia.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's possible for userspace to control idx. Sanitize idx when using it
as an array index.
Found by smatch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Call clear_siginfo to ensure every stack allocated siginfo is properly
initialized before being passed to the signal sending functions.
Note: It is not safe to depend on C initializers to initialize struct
siginfo on the stack because C is allowed to skip holes when
initializing a structure.
The initialization of struct siginfo in tracehook_report_syscall_exit
was moved from the helper user_single_step_siginfo into
tracehook_report_syscall_exit itself, to make it clear that the local
variable siginfo gets fully initialized.
In a few cases the scope of struct siginfo has been reduced to make it
clear that siginfo siginfo is not used on other paths in the function
in which it is declared.
Instances of using memset to initialize siginfo have been replaced
with calls clear_siginfo for clarity.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Our arm64_skip_faulting_instruction() helper advances the userspace
singlestep state machine, but this is also called by the kernel BRK
handler, as used for WARN*().
Thus, if we happen to hit a WARN*() while the user singlestep state
machine is in the active-no-pending state, we'll advance to the
active-pending state without having executed a user instruction, and
will take a step exception earlier than expected when we return to
userspace.
Let's fix this by only advancing the state machine when skipping a user
instruction.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit a257e02579 ("arm64/kernel: don't ban ADRP to work around
Cortex-A53 erratum #843419") introduced a function whose name ends with
"_veneer".
This clashes with commit bd8b22d288 ("Kbuild: kallsyms: ignore veneers
emitted by the ARM linker"), which removes symbols ending in "_veneer"
from kallsyms.
The problem was manifested as 'perf test -vvvvv vmlinux' failed,
correctly claiming the symbol 'module_emit_adrp_veneer' was present in
vmlinux, but not in kallsyms.
...
ERR : 0xffff00000809aa58: module_emit_adrp_veneer not on kallsyms
...
test child finished with -1
---- end ----
vmlinux symtab matches kallsyms: FAILED!
Fix the problem by renaming module_emit_adrp_veneer to
module_emit_veneer_for_adrp. Now the test passes.
Fixes: a257e02579 ("arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419")
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We transiently switch to KERNEL_DS in compat_ptrace_gethbpregs() and
compat_ptrace_sethbpregs(), but in either case this is pointless as we
don't perform any uaccess during this window.
let's rip out the redundant addr_limit manipulation.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We're missing a sentinel entry in kpti_safe_list. Thus is_midr_in_range_list()
can walk past the end of kpti_safe_list. Depending on the contents of memory,
this could erroneously match a CPU's MIDR, cause a data abort, or other bad
outcomes.
Add the sentinel entry to avoid this.
Fixes: be5b299830 ("arm64: capabilities: Add support for checks based on a list of MIDRs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit:
a7e6f1ca90 ("arm64: signal: Force SIGKILL for unknown signals in force_signal_inject")
... any signal which is not SIGKILL will be upgraded to a SIGKILL be
force_signal_inject(). This includes signals we do expect, such as
SIGILL triggered by do_undefinstr().
Fix the check to use a logical AND rather than a logical OR, permitting
signals whose layout is SIL_FAULT.
Fixes: a7e6f1ca90 ("arm64: signal: Force SIGKILL for unknown signals in force_signal_inject")
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add support macros to conditionally yield the NEON (and thus the CPU)
that may be called from the assembler code.
In some cases, yielding the NEON involves saving and restoring a non
trivial amount of context (especially in the CRC folding algorithms),
and so the macro is split into three, and the code in between is only
executed when the yield path is taken, allowing the context to be preserved.
The third macro takes an optional label argument that marks the resume
path after a yield has been performed.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
bpi.S was introduced as we were starting to build the Spectre v2
mitigation framework, and it was rather unclear that it would
become strictly KVM specific.
Now that the picture is a lot clearer, let's move the content
of that file to hyp-entry.S, where it actually belong.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The very existence of __smccc_workaround_1_hvc_* is a thinko, as
KVM will never use a HVC call to perform the branch prediction
invalidation. Even as a nested hypervisor, it would use an SMC
instruction.
Let's get rid of it.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since 5e7951ce19 ("arm64: capabilities: Clean up midr range helpers"),
capabilities must be represented with a single entry. If multiple
CPU types can use the same capability, then they need to be enumerated
in a list.
The EL2 hardening stuff (which affects both A57 and A72) managed to
escape the conversion in the above patch thanks to the 4.17 merge
window. Let's fix it now.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC
V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses
the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead
of Silicon provider service ID 0xC2001700.
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
[maz: reworked errata framework integration]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past invalid
privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as of now)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
=bPlD
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past
invalid privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as
of now)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
kvm: x86: fix a prototype warning
kvm: selftests: add sync_regs_test
kvm: selftests: add API testing infrastructure
kvm: x86: fix a compile warning
KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
KVM: X86: Introduce handle_ud()
KVM: vmx: unify adjacent #ifdefs
x86: kvm: hide the unused 'cpu' variable
KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
kvm: Add emulation for movups/movupd
KVM: VMX: raise internal error for exception during invalid protected mode state
KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
KVM: x86: Fix misleading comments on handling pending exceptions
KVM: x86: Rename interrupt.pending to interrupt.injected
KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
x86/kvm: use Enlightened VMCS when running on Hyper-V
x86/hyper-v: detect nested features
x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
...
- Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a bunch
more warnings (hidden behind W=1).
- Build dtc lexer and parser files instead of using shipped versions.
- Rework overlay apply API to take an FDT as input and apply overlays in
a single step.
- Add a phandle lookup cache. This improves boot time by hundreds of
msec on systems with large DT.
- Add trivial mcp4017/18/19 potentiometers bindings.
- Remove VLA stack usage in DT code.
-----BEGIN PGP SIGNATURE-----
iQItBAABCAAXBQJaxiUdEBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcM0+w/+
L7nkug1Hz2476eRrsn5bm6oOO0vCrhQcDTJ/AlvU1YO8XBVgGEetLDs8drmvD0/O
FQDcpumX6G0eFoHTnTNWD7keM+0nY5jZBIAqKQNa9a0HKkjYc4HO5Ot9E02XG8W8
759vvCcGeJpysoCls9u8OplzqiDyNVQJd1a0fLivtafdKypuE/Ywh15wrzckPO+F
bxqWQd+uwm98ZVz8/o3vfYtAOJmA06A+hsyVLXYu7iKQcXYVxi+ZNbRV44MQ50NI
1w5m8GgtWe4A2lpXjmeXk1VmLPO3eEgQKnBoH7gcJmCHaVg/SVfMgBscuGSQZRQa
rQvaYRUNGJ0Mtji8EZpZb5Vip4ZCDtZCQBB3snN24CvGXI6WuIIg/8ncXt0AfLqn
pxFmC32ZcwvJR2NCpPVfTgILm6foT9IzJWKl6SQLVtqqVp9nPFua7T3l8AQak7FB
2MMaaqh7L0l0za0ZgArZZo/IWUHRb0MwZdXAkqBZlQ6f3IBqGQeKCnkclAeH8qYr
OorCOmC2OlKXLPHoz8XHeBzPRdnv1dQ//gEkKXBJ2igLU03hRWv9dxnGju/45sun
Ifo79uBAUc9s3F4Kjd/zs2iLztuPrYCSICHtJh9LPeOxoV1ZUNt+6Cm23yQ014Uo
/GsFW+lzh7c9wB1eETjPHd1WuYXiSrmE4zvbdykyLCk=
=ZWpa
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
- Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a
bunch more warnings (hidden behind W=1).
- Build dtc lexer and parser files instead of using shipped versions.
- Rework overlay apply API to take an FDT as input and apply overlays
in a single step.
- Add a phandle lookup cache. This improves boot time by hundreds of
msec on systems with large DT.
- Add trivial mcp4017/18/19 potentiometers bindings.
- Remove VLA stack usage in DT code.
* tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (26 commits)
of: unittest: fix an error code in of_unittest_apply_overlay()
of: unittest: move misplaced function declaration
of: unittest: Remove VLA stack usage
of: overlay: Fix forgotten reference to of_overlay_apply()
of: Documentation: Fix forgotten reference to of_overlay_apply()
of: unittest: local return value variable related cleanups
of: unittest: remove unneeded local return value variables
dt-bindings: trivial: add various mcp4017/18/19 potentiometers
of: unittest: fix an error test in of_unittest_overlay_8()
of: cache phandle nodes to reduce cost of of_find_node_by_phandle()
dt-bindings: rockchip-dw-mshc: use consistent clock names
MAINTAINERS: Add linux/of_*.h headers to appropriate subsystems
scripts: turn off some new dtc warnings by default
scripts/dtc: Update to upstream version v1.4.6-9-gaadd0b65c987
scripts/dtc: generate lexer and parser during build instead of shipping
powerpc: boot: add strrchr function
of: overlay: do not include path in full_name of added nodes
of: unittest: clean up changeset test
arm64/efi: Make strrchr() available to the EFI namespace
ARM: boot: add strrchr function
...
Nothing particularly stands out here, probably because people were tied
up with spectre/meltdown stuff last time around. Still, the main pieces
are:
- Rework of our CPU features framework so that we can whitelist CPUs that
don't require kpti even in a heterogeneous system
- Support for the IDC/DIC architecture extensions, which allow us to elide
instruction and data cache maintenance when writing out instructions
- Removal of the large memory model which resulted in suboptimal codegen
by the compiler and increased the use of literal pools, which could
potentially be used as ROP gadgets since they are mapped as executable
- Rework of forced signal delivery so that the siginfo_t is well-formed
and handling of show_unhandled_signals is consolidated and made
consistent between different fault types
- More siginfo cleanup based on the initial patches from Eric Biederman
- Workaround for Cortex-A55 erratum #1024718
- Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi
- Misc cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJaw1TCAAoJELescNyEwWM0gyQIAJVMK4QveBW+LwF96NYdZo16
p90Aa+nqKelh/s93govQArDMv1gxyuXdFlQZVOGPQHfqpz6RhJWmBA2tFsUbQrUc
OBcioPrRihqTmKBe+1r1XORwZxkVX6GGmCn0LYpPR7I3TjxXZpvxqaxGxiUvHkci
yVxWlDTyN/7eL3akhCpCDagN3Fxwk3QnJLqE3fxOFMlY7NvQcmUxcITiUl/s469q
xK6SWH9SRH1JK8jTHPitwUBiU//3FfCqSI9HLEdDIDoTuPcVM8UetWvi4QzrzJL1
UYg8lmU0CXNmflDzZJDaMf+qFApOrGxR0YVPpBzlQvxe0JIY69g48f+JzDPz8nc=
=+gNa
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Nothing particularly stands out here, probably because people were
tied up with spectre/meltdown stuff last time around. Still, the main
pieces are:
- Rework of our CPU features framework so that we can whitelist CPUs
that don't require kpti even in a heterogeneous system
- Support for the IDC/DIC architecture extensions, which allow us to
elide instruction and data cache maintenance when writing out
instructions
- Removal of the large memory model which resulted in suboptimal
codegen by the compiler and increased the use of literal pools,
which could potentially be used as ROP gadgets since they are
mapped as executable
- Rework of forced signal delivery so that the siginfo_t is
well-formed and handling of show_unhandled_signals is consolidated
and made consistent between different fault types
- More siginfo cleanup based on the initial patches from Eric
Biederman
- Workaround for Cortex-A55 erratum #1024718
- Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi
- Misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
arm64: uaccess: Fix omissions from usercopy whitelist
arm64: fpsimd: Split cpu field out from struct fpsimd_state
arm64: tlbflush: avoid writing RES0 bits
arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h
arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h
arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG
arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC
arm64: fpsimd: include <linux/init.h> in fpsimd.h
drivers/perf: arm_pmu_platform: do not warn about affinity on uniprocessor
perf: arm_spe: include linux/vmalloc.h for vmap()
Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)"
arm64: cpufeature: Avoid warnings due to unused symbols
arm64: Add work around for Arm Cortex-A55 Erratum 1024718
arm64: Delay enabling hardware DBM feature
arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35
arm64: capabilities: Handle shared entries
arm64: capabilities: Add support for checks based on a list of MIDRs
arm64: Add helpers for checking CPU MIDR against a range
arm64: capabilities: Clean up midr range helpers
arm64: capabilities: Change scope of VHE to Boot CPU feature
...
Pull removal of in-kernel calls to syscalls from Dominik Brodowski:
"System calls are interaction points between userspace and the kernel.
Therefore, system call functions such as sys_xyzzy() or
compat_sys_xyzzy() should only be called from userspace via the
syscall table, but not from elsewhere in the kernel.
At least on 64-bit x86, it will likely be a hard requirement from
v4.17 onwards to not call system call functions in the kernel: It is
better to use use a different calling convention for system calls
there, where struct pt_regs is decoded on-the-fly in a syscall wrapper
which then hands processing over to the actual syscall function. This
means that only those parameters which are actually needed for a
specific syscall are passed on during syscall entry, instead of
filling in six CPU registers with random user space content all the
time (which may cause serious trouble down the call chain). Those
x86-specific patches will be pushed through the x86 tree in the near
future.
Moreover, rules on how data may be accessed may differ between kernel
data and user data. This is another reason why calling sys_xyzzy() is
generally a bad idea, and -- at most -- acceptable in arch-specific
code.
This patchset removes all in-kernel calls to syscall functions in the
kernel with the exception of arch/. On top of this, it cleans up the
three places where many syscalls are referenced or prototyped, namely
kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h"
* 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits)
bpf: whitelist all syscalls for error injection
kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions
kernel/sys_ni: sort cond_syscall() entries
syscalls/x86: auto-create compat_sys_*() prototypes
syscalls: sort syscall prototypes in include/linux/compat.h
net: remove compat_sys_*() prototypes from net/compat.h
syscalls: sort syscall prototypes in include/linux/syscalls.h
kexec: move sys_kexec_load() prototype to syscalls.h
x86/sigreturn: use SYSCALL_DEFINE0
x86: fix sys_sigreturn() return type to be long, not unsigned long
x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
...
Pull EFI updates from Ingo Molnar:
"The main EFI changes in this cycle were:
- Fix the apple-properties code (Andy Shevchenko)
- Add WARN() on arm64 if UEFI Runtime Services corrupt the reserved
x18 register (Ard Biesheuvel)
- Use efi_switch_mm() on x86 instead of manipulating %cr3 directly
(Sai Praneeth)
- Fix early memremap leak in ESRT code (Ard Biesheuvel)
- Switch to L"xxx" notation for wide string literals (Ard Biesheuvel)
- ... plus misc other cleanups and bugfixes"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
x86/efi: Replace efi_pgd with efi_mm.pgd
efi: Use string literals for efi_char16_t variable initializers
efi/esrt: Fix handling of early ESRT table mapping
efi: Use efi_mm in x86 as well as ARM
efi: Make const array 'apple' static
efi/apple-properties: Use memremap() instead of ioremap()
efi: Reorder pr_notice() with add_device_randomness() call
x86/efi: Replace GFP_ATOMIC with GFP_KERNEL in efi_query_variable_store()
efi/arm64: Check whether x18 is preserved by runtime services calls
efi/arm*: Stop printing addresses of virtual mappings
efi/apple-properties: Remove redundant attribute initialization from unmarshal_key_value_pairs()
efi/arm*: Only register page tables when they exist
Using this helper allows us to avoid the in-kernel calls to the
sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_mmap_pgoff().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
When the hardend usercopy support was added for arm64, it was
concluded that all cases of usercopy into and out of thread_struct
were statically sized and so didn't require explicit whitelisting
of the appropriate fields in thread_struct.
Testing with usercopy hardening enabled has revealed that this is
not the case for certain ptrace regset manipulation calls on arm64.
This occurs because the sizes of usercopies associated with the
regset API are dynamic by construction, and because arm64 does not
always stage such copies via the stack: indeed the regset API is
designed to avoid the need for that by adding some bounds checking.
This is currently believed to affect only the fpsimd and TLS
registers.
Because the whitelisted fields in thread_struct must be contiguous,
this patch groups them together in a nested struct. It is also
necessary to be able to determine the location and size of that
struct, so rather than making the struct anonymous (which would
save on edits elsewhere) or adding an anonymous union containing
named and unnamed instances of the same struct (gross), this patch
gives the struct a name and makes the necessary edits to code that
references it (noisy but simple).
Care is needed to ensure that the new struct does not contain
padding (which the usercopy hardening would fail to protect).
For this reason, the presence of tp2_value is made unconditional,
since a padding field would be needed there in any case. This pads
up to the 16-byte alignment required by struct user_fpsimd_state.
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 9e8084d3f7 ("arm64: Implement thread_struct whitelist for hardened usercopy")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for using a common representation of the FPSIMD
state for tasks and KVM vcpus, this patch separates out the "cpu"
field that is used to track the cpu on which the state was most
recently loaded.
This will allow common code to operate on task and vcpu contexts
without requiring the cpu field to be stored at the same offset
from the FPSIMD register data in both cases. This should avoid the
need for messing with the definition of those parts of struct
vcpu_arch that are exposed in the KVM user ABI.
The resulting change is also convenient for grouping and defining
the set of thread_struct fields that are supposed to be accessible
to copy_{to,from}_user(), which includes user_fpsimd_state but
should exclude the cpu field. This patch does not amend the
usercopy whitelist to match: that will be addressed in a subsequent
patch.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[will: inline fpsimd_flush_state for now]
Signed-off-by: Will Deacon <will.deacon@arm.com>
MIDR_ALL_VERSIONS is changing, and won't have the same meaning
in 4.17, and the right thing to use will be ERRATA_MIDR_ALL_VERSIONS.
In order to cope with the merge window, let's add a compatibility
macro that will allow a relatively smooth transition, and that
can be removed post 4.17-rc1.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Creates far too many conflicts with arm64/for-next/core, to be
resent post -rc1.
This reverts commit f9f5dc1950.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This reverts commit 1f85b42a69.
The internal dma-direct.h API has changed in -next, which collides with
us trying to use it to manage non-coherent DMA devices on systems with
unreasonably large cache writeback granules.
This isn't at all trivial to resolve, so revert our changes for now and
we can revisit this after the merge window. Effectively, this just
restores our behaviour back to that of 4.16.
Signed-off-by: Will Deacon <will.deacon@arm.com>
An allnoconfig build complains about unused symbols due to functions
that are called via conditional cpufeature and cpu_errata table entries.
Annotate these as __maybe_unused if they are likely to be generic, or
predicate their compilation on the same option as the table entry if
they are specific to a given alternative.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some variants of the Arm Cortex-55 cores (r0p0, r0p1, r1p0) suffer
from an erratum 1024718, which causes incorrect updates when DBM/AP
bits in a page table entry is modified without a break-before-make
sequence. The work around is to skip enabling the hardware DBM feature
on the affected cores. The hardware Access Flag management features
is not affected. There are some other cores suffering from this
errata, which could be added to the midr_list to trigger the work
around.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: ckadabi@codeaurora.org
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We enable hardware DBM bit in a capable CPU, very early in the
boot via __cpu_setup. This doesn't give us a flexibility of
optionally disable the feature, as the clearing the bit
is a bit costly as the TLB can cache the settings. Instead,
we delay enabling the feature until the CPU is brought up
into the kernel. We use the feature capability mechanism
to handle it.
The hardware DBM is a non-conflicting feature. i.e, the kernel
can safely run with a mix of CPUs with some using the feature
and the others don't. So, it is safe for a late CPU to have
this capability and enable it, even if the active CPUs don't.
To get this handled properly by the infrastructure, we
unconditionally set the capability and only enable it
on CPUs which really have the feature. Also, we print the
feature detection from the "matches" call back to make sure
we don't mislead the user when none of the CPUs could use the
feature.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some capabilities have different criteria for detection and associated
actions based on the matching criteria, even though they all share the
same capability bit. So far we have used multiple entries with the same
capability bit to handle this. This is prone to errors, as the
cpu_enable is invoked for each entry, irrespective of whether the
detection rule applies to the CPU or not. And also this complicates
other helpers, e.g, __this_cpu_has_cap.
This patch adds a wrapper entry to cover all the possible variations
of a capability by maintaining list of matches + cpu_enable callbacks.
To avoid complicating the prototypes for the "matches()", we use
arm64_cpu_capabilities maintain the list and we ignore all the other
fields except the matches & cpu_enable.
This ensures :
1) The capabilitiy is set when at least one of the entry detects
2) Action is only taken for the entries that "matches".
This avoids explicit checks in the cpu_enable() take some action.
The only constraint here is that, all the entries should have the
same "type" (i.e, scope and conflict rules).
If a cpu_enable() method is associated with multiple matches for a
single capability, care should be taken that either the match criteria
are mutually exclusive, or that the method is robust against being
called multiple times.
This also reverts the changes introduced by commit 67948af41f
("arm64: capabilities: Handle duplicate entries for a capability").
Cc: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add helpers for detecting an errata on list of midr ranges
of affected CPUs, with the same work around.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add helpers for checking if the given CPU midr falls in a range
of variants/revisions for a given model.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We are about to introduce generic MIDR range helpers. Clean
up the existing helpers in erratum handling, preparing them
to use generic version.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We expect all CPUs to be running at the same EL inside the kernel
with or without VHE enabled and we have strict checks to ensure
that any mismatch triggers a kernel panic. If VHE is enabled,
we use the feature based on the boot CPU and all other CPUs
should follow. This makes it a perfect candidate for a capability
based on the boot CPU, which should be matched by all the CPUs
(both when is ON and OFF). This saves us some not-so-pretty
hooks and special code, just for verifying the conflict.
The patch also makes the VHE capability entry depend on
CONFIG_ARM64_VHE.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The kernel detects and uses some of the features based on the boot
CPU and expects that all the following CPUs conform to it. e.g,
with VHE and the boot CPU running at EL2, the kernel decides to
keep the kernel running at EL2. If another CPU is brought up without
this capability, we use custom hooks (via check_early_cpu_features())
to handle it. To handle such capabilities add support for detecting
and enabling capabilities based on the boot CPU.
A bit is added to indicate if the capability should be detected
early on the boot CPU. The infrastructure then ensures that such
capabilities are probed and "enabled" early on in the boot CPU
and, enabled on the subsequent CPUs.
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
KPTI is treated as a system wide feature and is only detected if all
the CPUs in the sysetm needs the defense, unless it is forced via kernel
command line. This leaves a system with a mix of CPUs with and without
the defense vulnerable. Also, if a late CPU needs KPTI but KPTI was not
activated at boot time, the CPU is currently allowed to boot, which is a
potential security vulnerability.
This patch ensures that the KPTI is turned on if at least one CPU detects
the capability (i.e, change scope to SCOPE_LOCAL_CPU). Also rejetcs a late
CPU, if it requires the defense, when the system hasn't enabled it,
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that we have the flexibility of defining system features based
on individual CPUs, introduce CPU feature type that can be detected
on a local SCOPE and ignores the conflict on late CPUs. This is
applicable for ARM64_HAS_NO_HW_PREFETCH, where it is fine for
the system to have CPUs without hardware prefetch turning up
later. We only suffer a performance penalty, nothing fatal.
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the features and errata workarounds have the same
rules and flow, group the handling of the tables.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
So far we have treated the feature capabilities as system wide
and this wouldn't help with features that could be detected locally
on one or more CPUs (e.g, KPTI, Software prefetch). This patch
splits the feature detection to two phases :
1) Local CPU features are checked on all boot time active CPUs.
2) System wide features are checked only once after all CPUs are
active.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Right now we run through the errata workarounds check on all boot
active CPUs, with SCOPE_ALL. This wouldn't help for detecting erratum
workarounds with a SYSTEM_SCOPE. There are none yet, but we plan to
introduce some: let us clean this up so that such workarounds can be
detected and enabled correctly.
So, we run the checks with SCOPE_LOCAL_CPU on all CPUs and SCOPE_SYSTEM
checks are run only once after all the boot time CPUs are active.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We are about to group the handling of all capabilities (features
and errata workarounds). This patch open codes the wrapper routines
to make it easier to merge the handling.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
While processing the list of capabilities, it is useful to
filter out some of the entries based on the given mask for the
scope of the capabilities to allow better control. This can be
used later for handling LOCAL vs SYSTEM wide capabilities and more.
All capabilities should have their scope set to either LOCAL_CPU or
SYSTEM. No functional/flow change.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that each capability describes how to treat the conflicts
of CPU cap state vs System wide cap state, we can unify the
verification logic to a single place.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When a CPU is brought up, it is checked against the caps that are
known to be enabled on the system (via verify_local_cpu_capabilities()).
Based on the state of the capability on the CPU vs. that of System we
could have the following combinations of conflict.
x-----------------------------x
| Type | System | Late CPU |
|-----------------------------|
| a | y | n |
|-----------------------------|
| b | n | y |
x-----------------------------x
Case (a) is not permitted for caps which are system features, which the
system expects all the CPUs to have (e.g VHE). While (a) is ignored for
all errata work arounds. However, there could be exceptions to the plain
filtering approach. e.g, KPTI is an optional feature for a late CPU as
long as the system already enables it.
Case (b) is not permitted for errata work arounds that cannot be activated
after the kernel has finished booting.And we ignore (b) for features. Here,
yet again, KPTI is an exception, where if a late CPU needs KPTI we are too
late to enable it (because we change the allocation of ASIDs etc).
Add two different flags to indicate how the conflict should be handled.
ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU - CPUs may have the capability
ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU - CPUs may not have the cappability.
Now that we have the flags to describe the behavior of the errata and
the features, as we treat them, define types for ERRATUM and FEATURE.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We use arm64_cpu_capabilities to represent CPU ELF HWCAPs exposed
to the userspace and the CPU hwcaps used by the kernel, which
include cpu features and CPU errata work arounds. Capabilities
have some properties that decide how they should be treated :
1) Detection, i.e scope : A cap could be "detected" either :
- if it is present on at least one CPU (SCOPE_LOCAL_CPU)
Or
- if it is present on all the CPUs (SCOPE_SYSTEM)
2) When is it enabled ? - A cap is treated as "enabled" when the
system takes some action based on whether the capability is detected or
not. e.g, setting some control register, patching the kernel code.
Right now, we treat all caps are enabled at boot-time, after all
the CPUs are brought up by the kernel. But there are certain caps,
which are enabled early during the boot (e.g, VHE, GIC_CPUIF for NMI)
and kernel starts using them, even before the secondary CPUs are brought
up. We would need a way to describe this for each capability.
3) Conflict on a late CPU - When a CPU is brought up, it is checked
against the caps that are known to be enabled on the system (via
verify_local_cpu_capabilities()). Based on the state of the capability
on the CPU vs. that of System we could have the following combinations
of conflict.
x-----------------------------x
| Type | System | Late CPU |
------------------------------|
| a | y | n |
------------------------------|
| b | n | y |
x-----------------------------x
Case (a) is not permitted for caps which are system features, which the
system expects all the CPUs to have (e.g VHE). While (a) is ignored for
all errata work arounds. However, there could be exceptions to the plain
filtering approach. e.g, KPTI is an optional feature for a late CPU as
long as the system already enables it.
Case (b) is not permitted for errata work arounds which requires some
work around, which cannot be delayed. And we ignore (b) for features.
Here, yet again, KPTI is an exception, where if a late CPU needs KPTI we
are too late to enable it (because we change the allocation of ASIDs
etc).
So this calls for a lot more fine grained behavior for each capability.
And if we define all the attributes to control their behavior properly,
we may be able to use a single table for the CPU hwcaps (which cover
errata and features, not the ELF HWCAPs). This is a prepartory step
to get there. More bits would be added for the properties listed above.
We are going to use a bit-mask to encode all the properties of a
capabilities. This patch encodes the "SCOPE" of the capability.
As such there is no change in how the capabilities are treated.
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We have errata work around processing code in cpu_errata.c,
which calls back into helpers defined in cpufeature.c. Now
that we are going to make the handling of capabilities
generic, by adding the information to each capability,
move the errata work around specific processing code.
No functional changes.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We trigger CPU errata work around check on the boot CPU from
smp_prepare_boot_cpu() to make sure that we run the checks only
after the CPU feature infrastructure is initialised. While this
is correct, we can also do this from init_cpu_features() which
initilises the infrastructure, and is called only on the
Boot CPU. This helps to consolidate the CPU capability handling
to cpufeature.c. No functional changes.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We issue the enable() call back for all CPU hwcaps capabilities
available on the system, on all the CPUs. So far we have ignored
the argument passed to the call back, which had a prototype to
accept a "void *" for use with on_each_cpu() and later with
stop_machine(). However, with commit 0a0d111d40
("arm64: cpufeature: Pass capability structure to ->enable callback"),
there are some users of the argument who wants the matching capability
struct pointer where there are multiple matching criteria for a single
capability. Clean up the declaration of the call back to make it clear.
1) Renamed to cpu_enable(), to imply taking necessary actions on the
called CPU for the entry.
2) Pass const pointer to the capability, to allow the call back to
check the entry. (e.,g to check if any action is needed on the CPU)
3) We don't care about the result of the call back, turning this to
a void.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Dave Martin <dave.martin@arm.com>
[suzuki: convert more users, rename call back and drop results]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently a SIGFPE delivered in response to a floating-point
exception trap may have si_code set to 0 on arm64. As reported by
Eric, this is a bad idea since this is the value of SI_USER -- yet
this signal is definitely not the result of kill(2), tgkill(2) etc.
and si_uid and si_pid make limited sense whereas we do want to
yield a value for si_addr (which doesn't exist for SI_USER).
It's not entirely clear whether the architecure permits a
"spurious" fp exception trap where none of the exception flag bits
in ESR_ELx is set. (IMHO the architectural intent is to forbid
this.) However, it does permit those bits to contain garbage if
the TFV bit in ESR_ELx is 0. That case isn't currently handled at
all and may result in si_code == 0 or si_code containing a FPE_FLT*
constant corresponding to an exception that did not in fact happen.
There is nothing sensible we can return for si_code in such cases,
but SI_USER is certainly not appropriate and will lead to violation
of legitimate userspace assumptions.
This patch allocates a new si_code value FPE_UNKNOWN that at least
does not conflict with any existing SI_* or FPE_* code, and yields
this in si_code for undiagnosable cases. This is probably the best
simplicity/incorrectness tradeoff achieveable without relying on
implementation-dependent features or adding a lot of code. In any
case, there appears to be no perfect solution possible that would
justify a lot of effort here.
Yielding FPE_UNKNOWN when some well-defined fp exception caused the
trap is a violation of POSIX, but this is forced by the
architecture. We have no realistic prospect of yielding the
correct code in such cases. At present I am not aware of any ARMv8
implementation that supports trapped floating-point exceptions in
any case.
The new code may be applicable to other architectures for similar
reasons.
No attempt is made to provide ESR_ELx to userspace in the signal
frame, since architectural limitations mean that it is unlikely to
provide much diagnostic value, doesn't benefit existing software
and would create ABI with no proven purpose. The existing
mechanism for passing it also has problems of its own which may
result in the wrong value being passed to userspace due to
interaction with mm faults. The implied rework does not appear
justified.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC
V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses
the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead
of Silicon provider service ID 0xC2001700.
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Expose the new features introduced by Arm v8.4 extensions to
Arm v8-A profile.
These include :
1) Data indpendent timing of instructions. (DIT, exposed as HWCAP_DIT)
2) Unaligned atomic instructions and Single-copy atomicity of loads
and stores. (AT, expose as HWCAP_USCAT)
3) LDAPR and STLR instructions with immediate offsets (extension to
LRCPC, exposed as HWCAP_ILRCPC)
4) Flag manipulation instructions (TS, exposed as HWCAP_FLAGM).
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The printk symbol was intended as a generic address that is always
exported, however that turned out to be false with CONFIG_PRINTK=n:
ERROR: "printk" [arch/arm64/kernel/arm64-reloc-test.ko] undefined!
This changes the references to memstart_addr, which should be there
regardless of configuration.
Fixes: a257e02579 ("arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419")
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cortex-A57 and A72 are vulnerable to the so-called "variant 3a" of
Meltdown, where an attacker can speculatively obtain the value
of a privileged system register.
By enabling ARM64_HARDEN_EL2_VECTORS on these CPUs, obtaining
VBAR_EL2 is not disclosing the hypervisor mappings anymore.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We're about to need to allocate hardening slots from other parts
of the kernel (in order to support ARM64_HARDEN_EL2_VECTORS).
Turn the counter into an atomic_t and make it available to the
rest of the kernel. Also add BP_HARDEN_EL2_SLOTS as the number of
slots instead of the hardcoded 4...
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
So far, the branch from the vector slots to the main vectors can at
most be 4GB from the main vectors (the reach of ADRP), and this
distance is known at compile time. If we were to remap the slots
to an unrelated VA, things would break badly.
A way to achieve VA independence would be to load the absolute
address of the vectors (__kvm_hyp_vector), either using a constant
pool or a series of movs, followed by an indirect branch.
This patches implements the latter solution, using another instance
of a patching callback. Note that since we have to save a register
pair on the stack, we branch to the *second* instruction in the
vectors in order to compensate for it. This also results in having
to adjust this balance in the invalid vector entry point.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
So far, we only reserve a single instruction in the BPI template in
order to branch to the vectors. As we're going to stuff a few more
instructions there, let's reserve a total of 5 instructions, which
we're going to patch later on as required.
We also introduce a small refactor of the vectors themselves, so that
we stop carrying the target branch around.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
There is no reason why the BP hardening vectors shouldn't be part
of the HYP text at compile time, rather than being mapped at runtime.
Also introduce a new config symbol that controls the compilation
of bpi.S.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The encoder for ADD/SUB (immediate) can only cope with 12bit
immediates, while there is an encoding for a 12bit immediate shifted
by 12 bits to the left.
Let's fix this small oversight by allowing the LSL_12 bit to be set.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add an encoder for the EXTR instruction, which also implements the ROR
variant (where Rn == Rm).
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that we can dynamically compute the kernek/hyp VA mask, there
is no need for a feature flag to trigger the alternative patching.
Let's drop the flag and everything that depends on it.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We lack a way to encode operations such as AND, ORR, EOR that take
an immediate value. Doing so is quite involved, and is all about
reverse engineering the decoding algorithm described in the
pseudocode function DecodeBitMasks().
This has been tested by feeding it all the possible literal values
and comparing the output with that of GAS.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We're missing the a way to generate the encoding of the N immediate,
which is only a single bit used in a number of instruction that take
an immediate.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We've so far relied on a patching infrastructure that only gave us
a single alternative, without any way to provide a range of potential
replacement instructions. For a single feature, this is an all or
nothing thing.
It would be interesting to have a more flexible grained way of patching
the kernel though, where we could dynamically tune the code that gets
injected.
In order to achive this, let's introduce a new form of dynamic patching,
assiciating a callback to a patching site. This callback gets source and
target locations of the patching request, as well as the number of
instructions to be patched.
Dynamic patching is declared with the new ALTERNATIVE_CB and alternative_cb
directives:
asm volatile(ALTERNATIVE_CB("mov %0, #0\n", callback)
: "r" (v));
or
alternative_cb callback
mov x0, #0
alternative_cb_end
where callback is the C function computing the alternative.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We already have the percpu area for the host cpu state, which points to
the VCPU, so there's no need to store the VCPU pointer on the stack on
every context switch. We can be a little more clever and just use
tpidr_el2 for the percpu offset and load the VCPU pointer from the host
context.
This has the benefit of being able to retrieve the host context even
when our stack is corrupted, and it has a potential performance benefit
because we trade a store plus a load for an mrs and a load on a round
trip to the guest.
This does require us to calculate the percpu offset without including
the offset from the kernel mapping of the percpu array to the linear
mapping of the array (which is what we store in tpidr_el1), because a
PC-relative generated address in EL2 is already giving us the hyp alias
of the linear mapping of a kernel address. We do this in
__cpu_init_hyp_mode() by using kvm_ksym_ref().
The code that accesses ESR_EL2 was previously using an alternative to
use the _EL1 accessor on VHE systems, but this was actually unnecessary
as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2
accessor does the same thing on both systems.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
A recent update to the ARM SMCCC ARCH_WORKAROUND_1 specification
allows firmware to return a non zero, positive value to describe
that although the mitigation is implemented at the higher exception
level, the CPU on which the call is made is not affected.
Let's relax the check on the return value from ARCH_WORKAROUND_1
so that we only error out if the returned value is negative.
Fixes: b092201e00 ("arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, as reported by Eric, an invalid si_code value 0 is
passed in many signals delivered to userspace in response to faults
and other kernel errors. Typically 0 is passed when the fault is
insufficiently diagnosable or when there does not appear to be any
sensible alternative value to choose.
This appears to violate POSIX, and is intuitively wrong for at
least two reasons arising from the fact that 0 == SI_USER:
1) si_code is a union selector, and SI_USER (and si_code <= 0 in
general) implies the existence of a different set of fields
(siginfo._kill) from that which exists for a fault signal
(siginfo._sigfault). However, the code raising the signal
typically writes only the _sigfault fields, and the _kill
fields make no sense in this case.
Thus when userspace sees si_code == 0 (SI_USER) it may
legitimately inspect fields in the inactive union member _kill
and obtain garbage as a result.
There appears to be software in the wild relying on this,
albeit generally only for printing diagnostic messages.
2) Software that wants to be robust against spurious signals may
discard signals where si_code == SI_USER (or <= 0), or may
filter such signals based on the si_uid and si_pid fields of
siginfo._sigkill. In the case of fault signals, this means
that important (and usually fatal) error conditions may be
silently ignored.
In practice, many of the faults for which arm64 passes si_code == 0
are undiagnosable conditions such as exceptions with syndrome
values in ESR_ELx to which the architecture does not yet assign any
meaning, or conditions indicative of a bug or error in the kernel
or system and thus that are unrecoverable and should never occur in
normal operation.
The approach taken in this patch is to translate all such
undiagnosable or "impossible" synchronous fault conditions to
SIGKILL, since these are at least probably localisable to a single
process. Some of these conditions should really result in a kernel
panic, but due to the lack of diagnostic information it is
difficult to be certain: this patch does not add any calls to
panic(), but this could change later if justified.
Although si_code will not reach userspace in the case of SIGKILL,
it is still desirable to pass a nonzero value so that the common
siginfo handling code can detect incorrect use of si_code == 0
without false positives. In this case the si_code dependent
siginfo fields will not be correctly initialised, but since they
are not passed to userspace I deem this not to matter.
A few faults can reasonably occur in realistic userspace scenarios,
and _should_ raise a regular, handleable (but perhaps not
ignorable/blockable) signal: for these, this patch attempts to
choose a suitable standard si_code value for the raised signal in
each case instead of 0.
arm64 was the only arch to define a BUS_FIXME code, so after this
patch nobody defines it. This patch therefore also removes the
relevant code from siginfo_layout().
Cc: James Morse <james.morse@arm.com>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
The following two control bits DIC and IDC were defined for this
purpose. No need to perform point of unification cache maintenance
operations from software on systems where CPU caches are transparent.
This patch optimize the three functions __flush_cache_user_range(),
clean_dcache_area_pou() and invalidate_icache_range() if the hardware
reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
in order to avoid the unnecessary overhead.
CTR_EL0.DIC: Instruction cache invalidation requirements for
instruction to data coherence. The meaning of this bit[29].
0: Instruction cache invalidation to the point of unification
is required for instruction to data coherence.
1: Instruction cache cleaning to the point of unification is
not required for instruction to data coherence.
CTR_EL0.IDC: Data cache clean requirements for instruction to data
coherence. The meaning of this bit[28].
0: Data cache clean to the point of unification is required for
instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
1: Data cache clean to the point of unification is not required
for instruction to data coherence.
Co-authored-by: Philip Elcan <pelcan@codeaurora.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Omit patching of ADRP instruction at module load time if the current
CPUs are not susceptible to the erratum.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: Drop duplicate initialisation of .def_scope field]
Signed-off-by: Will Deacon <will.deacon@arm.com>
In some cases, core variants that are affected by a certain erratum
also exist in versions that have the erratum fixed, and this fact is
recorded in a dedicated bit in system register REVIDR_EL1.
Since the architecture does not require that a certain bit retains
its meaning across different variants of the same model, each such
REVIDR bit is tightly coupled to a certain revision/variant value,
and so we need a list of revidr_mask/midr pairs to carry this
information.
So add the struct member and the associated macros and handling to
allow REVIDR fixes to be taken into account.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Whether or not we will ever decide to start using x18 as a platform
register in Linux is uncertain, but by that time, we will need to
ensure that UEFI runtime services calls don't corrupt it.
So let's start issuing warnings now for this, and increase the
likelihood that these firmware images have all been replaced by that time.
This has been fixed on the EDK2 side in commit:
6d73863b5464 ("BaseTools/tools_def AARCH64: mark register x18 as reserved")
dated July 13, 2017.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180308080020.22828-6-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We currently have to rely on the GCC large code model for KASLR for
two distinct but related reasons:
- if we enable full randomization, modules will be loaded very far away
from the core kernel, where they are out of range for ADRP instructions,
- even without full randomization, the fact that the 128 MB module region
is now no longer fully reserved for kernel modules means that there is
a very low likelihood that the normal bottom-up allocation of other
vmalloc regions may collide, and use up the range for other things.
Large model code is suboptimal, given that each symbol reference involves
a literal load that goes through the D-cache, reducing cache utilization.
But more importantly, literals are not instructions but part of .text
nonetheless, and hence mapped with executable permissions.
So let's get rid of our dependency on the large model for KASLR, by:
- reducing the full randomization range to 4 GB, thereby ensuring that
ADRP references between modules and the kernel are always in range,
- reduce the spillover range to 4 GB as well, so that we fallback to a
region that is still guaranteed to be in range
- move the randomization window of the core kernel to the middle of the
VMALLOC space
Note that KASAN always uses the module region outside of the vmalloc space,
so keep the kernel close to that if KASAN is enabled.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When PLTs are emitted at relocation time, we really should not exceed
the number that we counted when parsing the relocation tables, and so
currently, we BUG() on this condition. However, even though this is a
clear bug in this particular piece of code, we can easily recover by
failing to load the module.
So instead, return 0 from module_emit_plt_entry() if this condition
occurs, which is not a valid kernel address, and can hence serve as
a flag value that makes the relocation routine bail out.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This is the equivalent of commit 001bf455d2 ("ARM: 8428/1: kgdb: Fix
registers on sleeping tasks") but for arm64. Nuff said.
...well, perhaps I could also add that task_pt_regs are userspace
registers and that's not what kgdb is supposed to be reporting. We're
supposed to be reporting kernel registers.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 9730348075 ("arm64: Increase the max granular size") increased
the cache line size to 128 to match Cavium ThunderX, apparently for some
performance benefit which could not be confirmed. This change, however,
has an impact on the network packets allocation in certain
circumstances, requiring slightly over a 4K page with a significant
performance degradation.
This patch reverts L1_CACHE_SHIFT back to 6 (64-byte cache line) while
keeping ARCH_DMA_MINALIGN at 128. The cache_line_size() function was
changed to default to ARCH_DMA_MINALIGN in the absence of a meaningful
CTR_EL0.CWG bit field.
In addition, if a system with ARCH_DMA_MINALIGN < CTR_EL0.CWG is
detected, the kernel will force swiotlb bounce buffering for all
non-coherent devices since DMA cache maintenance on sub-CWG ranges is
not safe, leading to data corruption.
Cc: Tirumalesh Chalamarla <tchalamarla@cavium.com>
Cc: Timur Tabi <timur@codeaurora.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Using arm64_force_sig_info means that printing messages about unhandled
signals is dealt with for us, so use that in preference to force_sig_info
and remove any homebrew printing code.
Signed-off-by: Will Deacon <will.deacon@arm.com>
show_unhandled_signals_ratelimited is only called in traps.c, so move it
out of its macro in the dreaded system_misc.h and into a static function
in traps.c
Signed-off-by: Will Deacon <will.deacon@arm.com>
If we fail to deliver a signal due to taking an unhandled fault on the
stackframe, we can call arm64_notify_segfault to deliver a SEGV can deal
with printing any unhandled signal messages for us, rather than roll our
own printing code.
A side-effect of this change is that we now deliver the frame address
in si_addr along with an si_code of SEGV_{ACC,MAP}ERR, rather than an
si_addr of 0 and an si_code of SI_KERNEL as before.
Signed-off-by: Will Deacon <will.deacon@arm.com>
arm64_notify_die deals with printing out information regarding unhandled
signals, so there's no need to roll our own code here.
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for consolidating our handling of printing unhandled
signals, introduce a wrapper around force_sig_info which can act as
the canonical place for dealing with show_unhandled_signals.
Initially, we just hook this up to arm64_notify_die.
Signed-off-by: Will Deacon <will.deacon@arm.com>
For signals other than SIGKILL or those with siginfo_layout(signal, code)
== SIL_FAULT then force_signal_inject does not initialise the siginfo_t
properly. Since the signal number is determined solely by the caller,
simply WARN on unknown signals and force to SIGKILL.
Reported-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
force_signal_inject is a little flakey:
* It only knows about SIGILL and SIGSEGV, so can potentially deliver
other signals based on a partially initialised siginfo_t
* It sets si_addr to point at the PC for SIGSEGV
* It always operates on current, so doesn't need the regs argument
This patch fixes these issues by always assigning the si_addr field to
the address parameter of the function and updates the callers (including
those that indirectly call via arm64_notify_segfault) accordingly.
Signed-off-by: Will Deacon <will.deacon@arm.com>
libfdt gained a new dependency on strrchr, so make it available to the
EFI namespace before we update libfdt.
Thanks to Ard for providing this fix.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
The word "feature" is repeated in the CPU features reporting. This drops it
for improved readability.
Before (redundant "feature" word):
SMP: Total of 4 processors activated.
CPU features: detected feature: 32-bit EL0 Support
CPU features: detected feature: Kernel page table isolation (KPTI)
CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching
CPU: All CPU(s) started at EL2
After:
SMP: Total of 4 processors activated.
CPU features: detected: 32-bit EL0 Support
CPU features: detected: Kernel page table isolation (KPTI)
CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching
CPU: All CPU(s) started at EL2
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The PAN emulation notification was only happening for non-boot CPUs
if CPU capabilities had already been configured. This seems to be the
wrong place, as it's system-wide and isn't attached to capabilities,
so its reporting didn't normally happen. Instead, report it once from
the boot CPU.
Before (missing PAN emulation report):
SMP: Total of 4 processors activated.
CPU features: detected feature: 32-bit EL0 Support
CPU features: detected feature: Kernel page table isolation (KPTI)
CPU: All CPU(s) started at EL2
After:
SMP: Total of 4 processors activated.
CPU features: detected feature: 32-bit EL0 Support
CPU features: detected feature: Kernel page table isolation (KPTI)
CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching
CPU: All CPU(s) started at EL2
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the early kernel mapping logic can tolerate placements of
Image that cross swapper table boundaries, we can remove the logic
that adjusts the offset if the dice roll produced an offset that
puts the kernel right on top of one.
Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mirror arm behaviour for unimplemented syscalls: Below 2048 return
-ENOSYS, above 2048 raise SIGILL.
Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
[will: Tweak die string to identify as compat syscall]
Signed-off-by: Will Deacon <will.deacon@arm.com>
We don't currently limit guest accesses to the LOR registers, which we
neither virtualize nor context-switch. As such, guests are provided with
unusable information/controls, and are not isolated from each other (or
the host).
To prevent these issues, we can trap register accesses and present the
illusion LORegions are unssupported by the CPU. To do this, we mask
ID_AA64MMFR1.LO, and set HCR_EL2.TLOR to trap accesses to the following
registers:
* LORC_EL1
* LOREA_EL1
* LORID_EL1
* LORN_EL1
* LORSA_EL1
... when trapped, we inject an UNDEFINED exception to EL1, simulating
their non-existence.
As noted in D7.2.67, when no LORegions are implemented, LoadLOAcquire
and StoreLORelease must behave as LoadAcquire and StoreRelease
respectively. We can ensure this by clearing LORC_EL1.EN when a CPU's
EL2 is first initialized, as the host kernel will not modify this.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Pull cleanup patchlet from Thomas Gleixner:
"A single commit removing a bunch of bogus double semicolons all over
the tree"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide/trivial: Remove ';;$' typo noise
do_task_stat() calls get_wchan(), which further does unwind_frame().
unwind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).
Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.
Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().
This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.
Reproducer:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent
Above commands result in:
Unable to handle kernel paging request at virtual address ffff801bd3d1e000
pgd = ffff8003cbe97c00
[ffff801bd3d1e000] *pgd=0000000000000000, *pud=0000000000000000
Internal error: Oops: 96000006 [#1] SMP
[...]
CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33
[...]
task: ffff8003c21ba000 task.stack: ffff8003cc6c0000
PC is at unwind_frame+0x12c/0x180
LR is at get_wchan+0xd4/0x134
pc : [<ffff00000808892c>] lr : [<ffff0000080860b8>] pstate: 60000145
sp : ffff8003cc6c3ab0
x29: ffff8003cc6c3ab0 x28: 0000000000000001
x27: 0000000000000026 x26: 0000000000000026
x25: 00000000000012d8 x24: 0000000000000000
x23: ffff8003c1c04000 x22: ffff000008c83000
x21: ffff8003c1c00000 x20: 000000000000000f
x19: ffff8003c1bc0000 x18: 0000fffffc593690
x17: 0000000000000000 x16: 0000000000000001
x15: 0000b855670e2b60 x14: 0003e97f22cf1d0f
x13: 0000000000000001 x12: 0000000000000000
x11: 00000000e8f4883e x10: 0000000154f47ec8
x9 : 0000000070f367c0 x8 : 0000000000000000
x7 : 00008003f7290000 x6 : 0000000000000018
x5 : 0000000000000000 x4 : ffff8003c1c03cb0
x3 : ffff8003c1c03ca0 x2 : 00000017ffe80000
x1 : ffff8003cc6c3af8 x0 : ffff8003d3e9e000
Process ps (pid: 11696, stack limit = 0xffff8003cc6c0000)
Stack: (0xffff8003cc6c3ab0 to 0xffff8003cc6c4000)
[...]
[<ffff00000808892c>] unwind_frame+0x12c/0x180
[<ffff000008305008>] do_task_stat+0x864/0x870
[<ffff000008305c44>] proc_tgid_stat+0x3c/0x48
[<ffff0000082fde0c>] proc_single_show+0x5c/0xb8
[<ffff0000082b27e0>] seq_read+0x160/0x414
[<ffff000008289e6c>] __vfs_read+0x58/0x164
[<ffff00000828b164>] vfs_read+0x88/0x144
[<ffff00000828c2e8>] SyS_read+0x60/0xc0
[<ffff0000080834a0>] __sys_trace_return+0x0/0x4
Fixes: 20380bb390 (arm64: ftrace: fix a stack tracer's output under function graph tracer)
Signed-off-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
[catalin.marinas@arm.com: replace WARN_ON with WARN_ON_ONCE]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On lkml suggestions were made to split up such trivial typo fixes into per subsystem
patches:
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
struct efi_uga_draw_protocol *uga = NULL, *first_uga;
efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
unsigned long nr_ugas;
- u32 *handles = (u32 *)uga_handle;;
+ u32 *handles = (u32 *)uga_handle;
efi_status_t status = EFI_INVALID_PARAMETER;
int i;
This patch is the result of the following script:
$ sed -i 's/;;$/;/g' $(git grep -E ';;$' | grep "\.[ch]:" | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)
... followed by manual review to make sure it's all good.
Splitting this up is just crazy talk, let's get over with this and just do it.
Reported-by: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The ID_AA64DFR0_EL1.PMUVer field doesn't follow the usual ID registers
scheme. While value 0xf indicates a non-architected PMU is implemented,
values 0x1 to 0xe indicate an increasingly featureful architected PMU,
as if the field were unsigned.
For more details, see ARM DDI 0487C.a, D10.1.4, "Alternative ID scheme
used for the Performance Monitors Extension version".
Currently, we treat the field as signed, and erroneously bail out for
values 0x8 to 0xe. Let's correct that.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
__show_regs pretty prints PC and LR by attempting to map them to kernel
function names to improve the utility of crash reports. Unfortunately,
this mapping is applied even when the pt_regs corresponds to user mode,
resulting in a KASLR oracle.
Avoid this issue by only looking up the function symbols when the register
state indicates that we're actually running at EL1.
Cc: <stable@vger.kernel.org>
Reported-by: NCSC Security <security@ncsc.gov.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Stop printing a (ratelimited) kernel message for each instance of an
unimplemented syscall being called. Userland making an unimplemented
syscall is not necessarily misbehaviour and to be expected with a
current userland running on an older kernel. Also, the current message
looks scary to users but does not actually indicate a real problem nor
help them narrow down the cause. Just rely on sys_ni_syscall() to return
-ENOSYS.
Cc: <stable@vger.kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Our field definitions for CTR_EL0 suffer from a number of problems:
- The IDC and DIC fields are missing, which causes us to enable CTR
trapping on CPUs with either of these returning non-zero values.
- The ERG is FTR_LOWER_SAFE, whereas it should be treated like CWG as
FTR_HIGHER_SAFE so that applications can use it to avoid false sharing.
- [nit] A RES1 field is described as "RAO"
This patch updates the CTR_EL0 field definitions to fix these issues.
Cc: <stable@vger.kernel.org>
Cc: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In converting __range_ok() into a static inline, I inadvertently made
it more type-safe, but without considering the ordering of the relevant
conversions. This leads to quite a lot of Sparse noise about the fact
that we use __chk_user_ptr() after addr has already been converted from
a user pointer to an unsigned long.
Rather than just adding another cast for the sake of shutting Sparse up,
it seems reasonable to rework the types to make logical sense (although
the resulting codegen for __range_ok() remains identical). The only
callers this affects directly are our compat traps where the inferred
"user-pointer-ness" of a register value now warrants explicit casting.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In many cases, page tables can be accessed concurrently by either another
CPU (due to things like fast gup) or by the hardware page table walker
itself, which may set access/dirty bits. In such cases, it is important
to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
entries cannot be torn, merged or subject to apparent loss of coherence
due to compiler transformations.
Whilst there are some scenarios where this cannot happen (e.g. pinned
kernel mappings for the linear region), the overhead of using READ_ONCE
/WRITE_ONCE everywhere is minimal and makes the code an awful lot easier
to reason about. This patch consistently uses these macros in the arch
code, as well as explicitly namespacing pointers to page table entries
from the entries themselves by using adopting a 'p' suffix for the former
(as is sometimes used elsewhere in the kernel source).
Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
References to CPU part number MIDR_QCOM_FALKOR were dropped from the
mailing list patch due to mainline/arm64 branch dependency. So this
patch adds the missing part number.
Fixes: ec82b567a7 ("arm64: Implement branch predictor hardening for Falkor")
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Update the ACPICA kernel code to upstream revision 20180105 including:
* Assorted fixes (Jung-uk Kim).
* Support for X32 ABI compilation (Anuj Mittal).
* Update of ACPICA copyrights to 2018 (Bob Moore).
- Prepare for future modifications to avoid executing the _STA control
method too early (Hans de Goede).
- Make the processor performance control library code ignore _PPC
notifications if they cannot be handled and fix up the C1 idle
state definition when it is used as a fallback state (Chen Yu,
Yazen Ghannam).
- Make it possible to use the SPCR table on x86 and to replace the
original IORT table with a new one from initrd (Prarit Bhargava,
Shunyong Yang).
- Add battery-related quirks for Asus UX360UA and UX410UAK and add
quirks for table parsing on Dell XPS 9570 and Precision M5530
(Kai Heng Feng).
- Address static checker warnings in the CPPC code (Gustavo Silva).
- Avoid printing a raw pointer to the kernel log in the smart
battery driver (Greg Kroah-Hartman).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJafGvJAAoJEILEb/54YlRxiusQAKUa+OM/oxTJkOEfGGRM8NlS
Hq/PaL/TnAj3nCoZN9fM38mI4gkxqu3eVMv6kfiqRe8VYmUX9r9tRbQ9kxvEYa7n
s6Dl+wdC9UND20QJkYVzPlaXbPuZyLFHt4Fkb1hp+HAGgNNYqc4e0lJvI82F2pdo
im1UFI84jg9UQV4WpUJL6ny2c/RMNtpUV5fOKFD8lkvBvVe7mtZTZ+1nZDeqXGkV
jzdrVTHLUEDhjS1o0TBmEsJGNeGOqnK/f+m8Rq4397guPAQQq18MYNC68SzhuGjP
iqhvIvI9sF197i66l/qgsubBifOV4At8Wb0LA5cU8CQLLpEW8GDktz/kucVHyzJ4
cVKuPXptBwwtPbNFHWO8reTUFMAnP7IpjtC31ntr6xWRQCiXv0/i2hRRN54g9T7e
FAOBmmys5DKFOq50OB5WdD3/Qz5OUuVgdbrSxNFARIZpQFtUn7Np2/nmNpPgrrcl
77hO8dpeXUTVvM4HpRQN1+r0KOTLfTAvWV7LYLAjCF9ivc0Vop/tYZQ2VEMSUEFD
SGKC30mGC4pphAjxcSYV282JR7Jx7arQ71ZA5uYTRRuxnEQd/2MC71fNjrFmCgUW
1Pumw0Pw6eZRjj1FZ/pj0X5lm7AlZj0dVzsJFgNb0FcJW0nOhN3czQrA4igoSVng
B2sRv9U8YDnDtzHyTPrY
=rVdp
-----END PGP SIGNATURE-----
Merge tag 'acpi-part2-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These are mostly fixes and cleanups, a few new quirks, a couple of
updates related to the handling of ACPI tables and ACPICA copyrights
refreshment.
Specifics:
- Update the ACPICA kernel code to upstream revision 20180105
including:
* Assorted fixes (Jung-uk Kim)
* Support for X32 ABI compilation (Anuj Mittal)
* Update of ACPICA copyrights to 2018 (Bob Moore)
- Prepare for future modifications to avoid executing the _STA
control method too early (Hans de Goede)
- Make the processor performance control library code ignore _PPC
notifications if they cannot be handled and fix up the C1 idle
state definition when it is used as a fallback state (Chen Yu,
Yazen Ghannam)
- Make it possible to use the SPCR table on x86 and to replace the
original IORT table with a new one from initrd (Prarit Bhargava,
Shunyong Yang)
- Add battery-related quirks for Asus UX360UA and UX410UAK and add
quirks for table parsing on Dell XPS 9570 and Precision M5530 (Kai
Heng Feng)
- Address static checker warnings in the CPPC code (Gustavo Silva)
- Avoid printing a raw pointer to the kernel log in the smart battery
driver (Greg Kroah-Hartman)"
* tag 'acpi-part2-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: sbshc: remove raw pointer from printk() message
ACPI: SPCR: Make SPCR available to x86
ACPI / CPPC: Use 64-bit arithmetic instead of 32-bit
ACPI / tables: Add IORT to injectable table list
ACPI / bus: Parse tables as term_list for Dell XPS 9570 and Precision M5530
ACPICA: Update version to 20180105
ACPICA: All acpica: Update copyrights to 2018
ACPI / processor: Set default C1 idle state description
ACPI / battery: Add quirk for Asus UX360UA and UX410UAK
ACPI: processor_perflib: Do not send _PPC change notification if not ready
ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs
ACPI / bus: Do not call _STA on battery devices with unmet dependencies
PCI: acpiphp_ibm: prepare for acpi_get_object_info() no longer returning status
ACPI: export acpi_bus_get_status_handle()
ACPICA: Add a missing pair of parentheses
ACPICA: Prefer ACPI_TO_POINTER() over ACPI_ADD_PTR()
ACPICA: Avoid NULL pointer arithmetic
ACPICA: Linux: add support for X32 ABI compilation
ACPI / video: Use true for boolean value
Spectre v1 mitigation:
- back-end version of array_index_mask_nospec()
- masking of the syscall number to restrict speculation through the
syscall table
- masking of __user pointers prior to deference in uaccess routines
Spectre v2 mitigation update:
- using the new firmware SMC calling convention specification update
- removing the current PSCI GET_VERSION firmware call mitigation as
vendors are deploying new SMCCC-capable firmware
- additional branch predictor hardening for synchronous exceptions and
interrupts while in user mode
Meltdown v3 mitigation update for Cavium Thunder X: unaffected but
hardware erratum gets in the way. The kernel now starts with the page
tables mapped as global and switches to non-global if kpti needs to be
enabled.
Other:
- Theoretical trylock bug fixed
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlp8lqcACgkQa9axLQDI
XvH2lxAAnsYqthpGQ11MtDJB+/UiBAFkg9QWPDkwrBDvNhgpll+J0VQuCN1QJ2GX
qQ8rkv8uV+y4Fqr8hORGJy5At+0aI63ZCJ72RGkZTzJAtbFbFGIDHP7RhAEIGJBS
Lk9kDZ7k39wLEx30UXIFYTTVzyHar397TdI7vkTcngiTzZ8MdFATfN/hiKO906q3
14pYnU9Um4aHUdcJ+FocL3dxvdgniuuMBWoNiYXyOCZXjmbQOnDNU2UrICroV8lS
mB+IHNEhX1Gl35QzNBtC0ET+aySfHBMJmM5oln+uVUljIGx6En1WLj6mrHYcx8U2
rIBm5qO/X/4iuzYPGkxwQtpjq3wPYxsSUnMdKJrsUZqAfy2QeIhFx6XUtJsZPB2J
/lgls5xSXMOS7oiOQtmVjcDLBURDmYXGwljXR4n4jLm4CT1V9qSLcKHu1gdFU9Mq
VuMUdPOnQub1vqKndi154IoYDTo21jAib2ktbcxpJfSJnDYoit4Gtnv7eWY+M3Pd
Toaxi8htM2HSRwbvslHYGW8ZcVpI79Jit+ti7CsFg7m9Lvgs0zxcnNui4uPYDymT
jh2JYxuirIJbX9aGGhnmkNhq9REaeZJg9LA2JM8S77FCHN3bnlSdaG6wy899J6EI
lK4anCuPQKKKhUia/dc1MeKwrmmC18EfPyGUkOzywg/jGwGCmZM=
=Y0TT
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
"As I mentioned in the last pull request, there's a second batch of
security updates for arm64 with mitigations for Spectre/v1 and an
improved one for Spectre/v2 (via a newly defined firmware interface
API).
Spectre v1 mitigation:
- back-end version of array_index_mask_nospec()
- masking of the syscall number to restrict speculation through the
syscall table
- masking of __user pointers prior to deference in uaccess routines
Spectre v2 mitigation update:
- using the new firmware SMC calling convention specification update
- removing the current PSCI GET_VERSION firmware call mitigation as
vendors are deploying new SMCCC-capable firmware
- additional branch predictor hardening for synchronous exceptions
and interrupts while in user mode
Meltdown v3 mitigation update:
- Cavium Thunder X is unaffected but a hardware erratum gets in the
way. The kernel now starts with the page tables mapped as global
and switches to non-global if kpti needs to be enabled.
Other:
- Theoretical trylock bug fixed"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (38 commits)
arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support
arm/arm64: smccc: Implement SMCCC v1.1 inline primitive
arm/arm64: smccc: Make function identifiers an unsigned quantity
firmware/psci: Expose SMCCC version through psci_ops
firmware/psci: Expose PSCI conduit
arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling
arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
arm/arm64: KVM: Turn kvm_psci_version into a static inline
arm/arm64: KVM: Advertise SMCCC v1.1
arm/arm64: KVM: Implement PSCI 1.0 support
arm/arm64: KVM: Add smccc accessors to PSCI code
arm/arm64: KVM: Add PSCI_VERSION helper
arm/arm64: KVM: Consolidate the PSCI include files
arm64: KVM: Increment PC after handling an SMC trap
arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
arm64: entry: Apply BP hardening for suspicious interrupts from EL0
arm64: entry: Apply BP hardening for high-priority synchronous exceptions
arm64: futex: Mask __user pointers prior to dereference
...
SPCR is currently only enabled or ARM64 and x86 can use SPCR to setup
an early console.
General fixes include updating Documentation & Kconfig (for x86),
updating comments, and changing parse_spcr() to acpi_parse_spcr(),
and earlycon_init_is_deferred to earlycon_acpi_spcr_enable to be
more descriptive.
On x86, many systems have a valid SPCR table but the table version is
not 2 so the table version check must be a warning.
On ARM64 when the kernel parameter earlycon is used both the early console
and console are enabled. On x86, only the earlycon should be enabled by
by default. Modify acpi_parse_spcr() to allow options for initializing
the early console and console separately.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mark Salter <msalter@redhat.com>
Tested-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
with bitmap_{from,to}_arr32 over the kernel. Additionally to it:
* __check_eq_bitmap() now takes single nbits argument.
* __check_eq_u32_array is not used in new test but may be used in
future. So I don't remove it here, but annotate as __used.
Tested on arm64 and 32-bit BE mips.
[arnd@arndb.de: perf: arm_dsu_pmu: convert to bitmap_from_arr32]
Link: http://lkml.kernel.org/r/20180201172508.5739-2-ynorov@caviumnetworks.com
[ynorov@caviumnetworks.com: fix net/core/ethtool.c]
Link: http://lkml.kernel.org/r/20180205071747.4ekxtsbgxkj5b2fz@yury-thinkpad
Link: http://lkml.kernel.org/r/20171228150019.27953-2-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: David Decotigny <decot@googlers.com>,
Cc: David S. Miller <davem@davemloft.net>,
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we've standardised on SMCCC v1.1 to perform the branch
prediction invalidation, let's drop the previous band-aid.
If vendors haven't updated their firmware to do SMCCC 1.1, they
haven't updated PSCI either, so we don't loose anything.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add the detection and runtime code for ARM_SMCCC_ARCH_WORKAROUND_1.
It is lovely. Really.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It is possible to take an IRQ from EL0 following a branch to a kernel
address in such a way that the IRQ is prioritised over the instruction
abort. Whilst an attacker would need to get the stars to align here,
it might be sufficient with enough calibration so perform BP hardening
in the rare case that we see a kernel address in the ELR when handling
an IRQ from EL0.
Reported-by: Dan Hettena <dhettena@nvidia.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Software-step and PC alignment fault exceptions have higher priority than
instruction abort exceptions, so apply the BP hardening hooks there too
if the user PC appears to reside in kernel space.
Reported-by: Dan Hettena <dhettena@nvidia.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Like we've done for get_user and put_user, ensure that user pointers
are masked before invoking the underlying __arch_{clear,copy_*}_user
operations.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In a similar manner to array_index_mask_nospec, this patch introduces an
assembly macro (mask_nospec64) which can be used to bound a value under
speculation. This macro is then used to ensure that the indirect branch
through the syscall table is bounded under speculation, with out-of-range
addresses speculating as calls to sys_io_setup (0).
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, USER_DS represents an exclusive limit while KERNEL_DS is
inclusive. In order to do some clever trickery for speculation-safe
masking, we need them both to behave equivalently - there aren't enough
bits to make KERNEL_DS exclusive, so we have precisely one option. This
also happens to correct a longstanding false negative for a range
ending on the very top byte of kernel memory.
Mark Rutland points out that we've actually got the semantics of
addresses vs. segments muddled up in most of the places we need to
amend, so shuffle the {USER,KERNEL}_DS definitions around such that we
can correct those properly instead of just pasting "-1"s everywhere.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The identity map is mapped as both writeable and executable by the
SWAPPER_MM_MMUFLAGS and this is relied upon by the kpti code to manage
a synchronisation flag. Update the .pushsection flags to reflect the
actual mapping attributes.
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
pte_to_phys lives in assembler.h and takes its destination register as
the first argument. Move phys_to_pte out of head.S to sit with its
counterpart and rejig it to follow the same calling convention.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We don't fully understand the Cavium ThunderX erratum, but it appears
that mapping the kernel as nG can lead to horrible consequences such as
attempting to execute userspace from kernel context. Since kpti isn't
enabled for these CPUs anyway, simplify the comment justifying the lack
of post_ttbr_update_workaround in the exception trampoline.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since AArch64 assembly instructions take the destination register as
their first operand, do the same thing for the phys_to_ttbr macro.
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cavium ThunderX's erratum 27456 results in a corruption of icache
entries that are loaded from memory that is mapped as non-global
(i.e. ASID-tagged).
As KPTI is based on memory being mapped non-global, let's prevent
it from kicking in if this erratum is detected.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[will: Update comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Defaulting to global mappings for kernel space is generally good for
performance and appears to be necessary for Cavium ThunderX. If we
subsequently decide that we need to enable kpti, then we need to rewrite
our existing page table entries to be non-global. This is fiddly, and
made worse by the possible use of contiguous mappings, which require
a strict break-before-make sequence.
Since the enable callback runs on each online CPU from stop_machine
context, we can have all CPUs enter the idmap, where secondaries can
wait for the primary CPU to rewrite swapper with its MMU off. It's all
fairly horrible, but at least it only runs once.
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.
When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.
1) A System Error Interrupt (SEI) being raised by the Falkor core due
to the errant memory access attempting to access a region of memory
that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
memory. This behavior may only occur if the instruction cache is
disabled prior to or coincident with translation being changed from
enabled to disabled.
The conditions leading to this erratum will not occur when either of the
following occur:
1) A higher exception level disables translation of a lower exception level
(e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).
To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: David Sehr <sehr@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maged Michael <maged.michael@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-11-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull printk updates from Petr Mladek:
- Add a console_msg_format command line option:
The value "default" keeps the old "[time stamp] text\n" format. The
value "syslog" allows to see the syslog-like "<log
level>[timestamp] text" format.
This feature was requested by people doing regression tests, for
example, 0day robot. They want to have both filtered and full logs
at hands.
- Reduce the risk of softlockup:
Pass the console owner in a busy loop.
This is a new approach to the old problem. It was first proposed by
Steven Rostedt on Kernel Summit 2017. It marks a context in which
the console_lock owner calls console drivers and could not sleep.
On the other side, printk() callers could detect this state and use
a busy wait instead of a simple console_trylock(). Finally, the
console_lock owner checks if there is a busy waiter at the end of
the special context and eventually passes the console_lock to the
waiter.
The hand-off works surprisingly well and helps in many situations.
Well, there is still a possibility of the softlockup, for example,
when the flood of messages stops and the last owner still has too
much to flush.
There is increasing number of people having problems with
printk-related softlockups. We might eventually need to get better
solution. Anyway, this looks like a good start and promising
direction.
- Do not allow to schedule in console_unlock() called from printk():
This reverts an older controversial commit. The reschedule helped
to avoid softlockups. But it also slowed down the console output.
This patch is obsoleted by the new console waiter logic described
above. In fact, the reschedule made the hand-off less effective.
- Deprecate "%pf" and "%pF" format specifier:
It was needed on ia64, ppc64 and parisc64 to dereference function
descriptors and show the real function address. It is done
transparently by "%ps" and "pS" format specifier now.
Sergey Senozhatsky found that all the function descriptors were in
a special elf section and could be easily detected.
- Remove printk_symbol() API:
It has been obsoleted by "%pS" format specifier, and this change
helped to remove few continuous lines and a less intuitive old API.
- Remove redundant memsets:
Sergey removed unnecessary memset when processing printk.devkmsg
command line option.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits)
printk: drop redundant devkmsg_log_str memsets
printk: Never set console_may_schedule in console_trylock()
printk: Hide console waiter logic into helpers
printk: Add console owner and waiter logic to load balance console writes
kallsyms: remove print_symbol() function
checkpatch: add pF/pf deprecation warning
symbol lookup: introduce dereference_symbol_descriptor()
parisc64: Add .opd based function descriptor dereference
powerpc64: Add .opd based function descriptor dereference
ia64: Add .opd based function descriptor dereference
sections: split dereference_function_descriptor()
openrisc: Fix conflicting types for _exext and _stext
lib: do not use print_symbol()
irq debug: do not use print_symbol()
sysfs: do not use print_symbol()
drivers: do not use print_symbol()
x86: do not use print_symbol()
unicore32: do not use print_symbol()
sh: do not use print_symbol()
mn10300: do not use print_symbol()
...
Pull siginfo cleanups from Eric Biederman:
"Long ago when 2.4 was just a testing release copy_siginfo_to_user was
made to copy individual fields to userspace, possibly for efficiency
and to ensure initialized values were not copied to userspace.
Unfortunately the design was complex, it's assumptions unstated, and
humans are fallible and so while it worked much of the time that
design failed to ensure unitialized memory is not copied to userspace.
This set of changes is part of a new design to clean up siginfo and
simplify things, and hopefully make the siginfo handling robust enough
that a simple inspection of the code can be made to ensure we don't
copy any unitializied fields to userspace.
The design is to unify struct siginfo and struct compat_siginfo into a
single definition that is shared between all architectures so that
anyone adding to the set of information shared with struct siginfo can
see the whole picture. Hopefully ensuring all future si_code
assignments are arch independent.
The design is to unify copy_siginfo_to_user32 and
copy_siginfo_from_user32 so that those function are complete and cope
with all of the different cases documented in signinfo_layout. I don't
think there was a single implementation of either of those functions
that was complete and correct before my changes unified them.
The design is to introduce a series of helpers including
force_siginfo_fault that take the values that are needed in struct
siginfo and build the siginfo structure for their callers. Ensuring
struct siginfo is built correctly.
The remaining work for 4.17 (unless someone thinks it is post -rc1
material) is to push usage of those helpers down into the
architectures so that architecture specific code will not need to deal
with the fiddly work of intializing struct siginfo, and then when
struct siginfo is guaranteed to be fully initialized change copy
siginfo_to_user into a simple wrapper around copy_to_user.
Further there is work in progress on the issues that have been
documented requires arch specific knowledge to sort out.
The changes below fix or at least document all of the issues that have
been found with siginfo generation. Then proceed to unify struct
siginfo the 32 bit helpers that copy siginfo to and from userspace,
and generally clean up anything that is not arch specific with regards
to siginfo generation.
It is a lot but with the unification you can of siginfo you can
already see the code reduction in the kernel"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits)
signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr
mm/memory_failure: Remove unused trapno from memory_failure
signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed
signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap
signal: Helpers for faults with specialized siginfo layouts
signal: Add send_sig_fault and force_sig_fault
signal: Replace memset(info,...) with clear_siginfo for clarity
signal: Don't use structure initializers for struct siginfo
signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered
ptrace: Use copy_siginfo in setsiginfo and getsiginfo
signal: Unify and correct copy_siginfo_to_user32
signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32
signal: Unify and correct copy_siginfo_from_user32
signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED
signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h
signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h
signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h
signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h
signal/powerpc: Remove redefinition of NSIGTRAP on powerpc
signal: Move addr_lsb into the _sigfault union for clarity
...
- Security mitigations:
- variant 2: invalidating the branch predictor with a call to secure firmware
- variant 3: implementing KPTI for arm64
- 52-bit physical address support for arm64 (ARMv8.2)
- arm64 support for RAS (firmware first only) and SDEI (software
delegated exception interface; allows firmware to inject a RAS error
into the OS)
- Perf support for the ARM DynamIQ Shared Unit PMU
- CPUID and HWCAP bits updated for new floating point multiplication
instructions in ARMv8.4
- Removing some virtual memory layout printks during boot
- Fix initial page table creation to cope with larger than 32M kernel
images when 16K pages are enabled
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlpwxDMACgkQa9axLQDI
XvF55BAAniMpxPXnYNfv6l7/4O8eKo1lJIaG1wbej4JRZ/rT3K4Z3OBXW1dKHO8d
/PTbVmZ90IqIGROkoDrE+6xyjjn9yK3uuW4ytN2zQkBa8VFaHAnHlX+zKQcuwy9f
yxwiHk+C7vK5JR7mpXTazjRknsUv1MPtlTt7DQrSdq0KRDJVDNFC+grmbew2rz0X
cjQDqZqgzuFyrKxdiQVjDmc3zH9NsNBhDo0hlGHf2jK6bGJsAPtI8M2JcLrK8ITG
Ye/dD7BJp1mWD8ff0BPaMxu24qfAMNLH8f2dpTa986/H78irVz7i/t5HG0/1+5Jh
EE4OFRTKZ59Qgyo1zWcaJvdp8YjiaX/L4PWJg8CxM5OhP9dIac9ydcFQfWzpKpUs
xyZfmK6XliGFReAkVOOf5tEqFUDhMtsqhzPYmbmU1lp61wmSYIZ8CTenpWWCJSRO
NOGyG1X2uFBvP69+iPNlfTGz1r7tg1URY5iO8fUEIhY8LrgyORkiqw4OvPEgnMXP
Ngy+dXhyvnps2AAWbSX0O4puRlTgEYLT5KaMLzH/+gWsXATT0rzUCD/aOwUQq/Y7
SWXZHkb3jpmOZZnzZsLL2MNzEIPCFBwSUE9fSv4dA9d/N6tUmlmZALJjHkfzCDpj
+mPsSmAMTj72kUYzm0b5GCtOu/iQ2kDWOZjOM1m4+v/B+f7JoEE=
=iEjP
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"The main theme of this pull request is security covering variants 2
and 3 for arm64. I expect to send additional patches next week
covering an improved firmware interface (requires firmware changes)
for variant 2 and way for KPTI to be disabled on unaffected CPUs
(Cavium's ThunderX doesn't work properly with KPTI enabled because of
a hardware erratum).
Summary:
- Security mitigations:
- variant 2: invalidate the branch predictor with a call to
secure firmware
- variant 3: implement KPTI for arm64
- 52-bit physical address support for arm64 (ARMv8.2)
- arm64 support for RAS (firmware first only) and SDEI (software
delegated exception interface; allows firmware to inject a RAS
error into the OS)
- perf support for the ARM DynamIQ Shared Unit PMU
- CPUID and HWCAP bits updated for new floating point multiplication
instructions in ARMv8.4
- remove some virtual memory layout printks during boot
- fix initial page table creation to cope with larger than 32M kernel
images when 16K pages are enabled"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (104 commits)
arm64: Fix TTBR + PAN + 52-bit PA logic in cpu_do_switch_mm
arm64: Turn on KPTI only on CPUs that need it
arm64: Branch predictor hardening for Cavium ThunderX2
arm64: Run enable method for errata work arounds on late CPUs
arm64: Move BP hardening to check_and_switch_context
arm64: mm: ignore memory above supported physical address size
arm64: kpti: Fix the interaction between ASID switching and software PAN
KVM: arm64: Emulate RAS error registers and set HCR_EL2's TERR & TEA
KVM: arm64: Handle RAS SErrors from EL2 on guest exit
KVM: arm64: Handle RAS SErrors from EL1 on guest exit
KVM: arm64: Save ESR_EL2 on guest SError
KVM: arm64: Save/Restore guest DISR_EL1
KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.
KVM: arm/arm64: mask/unmask daif around VHE guests
arm64: kernel: Prepare for a DISR user
arm64: Unconditionally enable IESB on exception entry/return for firmware-first
arm64: kernel: Survive corrected RAS errors notified by SError
arm64: cpufeature: Detect CPU RAS Extentions
arm64: sysreg: Move to use definitions for all the SCTLR bits
arm64: cpufeature: __this_cpu_has_cap() shouldn't stop early
...
Whitelist Broadcom Vulcan/Cavium ThunderX2 processors in
unmap_kernel_at_el0(). These CPUs are not vulnerable to
CVE-2017-5754 and do not need KPTI when KASLR is off.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Use PSCI based mitigation for speculative execution attacks targeting
the branch predictor. We use the same mechanism as the one used for
Cortex-A CPUs, we expect the PSCI version call to have a side effect
of clearing the BTBs.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When a CPU is brought up after we have finalised the system
wide capabilities (i.e, features and errata), we make sure the
new CPU doesn't need a new errata work around which has not been
detected already. However we don't run enable() method on the new
CPU for the errata work arounds already detected. This could
cause the new CPU running without potential work arounds.
It is upto the "enable()" method to decide if this CPU should
do something about the errata.
Fixes: commit 6a6efbb45b ("arm64: Verify CPU errata work arounds on hotplugged CPU")
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There are so many places that build struct siginfo by hand that at
least one of them is bound to get it wrong. A handful of cases in the
kernel arguably did just that when using the errno field of siginfo to
pass no errno values to userspace. The usage is limited to a single
si_code so at least does not mess up anything else.
Encapsulate this questionable pattern in a helper function so
that the userspace ABI is preserved.
Update all of the places that use this pattern to use the new helper
function.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The siginfo structure has all manners of holes with the result that a
structure initializer is not guaranteed to initialize all of the bits.
As we have to copy the structure to userspace don't even try to use
a structure initializer. Instead use clear_siginfo followed by initializing
selected fields. This gives a guarantee that uninitialized kernel memory
is not copied to userspace.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Instead of jumpping while !is_compat_task placee all of the code
inside of an if (is_compat_task) block. This allows the int i
variable to be properly limited to the compat block no matter how the
rest of ptrace_hbptriggered changes.
In a following change a non-variable declaration will preceed
was made independent to ensure the code is easy to review.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
With ARM64_SW_TTBR0_PAN enabled, the exception entry code checks the
active ASID to decide whether user access was enabled (non-zero ASID)
when the exception was taken. On return from exception, if user access
was previously disabled, it re-instates TTBR0_EL1 from the per-thread
saved value (updated in switch_mm() or efi_set_pgd()).
Commit 7655abb953 ("arm64: mm: Move ASID from TTBR0 to TTBR1") makes a
TTBR0_EL1 + ASID switching non-atomic. Subsequently, commit 27a921e757
("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN") changes the
__uaccess_ttbr0_disable() function and asm macro to first write the
reserved TTBR0_EL1 followed by the ASID=0 update in TTBR1_EL1. If an
exception occurs between these two, the exception return code will
re-instate a valid TTBR0_EL1. Similar scenario can happen in
cpu_switch_mm() between setting the reserved TTBR0_EL1 and the ASID
update in cpu_do_switch_mm().
This patch reverts the entry.S check for ASID == 0 to TTBR0_EL1 and
disables the interrupts around the TTBR0_EL1 and ASID switching code in
__uaccess_ttbr0_disable(). It also ensures that, when returning from the
EFI runtime services, efi_set_pgd() doesn't leave a non-zero ASID in
TTBR1_EL1 by using uaccess_ttbr0_{enable,disable}.
The accesses to current_thread_info()->ttbr0 are updated to use
READ_ONCE/WRITE_ONCE.
As a safety measure, __uaccess_ttbr0_enable() always masks out any
existing non-zero ASID TTBR1_EL1 before writing in the new ASID.
Fixes: 27a921e757 ("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN")
Acked-by: Will Deacon <will.deacon@arm.com>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Co-developed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.
There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
The current SError from EL2 code unmasks SError and tries to fence any
pending SError into a single instruction window. It then leaves SError
unmasked.
With the v8.2 RAS Extensions we may take an SError for a 'corrected'
error, but KVM is only able to handle SError from EL2 if they occur
during this single instruction window...
The RAS Extensions give us a new instruction to synchronise and
consume SErrors. The RAS Extensions document (ARM DDI0587),
'2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
SError interrupts generated by 'instructions, translation table walks,
hardware updates to the translation tables, and instruction fetches on
the same PE'. This makes ESB equivalent to KVMs existing
'dsb, mrs-daifclr, isb' sequence.
Use the alternatives to synchronise and consume any SError using ESB
instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
in the exit_code so that we can restart the vcpu if it turns out this
SError has no impact on the vcpu.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
KVM would like to consume any pending SError (or RAS error) after guest
exit. Today it has to unmask SError and use dsb+isb to synchronise the
CPU. With the RAS extensions we can use ESB to synchronise any pending
SError.
Add the necessary macros to allow DISR to be read and converted to an
ESR.
We clear the DISR register when we enable the RAS cpufeature, and the
kernel has not executed any ESB instructions. Any value we find in DISR
must have belonged to firmware. Executing an ESB instruction is the
only way to update DISR, so we can expect firmware to have handled
any deferred SError. By the same logic we clear DISR in the idle path.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Prior to v8.2, SError is an uncontainable fatal exception. The v8.2 RAS
extensions use SError to notify software about RAS errors, these can be
contained by the Error Syncronization Barrier.
An ACPI system with firmware-first may use SError as its 'SEI'
notification. Future patches may add code to 'claim' this SError as a
notification.
Other systems can distinguish these RAS errors from the SError ESR and
use the AET bits and additional data from RAS-Error registers to handle
the error. Future patches may add this kernel-first handling.
Without support for either of these we will panic(), even if we received
a corrected error. Add code to decode the severity of RAS errors. We can
safely ignore contained errors where the CPU can continue to make
progress. For all other errors we continue to panic().
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ARM's v8.2 Extentions add support for Reliability, Availability and
Serviceability (RAS). On CPUs with these extensions system software
can use additional barriers to isolate errors and determine if faults
are pending. Add cpufeature detection.
Platform level RAS support may require additional firmware support.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[Rebased added config option, reworded commit message]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
__cpu_setup() configures SCTLR_EL1 using some hard coded hex masks,
and el2_setup() duplicates some this when setting RES1 bits.
Lets make this the same as KVM's hyp_init, which uses named bits.
First, we add definitions for all the SCTLR_EL{1,2} bits, the RES{1,0}
bits, and those we want to set or clear.
Add a build_bug checks to ensures all bits are either set or clear.
This means we don't need to preserve endian-ness configuration
generated elsewhere.
Finally, move the head.S and proc.S users of these hard-coded masks
over to the macro versions.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
this_cpu_has_cap() tests caps->desc not caps->matches, so it stops
walking the list when it finds a 'silent' feature, instead of
walking to the end of the list.
Prior to v4.6's 644c2ae198 ("arm64: cpufeature: Test 'matches' pointer
to find the end of the list") we always tested desc to find the end of
a capability list. This was changed for dubious things like PAN_NOT_UAO.
v4.7's e3661b128e ("arm64: Allow a capability to be checked on
single CPU") added this_cpu_has_cap() using the old desc style test.
CC: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When refactoring the sigreturn code to handle SVE, I changed the
sigreturn implementation to store the new FPSIMD state from the
user sigframe into task_struct before reloading the state into the
CPU regs. This makes it easier to convert the data for SVE when
needed.
However, it turns out that the fpsimd_state structure passed into
fpsimd_update_current_state is not fully initialised, so assigning
the structure as a whole corrupts current->thread.fpsimd_state.cpu
with uninitialised data.
This means that if the garbage data written to .cpu happens to be a
valid cpu number, and the task is subsequently migrated to the cpu
identified by the that number, and then tries to enter userspace,
the CPU FPSIMD regs will be assumed to be correct for the task and
not reloaded as they should be. This can result in returning to
userspace with the FPSIMD registers containing data that is stale or
that belongs to another task or to the kernel.
Knowingly handing around a kernel structure that is incompletely
initialised with user data is a potential source of mistakes,
especially across source file boundaries. To help avoid a repeat
of this issue, this patch adapts the relevant internal API to hand
around the user-accessible subset only: struct user_fpsimd_state.
To avoid future surprises, this patch also converts all uses of
struct fpsimd_state that really only access the user subset, to use
struct user_fpsimd_state. A few missing consts are added to
function prototypes for good measure.
Thanks to Will for spotting the cause of the bug here.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It isn't entirely obvious if we're using software PAN because we
don't say anything about it in the boot log. But if we're using
hardware PAN we'll print a nice CPU feature message indicating
it. Add a print for software PAN too so we know if it's being
used or not.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Among the existing architecture specific versions of
copy_siginfo_to_user32 there are several different implementation
problems. Some architectures fail to handle all of the cases in in
the siginfo union. Some architectures perform a blind copy of the
siginfo union when the si_code is negative. A blind copy suggests the
data is expected to be in 32bit siginfo format, which means that
receiving such a signal via signalfd won't work, or that the data is
in 64bit siginfo and the code is copying nonsense to userspace.
Create a single instance of copy_siginfo_to_user32 that all of the
architectures can share, and teach it to handle all of the cases in
the siginfo union correctly, with the assumption that siginfo is
stored internally to the kernel is 64bit siginfo format.
A special case is made for x86 x32 format. This is needed as presence
of both x32 and ia32 on x86_64 results in two different 32bit signal
formats. By allowing this small special case there winds up being
exactly one code base that needs to be maintained between all of the
architectures. Vastly increasing the testing base and the chances of
finding bugs.
As the x86 copy of copy_siginfo_to_user32 the call of the x86
signal_compat_build_tests were moved into sigaction_compat_abi, so
that they will keep running.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function copy_siginfo_from_user32 is used for two things, in ptrace
since the dawn of siginfo for arbirarily modifying a signal that
user space sees, and in sigqueueinfo to send a signal with arbirary
siginfo data.
Create a single copy of copy_siginfo_from_user32 that all architectures
share, and teach it to handle all of the cases in the siginfo union.
In the generic version of copy_siginfo_from_user32 ensure that all
of the fields in siginfo are initialized so that the siginfo structure
can be safely copied to userspace if necessary.
When copying the embedded sigval union copy the si_int member. That
ensures the 32bit values passes through the kernel unchanged.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Sometimes a single capability could be listed multiple times with
differing matches(), e.g, CPU errata for different MIDR versions.
This breaks verify_local_cpu_feature() and this_cpu_has_cap() as
we stop checking for a capability on a CPU with the first
entry in the given table, which is not sufficient. Make sure we
run the checks for all entries of the same capability. We do
this by fixing __this_cpu_has_cap() to run through all the
entries in the given table for a match and reuse it for
verify_local_cpu_feature().
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The Kryo CPUs are also affected by the Falkor 1003 errata, so
we need to do the same workaround on Kryo CPUs. The MIDR is
slightly more complicated here, where the PART number is not
always the same when looking at all the bits from 15 to 4. Drop
the lower 8 bits and just look at the top 4 to see if it's '2'
and then consider those as Kryo CPUs. This covers all the
combinations without having to list them all out.
Fixes: 38fd94b027 ("arm64: Work around Falkor erratum 1003")
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently the early assembler page table code assumes that precisely
1xpgd, 1xpud, 1xpmd are sufficient to represent the early kernel text
mappings.
Unfortunately this is rarely the case when running with a 16KB granule,
and we also run into limits with 4KB granule when building much larger
kernels.
This patch re-writes the early page table logic to compute indices of
mappings for each level of page table, and if multiple indices are
required, the next-level page table is scaled up accordingly.
Also the required size of the swapper_pg_dir is computed at link time
to cover the mapping [KIMAGE_ADDR + VOFFSET, _end]. When KASLR is
enabled, an extra page is set aside for each level that may require extra
entries at runtime.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The trampoline page tables are positioned after the early page tables in
the kernel linker script.
As we are about to change the early page table logic to resolve the
swapper size at link time as opposed to compile time, the
SWAPPER_DIR_SIZE variable (currently used to locate the trampline)
will be rendered unsuitable for low level assembler.
This patch solves this issue by moving the trampoline before the PAN
page tables. The offset to the trampoline from ttbr1 can then be
expressed by: PAGE_SIZE + RESERVED_TTBR0_SIZE, which is available to the
entry assembler.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently one resolves the location of the reserved_ttbr0 for PAN by
taking a positive offset from swapper_pg_dir. In a future patch we wish
to extend the swapper s.t. its size is determined at link time rather
than comile time, rendering SWAPPER_DIR_SIZE unsuitable for such a low
level calculation.
In this patch we re-arrange the order of the linker script s.t. instead
one computes reserved_ttbr0 by subtracting RESERVED_TTBR0_SIZE from
swapper_pg_dir.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When CONFIG_UNMAP_KERNEL_AT_EL0 is set the SDEI entry point and the rest
of the kernel may be unmapped when we take an event. If this may be the
case, use an entry trampoline that can switch to the kernel page tables.
We can't use the provided PSTATE to determine whether to switch page
tables as we may have interrupted the kernel's entry trampoline, (or a
normal-priority event that interrupted the kernel's entry trampoline).
Instead test for a user ASID in ttbr1_el1.
Save a value in regs->addr_limit to indicate whether we need to restore
the original ASID when returning from this event. This value is only used
by do_page_fault(), which we don't call with the SDEI regs.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
SDEI inherits the 'use hvc' bit that is also used by PSCI. PSCI does all
its initialisation early, SDEI does its late.
Remove the __init annotation from acpi_psci_use_hvc().
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Today the arm64 arch code allocates an extra IRQ stack per-cpu. If we
also have SDEI and VMAP stacks we need two extra per-cpu VMAP stacks.
Move the VMAP stack allocation out to a helper in a new header file.
This avoids missing THREADINFO_GFP, or getting the all-important alignment
wrong.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that KVM uses tpidr_el2 in the same way as Linux's cpu_offset in
tpidr_el1, merge the two. This saves KVM from save/restoring tpidr_el1
on VHE hosts, and allows future code to blindly access per-cpu variables
without triggering world-switch.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Setting si_code to 0 results in a userspace seeing an si_code of 0.
This is the same si_code as SI_USER. Posix and common sense requires
that SI_USER not be a signal specific si_code. As such this use of 0
for the si_code is a pretty horribly broken ABI.
Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
value of __SI_KILL and now sees a value of SIL_KILL with the result
that uid and pid fields are copied and which might copying the si_addr
field by accident but certainly not by design. Making this a very
flakey implementation.
Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return
SIL_FAULT and the appropriate fields will be reliably copied.
But folks this is a new and unique kind of bad. This is massively
untested code bad. This is inventing new and unique was to get
siginfo wrong bad. This is don't even think about Posix or what
siginfo means bad. This is lots of eyeballs all missing the fact
that the code does the wrong thing bad. This is getting stuck
and keep making the same mistake bad.
I really hope we can find a non userspace breaking fix for this on a
port as new as arm64.
Possible ABI fixes include:
- Send the signal without siginfo
- Don't generate a signal
- Possibly assign and use an appropriate si_code
- Don't handle cases which can't happen
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arm-kernel@lists.infradead.org
Ref: 53631b54c8 ("arm64: Floating point and SIMD")
Ref: 32015c2356 ("arm64: exception: handle Synchronous External Abort")
Ref: 1d18c47c73 ("arm64: MMU fault handling and page table management")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
* 'for-next/perf' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
perf: ARM DynamIQ Shared Unit PMU support
dt-bindings: Document devicetree binding for ARM DSU PMU
arm_pmu: Use of_cpu_node_to_id helper
arm64: Use of_cpu_node_to_id helper for CPU topology parsing
irqchip: gic-v3: Use of_cpu_node_to_id helper
coresight: of: Use of_cpu_node_to_id helper
of: Add helper for mapping device node to logical CPU number
perf: Export perf_event_update_userpage
Falkor is susceptible to branch predictor aliasing and can
theoretically be attacked by malicious code. This patch
implements a mitigation for these attacks, preventing any
malicious entries from affecting other victim contexts.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
[will: fix label name when !CONFIG_KVM and remove references to MIDR_FALKOR]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cortex-A57, A72, A73 and A75 are susceptible to branch predictor aliasing
and can theoretically be attacked by malicious code.
This patch implements a PSCI-based mitigation for these CPUs when available.
The call into firmware will invalidate the branch predictor state, preventing
any malicious entries from affecting other victim contexts.
Co-developed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Aliasing attacks against CPU branch predictors can allow an attacker to
redirect speculative control flow on some CPUs and potentially divulge
information from one context to another.
This patch adds initial skeleton code behind a new Kconfig option to
enable implementation-specific mitigations against these attacks for
CPUs that are affected.
Co-developed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We will soon need to invoke a CPU-specific function pointer after changing
page tables, so move post_ttbr_update_workaround out into C code to make
this possible.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to invoke the CPU capability ->matches callback from the ->enable
callback for applying local-CPU workarounds, we need a handle on the
capability structure.
This patch passes a pointer to the capability structure to the ->enable
callback.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For non-KASLR kernels where the KPTI behaviour has not been overridden
on the command line we can use ID_AA64PFR0_EL1.CSV3 to determine whether
or not we should unmap the kernel whilst running at EL0.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Speculation attacks against the entry trampoline can potentially resteer
the speculative instruction stream through the indirect branch and into
arbitrary gadgets within the kernel.
This patch defends against these attacks by forcing a misprediction
through the return stack: a dummy BL instruction loads an entry into
the stack, so that the predicted program flow of the subsequent RET
instruction is to a branch-to-self instruction which is finally resolved
as a branch to the kernel vectors with speculation suppressed.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
print_symbol() is a very old API that has been obsoleted by %pS format
specifier in a normal printk() call.
Replace print_symbol() with a direct printk("%pS") call.
Link: http://lkml.kernel.org/r/20171211125025.2270-3-sergey.senozhatsky@gmail.com
To: Andrew Morton <akpm@linux-foundation.org>
To: Russell King <linux@armlinux.org.uk>
To: Catalin Marinas <catalin.marinas@arm.com>
To: Mark Salter <msalter@redhat.com>
To: Tony Luck <tony.luck@intel.com>
To: David Howells <dhowells@redhat.com>
To: Yoshinori Sato <ysato@users.sourceforge.jp>
To: Guan Xuetao <gxt@mprc.pku.edu.cn>
To: Borislav Petkov <bp@alien8.de>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Thomas Gleixner <tglx@linutronix.de>
To: Peter Zijlstra <peterz@infradead.org>
To: Vineet Gupta <vgupta@synopsys.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-sh@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
[pmladek@suse.com: updated commit message]
Signed-off-by: Petr Mladek <pmladek@suse.com>
ARM v8.4 extensions add new neon instructions for performing a
multiplication of each FP16 element of one vector with the corresponding
FP16 element of a second vector, and to add or subtract this without an
intermediate rounding to the corresponding FP32 element in a third vector.
This patch detects this feature and let the userspace know about it via a
HWCAP bit and MRS emulation.
Cc: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The UEFI memory map is a bit vague about how to interpret the
EFI_MEMORY_XP attribute when it is combined with EFI_MEMORY_RP and/or
EFI_MEMORY_WP, which have retroactively been redefined as cacheability
attributes rather than permission attributes.
So let's ignore EFI_MEMORY_XP if _RP and/or _WP are also set. In this
case, it is likely that they are being used to describe the capability
of the region (i.e., whether it has the controls to reconfigure it as
non-executable) rather than the nature of the contents of the region
(i.e., whether it contains data that we will never attempt to execute)
Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: Vasyl Gomonovych <gomonovych@gmail.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180102181042.19074-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make use of the new generic helper to convert an of_node of a CPU
to the logical CPU id in parsing the topology.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
CPU_PM_CPU_IDLE_ENTER_RETENTION skips calling cpu_pm_enter() and
cpu_pm_exit(). By not calling cpu_pm functions in idle entry/exit
paths we can reduce the latency involved in entering and exiting
the low power idle state.
On ARM64 based Qualcomm server platform we measured below overhead
for calling cpu_pm_enter and cpu_pm_exit for retention states.
workload: stress --hdd #CPUs --hdd-bytes 32M -t 30
Average overhead of cpu_pm_enter - 1.2us
Average overhead of cpu_pm_exit - 3.1us
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* for-next/52-bit-pa:
arm64: enable 52-bit physical address support
arm64: allow ID map to be extended to 52 bits
arm64: handle 52-bit physical addresses in page table entries
arm64: don't open code page table entry creation
arm64: head.S: handle 52-bit PAs in PTEs in early page table setup
arm64: handle 52-bit addresses in TTBR
arm64: limit PA size to supported range
arm64: add kconfig symbol to configure physical address size
Currently, when using VA_BITS < 48, if the ID map text happens to be
placed in physical memory above VA_BITS, we increase the VA size (up to
48) and create a new table level, in order to map in the ID map text.
This is okay because the system always supports 48 bits of VA.
This patch extends the code such that if the system supports 52 bits of
VA, and the ID map text is placed that high up, then we increase the VA
size accordingly, up to 52.
One difference from the current implementation is that so far the
condition of VA_BITS < 48 has meant that the top level table is always
"full", with the maximum number of entries, and an extra table level is
always needed. Now, when VA_BITS = 48 (and using 64k pages), the top
level table is not full, and we simply need to increase the number of
entries in it, instead of creating a new table level.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: reduce arguments to __create_hyp_mappings()]
[catalin.marinas@arm.com: reworked/renamed __cpu_uses_extended_idmap_level()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The top 4 bits of a 52-bit physical address are positioned at bits
12..15 of a page table entry. Introduce macros to convert between a
physical address and its placement in a table entry, and change all
macros/functions that access PTEs to use them.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: some long lines wrapped]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Instead of open coding the generation of page table entries, use the
macros/functions that exist for this - pfn_p*d and p*d_populate. Most
code in the kernel already uses these macros, this patch tries to fix
up the few places that don't. This is useful for the next patch in this
series, which needs to change the page table entry logic, and it's
better to have that logic in one place.
The KVM extended ID map is special, since we're creating a level above
CONFIG_PGTABLE_LEVELS and the required function isn't available. Leave
it as is and add a comment to explain it. (The normal kernel ID map code
doesn't need this change because its page tables are created in assembly
(__create_page_tables)).
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The top 4 bits of a 52-bit physical address are positioned at bits
12..15 in page table entries. Introduce a macro to move the bits there,
and change the early ID map and swapper table setup code to use it.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: additional comments for clarification]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The top 4 bits of a 52-bit physical address are positioned at bits 2..5
in the TTBR registers. Introduce a couple of macros to move the bits
there, and change all TTBR writers to use them.
Leave TTBR0 PAN code unchanged, to avoid complicating it. A system with
52-bit PA will have PAN anyway (because it's ARMv8.1 or later), and a
system without 52-bit PA can only use up to 48-bit PAs. A later patch in
this series will add a kconfig dependency to ensure PAN is configured.
In addition, when using 52-bit PA there is a special alignment
requirement on the top-level table. We don't currently have any VA_BITS
configuration that would violate the requirement, but one could be added
in the future, so add a compile-time BUG_ON to check for it.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: added TTBR_BADD_MASK_52 comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit 9de52a755c ("arm64: fpsimd: Fix failure to restore FPSIMD
state after signals") fixed an issue reported in our FPSIMD signal
restore code but inadvertently introduced another issue which tends to
manifest as random SEGVs in userspace.
The problem is that when we copy the struct fpsimd_state from the kernel
stack (populated from the signal frame) into the struct held in the
current thread_struct, we blindly copy uninitialised stack into the
"cpu" field, which means that context-switching of the FP registers is
no longer reliable.
This patch fixes the problem by copying only the user_fpsimd member of
struct fpsimd_state. We should really rework the function prototypes
to take struct user_fpsimd_state * instead, but let's just get this
fixed for now.
Cc: Dave Martin <Dave.Martin@arm.com>
Fixes: 9de52a755c ("arm64: fpsimd: Fix failure to restore FPSIMD state after signals")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, the SVE field in ID_AA64PFR0_EL1 is visible
unconditionally to userspace via the CPU ID register emulation,
irrespective of the kernel config. This means that if a kernel
configured with CONFIG_ARM64_SVE=n is run on SVE-capable hardware,
userspace will see SVE reported as present in the ID regs even
though the kernel forbids execution of SVE instructions.
This patch makes the exposure of the SVE field in ID_AA64PFR0_EL1
conditional on CONFIG_ARM64_SVE=y.
Since future architecture features are likely to encounter a
similar requirement, this patch adds a suitable helper macros for
use when declaring config-conditional ID register fields.
Fixes: 43994d824e ("arm64/sve: Detect SVE and activate runtime support")
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The only inclusion of asm/uaccess.h should be by linux/uaccess.h. All
other headers should use the latter.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.
When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.
1) A System Error Interrupt (SEI) being raised by the Falkor core due
to the errant memory access attempting to access a region of memory
that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
memory. This behavior may only occur if the instruction cache is
disabled prior to or coincident with translation being changed from
enabled to disabled.
The conditions leading to this erratum will not occur when either of the
following occur:
1) A higher exception level disables translation of a lower exception level
(e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).
To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The literal pool entry for identifying the vectors base is the only piece
of information in the trampoline page that identifies the true location
of the kernel.
This patch moves it into a page-aligned region of the .rodata section
and maps this adjacent to the trampoline text via an additional fixmap
entry, which protects against any accidental leakage of the trampoline
contents.
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There are now a handful of open-coded masks to extract the ASID from a
TTBR value, so introduce a TTBR_ASID_MASK and use that instead.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Allow explicit disabling of the entry trampoline on the kernel command
line (kpti=off) by adding a fake CPU feature (ARM64_UNMAP_KERNEL_AT_EL0)
that can be used to toggle the alternative sequences in our entry code and
avoid use of the trampoline altogether if desired. This also allows us to
make use of a static key in arm64_kernel_unmapped_at_el0().
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When unmapping the kernel at EL0, we use tpidrro_el0 as a scratch register
during exception entry from native tasks and subsequently zero it in
the kernel_ventry macro. We can therefore avoid zeroing tpidrro_el0
in the context-switch path for native tasks using the entry trampoline.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We rely on an atomic swizzling of TTBR1 when transitioning from the entry
trampoline to the kernel proper on an exception. We can't rely on this
atomicity in the face of Falkor erratum #E1003, so on affected cores we
can issue a TLB invalidation to invalidate the walk cache prior to
jumping into the kernel. There is still the possibility of a TLB conflict
here due to conflicting walk cache entries prior to the invalidation, but
this doesn't appear to be the case on these CPUs in practice.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Hook up the entry trampoline to our exception vectors so that all
exceptions from and returns to EL0 go via the trampoline, which swizzles
the vector base register accordingly. Transitioning to and from the
kernel clobbers x30, so we use tpidrro_el0 and far_el1 as scratch
registers for native tasks.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We will need to treat exceptions from EL0 differently in kernel_ventry,
so rework the macro to take the exception level as an argument and
construct the branch target using that.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The exception entry trampoline needs to be mapped at the same virtual
address in both the trampoline page table (which maps nothing else)
and also the kernel page table, so that we can swizzle TTBR1_EL1 on
exceptions from and return to EL0.
This patch maps the trampoline at a fixed virtual address in the fixmap
area of the kernel virtual address space, which allows the kernel proper
to be randomized with respect to the trampoline when KASLR is enabled.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
To allow unmapping of the kernel whilst running at EL0, we need to
point the exception vectors at an entry trampoline that can map/unmap
the kernel on entry/exit respectively.
This patch adds the trampoline page, although it is not yet plugged
into the vector table and is therefore unused.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
With the ASID now installed in TTBR1, we can re-enable ARM64_SW_TTBR0_PAN
by ensuring that we switch to a reserved ASID of zero when disabling
user access and restore the active user ASID on the uaccess enable path.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The post_ttbr0_update_workaround hook applies to any change to TTBRx_EL1.
Since we're using TTBR1 for the ASID, rename the hook to make it clearer
as to what it's doing.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When deciding whether to invalidate FPSIMD state cached in the cpu,
the backend function sve_flush_cpu_state() attempts to dereference
__this_cpu_read(fpsimd_last_state). However, this is not safe:
there is no guarantee that this task_struct pointer is still valid,
because the task could have exited in the meantime.
This means that we need another means to get the appropriate value
of TIF_SVE for the associated task.
This patch solves this issue by adding a cached copy of the TIF_SVE
flag in fpsimd_last_state, which we can check without dereferencing
the task pointer.
In particular, although this patch is not a KVM fix per se, this
means that this check is now done safely in the KVM world switch
path (which is currently the only user of this code).
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There is currently some duplicate logic to associate current's
FPSIMD context with the cpu when loading FPSIMD state into the cpu
regs.
Subsequent patches will update that logic, so in order to ensure it
only needs to be done in one place, this patch factors the relevant
code out into a new function fpsimd_bind_to_cpu().
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, loading of a task's fpsimd state into the CPU registers
is skipped if that task's state is already present in the registers
of that CPU.
However, the code relies on the struct fpsimd_state * (and by
extension struct task_struct *) to unambiguously identify a task.
There is a particular case in which this doesn't work reliably:
when a task exits, its task_struct may be recycled to describe a
new task.
Consider the following scenario:
1) Task P loads its fpsimd state onto cpu C.
per_cpu(fpsimd_last_state, C) := P;
P->thread.fpsimd_state.cpu := C;
2) Task X is scheduled onto C and loads its fpsimd state on C.
per_cpu(fpsimd_last_state, C) := X;
X->thread.fpsimd_state.cpu := C;
3) X exits, causing X's task_struct to be freed.
4) P forks a new child T, which obtains X's recycled task_struct.
T == X.
T->thread.fpsimd_state.cpu == C (inherited from P).
5) T is scheduled on C.
T's fpsimd state is not loaded, because
per_cpu(fpsimd_last_state, C) == T (== X) &&
T->thread.fpsimd_state.cpu == C.
(This is the check performed by fpsimd_thread_switch().)
So, T gets X's registers because the last registers loaded onto C
were those of X, in (2).
This patch fixes the problem by ensuring that the sched-in check
fails in (5): fpsimd_flush_task_state(T) is called when T is
forked, so that T->thread.fpsimd_state.cpu == C cannot be true.
This relies on the fact that T is not schedulable until after
copy_thread() completes.
Once T's fpsimd state has been loaded on some CPU C there may still
be other cpus D for which per_cpu(fpsimd_last_state, D) ==
&X->thread.fpsimd_state. But D is necessarily != C in this case,
and the check in (5) must fail.
An alternative fix would be to do refcounting on task_struct. This
would result in each CPU holding a reference to the last task whose
fpsimd state was loaded there. It's not clear whether this is
preferable, and it involves higher overhead than the fix proposed
in this patch. It would also move all the task_struct freeing
work into the context switch critical section, or otherwise some
deferred cleanup mechanism would need to be introduced, neither of
which seems obviously justified.
Cc: <stable@vger.kernel.org>
Fixes: 005f78cd88 ("arm64: defer reloading a task's FPSIMD state to userland resume")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: word-smithed the comment so it makes more sense]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Building the kernel with an LTO-enabled GCC spits out the following "const"
warning for the cpu_ops code:
mm/percpu.c:2168:20: error: pcpu_fc_names causes a section type conflict
with dt_supported_cpu_ops
const char * const pcpu_fc_names[PCPU_FC_NR] __initconst = {
^
arch/arm64/kernel/cpu_ops.c:34:37: note: ‘dt_supported_cpu_ops’ was declared here
static const struct cpu_operations *dt_supported_cpu_ops[] __initconst = {
Fix it by adding missed const qualifiers.
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
bus access read/write events are not supported in A73, based on the
Cortex-A73 TRM r0p2, section 11.9 Events (pages 11-457 to 11-460).
Fixes: 5561b6c5e9 "arm64: perf: add support for Cortex-A73"
Acked-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Xu YiPing <xuyiping@hisilicon.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The fpsimd_update_current_state() function is responsible for
loading the FPSIMD state from the user signal frame into the
current task during sigreturn. When implementing support for SVE,
conditional code was added to this function in order to handle the
case where SVE state need to be loaded for the task and merged with
the FPSIMD data from the signal frame; however, the FPSIMD-only
case was unintentionally dropped.
As a result of this, sigreturn does not currently restore the
FPSIMD state of the task, except in the case where the system
supports SVE and the signal frame contains SVE state in addition to
FPSIMD state.
This patch fixes this bug by making the copy-in of the FPSIMD data
from the signal frame to thread_struct unconditional.
This remains a performance regression from v4.14, since the FPSIMD
state is now copied into thread_struct and then loaded back,
instead of _only_ being loaded into the CPU FPSIMD registers.
However, it is essential to call task_fpsimd_load() here anyway in
order to ensure that the SVE enable bit in CPACR_EL1 is set
correctly before returning to userspace. This could use some
refactoring, but since sigreturn is not a fast path I have kept
this patch as a pure fix and left the refactoring for later.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: 8cd969d28f ("arm64/sve: Signal handling support")
Reported-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When building the arm64 kernel with both CONFIG_ARM64_MODULE_PLTS and
CONFIG_DYNAMIC_FTRACE enabled, the ftrace-mod.o object file is built
with the kernel and contains a trampoline that is linked into each
module, so that modules can be loaded far away from the kernel and
still reach the ftrace entry point in the core kernel with an ordinary
relative branch, as is emitted by the compiler instrumentation code
dynamic ftrace relies on.
In order to be able to build out of tree modules, this object file
needs to be included into the linux-headers or linux-devel packages,
which is undesirable, as it makes arm64 a special case (although a
precedent does exist for 32-bit PPC).
Given that the trampoline essentially consists of a PLT entry, let's
not bother with a source or object file for it, and simply patch it
in whenever the trampoline is being populated, using the existing
PLT support routines.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
To allow the ftrace trampoline code to reuse the PLT entry routines,
factor it out and move it into asm/module.h.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Plenty of acronym soup here:
- Initial support for the Scalable Vector Extension (SVE)
- Improved handling for SError interrupts (required to handle RAS events)
- Enable GCC support for 128-bit integer types
- Remove kernel text addresses from backtraces and register dumps
- Use of WFE to implement long delay()s
- ACPI IORT updates from Lorenzo Pieralisi
- Perf PMU driver for the Statistical Profiling Extension (SPE)
- Perf PMU driver for Hisilicon's system PMUs
- Misc cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJaCcLqAAoJELescNyEwWM0JREH/2FbmD/khGzEtP8LW+o9D8iV
TBM02uWQxS1bbO1pV2vb+512YQO+iWfeQwJH9Jv2FZcrMvFv7uGRnYgAnJuXNGrl
W+LL6OhN22A24LSawC437RU3Xe7GqrtONIY/yLeJBPablfcDGzPK1eHRA0pUzcyX
VlyDruSHWX44VGBPV6JRd3x0vxpV8syeKOjbRvopRfn3Nwkbd76V3YSfEgwoTG5W
ET1sOnXLmHHdeifn/l1Am5FX1FYstpcd7usUTJ4Oto8y7e09tw3bGJCD0aMJ3vow
v1pCUWohEw7fHqoPc9rTrc1QEnkdML4vjJvMPUzwyTfPrN+7uEuMIEeJierW+qE=
=0qrg
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The big highlight is support for the Scalable Vector Extension (SVE)
which required extensive ABI work to ensure we don't break existing
applications by blowing away their signal stack with the rather large
new vector context (<= 2 kbit per vector register). There's further
work to be done optimising things like exception return, but the ABI
is solid now.
Much of the line count comes from some new PMU drivers we have, but
they're pretty self-contained and I suspect we'll have more of them in
future.
Plenty of acronym soup here:
- initial support for the Scalable Vector Extension (SVE)
- improved handling for SError interrupts (required to handle RAS
events)
- enable GCC support for 128-bit integer types
- remove kernel text addresses from backtraces and register dumps
- use of WFE to implement long delay()s
- ACPI IORT updates from Lorenzo Pieralisi
- perf PMU driver for the Statistical Profiling Extension (SPE)
- perf PMU driver for Hisilicon's system PMUs
- misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
arm64: Make ARMV8_DEPRECATED depend on SYSCTL
arm64: Implement __lshrti3 library function
arm64: support __int128 on gcc 5+
arm64/sve: Add documentation
arm64/sve: Detect SVE and activate runtime support
arm64/sve: KVM: Hide SVE from CPU features exposed to guests
arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
arm64/sve: KVM: Prevent guests from using SVE
arm64/sve: Add sysctl to set the default vector length for new processes
arm64/sve: Add prctl controls for userspace vector length management
arm64/sve: ptrace and ELF coredump support
arm64/sve: Preserve SVE registers around EFI runtime service calls
arm64/sve: Preserve SVE registers around kernel-mode NEON use
arm64/sve: Probe SVE capabilities and usable vector lengths
arm64: cpufeature: Move sys_caps_initialised declarations
arm64/sve: Backend logic for setting the vector length
arm64/sve: Signal handling support
arm64/sve: Support vector length resetting for new processes
arm64/sve: Core task context handling
arm64/sve: Low-level CPU setup
...
This patch enables detection of hardware SVE support via the
cpufeatures framework, and reports its presence to the kernel and
userspace via the new ARM64_SVE cpucap and HWCAP_SVE hwcap
respectively.
Userspace can also detect SVE using ID_AA64PFR0_EL1, using the
cpufeatures MRS emulation.
When running on hardware that supports SVE, this enables runtime
kernel support for SVE, and allows user tasks to execute SVE
instructions and make of the of the SVE-specific user/kernel
interface extensions implemented by this series.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Until KVM has full SVE support, guests must not be allowed to
execute SVE instructions.
This patch enables the necessary traps, and also ensures that the
traps are disabled again on exit from the guest so that the host
can still use SVE if it wants to.
On guest exit, high bits of the SVE Zn registers may have been
clobbered as a side-effect the execution of FPSIMD instructions in
the guest. The existing KVM host FPSIMD restore code is not
sufficient to restore these bits, so this patch explicitly marks
the CPU as not containing cached vector state for any task, thus
forcing a reload on the next return to userspace. This is an
interim measure, in advance of adding full SVE awareness to KVM.
This marking of cached vector state in the CPU as invalid is done
using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c. Due
to the repeated use of this rather obscure operation, it makes
sense to factor it out as a separate helper with a clearer name.
This patch factors it out as fpsimd_flush_cpu_state(), and ports
all callers to use it.
As a side effect of this refactoring, a this_cpu_write() in
fpsimd_cpu_pm_notifier() is changed to __this_cpu_write(). This
should be fine, since cpu_pm_enter() is supposed to be called only
with interrupts disabled.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Because of the effect of SVE on the size of the signal frame, the
default vector length used for new processes involves a tradeoff
between performance of SVE-enabled software on the one hand, and
reliability of non-SVE-aware software on the other hand.
For this reason, the best choice depends on the repertoire of
userspace software in use and is thus best left up to distro
maintainers, sysadmins and developers.
If CONFIG_SYSCTL and CONFIG_PROC_SYSCTL are enabled, this patch
exposes the default vector length in
/proc/sys/abi/sve_default_vector_length, where boot scripts or the
adventurous can poke it.
In common with other arm64 ABI sysctls, this control is currently
global: setting it requires CAP_SYS_ADMIN in the root user
namespace, but the value set is effective for subsequent execs in
all namespaces. The control only affects _new_ processes, however:
changing it does not affect the vector length of any existing
process.
The intended usage model is that if userspace is known to be fully
SVE-tolerant (or a developer is curious to find out) then this
parameter can be cranked up during system startup.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds two arm64-specific prctls, to permit userspace to
control its vector length:
* PR_SVE_SET_VL: set the thread's SVE vector length and vector
length inheritance mode.
* PR_SVE_GET_VL: get the same information.
Although these prctls resemble instruction set features in the SVE
architecture, they provide additional control: the vector length
inheritance mode is Linux-specific and nothing to do with the
architecture, and the architecture does not permit EL0 to set its
own vector length directly. Both can be used in portable tools
without requiring the use of SVE instructions.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: Fixed up prctl constants to avoid clash with PDEATHSIG]
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch defines and implements a new regset NT_ARM_SVE, which
describes a thread's SVE register state. This allows a debugger to
manipulate the SVE state, as well as being included in ELF
coredumps for post-mortem debugging.
Because the regset size and layout are dependent on the thread's
current vector length, it is not possible to define a C struct to
describe the regset contents as is done for existing regsets.
Instead, and for the same reasons, NT_ARM_SVE is based on the
freeform variable-layout approach used for the SVE signal frame.
Additionally, to reduce debug overhead when debugging threads that
might or might not have live SVE register state, NT_ARM_SVE may be
presented in one of two different formats: the old struct
user_fpsimd_state format is embedded for describing the state of a
thread with no live SVE state, whereas a new variable-layout
structure is embedded for describing live SVE state. This avoids a
debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and
allows existing userspace code to handle the non-SVE case without
too much modification.
For this to work, NT_ARM_SVE is defined with a fixed-format header
of type struct user_sve_header, which the recipient can use to
figure out the content, size and layout of the reset of the regset.
Accessor macros are defined to allow the vector-length-dependent
parts of the regset to be manipulated.
Signed-off-by: Alan Hayward <alan.hayward@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Okamoto Takayuki <tokamoto@jp.fujitsu.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The EFI runtime services ABI allows EFI to make free use of the
FPSIMD registers during EFI runtime service calls, subject to the
callee-save requirements of the AArch64 procedure call standard.
However, the SVE architecture allows upper bits of the SVE vector
registers to be zeroed as a side-effect of FPSIMD V-register
writes. This means that the SVE vector registers must be saved in
their entirety in order to avoid data loss: non-SVE-aware EFI
implementations cannot restore them correctly.
The non-IRQ case is already handled gracefully by
kernel_neon_begin(). For the IRQ case, this patch allocates a
suitable per-CPU stash buffer for the full SVE register state and
uses it to preserve the affected registers around EFI calls. It is
currently unclear how the EFI runtime services ABI will be
clarified with respect to SVE, so it safest to assume that the
predicate registers and FFR must be saved and restored too.
No attempt is made to restore the restore the vector length after
a call, for now. It is deemed rather insane for EFI to change it,
and contemporary EFI implementations certainly won't.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Kernel-mode NEON will corrupt the SVE vector registers, due to the
way they alias the FPSIMD vector registers in the hardware.
This patch ensures that any live SVE register content for the task
is saved by kernel_neon_begin(). The data will be restored in the
usual way on return to userspace.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch uses the cpufeatures framework to determine common SVE
capabilities and vector lengths, and configures the runtime SVE
support code appropriately.
ZCR_ELx is not really a feature register, but it is convenient to
use it as a template for recording the maximum vector length
supported by a CPU, using the LEN field. This field is similar to
a feature field in that it is a contiguous bitfield for which we
want to determine the minimum system-wide value. This patch adds
ZCR as a pseudo-register in cpuinfo/cpufeatures, with appropriate
custom code to populate it. Finding the minimum supported value of
the LEN field is left to the cpufeatures framework in the usual
way.
The meaning of ID_AA64ZFR0_EL1 is not architecturally defined yet,
so for now we just require it to be zero.
Note that much of this code is dormant and SVE still won't be used
yet, since system_supports_sve() remains hardwired to false.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
update_cpu_features() currently cannot tell whether it is being
called during early or late secondary boot. This doesn't
desperately matter for anything it currently does.
However, SVE will need to know here whether the set of available
vector lengths is known or still to be determined when booting a
CPU, so that it can be updated appropriately.
This patch simply moves the sys_caps_initialised stuff to the top
of the file so that it can be used more widely. There doesn't seem
to be a more obvious place to put it.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch implements the core logic for changing a task's vector
length on request from userspace. This will be used by the ptrace
and prctl frontends that are implemented in later patches.
The SVE architecture permits, but does not require, implementations
to support vector lengths that are not a power of two. To handle
this, logic is added to check a requested vector length against a
possibly sparse bitmap of available vector lengths at runtime, so
that the best supported value can be chosen.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch implements support for saving and restoring the SVE
registers around signals.
A fixed-size header struct sve_context is always included in the
signal frame encoding the thread's vector length at the time of
signal delivery, optionally followed by a variable-layout structure
encoding the SVE registers.
Because of the need to preserve backwards compatibility, the FPSIMD
view of the SVE registers is always dumped as a struct
fpsimd_context in the usual way, in addition to any sve_context.
The SVE vector registers are dumped in full, including bits 127:0
of each register which alias the corresponding FPSIMD vector
registers in the hardware. To avoid any ambiguity about which
alias to restore during sigreturn, the kernel always restores bits
127:0 of each SVE vector register from the fpsimd_context in the
signal frame (which must be present): userspace needs to take this
into account if it wants to modify the SVE vector register contents
on return from a signal.
FPSR and FPCR, which are used by both FPSIMD and SVE, are not
included in sve_context because they are always present in
fpsimd_context anyway.
For signal delivery, a new helper
fpsimd_signal_preserve_current_state() is added to update _both_
the FPSIMD and SVE views in the task struct, to make it easier to
populate this information into the signal frame. Because of the
redundancy between the two views of the state, only one is updated
otherwise.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's desirable to be able to reset the vector length to some sane
default for new processes, since the new binary and its libraries
may or may not be SVE-aware.
This patch tracks the desired post-exec vector length (if any) in a
new thread member sve_vl_onexec, and adds a new thread flag
TIF_SVE_VL_INHERIT to control whether to inherit or reset the
vector length. Currently these are inactive. Subsequent patches
will provide the capability to configure them.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds the core support for switching and managing the SVE
architectural state of user tasks.
Calls to the existing FPSIMD low-level save/restore functions are
factored out as new functions task_fpsimd_{save,load}(), since SVE
now dynamically may or may not need to be handled at these points
depending on the kernel configuration, hardware features discovered
at boot, and the runtime state of the task. To make these
decisions as fast as possible, const cpucaps are used where
feasible, via the system_supports_sve() helper.
The SVE registers are only tracked for threads that have explicitly
used SVE, indicated by the new thread flag TIF_SVE. Otherwise, the
FPSIMD view of the architectural state is stored in
thread.fpsimd_state as usual.
When in use, the SVE registers are not stored directly in
thread_struct due to their potentially large and variable size.
Because the task_struct slab allocator must be configured very
early during kernel boot, it is also tricky to configure it
correctly to match the maximum vector length provided by the
hardware, since this depends on examining secondary CPUs as well as
the primary. Instead, a pointer sve_state in thread_struct points
to a dynamically allocated buffer containing the SVE register data,
and code is added to allocate and free this buffer at appropriate
times.
TIF_SVE is set when taking an SVE access trap from userspace, if
suitable hardware support has been detected. This enables SVE for
the thread: a subsequent return to userspace will disable the trap
accordingly. If such a trap is taken without sufficient system-
wide hardware support, SIGILL is sent to the thread instead as if
an undefined instruction had been executed: this may happen if
userspace tries to use SVE in a system where not all CPUs support
it for example.
The kernel will clear TIF_SVE and disable SVE for the thread
whenever an explicit syscall is made by userspace. For backwards
compatibility reasons and conformance with the spirit of the base
AArch64 procedure call standard, the subset of the SVE register
state that aliases the FPSIMD registers is still preserved across a
syscall even if this happens. The remainder of the SVE register
state logically becomes zero at syscall entry, though the actual
zeroing work is currently deferred until the thread next tries to
use SVE, causing another trap to the kernel. This implementation
is suboptimal: in the future, the fastpath case may be optimised
to zero the registers in-place and leave SVE enabled for the task,
where beneficial.
TIF_SVE is also cleared in the following slowpath cases, which are
taken as reasonable hints that the task may no longer use SVE:
* exec
* fork and clone
Code is added to sync data between thread.fpsimd_state and
thread.sve_state whenever enabling/disabling SVE, in a manner
consistent with the SVE architectural programmer's model.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: added #include to fix allnoconfig build]
[will: use enable_daif in do_sve_acc]
Signed-off-by: Will Deacon <will.deacon@arm.com>
To enable the kernel to use SVE, SVE traps from EL1 to EL2 must be
disabled. To take maximum advantage of the hardware, the full
available vector length also needs to be enabled for EL1 by
programming ZCR_EL2.LEN. (The kernel will program ZCR_EL1.LEN as
required, but this cannot override the limit set by ZCR_EL2.)
This patch makes the appropriate changes to the EL2 early setup
code.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Manipulating the SVE architectural state, including the vector and
predicate registers, first-fault register and the vector length,
requires the use of dedicated instructions added by SVE.
This patch adds suitable assembly functions for saving and
restoring the SVE registers and querying the vector length.
Setting of the vector length is done as part of register restore.
Since people building kernels may not all get an SVE-enabled
toolchain for a while, this patch uses macros that generate
explicit opcodes in place of assembler mnemonics.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SVE architecture adds some system registers, ID register fields
and a dedicated ESR exception class.
This patch adds the appropriate definitions that will be needed by
the kernel.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The existing FPSIMD context switch code contains a couple of
instances of {set,clear}_ti_thread(task_thread_info(task)). Since
there are thread flag manipulators that operate directly on
task_struct, this verbosity isn't strictly needed.
For consistency, this patch simplifies the affected calls. This
should have no impact on behaviour.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, armv8_deprected.c takes charge of the "abi" sysctl
directory, which makes life difficult for other code that wants to
register sysctls in the same directory.
There is a "new" [1] sysctl registration interface that removes the
need to define ctl_tables for parent directories explicitly, which
is ideal here.
This patch ports register_insn_emulation_sysctl() over to the
register_sysctl() interface and removes the redundant ctl_table for
"abi".
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
[1] fea478d410 (sysctl: Add register_sysctl for normal sysctl
users)
The commit message notes an intent to port users of the
pre-existing interfaces over to register_sysctl(), though the
number of users of the new interface currently appears negligible.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently sys_rt_sigreturn() verifies that the base sigframe is
readable, but no similar check is performed on the extra data to
which an extra_context record points.
This matters because the extra data will be read with the
unprotected user accessors. However, this is not a problem at
present because the extra data base address is required to be
exactly at the end of the base sigframe. So, there would need to
be a non-user-readable kernel address within about 59K
(SIGFRAME_MAXSZ - sizeof(struct rt_sigframe)) of some address for
which access_ok(VERIFY_READ) returns true, in order for sigreturn
to be able to read kernel memory that should be inaccessible to the
user task. This is currently impossible due to the untranslatable
address hole between the TTBR0 and TTBR1 address ranges.
Disappearance of the hole between the TTBR0 and TTBR1 mapping
ranges would require the VA size for TTBR0 and TTBR1 to grow to at
least 55 bits, and either the disabling of tagged pointers for
userspace or enabling of tagged pointers for kernel space; none of
which is currently envisaged.
Even so, it is wrong to use the unprotected user accessors without
an accompanying access_ok() check.
To avoid the potential for future surprises, this patch does an
explicit access_ok() check on the extra data space when parsing an
extra_context record.
Fixes: 33f082614c ("arm64: signal: Allow expansion of the signal frame")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
A couple of FPSIMD exception handling functions that are called
from entry.S are currently not annotated as such.
This is not a big deal since asmlinkage does nothing on arm/arm64,
but fixing the annotations is more consistent and may help avoid
future surprises.
This patch adds appropriate asmlinkage annotations for
do_fpsimd_acc() and do_fpsimd_exc().
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Function graph does not work currently when CONFIG_DYNAMIC_TRACE is not
set. This is because ftrace_function_trace is not always set to ftrace_stub
when function_graph is in use.
Do not skip checking of graph tracer functions when ftrace_function_trace
is set.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's possible for a user to deliberately trigger __dump_instr with a
chosen kernel address.
Let's avoid problems resulting from this by using get_user() rather than
__get_user(), ensuring that we don't erroneously access kernel memory.
Where we use __dump_instr() on kernel text, we already switch to
KERNEL_DS, so this shouldn't adversely affect those cases.
Fixes: 60ffc30d56 ("arm64: Exception handling")
Cc: stable@vger.kernel.org
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>