2019-06-03 05:44:50 +00:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0-only */
|
2012-03-05 11:49:27 +00:00
|
|
|
/*
|
|
|
|
* Low-level CPU initialisation
|
|
|
|
* Based on arch/arm/kernel/head.S
|
|
|
|
*
|
|
|
|
* Copyright (C) 1994-2002 Russell King
|
|
|
|
* Copyright (C) 2003-2012 ARM Ltd.
|
|
|
|
* Authors: Catalin Marinas <catalin.marinas@arm.com>
|
|
|
|
* Will Deacon <will.deacon@arm.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/linkage.h>
|
|
|
|
#include <linux/init.h>
|
2020-06-09 04:32:42 +00:00
|
|
|
#include <linux/pgtable.h>
|
2012-03-05 11:49:27 +00:00
|
|
|
|
arm64: simplify ptrauth initialization
Currently __cpu_setup conditionally initializes the address
authentication keys and enables them in SCTLR_EL1, doing so differently
for the primary CPU and secondary CPUs, and skipping this work for CPUs
returning from an idle state. For the latter case, cpu_do_resume
restores the keys and SCTLR_EL1 value after the MMU has been enabled.
This flow is rather difficult to follow, so instead let's move the
primary and secondary CPU initialization into their respective boot
paths. By following the example of cpu_do_resume and doing so once the
MMU is enabled, we can always initialize the keys from the values in
thread_struct, and avoid the machinery necessary to pass the keys in
secondary_data or open-coding initialization for the boot CPU.
This means we perform an additional RMW of SCTLR_EL1, but we already do
this in the cpu_do_resume path, and for other features in cpufeature.c,
so this isn't a major concern in a bringup path. Note that even while
the enable bits are clear, the key registers are accessible.
As this now renders the argument to __cpu_setup redundant, let's also
remove that entirely. Future extensions can follow a similar approach to
initialize values that differ for primary/secondary CPUs.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200423101606.37601-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-04-23 10:16:06 +00:00
|
|
|
#include <asm/asm_pointer_auth.h>
|
2012-03-05 11:49:27 +00:00
|
|
|
#include <asm/assembler.h>
|
2016-04-18 15:09:47 +00:00
|
|
|
#include <asm/boot.h>
|
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 11:00:26 +00:00
|
|
|
#include <asm/bug.h>
|
2012-03-05 11:49:27 +00:00
|
|
|
#include <asm/ptrace.h>
|
|
|
|
#include <asm/asm-offsets.h>
|
2014-03-26 18:25:55 +00:00
|
|
|
#include <asm/cache.h>
|
2012-08-29 17:32:18 +00:00
|
|
|
#include <asm/cputype.h>
|
2020-12-02 18:41:04 +00:00
|
|
|
#include <asm/el2_setup.h>
|
2016-01-26 08:13:44 +00:00
|
|
|
#include <asm/elf.h>
|
2018-11-15 05:52:46 +00:00
|
|
|
#include <asm/image.h>
|
2015-10-19 13:19:27 +00:00
|
|
|
#include <asm/kernel-pgtable.h>
|
2014-02-19 09:33:14 +00:00
|
|
|
#include <asm/kvm_arm.h>
|
2012-03-05 11:49:27 +00:00
|
|
|
#include <asm/memory.h>
|
|
|
|
#include <asm/pgtable-hwdef.h>
|
|
|
|
#include <asm/page.h>
|
2020-04-27 16:00:16 +00:00
|
|
|
#include <asm/scs.h>
|
2016-02-23 10:31:42 +00:00
|
|
|
#include <asm/smp.h>
|
2015-10-19 13:19:35 +00:00
|
|
|
#include <asm/sysreg.h>
|
|
|
|
#include <asm/thread_info.h>
|
2012-10-26 14:40:05 +00:00
|
|
|
#include <asm/virt.h>
|
2012-03-05 11:49:27 +00:00
|
|
|
|
2017-03-23 19:00:46 +00:00
|
|
|
#include "efi-header.S"
|
|
|
|
|
2020-08-25 13:54:40 +00:00
|
|
|
#if (PAGE_OFFSET & 0x1fffff) != 0
|
2014-06-24 15:51:37 +00:00
|
|
|
#error PAGE_OFFSET must be at least 2MB aligned
|
2012-03-05 11:49:27 +00:00
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Kernel startup entry point.
|
|
|
|
* ---------------------------
|
|
|
|
*
|
|
|
|
* The requirements are:
|
|
|
|
* MMU = off, D-cache = off, I-cache = on or off,
|
|
|
|
* x0 = physical address to the FDT blob.
|
|
|
|
*
|
|
|
|
* Note that the callee-saved registers are used for storing variables
|
|
|
|
* that are useful before the MMU is enabled. The allocations are described
|
|
|
|
* in the entry routines.
|
|
|
|
*/
|
|
|
|
__HEAD
|
|
|
|
/*
|
|
|
|
* DO NOT MODIFY. Image header expected by Linux boot-loaders.
|
|
|
|
*/
|
2020-11-17 12:47:29 +00:00
|
|
|
efi_signature_nop // special NOP to identity as PE/COFF executable
|
2020-03-26 17:14:23 +00:00
|
|
|
b primary_entry // branch to kernel start, magic
|
2020-08-25 13:54:40 +00:00
|
|
|
.quad 0 // Image load offset from start of RAM, little-endian
|
2015-12-26 12:48:02 +00:00
|
|
|
le64sym _kernel_size_le // Effective size of kernel image, little-endian
|
|
|
|
le64sym _kernel_flags_le // Informative flags, little-endian
|
2013-08-14 23:10:00 +00:00
|
|
|
.quad 0 // reserved
|
|
|
|
.quad 0 // reserved
|
|
|
|
.quad 0 // reserved
|
2018-11-15 05:52:46 +00:00
|
|
|
.ascii ARM64_IMAGE_MAGIC // Magic number
|
2020-11-17 12:47:29 +00:00
|
|
|
.long .Lpe_header_offset // Offset to the PE header.
|
2014-04-16 02:47:52 +00:00
|
|
|
|
2017-03-23 19:00:46 +00:00
|
|
|
__EFI_PE_HEADER
|
2012-03-05 11:49:27 +00:00
|
|
|
|
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
| .idmap.text 0xffffffc01e48e5c0 0x32c arch/arm64/mm/proc.o
| 0xffffffc01e48e5c0 idmap_cpu_replace_ttbr1
| 0xffffffc01e48e600 idmap_kpti_install_ng_mappings
| 0xffffffc01e48e800 __cpu_setup
| *fill* 0xffffffc01e48e8ec 0x4
| .idmap.text.stub
| 0xffffffc01e48e8f0 0x18 linker stubs
| 0xffffffc01e48f8f0 __idmap_text_end = .
| 0xffffffc01e48f000 . = ALIGN (0x1000)
| *fill* 0xffffffc01e48f8f0 0x710
| 0xffffffc01e490000 idmap_pg_dir = .
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 16:23:17 +00:00
|
|
|
.section ".idmap.text","a"
|
2016-03-30 15:43:07 +00:00
|
|
|
|
2016-08-31 11:05:17 +00:00
|
|
|
/*
|
|
|
|
* The following callee saved general purpose registers are used on the
|
|
|
|
* primary lowlevel boot path:
|
|
|
|
*
|
|
|
|
* Register Scope Purpose
|
2023-01-11 10:22:33 +00:00
|
|
|
* x19 primary_entry() .. start_kernel() whether we entered with the MMU on
|
2022-06-24 15:06:48 +00:00
|
|
|
* x20 primary_entry() .. __primary_switch() CPU boot mode
|
2020-03-26 17:14:23 +00:00
|
|
|
* x21 primary_entry() .. start_kernel() FDT pointer passed at boot in x0
|
2016-08-31 11:05:17 +00:00
|
|
|
*/
|
2020-03-26 17:14:23 +00:00
|
|
|
SYM_CODE_START(primary_entry)
|
2023-01-11 10:22:33 +00:00
|
|
|
bl record_mmu_state
|
2015-03-17 09:55:12 +00:00
|
|
|
bl preserve_boot_args
|
arm64: kernel: Create initial ID map from C code
The asm code that creates the initial ID map is rather intricate and
hard to follow. This is problematic because it makes adding support for
things like LPA2 or WXN more difficult than necessary. Also, it is
parameterized like the rest of the MM code to run with a configurable
number of levels, which is rather pointless, given that all AArch64 CPUs
implement support for 48-bit virtual addressing, and that many systems
exist with DRAM located outside of the 39-bit addressable range, which
is the only smaller VA size that is widely used, and we need additional
tricks to make things work in that combination.
So let's bite the bullet, and rip out all the asm macros, and fiddly
code, and replace it with a C implementation based on the newly added
routines for creating the early kernel VA mappings. And while at it,
create the initial ID map based on 48-bit virtual addressing as well,
regardless of the number of configured levels for the kernel proper.
Note that this code may execute with the MMU and caches disabled, and is
therefore not permitted to make unaligned accesses. This shouldn't
generally happen in any case for the algorithm as implemented, but to be
sure, let's pass -mstrict-align to the compiler just in case.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-66-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:07 +00:00
|
|
|
|
|
|
|
adrp x1, early_init_stack
|
|
|
|
mov sp, x1
|
|
|
|
mov x29, xzr
|
|
|
|
adrp x0, init_idmap_pg_dir
|
arm64: Enable LPA2 at boot if supported by the system
Update the early kernel mapping code to take 52-bit virtual addressing
into account based on the LPA2 feature. This is a bit more involved than
LVA (which is supported with 64k pages only), given that some page table
descriptor bits change meaning in this case.
To keep the handling in asm to a minimum, the initial ID map is still
created with 48-bit virtual addressing, which implies that the kernel
image must be loaded into 48-bit addressable physical memory. This is
currently required by the boot protocol, even though we happen to
support placement outside of that for LVA/64k based configurations.
Enabling LPA2 involves more than setting TCR.T1SZ to a lower value,
there is also a DS bit in TCR that needs to be set, and which changes
the meaning of bits [9:8] in all page table descriptors. Since we cannot
enable DS and every live page table descriptor at the same time, let's
pivot through another temporary mapping. This avoids the need to
reintroduce manipulations of the page tables with the MMU and caches
disabled.
To permit the LPA2 feature to be overridden on the kernel command line,
which may be necessary to work around silicon errata, or to deal with
mismatched features on heterogeneous SoC designs, test for CPU feature
overrides first, and only then enable LPA2.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-78-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:19 +00:00
|
|
|
mov x1, xzr
|
arm64: kernel: Create initial ID map from C code
The asm code that creates the initial ID map is rather intricate and
hard to follow. This is problematic because it makes adding support for
things like LPA2 or WXN more difficult than necessary. Also, it is
parameterized like the rest of the MM code to run with a configurable
number of levels, which is rather pointless, given that all AArch64 CPUs
implement support for 48-bit virtual addressing, and that many systems
exist with DRAM located outside of the 39-bit addressable range, which
is the only smaller VA size that is widely used, and we need additional
tricks to make things work in that combination.
So let's bite the bullet, and rip out all the asm macros, and fiddly
code, and replace it with a C implementation based on the newly added
routines for creating the early kernel VA mappings. And while at it,
create the initial ID map based on 48-bit virtual addressing as well,
regardless of the number of configured levels for the kernel proper.
Note that this code may execute with the MMU and caches disabled, and is
therefore not permitted to make unaligned accesses. This shouldn't
generally happen in any case for the algorithm as implemented, but to be
sure, let's pass -mstrict-align to the compiler just in case.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-66-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:07 +00:00
|
|
|
bl __pi_create_init_idmap
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the page tables have been populated with non-cacheable
|
|
|
|
* accesses (MMU disabled), invalidate those tables again to
|
|
|
|
* remove any speculatively loaded cache lines.
|
|
|
|
*/
|
|
|
|
cbnz x19, 0f
|
|
|
|
dmb sy
|
|
|
|
mov x1, x0 // end of used region
|
|
|
|
adrp x0, init_idmap_pg_dir
|
|
|
|
adr_l x2, dcache_inval_poc
|
|
|
|
blr x2
|
|
|
|
b 1f
|
2023-01-11 10:22:35 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we entered with the MMU and caches on, clean the ID mapped part
|
|
|
|
* of the primary boot code to the PoC so we can safely execute it with
|
|
|
|
* the MMU off.
|
|
|
|
*/
|
arm64: kernel: Create initial ID map from C code
The asm code that creates the initial ID map is rather intricate and
hard to follow. This is problematic because it makes adding support for
things like LPA2 or WXN more difficult than necessary. Also, it is
parameterized like the rest of the MM code to run with a configurable
number of levels, which is rather pointless, given that all AArch64 CPUs
implement support for 48-bit virtual addressing, and that many systems
exist with DRAM located outside of the 39-bit addressable range, which
is the only smaller VA size that is widely used, and we need additional
tricks to make things work in that combination.
So let's bite the bullet, and rip out all the asm macros, and fiddly
code, and replace it with a C implementation based on the newly added
routines for creating the early kernel VA mappings. And while at it,
create the initial ID map based on 48-bit virtual addressing as well,
regardless of the number of configured levels for the kernel proper.
Note that this code may execute with the MMU and caches disabled, and is
therefore not permitted to make unaligned accesses. This shouldn't
generally happen in any case for the algorithm as implemented, but to be
sure, let's pass -mstrict-align to the compiler just in case.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-66-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:07 +00:00
|
|
|
0: adrp x0, __idmap_text_start
|
2023-01-11 10:22:35 +00:00
|
|
|
adr_l x1, __idmap_text_end
|
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
| .idmap.text 0xffffffc01e48e5c0 0x32c arch/arm64/mm/proc.o
| 0xffffffc01e48e5c0 idmap_cpu_replace_ttbr1
| 0xffffffc01e48e600 idmap_kpti_install_ng_mappings
| 0xffffffc01e48e800 __cpu_setup
| *fill* 0xffffffc01e48e8ec 0x4
| .idmap.text.stub
| 0xffffffc01e48e8f0 0x18 linker stubs
| 0xffffffc01e48f8f0 __idmap_text_end = .
| 0xffffffc01e48f000 . = ALIGN (0x1000)
| *fill* 0xffffffc01e48f8f0 0x710
| 0xffffffc01e490000 idmap_pg_dir = .
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 16:23:17 +00:00
|
|
|
adr_l x2, dcache_clean_poc
|
|
|
|
blr x2
|
arm64: kernel: Create initial ID map from C code
The asm code that creates the initial ID map is rather intricate and
hard to follow. This is problematic because it makes adding support for
things like LPA2 or WXN more difficult than necessary. Also, it is
parameterized like the rest of the MM code to run with a configurable
number of levels, which is rather pointless, given that all AArch64 CPUs
implement support for 48-bit virtual addressing, and that many systems
exist with DRAM located outside of the 39-bit addressable range, which
is the only smaller VA size that is widely used, and we need additional
tricks to make things work in that combination.
So let's bite the bullet, and rip out all the asm macros, and fiddly
code, and replace it with a C implementation based on the newly added
routines for creating the early kernel VA mappings. And while at it,
create the initial ID map based on 48-bit virtual addressing as well,
regardless of the number of configured levels for the kernel proper.
Note that this code may execute with the MMU and caches disabled, and is
therefore not permitted to make unaligned accesses. This shouldn't
generally happen in any case for the algorithm as implemented, but to be
sure, let's pass -mstrict-align to the compiler just in case.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-66-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:07 +00:00
|
|
|
|
|
|
|
1: mov x0, x19
|
2020-11-13 12:49:23 +00:00
|
|
|
bl init_kernel_el // w0=cpu_boot_mode
|
2022-06-24 15:06:48 +00:00
|
|
|
mov x20, x0
|
2022-06-24 15:06:37 +00:00
|
|
|
|
2012-03-05 11:49:27 +00:00
|
|
|
/*
|
2015-03-18 14:55:20 +00:00
|
|
|
* The following calls CPU setup code, see arch/arm64/mm/proc.S for
|
|
|
|
* details.
|
2012-03-05 11:49:27 +00:00
|
|
|
* On return, the CPU will be ready for the MMU to be turned on and
|
|
|
|
* the TCR will have been set.
|
|
|
|
*/
|
2016-04-18 15:09:43 +00:00
|
|
|
bl __cpu_setup // initialise processor
|
2016-08-31 11:05:13 +00:00
|
|
|
b __primary_switch
|
2020-03-26 17:14:23 +00:00
|
|
|
SYM_CODE_END(primary_entry)
|
2012-03-05 11:49:27 +00:00
|
|
|
|
2023-01-11 10:22:35 +00:00
|
|
|
__INIT
|
2023-01-11 10:22:33 +00:00
|
|
|
SYM_CODE_START_LOCAL(record_mmu_state)
|
|
|
|
mrs x19, CurrentEL
|
|
|
|
cmp x19, #CurrentEL_EL2
|
|
|
|
mrs x19, sctlr_el1
|
|
|
|
b.ne 0f
|
|
|
|
mrs x19, sctlr_el2
|
2023-01-25 18:59:10 +00:00
|
|
|
0:
|
|
|
|
CPU_LE( tbnz x19, #SCTLR_ELx_EE_SHIFT, 1f )
|
|
|
|
CPU_BE( tbz x19, #SCTLR_ELx_EE_SHIFT, 1f )
|
|
|
|
tst x19, #SCTLR_ELx_C // Z := (C == 0)
|
2023-01-11 10:22:33 +00:00
|
|
|
and x19, x19, #SCTLR_ELx_M // isolate M bit
|
|
|
|
csel x19, xzr, x19, eq // clear x19 if Z
|
|
|
|
ret
|
2023-01-25 18:59:10 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set the correct endianness early so all memory accesses issued
|
|
|
|
* before init_kernel_el() occur in the correct byte order. Note that
|
|
|
|
* this means the MMU must be disabled, or the active ID map will end
|
|
|
|
* up getting interpreted with the wrong byte order.
|
|
|
|
*/
|
|
|
|
1: eor x19, x19, #SCTLR_ELx_EE
|
|
|
|
bic x19, x19, #SCTLR_ELx_M
|
|
|
|
b.ne 2f
|
|
|
|
pre_disable_mmu_workaround
|
|
|
|
msr sctlr_el2, x19
|
|
|
|
b 3f
|
2023-04-25 09:57:00 +00:00
|
|
|
2: pre_disable_mmu_workaround
|
|
|
|
msr sctlr_el1, x19
|
2023-01-25 18:59:10 +00:00
|
|
|
3: isb
|
|
|
|
mov x19, xzr
|
|
|
|
ret
|
2023-01-11 10:22:33 +00:00
|
|
|
SYM_CODE_END(record_mmu_state)
|
|
|
|
|
2015-03-17 09:55:12 +00:00
|
|
|
/*
|
|
|
|
* Preserve the arguments passed by the bootloader in x0 .. x3
|
|
|
|
*/
|
2020-02-18 19:58:34 +00:00
|
|
|
SYM_CODE_START_LOCAL(preserve_boot_args)
|
2015-03-17 09:55:12 +00:00
|
|
|
mov x21, x0 // x21=FDT
|
|
|
|
|
|
|
|
adr_l x0, boot_args // record the contents of
|
|
|
|
stp x21, x1, [x0] // x0 .. x3 at kernel entry
|
|
|
|
stp x2, x3, [x0, #16]
|
|
|
|
|
2023-01-11 10:22:33 +00:00
|
|
|
cbnz x19, 0f // skip cache invalidation if MMU is on
|
2015-03-17 09:55:12 +00:00
|
|
|
dmb sy // needed before dc ivac with
|
|
|
|
// MMU off
|
|
|
|
|
2021-05-24 08:29:53 +00:00
|
|
|
add x1, x0, #0x20 // 4 x 8 bytes
|
arm64: Rename arm64-internal cache maintenance functions
Although naming across the codebase isn't that consistent, it
tends to follow certain patterns. Moreover, the term "flush"
isn't defined in the Arm Architecture reference manual, and might
be interpreted to mean clean, invalidate, or both for a cache.
Rename arm64-internal functions to make the naming internally
consistent, as well as making it consistent with the Arm ARM, by
specifying whether it applies to the instruction, data, or both
caches, whether the operation is a clean, invalidate, or both.
Also specify which point the operation applies to, i.e., to the
point of unification (PoU), coherency (PoC), or persistence
(PoP).
This commit applies the following sed transformation to all files
under arch/arm64:
"s/\b__flush_cache_range\b/caches_clean_inval_pou_macro/g;"\
"s/\b__flush_icache_range\b/caches_clean_inval_pou/g;"\
"s/\binvalidate_icache_range\b/icache_inval_pou/g;"\
"s/\b__flush_dcache_area\b/dcache_clean_inval_poc/g;"\
"s/\b__inval_dcache_area\b/dcache_inval_poc/g;"\
"s/__clean_dcache_area_poc\b/dcache_clean_poc/g;"\
"s/\b__clean_dcache_area_pop\b/dcache_clean_pop/g;"\
"s/\b__clean_dcache_area_pou\b/dcache_clean_pou/g;"\
"s/\b__flush_cache_user_range\b/caches_clean_inval_user_pou/g;"\
"s/\b__flush_icache_all\b/icache_inval_all_pou/g;"
Note that __clean_dcache_area_poc is deliberately missing a word
boundary check at the beginning in order to match the efistub
symbols in image-vars.h.
Also note that, despite its name, __flush_icache_range operates
on both instruction and data caches. The name change here
reflects that.
No functional change intended.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-19-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-24 08:30:01 +00:00
|
|
|
b dcache_inval_poc // tail call
|
2023-01-11 10:22:33 +00:00
|
|
|
0: str_l x19, mmu_enabled_at_boot, x0
|
|
|
|
ret
|
2020-02-18 19:58:34 +00:00
|
|
|
SYM_CODE_END(preserve_boot_args)
|
2015-03-17 09:55:12 +00:00
|
|
|
|
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 11:00:26 +00:00
|
|
|
/*
|
2021-05-20 11:50:30 +00:00
|
|
|
* Initialize CPU registers with task-specific and cpu-specific context.
|
|
|
|
*
|
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 11:00:26 +00:00
|
|
|
* Create a final frame record at task_pt_regs(current)->stackframe, so
|
|
|
|
* that the unwinder can identify the final frame record of any task by
|
|
|
|
* its location in the task stack. We reserve the entire pt_regs space
|
|
|
|
* for consistency with user tasks and kthreads.
|
|
|
|
*/
|
2021-05-20 11:50:31 +00:00
|
|
|
.macro init_cpu_task tsk, tmp1, tmp2
|
2021-05-20 11:50:30 +00:00
|
|
|
msr sp_el0, \tsk
|
|
|
|
|
2021-05-20 11:50:31 +00:00
|
|
|
ldr \tmp1, [\tsk, #TSK_STACK]
|
|
|
|
add sp, \tmp1, #THREAD_SIZE
|
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 11:00:26 +00:00
|
|
|
sub sp, sp, #PT_REGS_SIZE
|
2021-05-20 11:50:30 +00:00
|
|
|
|
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 11:00:26 +00:00
|
|
|
stp xzr, xzr, [sp, #S_STACKFRAME]
|
|
|
|
add x29, sp, #S_STACKFRAME
|
2021-05-20 11:50:30 +00:00
|
|
|
|
2023-01-09 17:47:59 +00:00
|
|
|
scs_load_current
|
2021-05-20 11:50:31 +00:00
|
|
|
|
|
|
|
adr_l \tmp1, __per_cpu_offset
|
2021-09-14 12:10:33 +00:00
|
|
|
ldr w\tmp2, [\tsk, #TSK_TI_CPU]
|
2021-05-20 11:50:31 +00:00
|
|
|
ldr \tmp1, [\tmp1, \tmp2, lsl #3]
|
|
|
|
set_this_cpu_offset \tmp1
|
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 11:00:26 +00:00
|
|
|
.endm
|
|
|
|
|
2014-11-21 21:50:41 +00:00
|
|
|
/*
|
2015-03-04 10:51:48 +00:00
|
|
|
* The following fragment of code is executed with the MMU enabled.
|
2016-08-31 11:05:15 +00:00
|
|
|
*
|
2022-06-29 04:12:07 +00:00
|
|
|
* x0 = __pa(KERNEL_START)
|
2014-11-21 21:50:41 +00:00
|
|
|
*/
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_START_LOCAL(__primary_switched)
|
2021-05-20 11:50:30 +00:00
|
|
|
adr_l x4, init_task
|
2021-05-20 11:50:31 +00:00
|
|
|
init_cpu_task x4, x5, x6
|
2016-08-31 11:05:16 +00:00
|
|
|
|
2015-12-26 11:46:40 +00:00
|
|
|
adr_l x8, vectors // load VBAR_EL1 with virtual
|
|
|
|
msr vbar_el1, x8 // vector table address
|
|
|
|
isb
|
|
|
|
|
2021-05-20 11:50:30 +00:00
|
|
|
stp x29, x30, [sp, #-16]!
|
2016-08-31 11:05:16 +00:00
|
|
|
mov x29, sp
|
|
|
|
|
2016-08-31 11:05:15 +00:00
|
|
|
str_l x21, __fdt_pointer, x5 // Save FDT pointer
|
|
|
|
|
2023-11-29 11:15:59 +00:00
|
|
|
adrp x4, _text // Save the offset between
|
2016-08-31 11:05:15 +00:00
|
|
|
sub x4, x4, x0 // the kernel virtual and
|
|
|
|
str_l x4, kimage_voffset, x5 // physical mappings
|
|
|
|
|
2022-06-24 15:06:48 +00:00
|
|
|
mov x0, x20
|
|
|
|
bl set_cpu_boot_mode_flag
|
|
|
|
|
2020-12-22 20:02:06 +00:00
|
|
|
#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
|
2015-10-12 15:52:58 +00:00
|
|
|
bl kasan_early_init
|
2022-10-27 15:59:08 +00:00
|
|
|
#endif
|
2022-06-24 15:06:48 +00:00
|
|
|
mov x0, x20
|
2022-06-30 16:04:52 +00:00
|
|
|
bl finalise_el2 // Prefer VHE if possible
|
2021-05-20 11:50:30 +00:00
|
|
|
ldp x29, x30, [sp], #16
|
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 11:00:26 +00:00
|
|
|
bl start_kernel
|
|
|
|
ASM_BUG()
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_END(__primary_switched)
|
2014-11-21 21:50:41 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* end early head section, begin head code that is also used for
|
|
|
|
* hotplug and needs to have the same protections as the text region
|
|
|
|
*/
|
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
| .idmap.text 0xffffffc01e48e5c0 0x32c arch/arm64/mm/proc.o
| 0xffffffc01e48e5c0 idmap_cpu_replace_ttbr1
| 0xffffffc01e48e600 idmap_kpti_install_ng_mappings
| 0xffffffc01e48e800 __cpu_setup
| *fill* 0xffffffc01e48e8ec 0x4
| .idmap.text.stub
| 0xffffffc01e48e8f0 0x18 linker stubs
| 0xffffffc01e48f8f0 __idmap_text_end = .
| 0xffffffc01e48f000 . = ALIGN (0x1000)
| *fill* 0xffffffc01e48f8f0 0x710
| 0xffffffc01e490000 idmap_pg_dir = .
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 16:23:17 +00:00
|
|
|
.section ".idmap.text","a"
|
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 13:12:01 +00:00
|
|
|
|
2012-03-05 11:49:27 +00:00
|
|
|
/*
|
2020-11-13 12:49:23 +00:00
|
|
|
* Starting from EL2 or EL1, configure the CPU to execute at the highest
|
|
|
|
* reachable EL supported by the kernel in a chosen default state. If dropping
|
|
|
|
* from EL2 to EL1, configure EL2 before configuring EL1.
|
2013-10-11 13:52:16 +00:00
|
|
|
*
|
2020-11-13 12:49:25 +00:00
|
|
|
* Since we cannot always rely on ERET synchronizing writes to sysregs (e.g. if
|
|
|
|
* SCTLR_ELx.EOS is clear), we place an ISB prior to ERET.
|
2013-10-11 13:52:16 +00:00
|
|
|
*
|
2022-06-30 16:04:53 +00:00
|
|
|
* Returns either BOOT_CPU_MODE_EL1 or BOOT_CPU_MODE_EL2 in x0 if
|
|
|
|
* booted in EL1 or EL2 respectively, with the top 32 bits containing
|
|
|
|
* potential context flags. These flags are *not* stored in __boot_cpu_mode.
|
2023-01-11 10:22:35 +00:00
|
|
|
*
|
|
|
|
* x0: whether we are being called from the primary boot path with the MMU on
|
2012-03-05 11:49:27 +00:00
|
|
|
*/
|
2020-11-13 12:49:23 +00:00
|
|
|
SYM_FUNC_START(init_kernel_el)
|
2023-01-11 10:22:35 +00:00
|
|
|
mrs x1, CurrentEL
|
|
|
|
cmp x1, #CurrentEL_EL2
|
2020-11-13 12:49:25 +00:00
|
|
|
b.eq init_el2
|
|
|
|
|
|
|
|
SYM_INNER_LABEL(init_el1, SYM_L_LOCAL)
|
2021-04-08 13:10:09 +00:00
|
|
|
mov_q x0, INIT_SCTLR_EL1_MMU_OFF
|
2023-01-11 10:22:33 +00:00
|
|
|
pre_disable_mmu_workaround
|
2021-04-08 13:10:09 +00:00
|
|
|
msr sctlr_el1, x0
|
2013-10-11 13:52:17 +00:00
|
|
|
isb
|
2020-11-13 12:49:25 +00:00
|
|
|
mov_q x0, INIT_PSTATE_EL1
|
|
|
|
msr spsr_el1, x0
|
|
|
|
msr elr_el1, lr
|
|
|
|
mov w0, #BOOT_CPU_MODE_EL1
|
|
|
|
eret
|
2012-03-05 11:49:27 +00:00
|
|
|
|
2020-11-13 12:49:25 +00:00
|
|
|
SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
|
2023-01-11 10:22:35 +00:00
|
|
|
msr elr_el2, lr
|
|
|
|
|
|
|
|
// clean all HYP code to the PoC if we booted at EL2 with the MMU on
|
|
|
|
cbz x0, 0f
|
|
|
|
adrp x0, __hyp_idmap_text_start
|
|
|
|
adr_l x1, __hyp_text_end
|
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
| .idmap.text 0xffffffc01e48e5c0 0x32c arch/arm64/mm/proc.o
| 0xffffffc01e48e5c0 idmap_cpu_replace_ttbr1
| 0xffffffc01e48e600 idmap_kpti_install_ng_mappings
| 0xffffffc01e48e800 __cpu_setup
| *fill* 0xffffffc01e48e8ec 0x4
| .idmap.text.stub
| 0xffffffc01e48e8f0 0x18 linker stubs
| 0xffffffc01e48f8f0 __idmap_text_end = .
| 0xffffffc01e48f000 . = ALIGN (0x1000)
| *fill* 0xffffffc01e48f8f0 0x710
| 0xffffffc01e490000 idmap_pg_dir = .
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 16:23:17 +00:00
|
|
|
adr_l x2, dcache_clean_poc
|
|
|
|
blr x2
|
2024-04-15 07:54:15 +00:00
|
|
|
|
|
|
|
mov_q x0, INIT_SCTLR_EL2_MMU_OFF
|
|
|
|
pre_disable_mmu_workaround
|
|
|
|
msr sctlr_el2, x0
|
|
|
|
isb
|
2023-01-11 10:22:35 +00:00
|
|
|
0:
|
2020-12-02 18:41:04 +00:00
|
|
|
mov_q x0, HCR_HOST_NVHE_FLAGS
|
2024-03-21 11:54:14 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Compliant CPUs advertise their VHE-onlyness with
|
|
|
|
* ID_AA64MMFR4_EL1.E2H0 < 0. HCR_EL2.E2H can be
|
|
|
|
* RES1 in that case. Publish the E2H bit early so that
|
|
|
|
* it can be picked up by the init_el2_state macro.
|
|
|
|
*
|
|
|
|
* Fruity CPUs seem to have HCR_EL2.E2H set to RAO/WI, but
|
|
|
|
* don't advertise it (they predate this relaxation).
|
|
|
|
*/
|
|
|
|
mrs_s x1, SYS_ID_AA64MMFR4_EL1
|
|
|
|
tbz x1, #(ID_AA64MMFR4_EL1_E2H0_SHIFT + ID_AA64MMFR4_EL1_E2H0_WIDTH - 1), 1f
|
|
|
|
|
|
|
|
orr x0, x0, #HCR_E2H
|
|
|
|
1:
|
2020-12-02 18:41:04 +00:00
|
|
|
msr hcr_el2, x0
|
2017-10-31 15:51:04 +00:00
|
|
|
isb
|
2020-12-02 18:41:04 +00:00
|
|
|
|
2021-02-08 09:57:17 +00:00
|
|
|
init_el2_state
|
2017-10-31 15:51:04 +00:00
|
|
|
|
2012-10-19 16:46:27 +00:00
|
|
|
/* Hypervisor stub */
|
2020-12-02 18:41:04 +00:00
|
|
|
adr_l x0, __hyp_stub_vectors
|
2012-10-19 16:46:27 +00:00
|
|
|
msr vbar_el2, x0
|
2020-11-13 12:49:25 +00:00
|
|
|
isb
|
2020-12-02 18:41:04 +00:00
|
|
|
|
2022-06-30 16:04:54 +00:00
|
|
|
mov_q x1, INIT_SCTLR_EL1_MMU_OFF
|
|
|
|
|
2021-04-08 13:10:09 +00:00
|
|
|
mrs x0, hcr_el2
|
|
|
|
and x0, x0, #HCR_E2H
|
2024-01-22 18:13:41 +00:00
|
|
|
cbz x0, 2f
|
2024-03-21 11:54:14 +00:00
|
|
|
|
2022-06-30 16:04:54 +00:00
|
|
|
/* Set a sane SCTLR_EL1, the VHE way */
|
|
|
|
msr_s SYS_SCTLR_EL12, x1
|
|
|
|
mov x2, #BOOT_CPU_FLAG_E2H
|
2024-01-22 18:13:41 +00:00
|
|
|
b 3f
|
2021-04-08 13:10:09 +00:00
|
|
|
|
2024-01-22 18:13:41 +00:00
|
|
|
2:
|
2022-06-30 16:04:54 +00:00
|
|
|
msr sctlr_el1, x1
|
|
|
|
mov x2, xzr
|
2024-01-22 18:13:41 +00:00
|
|
|
3:
|
2023-06-14 15:51:29 +00:00
|
|
|
__init_el2_nvhe_prepare_eret
|
|
|
|
|
2020-11-13 12:49:25 +00:00
|
|
|
mov w0, #BOOT_CPU_MODE_EL2
|
2022-06-30 16:04:54 +00:00
|
|
|
orr x0, x0, x2
|
2012-03-05 11:49:27 +00:00
|
|
|
eret
|
2020-11-13 12:49:23 +00:00
|
|
|
SYM_FUNC_END(init_kernel_el)
|
2012-03-05 11:49:27 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* This provides a "holding pen" for platforms to hold all secondary
|
|
|
|
* cores are held until we're ready for them to initialise.
|
|
|
|
*/
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_START(secondary_holding_pen)
|
2023-01-11 10:22:35 +00:00
|
|
|
mov x0, xzr
|
2020-11-13 12:49:23 +00:00
|
|
|
bl init_kernel_el // w0=cpu_boot_mode
|
2022-06-24 15:06:48 +00:00
|
|
|
mrs x2, mpidr_el1
|
2016-04-18 15:09:45 +00:00
|
|
|
mov_q x1, MPIDR_HWID_BITMASK
|
2022-06-24 15:06:48 +00:00
|
|
|
and x2, x2, x1
|
2015-03-10 14:00:03 +00:00
|
|
|
adr_l x3, secondary_holding_pen_release
|
2012-03-05 11:49:27 +00:00
|
|
|
pen: ldr x4, [x3]
|
2022-06-24 15:06:48 +00:00
|
|
|
cmp x4, x2
|
2012-03-05 11:49:27 +00:00
|
|
|
b.eq secondary_startup
|
|
|
|
wfe
|
|
|
|
b pen
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_END(secondary_holding_pen)
|
arm64: factor out spin-table boot method
The arm64 kernel has an internal holding pen, which is necessary for
some systems where we can't bring CPUs online individually and must hold
multiple CPUs in a safe area until the kernel is able to handle them.
The current SMP infrastructure for arm64 is closely coupled to this
holding pen, and alternative boot methods must launch CPUs into the pen,
where they sit before they are launched into the kernel proper.
With PSCI (and possibly other future boot methods), we can bring CPUs
online individually, and need not perform the secondary_holding_pen
dance. Instead, this patch factors the holding pen management code out
to the spin-table boot method code, as it is the only boot method
requiring the pen.
A new entry point for secondaries, secondary_entry is added for other
boot methods to use, which bypasses the holding pen and its associated
overhead when bringing CPUs online. The smp.pen.text section is also
removed, as the pen can live in head.text without problem.
The cpu_operations structure is extended with two new functions,
cpu_boot and cpu_postboot, for bringing a cpu into the kernel and
performing any post-boot cleanup required by a bootmethod (e.g.
resetting the secondary_holding_pen_release to INVALID_HWID).
Documentation is added for cpu_operations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-10-24 19:30:16 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Secondary entry point that jumps straight into the kernel. Only to
|
|
|
|
* be used where CPUs are brought online dynamically by the kernel.
|
|
|
|
*/
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_START(secondary_entry)
|
2023-01-11 10:22:35 +00:00
|
|
|
mov x0, xzr
|
2020-11-13 12:49:23 +00:00
|
|
|
bl init_kernel_el // w0=cpu_boot_mode
|
arm64: factor out spin-table boot method
The arm64 kernel has an internal holding pen, which is necessary for
some systems where we can't bring CPUs online individually and must hold
multiple CPUs in a safe area until the kernel is able to handle them.
The current SMP infrastructure for arm64 is closely coupled to this
holding pen, and alternative boot methods must launch CPUs into the pen,
where they sit before they are launched into the kernel proper.
With PSCI (and possibly other future boot methods), we can bring CPUs
online individually, and need not perform the secondary_holding_pen
dance. Instead, this patch factors the holding pen management code out
to the spin-table boot method code, as it is the only boot method
requiring the pen.
A new entry point for secondaries, secondary_entry is added for other
boot methods to use, which bypasses the holding pen and its associated
overhead when bringing CPUs online. The smp.pen.text section is also
removed, as the pen can live in head.text without problem.
The cpu_operations structure is extended with two new functions,
cpu_boot and cpu_postboot, for bringing a cpu into the kernel and
performing any post-boot cleanup required by a bootmethod (e.g.
resetting the secondary_holding_pen_release to INVALID_HWID).
Documentation is added for cpu_operations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-10-24 19:30:16 +00:00
|
|
|
b secondary_startup
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_END(secondary_entry)
|
2012-03-05 11:49:27 +00:00
|
|
|
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_START_LOCAL(secondary_startup)
|
2012-03-05 11:49:27 +00:00
|
|
|
/*
|
|
|
|
* Common entry point for secondary CPUs.
|
|
|
|
*/
|
2022-06-24 15:06:48 +00:00
|
|
|
mov x20, x0 // preserve boot mode
|
arm64: mm: Handle LVA support as a CPU feature
Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.
Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.
On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.
Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:11 +00:00
|
|
|
|
|
|
|
#ifdef CONFIG_ARM64_VA_BITS_52
|
|
|
|
alternative_if ARM64_HAS_VA52
|
2018-12-06 22:50:40 +00:00
|
|
|
bl __cpu_secondary_check52bitva
|
arm64: mm: Handle LVA support as a CPU feature
Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.
Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.
On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.
Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:11 +00:00
|
|
|
alternative_else_nop_endif
|
2022-07-01 11:10:45 +00:00
|
|
|
#endif
|
arm64: mm: Handle LVA support as a CPU feature
Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.
Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.
On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.
Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:11 +00:00
|
|
|
|
2015-03-18 14:55:20 +00:00
|
|
|
bl __cpu_setup // initialise processor
|
2018-09-24 13:51:13 +00:00
|
|
|
adrp x1, swapper_pg_dir
|
2022-06-24 15:06:39 +00:00
|
|
|
adrp x2, idmap_pg_dir
|
2016-08-31 11:05:14 +00:00
|
|
|
bl __enable_mmu
|
|
|
|
ldr x8, =__secondary_switched
|
|
|
|
br x8
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_END(secondary_startup)
|
2012-03-05 11:49:27 +00:00
|
|
|
|
2023-01-11 10:22:32 +00:00
|
|
|
.text
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_START_LOCAL(__secondary_switched)
|
2022-06-24 15:06:48 +00:00
|
|
|
mov x0, x20
|
|
|
|
bl set_cpu_boot_mode_flag
|
2023-01-11 10:22:31 +00:00
|
|
|
|
|
|
|
mov x0, x20
|
|
|
|
bl finalise_el2
|
|
|
|
|
2022-06-24 15:06:48 +00:00
|
|
|
str_l xzr, __early_cpu_boot_status, x3
|
2015-12-26 11:46:40 +00:00
|
|
|
adr_l x5, vectors
|
|
|
|
msr vbar_el1, x5
|
|
|
|
isb
|
|
|
|
|
2016-02-23 10:31:42 +00:00
|
|
|
adr_l x0, secondary_data
|
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-03 20:23:13 +00:00
|
|
|
ldr x2, [x0, #CPU_BOOT_TASK]
|
2019-08-27 13:36:38 +00:00
|
|
|
cbz x2, __secondary_too_slow
|
2021-05-20 11:50:29 +00:00
|
|
|
|
2021-05-20 11:50:31 +00:00
|
|
|
init_cpu_task x2, x1, x3
|
arm64: simplify ptrauth initialization
Currently __cpu_setup conditionally initializes the address
authentication keys and enables them in SCTLR_EL1, doing so differently
for the primary CPU and secondary CPUs, and skipping this work for CPUs
returning from an idle state. For the latter case, cpu_do_resume
restores the keys and SCTLR_EL1 value after the MMU has been enabled.
This flow is rather difficult to follow, so instead let's move the
primary and secondary CPU initialization into their respective boot
paths. By following the example of cpu_do_resume and doing so once the
MMU is enabled, we can always initialize the keys from the values in
thread_struct, and avoid the machinery necessary to pass the keys in
secondary_data or open-coding initialization for the boot CPU.
This means we perform an additional RMW of SCTLR_EL1, but we already do
this in the cpu_do_resume path, and for other features in cpufeature.c,
so this isn't a major concern in a bringup path. Note that even while
the enable bits are clear, the key registers are accessible.
As this now renders the argument to __cpu_setup redundant, let's also
remove that entirely. Future extensions can follow a similar approach to
initialize values that differ for primary/secondary CPUs.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200423101606.37601-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-04-23 10:16:06 +00:00
|
|
|
|
|
|
|
#ifdef CONFIG_ARM64_PTR_AUTH
|
|
|
|
ptrauth_keys_init_cpu x2, x3, x4, x5
|
|
|
|
#endif
|
|
|
|
|
arm64: Implement stack trace termination record
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 11:00:26 +00:00
|
|
|
bl secondary_start_kernel
|
|
|
|
ASM_BUG()
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_END(__secondary_switched)
|
2012-03-05 11:49:27 +00:00
|
|
|
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_START_LOCAL(__secondary_too_slow)
|
2019-08-27 13:36:38 +00:00
|
|
|
wfe
|
|
|
|
wfi
|
|
|
|
b __secondary_too_slow
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_END(__secondary_too_slow)
|
2019-08-27 13:36:38 +00:00
|
|
|
|
2023-01-11 10:22:32 +00:00
|
|
|
/*
|
|
|
|
* Sets the __boot_cpu_mode flag depending on the CPU boot mode passed
|
|
|
|
* in w0. See arch/arm64/include/asm/virt.h for more info.
|
|
|
|
*/
|
|
|
|
SYM_FUNC_START_LOCAL(set_cpu_boot_mode_flag)
|
|
|
|
adr_l x1, __boot_cpu_mode
|
|
|
|
cmp w0, #BOOT_CPU_MODE_EL2
|
|
|
|
b.ne 1f
|
|
|
|
add x1, x1, #4
|
|
|
|
1: str w0, [x1] // Save CPU boot mode
|
|
|
|
ret
|
|
|
|
SYM_FUNC_END(set_cpu_boot_mode_flag)
|
|
|
|
|
2016-02-23 10:31:42 +00:00
|
|
|
/*
|
|
|
|
* The booting CPU updates the failed status @__early_cpu_boot_status,
|
|
|
|
* with MMU turned off.
|
|
|
|
*
|
|
|
|
* update_early_cpu_boot_status tmp, status
|
|
|
|
* - Corrupts tmp1, tmp2
|
|
|
|
* - Writes 'status' to __early_cpu_boot_status and makes sure
|
|
|
|
* it is committed to memory.
|
|
|
|
*/
|
|
|
|
|
|
|
|
.macro update_early_cpu_boot_status status, tmp1, tmp2
|
|
|
|
mov \tmp2, #\status
|
arm64: fix invalidation of wrong __early_cpu_boot_status cacheline
In head.S, the str_l macro, which takes a source register, a symbol name
and a temp register, is used to store a status value to the variable
__early_cpu_boot_status. Subsequently, the value of the temp register is
reused to invalidate any cachelines covering this variable.
However, since str_l resolves to
adrp \tmp, \sym
str \src, [\tmp, :lo12:\sym]
the temp register never actually holds the address of the variable but
only of the 4 KB window that covers it, and reusing it leads to the
wrong cacheline being invalidated. So instead, take the address
explicitly before doing the store, and reuse that value to perform
the cache invalidation.
Fixes: bb9052744f4b ("arm64: Handle early CPU boot failures")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-04-15 10:11:21 +00:00
|
|
|
adr_l \tmp1, __early_cpu_boot_status
|
|
|
|
str \tmp2, [\tmp1]
|
2016-02-23 10:31:42 +00:00
|
|
|
dmb sy
|
|
|
|
dc ivac, \tmp1 // Invalidate potentially stale cache line
|
|
|
|
.endm
|
|
|
|
|
2012-03-05 11:49:27 +00:00
|
|
|
/*
|
2015-03-17 07:59:53 +00:00
|
|
|
* Enable the MMU.
|
2012-03-05 11:49:27 +00:00
|
|
|
*
|
2015-03-17 07:59:53 +00:00
|
|
|
* x0 = SCTLR_EL1 value for turning on the MMU.
|
2018-09-24 13:51:13 +00:00
|
|
|
* x1 = TTBR1_EL1 value
|
2022-06-24 15:06:39 +00:00
|
|
|
* x2 = ID map root table address
|
2015-03-17 07:59:53 +00:00
|
|
|
*
|
2016-08-31 11:05:14 +00:00
|
|
|
* Returns to the caller via x30/lr. This requires the caller to be covered
|
|
|
|
* by the .idmap.text section.
|
2015-10-19 13:19:35 +00:00
|
|
|
*
|
|
|
|
* Checks if the selected granule size is supported by the CPU.
|
|
|
|
* If it isn't, park the CPU
|
2012-03-05 11:49:27 +00:00
|
|
|
*/
|
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
| .idmap.text 0xffffffc01e48e5c0 0x32c arch/arm64/mm/proc.o
| 0xffffffc01e48e5c0 idmap_cpu_replace_ttbr1
| 0xffffffc01e48e600 idmap_kpti_install_ng_mappings
| 0xffffffc01e48e800 __cpu_setup
| *fill* 0xffffffc01e48e8ec 0x4
| .idmap.text.stub
| 0xffffffc01e48e8f0 0x18 linker stubs
| 0xffffffc01e48f8f0 __idmap_text_end = .
| 0xffffffc01e48f000 . = ALIGN (0x1000)
| *fill* 0xffffffc01e48f8f0 0x710
| 0xffffffc01e490000 idmap_pg_dir = .
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Reported-by: syzbot+f8ac312e31226e23302b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-arm-kernel/00000000000028ea4105f4e2ef54@google.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230220162317.1581208-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-20 16:23:17 +00:00
|
|
|
.section ".idmap.text","a"
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_START(__enable_mmu)
|
2022-06-24 15:06:39 +00:00
|
|
|
mrs x3, ID_AA64MMFR0_EL1
|
2022-09-05 22:54:01 +00:00
|
|
|
ubfx x3, x3, #ID_AA64MMFR0_EL1_TGRAN_SHIFT, 4
|
|
|
|
cmp x3, #ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN
|
2021-03-10 05:53:10 +00:00
|
|
|
b.lt __no_granule_support
|
2022-09-05 22:54:01 +00:00
|
|
|
cmp x3, #ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX
|
2021-03-10 05:53:10 +00:00
|
|
|
b.gt __no_granule_support
|
2018-09-24 13:51:13 +00:00
|
|
|
phys_to_ttbr x2, x2
|
|
|
|
msr ttbr0_el1, x2 // load TTBR0
|
2022-06-24 15:06:46 +00:00
|
|
|
load_ttbr1 x1, x1, x3
|
2021-02-08 09:57:12 +00:00
|
|
|
|
|
|
|
set_sctlr_el1 x0
|
|
|
|
|
2016-08-31 11:05:14 +00:00
|
|
|
ret
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_END(__enable_mmu)
|
2015-10-19 13:19:35 +00:00
|
|
|
|
arm64: mm: Handle LVA support as a CPU feature
Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.
Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.
On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.
Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:11 +00:00
|
|
|
#ifdef CONFIG_ARM64_VA_BITS_52
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_START(__cpu_secondary_check52bitva)
|
arm64: Enable LPA2 at boot if supported by the system
Update the early kernel mapping code to take 52-bit virtual addressing
into account based on the LPA2 feature. This is a bit more involved than
LVA (which is supported with 64k pages only), given that some page table
descriptor bits change meaning in this case.
To keep the handling in asm to a minimum, the initial ID map is still
created with 48-bit virtual addressing, which implies that the kernel
image must be loaded into 48-bit addressable physical memory. This is
currently required by the boot protocol, even though we happen to
support placement outside of that for LVA/64k based configurations.
Enabling LPA2 involves more than setting TCR.T1SZ to a lower value,
there is also a DS bit in TCR that needs to be set, and which changes
the meaning of bits [9:8] in all page table descriptors. Since we cannot
enable DS and every live page table descriptor at the same time, let's
pivot through another temporary mapping. This avoids the need to
reintroduce manipulations of the page tables with the MMU and caches
disabled.
To permit the LPA2 feature to be overridden on the kernel command line,
which may be necessary to work around silicon errata, or to deal with
mismatched features on heterogeneous SoC designs, test for CPU feature
overrides first, and only then enable LPA2.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-78-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:19 +00:00
|
|
|
#ifndef CONFIG_ARM64_LPA2
|
2018-12-06 22:50:40 +00:00
|
|
|
mrs_s x0, SYS_ID_AA64MMFR2_EL1
|
2023-07-11 09:20:55 +00:00
|
|
|
and x0, x0, ID_AA64MMFR2_EL1_VARange_MASK
|
2018-12-06 22:50:40 +00:00
|
|
|
cbnz x0, 2f
|
arm64: Enable LPA2 at boot if supported by the system
Update the early kernel mapping code to take 52-bit virtual addressing
into account based on the LPA2 feature. This is a bit more involved than
LVA (which is supported with 64k pages only), given that some page table
descriptor bits change meaning in this case.
To keep the handling in asm to a minimum, the initial ID map is still
created with 48-bit virtual addressing, which implies that the kernel
image must be loaded into 48-bit addressable physical memory. This is
currently required by the boot protocol, even though we happen to
support placement outside of that for LVA/64k based configurations.
Enabling LPA2 involves more than setting TCR.T1SZ to a lower value,
there is also a DS bit in TCR that needs to be set, and which changes
the meaning of bits [9:8] in all page table descriptors. Since we cannot
enable DS and every live page table descriptor at the same time, let's
pivot through another temporary mapping. This avoids the need to
reintroduce manipulations of the page tables with the MMU and caches
disabled.
To permit the LPA2 feature to be overridden on the kernel command line,
which may be necessary to work around silicon errata, or to deal with
mismatched features on heterogeneous SoC designs, test for CPU feature
overrides first, and only then enable LPA2.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-78-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:19 +00:00
|
|
|
#else
|
|
|
|
mrs x0, id_aa64mmfr0_el1
|
|
|
|
sbfx x0, x0, #ID_AA64MMFR0_EL1_TGRAN_SHIFT, 4
|
|
|
|
cmp x0, #ID_AA64MMFR0_EL1_TGRAN_LPA2
|
|
|
|
b.ge 2f
|
|
|
|
#endif
|
2018-12-06 22:50:40 +00:00
|
|
|
|
2018-12-10 14:21:13 +00:00
|
|
|
update_early_cpu_boot_status \
|
|
|
|
CPU_STUCK_IN_KERNEL | CPU_STUCK_REASON_52_BIT_VA, x0, x1
|
2018-12-06 22:50:40 +00:00
|
|
|
1: wfe
|
|
|
|
wfi
|
|
|
|
b 1b
|
|
|
|
|
|
|
|
2: ret
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_END(__cpu_secondary_check52bitva)
|
arm64: mm: Handle LVA support as a CPU feature
Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.
Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.
On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.
Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:11 +00:00
|
|
|
#endif
|
2018-12-06 22:50:40 +00:00
|
|
|
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_START_LOCAL(__no_granule_support)
|
2016-02-23 10:31:42 +00:00
|
|
|
/* Indicate that this CPU can't boot and is stuck in the kernel */
|
2018-12-10 14:21:13 +00:00
|
|
|
update_early_cpu_boot_status \
|
|
|
|
CPU_STUCK_IN_KERNEL | CPU_STUCK_REASON_NO_GRAN, x1, x2
|
2016-02-23 10:31:42 +00:00
|
|
|
1:
|
2015-10-19 13:19:35 +00:00
|
|
|
wfe
|
2016-02-23 10:31:42 +00:00
|
|
|
wfi
|
2016-08-31 11:05:13 +00:00
|
|
|
b 1b
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_END(__no_granule_support)
|
2016-04-18 15:09:42 +00:00
|
|
|
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_START_LOCAL(__primary_switch)
|
2022-06-24 15:06:47 +00:00
|
|
|
adrp x1, reserved_pg_dir
|
2022-06-24 15:06:42 +00:00
|
|
|
adrp x2, init_idmap_pg_dir
|
2016-08-31 11:05:14 +00:00
|
|
|
bl __enable_mmu
|
2024-02-14 12:28:52 +00:00
|
|
|
|
2024-02-14 12:28:49 +00:00
|
|
|
adrp x1, early_init_stack
|
arm64: head: avoid relocating the kernel twice for KASLR
Currently, when KASLR is in effect, we set up the kernel virtual address
space twice: the first time, the KASLR seed is looked up in the device
tree, and the kernel virtual mapping is torn down and recreated again,
after which the relocations are applied a second time. The latter step
means that statically initialized global pointer variables will be reset
to their initial values, and to ensure that BSS variables are not set to
values based on the initial translation, they are cleared again as well.
All of this is needed because we need the command line (taken from the
DT) to tell us whether or not to randomize the virtual address space
before entering the kernel proper. However, this code has expanded
little by little and now creates global state unrelated to the virtual
randomization of the kernel before the mapping is torn down and set up
again, and the BSS cleared for a second time. This has created some
issues in the past, and it would be better to avoid this little dance if
possible.
So instead, let's use the temporary mapping of the device tree, and
execute the bare minimum of code to decide whether or not KASLR should
be enabled, and what the seed is. Only then, create the virtual kernel
mapping, clear BSS, etc and proceed as normal. This avoids the issues
around inconsistent global state due to BSS being cleared twice, and is
generally more maintainable, as it permits us to defer all the remaining
DT parsing and KASLR initialization to a later time.
This means the relocation fixup code runs only a single time as well,
allowing us to simplify the RELR handling code too, which is not
idempotent and was therefore required to keep track of the offset that
was applied the first time around.
Note that this means we have to clone a pair of FDT library objects, so
that we can control how they are built - we need the stack protector
and other instrumentation disabled so that the code can tolerate being
called this early. Note that only the kernel page tables and the
temporary stack are mapped read-write at this point, which ensures that
the early code does not modify any global state inadvertently.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220624150651.1358849-21-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24 15:06:50 +00:00
|
|
|
mov sp, x1
|
|
|
|
mov x29, xzr
|
2024-02-14 12:28:54 +00:00
|
|
|
mov x0, x20 // pass the full boot status
|
arm64: kernel: Create initial ID map from C code
The asm code that creates the initial ID map is rather intricate and
hard to follow. This is problematic because it makes adding support for
things like LPA2 or WXN more difficult than necessary. Also, it is
parameterized like the rest of the MM code to run with a configurable
number of levels, which is rather pointless, given that all AArch64 CPUs
implement support for 48-bit virtual addressing, and that many systems
exist with DRAM located outside of the 39-bit addressable range, which
is the only smaller VA size that is widely used, and we need additional
tricks to make things work in that combination.
So let's bite the bullet, and rip out all the asm macros, and fiddly
code, and replace it with a C implementation based on the newly added
routines for creating the early kernel VA mappings. And while at it,
create the initial ID map based on 48-bit virtual addressing as well,
regardless of the number of configured levels for the kernel proper.
Note that this code may execute with the MMU and caches disabled, and is
therefore not permitted to make unaligned accesses. This shouldn't
generally happen in any case for the algorithm as implemented, but to be
sure, let's pass -mstrict-align to the compiler just in case.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-66-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-14 12:29:07 +00:00
|
|
|
mov x1, x21 // pass the FDT
|
2024-02-14 12:29:04 +00:00
|
|
|
bl __pi_early_map_kernel // Map and relocate the kernel
|
2022-06-24 15:06:47 +00:00
|
|
|
|
2016-04-18 15:09:43 +00:00
|
|
|
ldr x8, =__primary_switched
|
2022-06-29 04:12:07 +00:00
|
|
|
adrp x0, KERNEL_START // __pa(KERNEL_START)
|
2016-04-18 15:09:43 +00:00
|
|
|
br x8
|
2020-02-18 19:58:33 +00:00
|
|
|
SYM_FUNC_END(__primary_switch)
|